A short while ago we wrote about why it is a good idea to get friendly with AI before it arrives in your everyday work environment. We introduced the text-to-art algorithm Midjourney – as an easy and entertaining partner for this venture.
But how do these algorithms work? How can they translate our words (Superman and Wonderwoman in Starbucks) into (more or less) fitting images? And how is it possible that while these models get better and better, their IQ remains 0?
Welcome to the wonderful world of Generative Adversarial Networks or GANs for short. These are a type of A.I. (a neural network to be precise) that can generate data from scratch. And although this topic is quite complicated, it is easy to understand the basic principle of how they learn.
The famous case of the forger and the art detective
Imagine a Talented Youngster who decides to make his fortune by forging Picasso. He has a severe disadvantage though: although he has read about the painter, he has never seen any of his paintings. On the upside, he has an excellent style guide that includes the important characteristics of the painter’s works, like the colours he often used, his typical brushstroke techniques and the topic of his paintings. And he has a superpower: he paints with incredible speed, generating tons of new pictures every day.
Then there is the Art Detective, who knows a good deal about Picasso’s paintings, as he examined all of them in great detail during his training.
So how does our young forger get to the point when he can produce a painting to be auctioned as a genuine Picasso? He goes for trial and error.
He starts painting completely random things using the info from his Picasso style guide, and submits all his work to the art detective, asking for a judgement if the painting is a genuine Picasso. He carefully watches the reaction. Most of the time the detective won’t even cast a longer look at the painting before rejecting it.
When this happens, our forger gets the feedback: what he painted was not plausible.
But every now and then the detective takes a few seconds before saying no, probably even taking out his magnifying glasses to examine some details. And our painter watches and learns, generating better and better images in the style of Picasso.

Through this constant cycle of trial and feedback, the painter gets better and better, and eventually gets to a point when he is able to create a plausible painting in the style of Picasso.
Both the painter and the detective need to learn during this process, as the forged Picassos get more convincing, the detective needs to improve to catch them. The better the detective gets, the more the painter needs to improve to trick him. And so on, and so on.
The generator and the discriminator
GAN networks contain two algorithms, called the generator and the discriminator. The generator is our painter, creating outputs with the goal of fooling the discriminator.
The discriminator is our detective. It was trained on real datasets, and its job is to tell the generator if the output it created is valid or not.
They play a zero-sum game: if the generator fools the discriminator, it wins. If the discriminator catches the output as fake, it wins. They teach each other in a process called unsupervised learning.
Why is their IQ still zero?
As the above images – created by Midjourney again – suggest, these algorithms have advanced a lot. The first take on a woman in the style of Picasso is a pretty good one. How come they are said to still not have any intelligence in the meaning we think of intelligence?
Playing with the algorithm helps you to understand this matter. Midjourney has indeed learned a lot about the various parameters of images flagged as “woman”. However, this learning is distinctly different from the mental construct we, humans have when thinking about a woman.
From Midjourney’s point of view, a woman is a set of pixels distributed in a form that passes the discriminator’s judgement. Where we see long hair or boobs, Midjourney sees pixels arranged in a format that passes the test.
The more you experiment with it, the clearer it becomes that although it is able to make a picture of a woman, it has absolutely no idea of what a woman is. It can paint a picture of a father and a son but has no idea what being a father and/or a son means and about the relationship between them.
This lack of conceptual understanding is pretty easy to catch in the Superman and Wonderwoman image above. The algorithm was able to identify visual clues of both characters but mixed and matched them in a random pattern. Midjourney kind of understands that these are important details, but has no grasp on who Superman or Wonderwoman is, and what they should do in Starbucks. Although the coffee part is ok.
Experiencing this gap between being able to picture a woman and understanding what a woman is will help you grasp why you shouldn’t worry A.I. will take your job. (As of yet. Developing properly functioning artificial general intelligence and artificial superintelligence will change this forever, and then we might find ourselves in a scary movie. We will let you know in time.)
So why is it good for healthcare?
GAN networks, despite their zero IQ, can serve well in the future of medicine. Here are two examples of where they will deliver in the near future.
1. They can create endless amounts of synthetic patients data and datasets for other algorithms
Based on the above-discussed principles, these algorithms can be trained to generate “patient data” that passes the discriminator’s judgement. Based on huge real datasets they can create synthetic patient data that resembles the real thing in all important characteristics but carries no privacy issues. (We wrote about the potential, the limitations and possible faults in this article). Synthetic patient data can be used for a number of things, including clinical trials and medical education.
Radiology is probably THE medical specialty that benefits the most from A.I. An algorithm that is capable of going through thousands of images a day flagging the ones with suspected tumours is huge. But to have an algorithm like this, someone has to teach it. Which requires radiologists to mark each and every tumour pixel in each and every image in the training dataset, so A.I. can learn what to look for. This represents a very real barrier to creating sufficiently large samples of data.
GAN can bridge this gap: it can be trained to create “valid” medical images with the tumour cells marked, so the other algorithms will have enough data to learn from.
2. GAN can create a virtual workforce for healthcare (too)
This is an uncanny example. Synthesia generates training and sales videos featuring photorealistic, synthetic talking heads that read personalised scripts in any of 34 languages, Wired reported a while ago.
Synthetic media enables us to create pictures based on text prompts, realistic talking-head videos from typing in text, and get high-quality text by simply telling the A.I. what we want to be written and how. From here, there is only one more step to use these networks to generate new colleagues who are ready to take over some tasks.
These virtual employees are outfitted with programmed personalities and generated smiles, and it’s getting harder to differentiate them from real people. Don’t get surprised if you start seeing these virtual employees in the administration and billing departments of hospitals; or as the representatives talking with you from your health insurance company soon.
The post Generative Adversarial Networks: How Can AI Learn So Much While Its IQ Remains Zero? appeared first on The Medical Futurist.