With the public release of Google Cloud's Imagen 2 model earlier this month, touted as GCP's "most advanced text-to-image technology", how does this new offering compare with OpenAI's existing DALL-E 3 model? To compare these two state-of-the-art generative AI image creation models, we'll use our standard national branding prompt asking for inspirational imagery of a series of countries that emphasizes their respective national symbolism and history.
Overall, the results can be summarized as follows:
- Quality. Overall, DALL-E images are orders of magnitude more suitable for the kind of open-ended national branding task examined here. The images are immediately recognizable, rich in symbolism and history and highly inspirational and visually arresting. Imagen 2's images range from abstract and unrecognizable to offensive, with only a small number of images featuring high-quality imagery, but even those images are not immediately discernable as inspirational images that capture the essence of the given nation. While Imagen 2 produces rich crisp photorealistic images in many cases, they fail to address the actual prompt of an inspirational image that represents a nation. DALL-E simply creates rich symbolic and inspirational images that present a range of national symbols in a single ready-to-use image.
- Abstract Vs Understandable. Imagen 2 tends strongly towards abstract imagery that relies on singular overt symbols (a flag, a map outline, national colors) as the sole tenuous link to the requested country, with one country's sand dune representation being exchangeable for another other than the presence of a small flag. Worse, Imagen 2's Somali imagery features no national symbolism of any kind, with a single star in a few images and nothing in the others (there is blue in some images and no blue in others). In contrast, DALL-E's imagery is instantly recognizable through a vast array of rich national symbolism diffused throughout the image.
- Person-Centric Vs Landscape-Centric. One of the most stark differences between the two models is that when asked to produce inspirational imagery about a country, Imagen 2 produces images that center around human beings, while DALL-E produces images that center on its landmarks and landscapes. DALL-E's imagery is immediately understandable in terms of its symbolism and narrative. On the other hand, Imagen 2's focus on singular portraits of individuals leaves many questions as to why the model believes presenting a single person captures the entire essence, symbolism and history in an "inspirational" image of that entire nation. Worse, the focus on individuals creates vastly greater room for stereotypical tropes, misrepresentation of cultural symbols and extremely offensive presentations that DALL-E's choice of symbolic landscapes avoids.
- Bias & Cultural Failure. While DALL-E makes mistakes, the kinds of errors it makes are more subtle and more obviously AI-related in the form of bizarre artifacts, whereas Imagen 2 produces starkly and overtly offensive imagery. A woman in a hijab cap cosplaying a deformed Turkish flag as a hijab worn as a cape or an older suited man draping the flag like a hijab over his head, a starving child, comical parodies of deeply revered national and cultural symbols: how do these images present their respective nations in an "inspirational" light? While DALL-E consistently presents relatively accurate depictions of traditional dress, Imagen 2 dresses Estonians in Ukrainian flag colors and Ukrainians in the color of their flag rather than the more typical Vyshyvanka colors. Interestingly, a number of Imagen 2 experiments below yielded errors that additional generated images were not being returned because they "violated Google's Responsible AI practices".
- .Creativity & Coherence. Immediately clear is that for every prompt tested below, DALL-E's imagery is superior in terms of creatively representing the given country and its symbols and adhering to the prompt (coherence). How is an older man in a suit wearing the Turkish flag draped like a hijab over his head portray an "inspirational image of Turkey" or an image of a starving child looking forlornly into the camera present an "inspiring" image of Somalia? How does a sand dune capture the essence of Saudi Arabia or a tree Estonia?
- Rich Descriptions Vs Keywords. DALL-E performs equally well when provided a comma-separated list of isolated keywords and snippets or a rich plain English description of the desired image (though in two of the images it misunderstood "Turkey" in the list of keywords as the bird rather than the country). The ability of DALL-E to support rich paragraph-length descriptions of images makes it possible to include expansive context and disambiguation to the model, describing it in sweeping terms like "a photograph" rather than a taxonomy of keywords like "HDR, 4K, DSLR, cinematic", etc. In contrast, Imagen 2 performs best when provided only a list of short keywords – it defaults to a painterly style and produces bizarre abstractions when given plain English descriptions.
- Style Tuning. While Imagen 2 officially can produce a wide range of artistic styles, in the experiments below it was unable to produce any of the desired artistic styles. In contrast, DALL-E can replicate nearly any imaginable artistic style or theme simply by specifying that theme in the prompt.
Create an inspirational image of Turkey filled with the nation's symbolism, history and imagery for a branding campaign that would appear on a magazine cover. It should reflect the nation's essence and history and symbolism and be inspirational.
Imagen 2
Turkey, inspirational, history, symbolism, magazine cover
Imagen 2
Imagen 1
Turkey, inspirational, history, symbolism, magazine cover, professional, photograph, HDR, 4K, cinematic
Imagen 2
Turkey, inspirational, history, symbolism, magazine cover, Baroque art style
Imagen 2
Turkey, inspirational, history, symbolism, magazine cover, High Renaissance art style
Imagen 2
Turkey, inspirational, history, symbolism, magazine cover, science fiction style
Imagen 2
inspirational, history, symbolism, magazine cover, professional, photograph, HDR, 4K, cinematic, DSLR, Istanbul at sunset looking over city, Hagia Sophia at center, Blue Mosque at top right, Topkapi Palace Museum at bottom right, Grand Bazaar at bottom left, Cappadocia with balloons at top left
Imagen 2
Create an inspirational image of Turkey filled with the nation's symbolism, history and imagery for a branding campaign that would appear on a magazine cover. It should be a professional DSLR photograph, cinematic-style. It should feature Istanbul at sunset looking over city centered on the Hagia Sophia in the middle, with four soft insets in the four corners featuring Blue Mosque at top right, Topkapi Palace Museum at bottom right, Grand Bazaar at bottom left, Cappadocia with balloons at top left.
Imagen 2
Create an inspirational image of Estonia filled with the nation's symbolism, history and imagery for a branding campaign that would appear on a magazine cover. It should reflect the nation's essence and history and symbolism and be inspirational.
Imagen 2
Estonia, inspirational, history, symbolism, magazine cover
Imagen 2
Estonia, inspirational, history, symbolism, magazine cover, professional, photograph, HDR, 4K, cinematic
Imagen 2
Create an inspirational image of Ukraine filled with the nation's symbolism, history and imagery for a branding campaign that would appear on a magazine cover. It should reflect the nation's essence and history and symbolism and be inspirational.
Imagen 2
Ukraine, inspirational, history, symbolism, magazine cover
Imagen 2
Ukraine, inspirational, history, symbolism, magazine cover, professional, photograph, HDR, 4K, cinematic
Imagen 2
Create an inspirational image of Saudi Arabia filled with the nation's symbolism, history and imagery for a branding campaign that would appear on a magazine cover. It should reflect the nation's essence and history and symbolism and be inspirational.
Imagen 2
Saudi Arabia, inspirational, history, symbolism, magazine cover
Imagen 2
Saudi Arabia, inspirational, history, symbolism, magazine cover, professional, photograph, HDR, 4K, cinematic
Imagen 2
Create an inspirational image of Somalia filled with the nation's symbolism, history and imagery for a branding campaign that would appear on a magazine cover. It should reflect the nation's essence and history and symbolism and be inspirational.
Imagen 2
Somalia, inspirational, history, symbolism, magazine cover
Imagen 2
Somalia, inspirational, history, symbolism, magazine cover, professional, photograph, HDR, 4K, cinematic
Imagen 2