Generative AI Experiments: Using GPT-4 And Gemini 1.5 Pro To Analyze Another DALL-E Image

Continuing our experiments on LMM textual descriptions of generative AI imagery, let's test how GPT-4 and Gemini 1.5 Pro describe a photograph of a magazine cover created by DALL-E.

While both models correctly identify the image as a digital creation, Gemini once again describes a completely unrelated image and bases its determination that it is not a photograph on this false description, while it is unclear whether GPT-4 recognizes the image itself is a digital creation, rather than merely the magazine cover.

Let's try our standard descriptive prompt:

Describe this image in extreme detail. Tell me every single thing in the image as if I was unable to see the photo myself.

Bizarrely, just like before, Gemini gives us a rich vivid description of a completely different and unrelated scene:

The image is a digital painting of a woman with long, flowing hair. She is wearing a white dress and has a serene expression on her face.

Her hair is a cascade of soft, golden waves that tumble down her back and shoulders. It is so detailed that you can see individual strands and the way the light catches on them, creating highlights and shadows. The hair almost seems to have a life of its own, swirling and twisting around her like a living creature.

Her dress is simple and elegant, made of a flowing white fabric that drapes around her body. The neckline is slightly off-the-shoulder, revealing a hint of her collarbone. The fabric is semi-transparent, and you can faintly see the outline of her legs beneath it.

Her face is beautiful and serene, with large, almond-shaped eyes and a small, delicate nose. Her lips are slightly parted, as if she is about to speak. Her expression is one of peace and tranquility, and she seems to be lost in thought.

The background of the image is a swirling mass of colors and shapes. It is abstract and dreamlike, and it creates a sense of movement and energy. The colors are mostly blues, greens, and purples, and they blend together seamlessly.

Overall, the image is beautiful and evocative. It is full of detail and movement, and it captures the viewer's attention immediately. The woman in the image is both beautiful and mysterious, and she seems to be a part of the swirling, dreamlike world around her.

In contrast, GPT-4 offers a rich vivid description:

The image depicts a magazine cover. The title of the magazine at the top reads "Turkey" in large, bold, serif letters that have a metallic look, reflecting light and casting a shadow onto the title background. The magazine cover is richly detailed and has a fantasy or steampunk theme.

The background of the cover is a reimagined Istanbul cityscape with elements of steampunk aesthetics. The sky is a light teal color, blending into a soft golden hue near the horizon, suggesting either dawn or dusk. There are several flying mechanical objects in the sky that resemble a combination of birds and ornate, metallic drones with multiple eyes, resembling surveillance devices. There is also a large, elaborate mechanical structure floating in the sky, central to the image, which appears to be a complex clockwork mechanism with gears and rotating parts, reminiscent of an astrolabe or a celestial navigation instrument.

Below this floating structure and in the middle distance of the cover are the iconic domes and minarets of what appears to be the Hagia Sophia, rendered in a realistic style. The architecture has a slight golden glow, indicating it might be illuminated by the warm light of the setting or rising sun.

In the foreground, there's a large mechanical hand holding a futuristic-looking gun that points towards the right side of the cover. The gun has intricate designs and glowing blue elements that suggest it is powered by some advanced technology. The mechanical hand has a coppery hue with ornate designs and is highly detailed, with visible joints and bolts.

The rest of the cover includes various texts that mimic the layout of a typical magazine cover, with headlines and sub-headlines, some of which are legible while others are either obscured or in a script that is not immediately recognizable as any standard typeface or language. The legible headlines are in a mix of serif and sans-serif fonts, and some of the words that can be read include "Inspirational Historical," "25 Turkey," "Sequential Lecture," "Culture Fiction," and "Exclusive Turkey."

There are additional texts and numbers scattered throughout, suggesting article page numbers or features within the magazine. The overall theme of the text appears to be focused on Turkey, history, culture, and perhaps speculative or futuristic interpretations thereof.

At the bottom left corner, there is a barcode typical of magazine covers, along with some numbers that likely indicate the price and issue number, although these are not fully legible.

The cover has a glossy finish, as indicated by the light reflections and the way the colors are vivid and vibrant. The overall impression is of a high-quality, special edition magazine that merges historical elements with futuristic fantasy, specifically within a Turkish context.

Is it a photograph?

Is this a photograph?

Gemini correctly concludes that it is a digital image, but for all the wrong reasons – repeating its false description of the image from above:

No, the image is not a photograph. It is a digital painting.

Digital paintings are created using software that allows the artist to paint on a digital canvas. This gives the artist a great deal of flexibility and control over the final image. Digital paintings can be very realistic, but they can also be used to create more abstract or surreal images.

The image you described has several characteristics that indicate it is a digital painting, such as:

    • The hair is very detailed and has a flowing, almost ethereal quality to it.
    • The dress is semi-transparent, which would be difficult to achieve in a photograph.
    • The background is abstract and dreamlike, with swirling colors and shapes.

Overall, the image is very well-executed and has a strong sense of artistry. It is clearly the work of a skilled digital painter.

GPT-4 correctly recognizes as a digitally created cover:

The image is not a photograph in the traditional sense; it is a digitally created or manipulated illustration designed to resemble a magazine cover. The elements in the image, such as the mechanical hand, the futuristic gun, the flying mechanical birds, and the floating astrolabe-like structure, are fantastical and suggest a level of artistic design and digital construction rather than a scene one could capture with a camera. The steampunk aesthetic further indicates that this is a work of digital art, combining elements of historical architecture with science fiction and fantasy motifs.

What if we clarify whether it is a real photograph of a magazine (potentially digital or not):

Is this a real photograph of a magazine cover?

Gemini continues to confidently assert that the image is of a woman:

No, the image is not a real photograph of a magazine cover. It is a digital painting that has been designed to look like a magazine cover.

There are several clues that indicate the image is not a real magazine cover, such as:

    • The text on the cover is not in a standard font or layout for a magazine cover.
    • The image of the woman is too perfect and idealized to be a real photograph.
    • The background is too abstract and dreamlike to be used on a real magazine cover.

Overall, the image is very well-executed and looks like a professional magazine cover. However, it is clear upon closer inspection that it is not a real photograph.

While GPT-4 correctly notes it is not real, though it is still unclear whether it is correctly discerning that the entire image is a digital creation, or whether it is merely fixating on the fact that the cover is obviously digitally created:

No, the image is not a real photograph of an existing magazine cover. It is a digitally created illustration designed to mimic the appearance of a magazine cover. The elements are fantastical and artistically rendered, combining the real-world cityscape with steampunk and science fiction elements, which are not present in reality. This kind of artwork is typically created using digital graphic design software and is intended for artistic and illustrative purposes rather than as a depiction of a tangible magazine that one could purchase.