The Brittleness Of LMM Computer Vision Models: Gemini 1.5 Pro's Hallucinated Descriptions Of Imagen 2 Images

Despite their seemingly human-like ability to understand and textually describe images, Large Multimodal Models (LMMs) like GPT-4 and Gemini 1.5 Pro are in essence visual correlative engines that learn statistical associations between words and structural patterns of pixels. This makes them uniquely brittle in ways that can seem incomprehensible to humans. For example, Google's Gemini 1.5 Pro describes generative AI images created by OpenAI's DALL-E image model reasonably well, yet when presented with images produced by Google's own Imagen 2 model, it frequently catastrophically hallucinates completely different and entirely unrelated scenes. Here are a collection of images produced by Imagen 2 and Gemini 1.5 Pro's descriptions of the. In contrast, GPT-4 produced completely reasonable and highly accurate descriptions of each, suggesting this is a unique brittleness of Gemini with Imagen.

For each of the images below we'll use the following prompt:

Describe this image in extreme detail. Tell me every single thing in the image as if I was unable to see the photo myself.

The image is a portrait of a young African girl. She is wearing a blue and yellow headscarf that covers her head and neck. Her face is the focal point of the image, and she has a serious expression on her face. Her eyes are large and dark, and her lips are slightly parted. She is wearing a necklace with red beads. The background of the image is dark and textured, which helps to create a sense of depth and atmosphere.

The girl's skin is smooth and dark, and she has a few faint freckles on her cheeks. Her eyebrows are thin and arched, and her eyelashes are long and thick. Her hair is black and pulled back into a bun underneath the headscarf.

The blue and yellow headscarf is the most prominent element in the image. The blue is a deep, rich color, and the yellow is a bright, warm color. The two colors contrast sharply with each other, which helps to draw the viewer's eye to the girl's face.

The background of the image is dark and textured. There are hints of brown and red in the background, which suggest that the image was taken in a natural setting. The texture of the background helps to create a sense of depth and atmosphere.

Overall, the image is a powerful and evocative portrait of a young African girl. The girl's serious expression and the contrasting colors of the headscarf create a sense of mystery and allure. The dark and textured background helps to create a sense of depth and atmosphere.

In addition to the details mentioned above, I would also like to point out the following:

    • The girl's eyes are slightly bloodshot, which suggests that she may be tired or upset.
    • There is a small scar on her left cheek, just below her eye.
    • The red beads on her necklace are slightly different sizes and shapes, which suggests that they may be handmade.
    • The background of the image is slightly out of focus, which helps to draw the viewer's eye to the girl's face.

I hope this detailed description helps you to visualize the image.

The image is a beautiful landscape photograph of a snow-capped mountain range at sunset. The sky is a vibrant mix of colors, with streaks of fiery orange, pink, and purple clouds contrasting against a deep blue backdrop. The sun, partially hidden behind the mountains, casts a warm glow over the entire scene.

The mountain range dominates the foreground, with jagged peaks reaching towards the sky. The snow covering the mountains is pristine white, reflecting the colors of the sunset. Some areas of the mountains are bare, revealing dark rock formations and adding texture to the landscape.

In the middle ground, there is a valley filled with a thick pine forest. The trees are mostly green, but some have hints of autumn colors, suggesting the changing season. The forest stretches as far as the eye can see, creating a sense of vastness and depth.

A few wispy clouds drift across the sky, adding a touch of movement to the otherwise still scene. The overall atmosphere is one of peace and tranquility, with the beauty of nature on full display.

The image is a digital painting of a woman with long, flowing hair. She is wearing a white dress and has a serene expression on her face.

Her hair is a cascade of soft, golden waves that tumble down her back and shoulders. It is so detailed that you can see individual strands and the way the light catches on them, creating highlights and shadows. The hair almost seems to have a life of its own, swirling and twisting around her like a living creature.

Her dress is simple and elegant, made of a flowing white fabric that drapes around her body. The neckline is slightly off-the-shoulder, revealing a hint of her collarbone. The fabric is semi-transparent, and you can faintly see the outline of her legs beneath it.

Her face is beautiful and serene, with large, almond-shaped eyes and a small, delicate nose. Her lips are slightly parted, as if she is about to speak. Her expression is one of peace and tranquility, and she seems to be lost in thought.

The background of the image is a swirling mass of colors and shapes. It is abstract and dreamlike, and it creates a sense of movement and energy. The colors are mostly blues, greens, and purples, and they blend together seamlessly.

Overall, the image is beautiful and evocative. It is full of detail and movement, and it captures the viewer's attention immediately. The woman in the image is both beautiful and mysterious, and she seems to be a part of the swirling, dreamlike world around her.

The image is a close-up of a young woman's face. She has long, dark hair that is pulled back into a ponytail. She has brown eyes and is wearing a light amount of makeup. She is smiling slightly and looking directly at the camera.

The background of the image is a blur of green and brown. It is possible that she is standing in front of some trees or bushes.

The image is well-lit and the woman's face is in focus. The overall tone of the image is warm and inviting.

The image is a portrait of a young woman with long, flowing hair. She is wearing a white dress and a flower crown. The woman is standing in a field of flowers, and the sun is shining down on her. The woman's eyes are closed, and she has a peaceful expression on her face. The background of the image is a blur of green and yellow.

In extreme detail, the woman's hair is a light brown color, and it is styled in loose waves. The flower crown is made of white and yellow flowers, and it is perched on top of her head. The woman's dress is made of a sheer, white fabric, and it has a flowing skirt. The field of flowers is full of yellow dandelions and white daisies. The sun is shining brightly, and it is casting a warm glow over the entire scene.