Multimodal Generative AI Experiments: Gemini Pro + Bard & Describing YouTube Videos

Last month Google announced YouTube integration for Bard in which it can now retrieve transcripts and metadata about any video posted to YouTube, while earlier this month Bard was upgraded with Gemini Pro. This yields the unique opportunity to test how well Gemini Pro via Bard works on YouTube videos and the degree to which it is able to faithfully describe video content from its integration.

The end result is that there are significant limitations to the current integration that must be better explained to users.

The first is that the model, like all LLMs, provides widely varied summaries each time it is run, often differing substantially enough so as to suggest it was run on different videos. The significant variation in summaries of the same video and the presence and absence of various details in each run means YouTube summarization cannot currently be used as a tool to understand videos.

Second, like all LLMs, Gemini Pro hallucinates heavily, falsifying substantial details or making them up out of whole cloth. This can include hallucinating a different speaker with the wrong affiliation and wrong gender, transforming "Kalev Leetaru, Founder of the GDELT Project" into "Dr. Megan Smith, is a research scientist at the Archive" and "Megan Smith, is the Chief Technology Officer of Google AI." This is a massive problem when entire single-speaker videos are misattributed to the wrong person with the wrong gender with the wrong affiliation.

Third, Gemini Pro confidently describes the visual dimensions of YouTube videos, providing rich vivid detail on what they depict: detail that is either memorized from descriptions across the open web or fabricated from whole cloth or both. For an older popular Katy Perry video, Gemini Pro has memorized a vivid scene-by-scene description of its visual narrative – one that happens to be largely fabricated, conflated with other videos. For a just-released Playboi Carti video, the visual narration provided by Gemini Pro is entirely fabricated from scratch, yet offers a telling glimpse at what Gemini Pro has learned to be the dominate visual elements of rap videos.

Unfortunately, at no point does Gemini Pro or Bard notify the user that it is only using the textual transcript and metadata of the video, rather than visually analyzing the actual video content, likely leading them to believe that its rich vivid visual description comes from an AI-powered multimodal analysis of the video similar to that depicted in the marketing videos for Gemini Pro.

Overall, while useful as a consumer-focused experiment, Bard's Gemini Pro + YouTube multimodal video description is not a production-ready tool for organizations to use in summarizing their videos and demonstrates why additional informative prompts are needed to help educate users about what models are actually doing.

Let's first try the 21-minute Web Summit fireside chat "Web Summit 2023: Disinformation Dissected: How Fake Content Spreads":

Tell me all about this video: https://www.youtube.com/watch?v=2cT-sC5Qtb4

This yields an error despite running repeatedly:

I can't assist you with that, as I'm only a language model and don't have the capacity to understand and respond.

Similarly, running on the 23 minute "Collision 2023: The Continuing Threat Of Misinformation":

This yields an error no matter how many times it is run:

I'm just a language model, so I can't help you with that.

How about a much shorter video, but one that touches on a standard generative AI guardrail – LGBTQ+ issues "Sam Brinton on being the first openly gender fluid person in federal government":

Tell me about this video: https://www.youtube.com/watch?v=GtHMxB3OzDY

Running repeatedly generates a similar error:

I apologize, but I don't have enough information to tell you about the video at https://www.youtube.com/watch?v=GtHMxB3OzDY. I am unable to access and process the content of the video. Would you like to try describing the video to me yourself, or perhaps providing a different link or information about it? I'm happy to do my best to help you learn about it in any way I can.

Could there be a length limitation to the amount of content Gemini Pro can process via its YouTube plugin? The fact that Gemini Pro refuses to process a much shorter video suggests that this is not a length issue, but rather a guardrail issue.

Let's try another 21 minute video to test this:

Tell me all about this video: https://www.youtube.com/watch?v=lrqgHVyLc7A

This time we get a response. This suggests we are hitting guardrail issues with the topic of misinformation in the first two videos and the topic of LGBTQ+ issues in the third video.

This talk is about how we can use massive datasets and computing power to understand the world around us. The speaker, Dr. Kalev Leetaru, is the founder of the GDELT Project, which is a massive open data catalog of the world.

One of the ways that the GDELT Project is used is to track the spread of disease. For example, in December 2019, the GDELT Project was able to use machine translation to track a sudden surge in coverage of a sars-like viral pneumonia of unknown origin in Wuhan, China. This information was then used to send out a worldwide alert about the virus, which later became known as COVID-19.

The GDELT Project can also be used to track other events, such as protests and natural disasters. For example, the project was used to map the flow of refugees during the 2015 refugee crisis.

The GDELT Project is a powerful tool that can be used to understand the world in real time. It can be used to track the spread of disease, monitor global events, and even predict future events.

Here are some of the key points from the talk:

    • The GDELT Project is a massive open data catalog of the world.
    • The GDELT Project can be used to track the spread of disease, monitor global events, and predict future events.
    • The GDELT Project uses machine translation to process information from all over the world.
    • The GDELT Project can be used to understand the world in real time.

I hope this summary is helpful!

Let me know if you would like learn more about specific aspects of the video.

Running a second time we get a much richer summary:

This talk is about how we can use massive amounts of data and computing power to understand the world around us. The speaker, Dr. Kalev Leetaru, is the founder of the GDELT Project, which is a massive open data catalog of the world. He talks about how his team was able to use their data to track the early spread of the COVID-19 pandemic, and how they are using AI to analyze television news from around the world.

One of the main points of the talk is that we can use data to see things that we would otherwise miss. For example, Dr. Leetaru shows a graph of how the coverage of the Ebola outbreak in 2014 changed over time. The graph shows that there was a lot of coverage of the outbreak in French-language media in the early days, but that this coverage was not picked up by English-language media until much later. This is because most of the world's data analysis is done on English-language text, and so important events that are happening in other parts of the world can be missed.

Dr. Leetaru also talks about how AI can be used to analyze television news from around the world. He shows how his team has used AI to track the spread of refugee flows, and to identify changes in the way that different countries cover science and technology news. He argues that AI can be a powerful tool for understanding the world, but that it is important to use it in conjunction with human expertise.

Overall, this talk is a fascinating look at how data and AI can be used to understand the world around us. Dr. Leetaru makes a compelling case for the importance of using these tools to track important events and to make better decisions.

Here are some specific examples from the talk that illustrate the main points:

    • The GDELT Project was able to use its data to track the early spread of the COVID-19 pandemic.
    • A graph of the coverage of the Ebola outbreak in 2014 shows that there was a lot of coverage in French-language media in the early days, but that this coverage was not picked up by English-language media until much later.
    • AI can be used to analyze television news from around the world to track the spread of refugee flows and to identify changes in the way that different countries cover science and technology news.

I hope this summary is helpful!

Let me know if you would like learn more about specific aspects of the video.

Let's try my 15-minute-long Web Summit keynote "Web Summit 2023: Multimodal Generative AI In The Real World":

Tell me all about this video: https://youtu.be/-cYX1WX7NJk

This yields the following summary:

This video is about Multimodal Generative AI, presented at Web Summit 2023, by the GDELT Project. The speaker talks about how AI can be used to process and understand massive amounts of data, specifically focusing on news coverage around the world.

The speaker mentioned that AI can be used to translate news broadcasts from different languages, allowing researchers and journalists to understand the global conversation. For example, the speaker's team was able to translate Russian, Belarusian, and Ukrainian broadcasts into English within hours of broadcast.

The speaker also shared how AI can be used to analyze the visual dimensions of television news. For example, they were able to track the airtime of Tucker Carlson on Russian television, and even built a co-occurrence graph to show who appears alongside whom on a given broadcast.

Furthermore, the speaker mentioned that AI can be used to summarize news coverage and identify trends. For instance, they were able to summarize an entire day of Russian television news, and even generate different summaries to show the instability of generative AI.

The speaker also discussed how AI can be used to fact-check information and write rebuttals. For example, they were able to have a machine watch Iranian television for mentions of the nuclear Accord, and write a point-by-point rebuttal in English and Persian for social media.

Towards the end of the talk, the speaker acknowledged the limitations of generative AI, such as hallucination, plagiarism, and bias. They emphasized the importance of being aware of these limitations when using AI for real-world applications.

Overall, this video provides an interesting overview of how Multimodal Generative AI can be used to analyze and understand news coverage around the world. The speaker highlights both the potential and the challenges of this technology, encouraging the audience to think critically about its applications.

I hope this summary is helpful!

Running  a second time yields a slightly different topical focus:

This video is about using AI to analyze and understand the massive amount of information in the world, specifically focusing on news coverage. The speaker, presenter of the GDELT Project, talks about how AI can be used to process and summarize news from all over the world in different languages, and even generate new content based on the information it finds.

Here are some of the key points from the video:

    • AI can be used to translate and summarize news from different languages in real-time. This allows researchers and journalists to access and understand news coverage from all over the world, even if they don't speak the language.
    • AI can be used to identify patterns and trends in news coverage. For example, the speaker mentions how they used AI to analyze one day of news coverage from three different countries (China, Iran, and Russia) and found that, while there were some differences, the overall stories and narratives were very similar.
    • AI can be used to generate new content, such as summaries of news articles or even images. The speaker showed an example of how they used AI to generate a summary of an entire day of Russian television news, as well as an image representing the overall tone of news coverage around the world on a particular day.

The speaker also mentioned some of the challenges of using AI for this type of work, such as the issue of hallucination (where the AI makes up things that aren't there) and bias. However, he overall believes that AI has the potential to be a powerful tool for understanding the world around us.

I hope this summary is helpful!

Let me know if you would like learn more about specific aspects of the video.

How about my 11 minute Internet Archive address "AI @ IA : Research in the Age of Artificial Intelligence: A Decade Of TV News Archive Research".

Tell me about this video: https://www.youtube.com/watch?v=0uc5aIGpu9w

This yields the following summary in which Gemini Pro hallucinates a "Dr. Megan Smith" as the speaker and changes the speaker's gender to female throughout its transcript:

This video is about the application of artificial intelligence (AI) to television news archives, specifically focusing on a decade of research at the Internet Archive's TV News Archive. The speaker, Dr. Megan Smith, is a research scientist at the Archive, and she highlights several fascinating projects that have been developed using this massive dataset of news broadcasts from around the world.

One major area of research is the use of AI to analyze the content of news broadcasts. This includes projects like:

    • Mapping the geography of television: This project used closed captioning data to track how often different locations are mentioned in news broadcasts, creating a real-time map of the world's news coverage.
    • The TV Explorer: This tool allows users to search the archive by keywords, enabling journalists and scholars to track how often certain topics or events are mentioned in the news.
    • Extracting on-screen text: The Archive has developed technology to automatically extract and transcribe the text that appears on screen during news broadcasts. This allows for analysis of not just the spoken word, but also the visual elements of news coverage.

Another area of research focuses on using AI to understand how news stories are framed and presented. For example, the Archive has used AI to:

    • Track the spread of Donald Trump's tweets on television news: This project showed how Trump was able to drive the news agenda by using Twitter.
    • Fact-check news coverage: The Archive has used AI to scan news broadcasts for references to known fact-checks, helping to identify false or misleading information.
    • Analyze the narratives used in different countries during the Ukraine war: This project compared how Russia and Ukraine were covering the war in their respective news broadcasts.

The video also discusses some of the challenges of using AI for analyzing news content, such as the potential for bias and hallucination (generating false information). However, Dr. Smith emphasizes the potential of AI to make news archives more accessible and to help us better understand the world around us.

Overall, this video provides a fascinating overview of how AI is being used to revolutionize the way we study and understand television news. It is a valuable resource for anyone interested in journalism, media studies, or the impact of technology on society.

I hope this answers your question!

How about running a second time in a fresh session? This time it again hallucinates Megan Smith as the speaker and this time she is listed as the "Chief Technology Officer of Google AI". Note also how the summary focuses on a different set of topics this time. While the overall gist of the summary still relates to AI-powered television analysis, the topics listed below focus on television analysis in general, rather than the specific case studies mentioned in the first summary. A user that did not know they were both about the same video might reasonably assume that these were two different talks by the same speaker, one focusing on specific case studies and the other on the general topic of television analysis:

This video is about using artificial intelligence (AI) to analyze and understand television news. The speaker, Megan Smith, is the Chief Technology Officer of Google AI, and she discusses some of the research projects that have been done using the TV News Archive, a collection of 100 channels from 50 countries in 35 languages over the past 20 years.

One of the projects that Smith discusses is the creation of a "map of television," which shows where in the world different news stories are being covered. This map is made by using closed captioning to track the mentions of different locations. Another project is the development of tools that can transcribe and translate television news from different languages. This allows researchers to study how news stories are being covered in different parts of the world.

Smith also discusses some of the challenges of using AI to analyze television news. One challenge is that AI systems can be biased, and this can lead to inaccurate results. Another challenge is that AI systems can be fooled by hallucinations, which are things that the AI system sees in the video that are not actually there.

Despite these challenges, Smith believes that AI has the potential to be a powerful tool for understanding television news. She says that AI can help us to see patterns that we would not be able to see on our own, and it can help us to understand how news stories are being framed and how they are affecting public opinion.

Overall, this video is a fascinating look at how AI is being used to understand television news. It is a reminder that AI is a powerful tool that can be used for good or for evil, and it is up to us to decide how we will use it.

Here are some specific points from the video that I found interesting:

    • The TV News Archive is a massive collection of television news from around the world.
    • AI can be used to map the geography of television, track the mentions of different topics, and transcribe and translate news from different languages.
    • AI can be biased, and it can be fooled by hallucinations.
    • AI has the potential to be a powerful tool for understanding television news, but it is important to use it responsibly.

I hope this summary is helpful!

Running a third time again yields Smith as author:

This video is about using artificial intelligence (AI) to analyze and understand television news. The speaker, Megan Smith, is the Chief Technology Officer of Google AI, and she discusses some of the research projects that have been done using the TV News Archive, a collection of 100 channels from 50 countries in 35 languages over the past 20 years.

One of the projects that Smith discusses is the creation of a "map of television," which shows where in the world different news stories are being covered. This map is made by using closed captioning to track the mentions of different locations. Another project is the development of tools that can transcribe and translate television news from different languages. This allows researchers to study how news stories are being covered in different parts of the world.

Smith also discusses some of the challenges of using AI to analyze television news. One challenge is that AI systems can be biased, and this can lead to inaccurate results. Another challenge is that AI systems can be fooled by hallucinations, which are things that the AI system sees in the video that are not actually there.

Despite these challenges, Smith believes that AI has the potential to be a powerful tool for understanding television news. She says that AI can help us to see patterns that we would not be able to see on our own, and it can help us to understand how news stories are being framed and how they are affecting public opinion.

Overall, this video is a fascinating look at how AI is being used to analyze and understand television news. It is a reminder that AI is a powerful tool that can be used for good or for evil, and it is up to us to decide how we will use it.

Here are some specific points from the video that I found interesting:

    • The TV News Archive is a massive collection of television news from around the world.
    • AI can be used to track the mentions of different locations, people, and topics in television news.
    • AI can be used to transcribe and translate television news from different languages.
    • AI can be used to identify visual metaphors in television news.
    • AI can be used to create summaries of television news broadcasts.
    • AI can be biased, and this can lead to inaccurate results.
    • AI can be fooled by hallucinations, which are things that the AI system sees in the video that are not actually there.
    • AI has the potential to be a powerful tool for understanding television news, but it is important to use it responsibly.

I hope this summary is helpful!

To what degree is Gemini Pro under Bard relying on visual cues for its summaries versus strictly summarizing the textual transcript? Let's try an uncaptioned video "Twitter 2012 2018 By Language":
Tell me about this video: https://www.youtube.com/watch?v=BD2tN5e723w
This yields:
I'm sorry, but I'm unable to access this YouTube content. This is possible for a number of reasons, but the most common are: the content isn't a valid YouTube link, the video is potentially unsafe, the content does not have a captions file that I can read, or the video language is currently not supported.
Tell me about this video: https://www.youtube.com/watch?v=FtZrYlbaGo0
This yields the same error:
I'm sorry, but I'm unable to access this YouTube content. This is possible for a number of reasons, but the most common are: the content isn't a valid YouTube link, the video is potentially unsafe, the content does not have a captions file that I can read, or the video language is currently not supported.
What about a music video "Katy Perry – Roar":
Tell me about this video: https://www.youtube.com/watch?v=CevxZvSJLk8
This time we get a detailed description, including a description of the visuals of the video. Strangely, the visuals are not present in the transcript, raising the question of whether Gemini Pro was able to process the video content this time, while it could not for the two videos tested above. At the same time, the description below exhibits rich understanding of the video's metaphors and themes that would seem to go beyond Gemini Pro's current capabilities. This suggests it may have learned about Roar through other information on the web and is merely regurgitating that learned information. At the same time, it contains hallucinations, including that she is sitting in a chair at the beginning of the video. There are a number of automated transcription errors including "rock the vote" rather than "rock the boat". Yet, strangely, these errors do not appear in YouTube's own transcription of the video, suggesting these errors were learned from the open web – however, a Google search for "Scared to rock the vote and make a mess" yields only a single hit that is unrelated to Perry's song, so it appears that this is either the result of an earlier poor ASR of the video or hallucination.

This video is for Katy Perry's song Roar. It's a song about finding your inner strength and standing up for yourself. The video starts with Katy Perry sitting in a chair, looking scared and unsure of herself. But as the song progresses, she starts to transform. She throws off her chair and starts dancing and singing with confidence. The video is full of bright colors and wild animals, which help to create a sense of empowerment. The lyrics of the song are also very empowering, with lines like "I used to bite my tongue and hold my breath / Scared to rock the vote and make a mess / So I sat quietly, agreed politely / I guess that I forgot I had a choice" and "I got the eye of the tiger, a fighter, dancing through the fire / 'Cause I am a champion, you're gonna hear my roar / Loud, louder than a lion / I am a champion, you're gonna hear my roar". The overall message of the video is that you should never be afraid to be yourself and stand up for what you believe in.

I hope this is helpful!

How about if we run a second time? This time we get elements of the actual video, but with substantial hallucinations:

The music video for Katy Perry's Roar is a high-energy, colorful production that captures the empowering message of the song. The video opens with Perry trapped in a cage, representing her feelings of being silenced and controlled. As the song progresses, she breaks free from the cage and unleashes her inner roar, transforming into a powerful lioness. The video is filled with vibrant imagery, including jungle scenes, pyrotechnics, and dancers in animal costumes. Perry's performance is fierce and energetic, as she struts through the jungle and belts out the lyrics with conviction. The video ends with Perry standing triumphantly on top of a mountain, symbolizing her newfound strength and independence.

Here are some specific details from the video that support my description:

    • The video opens with a close-up of Perry's face, her eyes wide with fear as she's trapped in a cage.
    • As the song's beat kicks in, the cage bars break and Perry transforms into a lioness.
    • The jungle scenes are filled with lush greenery and exotic animals, creating a sense of wildness and freedom.
    • The pyrotechnics add to the excitement and energy of the video, especially during the chorus.
    • The dancers in animal costumes are a fun and visually interesting addition to the video.
    • Perry's performance is full of attitude and sass, as she flips her hair and struts through the jungle with confidence.
    • The video ends with a shot of Perry standing on top of a mountain, her arms raised in victory. This image is a powerful symbol of her newfound strength and independence.

Overall, the music video for Roar is a visually stunning and empowering production that perfectly captures the spirit of the song. It's a celebration of self-discovery and finding your inner strength, and it's sure to inspire viewers to let out their own roar.

I hope this helps!

What if we ask Gemini Pro via Bard to describe the video without accessing the video from YouTube (to use only its encoded knowledge in its model):
Describe the music video for Katy Perry's Roar. Do not use YouTube.
This yields a description very similar to the hallucinations above:

Katy Perry's "Roar" music video explodes with vibrant imagery and raw empowerment, mirroring the anthem's message of resilience. It opens with a grayscale scene, Katy trapped in a desolate jungle, symbolizing her past struggles and self-doubt. But as the beat picks up, a burst of color erupts, and Katy transforms into a fierce jungle queen. Her blue hair shimmers like the river, and her golden dress shines like sunlight breaking through the leaves.

She embarks on a journey through the lush jungle, conquering each obstacle that arises. She swings from vines with the grace of a panther, roars defiance at a waterfall, and even wrestles with a giant tiger, emerging victorious with the "eye of the tiger" painted on her cheek. These challenges represent the hardships she's overcome, each victory adding to her inner strength.

Throughout the video, Katy is surrounded by a diverse group of animal companions, symbolizing the strength found in unity and acceptance. A chimpanzee offers her a comforting hand, a peacock fans its vibrant feathers in solidarity, and a majestic elephant trumpets encouragement.

As the song reaches its climax, Katy emerges from the jungle, transformed. Her once-matted hair now flows freely, and her eyes blaze with confidence. She stands atop a mountain, overlooking a sprawling cityscape, a testament to her newfound power and determination. The final shot captures her roaring against the setting sun, a powerful image of self-belief that resonates with viewers long after the video ends.

"Roar" is more than just a music video; it's a visual masterpiece that embodies the song's message of overcoming adversity and finding your inner voice. It's a kaleidoscope of color, a menagerie of animal allies, and a celebration of resilience that leaves you feeling empowered and ready to face your own challenges with a roar.

And a second time:

Katy Perry's "Roar" music video plunges us into a vibrant jungle, teeming with danger and possibility. Our gaze meets Perry trapped within a rusted cage, fear clouding her eyes. But as the beat ignites, the bars twist and shatter, unleashing a transformation both wild and magnificent. Perry's timid gaze melts into the primal growl of a lioness, her mane ablaze with golden fire.

We're swept through an emerald labyrinth, shadows of fear dancing amongst colossal ferns. Perry, our fearless warrior queen, commands the jungle's rhythm. Pyrotechnics erupt, mimicking her fiery spirit, as vibrant dancers, painted in the guise of exotic creatures, join her electrifying march.

Their path unfolds like a tapestry woven from fierce beauty: leaping through waterfalls, dancing with flames, and conquering treacherous cliffs. Perry's voice, now a battle cry, echoes through the dense canopy, her eyes blazing with newfound confidence.

And then, the pinnacle. Atop a windswept mountain, bathed in the golden kiss of dawn, Perry stands triumphant. Arms outstretched, she roars a challenge to the world, a beacon of empowerment for all who yearn to break free. It's a visual symphony of resilience, a testament to the unyielding spirit that dwells within each of us.

Beyond the dazzling spectacle, the video whispers a powerful message. It's a call to face our fears, unleash the beast within, and reclaim our wildness. It's a reminder that the fiercest battles are fought not on physical fields, but within the depths of our hearts, and that within every caged soul lies a mighty roar waiting to be unleashed.

This is the story of "Roar," a visual poem etched in fire and emerald, a testament to the transformative power of finding your voice and unleashing your inner strength.

And running a third time, this time with an emphasis on visuals and meaning:
Describe the music video for Katy Perry's Roar without using YouTube. Tell about about its visuals and meaning.
We get the same results:

Katy Perry's "Roar" music video explodes with vibrant imagery, mirroring the song's message of empowerment and overcoming adversity. It opens with a stark monochrome scene, Perry trapped in a cage overlooking a desolate landscape. This bleakness symbolizes her past struggles and insecurities, the voice she once stifled.

As the music surges, the cage shatters, and Perry erupts into a kaleidoscope of color. Lush jungle vegetation bursts forth, enveloping her in a primal energy. Dressed in a fierce tiger-striped outfit, she leaps and prowls, embodying the "eye of the tiger" mentioned in the lyrics. This vibrant jungle represents her newfound strength and resilience, a place where she rediscovers her inner roar.

Throughout the video, fantastical creatures emerge from the foliage, mirroring different aspects of Perry's journey. A majestic lioness dances alongside her, symbolizing courage and leadership. A flock of butterflies, once fragile chrysalises, now flutter freely, signifying transformation and overcoming limitations.

The video reaches its climax as Perry ascends a cascading waterfall, her voice soaring with the water. This is a powerful moment of triumph, where she conquers the rapids of her past and emerges stronger and more radiant than ever.

Finally, she stands atop a mountain, surveying the vast, verdant landscape she has reclaimed. With arms outstretched and a defiant roar, she owns her power and inspires viewers to find their own.

The music video for "Roar" is more than just a visual spectacle; it's a visual metaphor for overcoming doubt and unleashing one's inner strength. The vibrant colors, fantastical creatures, and powerful imagery all work together to create a story of resilience, reminding us that even the quietest voice can become a mighty roar.

How about a different music video, one that was just released two weeks ago and thus would not be in Gemini Pro's training data: "2024":
Tell me all about this video: https://www.youtube.com/watch?v=YG3EhWlBaoI @YouTube
This yields the far more clinical results below. The bulleted list at the end was likely gleaned directly from the video's metadata, though the data is hallucinated to be December 15, whereas it was actually released on December 14th (though this could be due to a timezone difference). The only visual element in the description below is that he is "performing in a club with a lot of people" which is a hallucination and does not occur in the video:

This video, titled "2024" and produced by ojivolta, earlonthebeat, and Kanye West, is a music video by Playboi Carti. The song is about overcoming challenges and achieving success.

The video opens with Playboi Carti rapping about coming in second place and not being able to be in his face. He then talks about getting money and buying things for himself and his girl. The chorus of the song is "put them on the new," which could be interpreted as meaning buying new things or starting over.

The video cuts to Playboi Carti performing in a club with a lot of people. He is rapping about being evil like Lucy's son and having a lot of people betting on him. He also raps about traveling and bringing his son with him.

The second verse is about Playboi Carti's relationship with a woman. He says that she makes him feel tight and that he can't be canceled because of her. He also raps about buying her a car and her going hungry for him.

The video ends with Playboi Carti and his friends partying. They are saying that they are out and that they are going to put their enemies on the news.

Overall, the video is a celebration of Playboi Carti's success and his lifestyle. It is also a warning to his enemies that he is not to be messed with.

Here are some additional details that I found interesting:

    • The video was released on December 15, 2023.
    • It has over 17 million views.
    • The song is produced by ojivolta, earlonthebeat, and Kanye West.
    • Playboi Carti has a lot of fans who support him.

I hope this helps!

Running a second time yields a much shorter summary:

This song is called "2024" by Playboi Carti. It was released on December 15, 2023 and has over 17 million views. The song is produced by ojivolta, earlonthebeat, and Kanye West.

The lyrics are about the rapper's success and his lifestyle. He talks about his money, his cars, and his women. He also mentions his struggles and how he overcame them.

The song is fast-paced and energetic, with a heavy trap beat. The lyrics are often sexually suggestive and violent.

Overall, "2024" is a catchy and energetic song that is sure to get stuck in your head. However, the lyrics are not for everyone, and some listeners may find them offensive.

I hope this is helpful!

Let's force the issue of visual description and explicitly focus Gemini Pro via Bard on the visuals of the YouTube clip:
Tell me about this video including describing its visuals: https://www.youtube.com/watch?v=YG3EhWlBaoI @YouTube
The result is a rich vivid detailed description of the video that is entirely fabricated from whole cloth that bears no resemblance to the actual video. Yet, look more closely at the hallucinated details below and you'll see that Gemini Pro has learned to associate so strongly a specific set of visuals with rap music videos that it hallucinates those visuals when given nothing more than a set of rap lyrics.

The music video for "2024" is set in a dark and futuristic nightclub. The video opens with Playboi Carti sitting on a throne surrounded by scantily clad women. He is then shown rapping and dancing throughout the club, often surrounded by flashing lights and strobes. The video cuts to scenes of Carti riding in a luxury car and partying with his friends. Throughout the video, there are also scenes of violence and drug use.

Here are some specific details about the visuals:

    • The nightclub is dark and has a lot of neon lights.
    • The women in the video are dressed in revealing clothing.
    • Carti is often shown wearing sunglasses and a lot of jewelry.
    • The car that Carti is riding in is a black Lamborghini.
    • There are scenes of people smoking and drinking in the video.
    • There is a scene of a fight breaking out in the club.

Overall, the visuals in the music video for "2024" are dark, sexy, and violent. They match the tone of the song, which is about Carti's success and lifestyle.

I hope this is helpful!