The Achilles Heel Of GPT-4 As Data Scientist: It Writes Code But Fails As Multimodal Or Data Analyst

Multimodal generative AI systems like GPT-4, also known as Large Multimodal Models (LMMs), are widely touted in many evangelist communities as drop-in replacements for corporate and governmental data scientists. Splashy mediagenic demonstrations abound on social media and in marketing decks showcasing nearly unbelievable examples of one-click data science, with GPT-4 being handed a collection of graphs and instantly analyzing them, returning a rich report summarizing and breaking down all of the key trends and findings from them and their deeper meaning and frequently vastly outperforming their human analyst counterparts. To what degree are these examples reflective of the reality of the kinds of real-world performance achieved on real-world charts capturing real-world data that real-world organizations deal with on a daily basis?

Here we'll examine a trivial timeline chart from the TV Explorer that plots daily mentions of Gaza vs Ukraine since the start of last year. We have purposely chosen as trivial as possible of a chart to give GPT-4 every chance possible to correctly analyze it, while reflecting the kind of real-world media charts we deal with everyday. To examine GPT-4's analytical capabilities, we'll test three approaches: multimodal, data analysis and data analysis + interpretation.

The multimodal analysis leverages GPT-4's much-touted multimodal computer vision capabilities in which we provide it the TV Explorer's timeline graph that we use in our articles and ask it to analyze it. Despite multiple attempts across two different versions of the graph and a wide array of prompting strategies, we were unable to get correct results, with GPT-4 returning wildly different descriptions of the graph's findings that were utterly wrong and changed with each prompt. Bizarrely, the most advanced prompts that incorporate all of OpenAI's prompt engineering recommendations actually yield some of the worse results, offering a poignant reminder that prompts are not magical spells that can be strung together to yield successively-better results.

The second analysis tests GPT-4's Python writing and execution data analysis skills. Rather than handing GPT-4 the timeline itself, we provide it with the underlying CSV spreadsheet used to produce the timeline. Instead of analyzing the spreadsheet itself, GPT-4 parses the prompt and autonomously writes a small Python script entirely on its own that transforms the textual questions in the prompt into actual Python programming code. It then executes the code and writes up the results in a simple textual narrative. This mode yielded the correct results, as GPT-4 needed only transform the basic tasks into simple Python code and execute it and then simply return the results. In this way GPT-4 can also act as a "copilot" for a human data analyst, helping them transform standard data analysis tasks into the underlying Python code to run them on the analyst's specific dataset. At the same time, GPT-4 is merely writing Python code and returning the results as-is, such as reporting the peaks, troughs, sums, etc, but is notably not interpreting those results in any way.

The third analysis tested GPT-4's ability to actually interpret the results above. Rather than merely asking the model to report the dates and values of the peaks in the data and series sums as-is, we asked it to comment on the overall trends of the time series. Here GPT-4 failed once again, consistently reporting that Israeli mentions increased linearly and Ukraine mentions were steady with no major change throughout the entire period.

The end result is that, far from the overhyped superhuman "data scientist in a box", GPT-4's ability to analyze even a trivial dataset like this is limited to translating textual commands into Python code and executing them – anything more and it fails miserably.

GPT-4's Multimodal Native Analysis

 

Let's start with a simple prompt that doesn't tell the LLM anything about the image and asks it to craft an ALT tag. This tests the kind of ingest workflow that might be seen in a production CMS environment in which images are uploaded out of context:

Write a rich ALT description of this image.

The resulting text correctly describes a timeline with two series and correctly ascribes their keywords and channels. However, while correctly describing the initial peak of the Ukraine series, it fails to similarly describe the Israeli series as peaking. Critically, it even summarizes that Gaza failed to attract significant attention with "coverage or events caused less frequent but notable spikes in attention or action", yielding a false summary of the graph. Under this ALT text we would understand the graph to show Ukraine receiving substantial attention, but Gaza little.

The image is a line graph with two data sets represented in two different shades of blue. The x-axis represents time, with dates ranging from January 1, 2022, to December 2, 2023. The y-axis represents a quantity scale that goes from 0 to 8K (likely standing for thousands).

Two lines track the frequency or quantity of something over time. The darker blue line, with a label that includes terms like "Ukraine," "Ukrainian," "Kyiv," or "Kiev," and mentions media outlets like CNN, FOXNEWS, and MSNBC, shows a high peak early in 2022, then a gradual decline with some fluctuations over time. It suggests that the subjects related to Ukraine and Kyiv were mentioned frequently in the media around the early part of 2022, with the interest or events causing mentions fluctuating but generally declining over time.

The lighter blue line, labeled with terms such as "Israel," "Israeli," "Palestinian," "Palestinians," "Gaza," and the same media outlets, shows a generally lower level of frequency or quantity than the Ukraine-related line, with several distinct peaks indicating periods of increased mentions or activity. This line suggests that these topics received sporadic attention with specific events causing spikes in mentions throughout the given timeline.

The graph could be indicative of media coverage, public interest, or the occurrence of events related to Ukraine/Kyiv and Israel/Palestine as reported by the named news organizations. The data suggests that there were significant events at the start of 2022 affecting Ukraine that caught media attention, with continued but lesser coverage throughout the two years. For Israel and Palestine, coverage or events caused less frequent but notable spikes in attention or action throughout the same period.

What if we ask for additional detail?

Write a rich ALT description of this image that describes it in detail.

This time it does even worse, claiming that the Israeli peak is at the start of the timeline, when that is actually the Ukrainian peak. It assesses the Israeli near-7K peak as being the Ukrainian peak, but labels it correctly as being at the beginning of the timeline. Under this ALT text, we would understand both Israel and Gaza to have peaked near each other in early 2022:

The image is a line graph with a horizontal time axis spanning dates from January 1, 2022, to December 31, 2023. The vertical axis represents quantity, with markers from 0 to 8K, probably indicating the number of mentions or frequency of occurrence.

There are two lines on the graph, one labeled with search terms related to Israel, such as "Israel," "Israeli," "Israelis," "Palestinian," "Palestinians," "Gaza," and associated with CNN, FOXNEWS, and MSNBC. This line fluctuates with various peaks and troughs throughout the time period. The highest peak is just below 6K, occurring near the start of the time range.

The second line represents search terms related to Ukraine, such as "Ukraine," "Ukrainian," "Kyiv," or "Kiev," also in relation to the mentioned news stations. This line shows a significant peak early in 2022, reaching just above 7K, indicating a high frequency of occurrence or mentions. After this peak, the line trends downward significantly, with smaller peaks that do not exceed 2K for the remainder of the time frame.

Both lines exhibit periodic spikes, suggesting intermittent increases in mentions or relevance in the news cycle. The background of the graph is white, and the lines are in different shades of blue for clear distinction. The lines are quite jagged, indicating daily or frequent data points. There is a legend or a label box overlapping the lines at the top of the graph, which is partially obscuring the view of the lines behind it. The exact dates of peaks and troughs are not visible due to the resolution of the image.

What if we get even more specific with our instructions and use all of the usual recommendations like "explain yourself" and "reason step-by-step"?

Describe each time series in this timeline in detail, telling me what they represent and what they show, including any peaks and the time they occurred and their overall pattern. Explain yourself and your reasoning step-by-step.

Even with all of these instructions, it still fails, reporting that the Israeli peak is in the beginning of the timeline alongside that of Ukraine:

The image depicts a timeline with two distinct time series, each representing the frequency of mentions or occurrence of specific search terms in relation to major news outlets (CNN, FOXNEWS, and MSNBC).

**First Time Series (Israel-Related Terms):**
– **Representation**: This line tracks the combined frequency of mentions for terms related to Israel and Palestine.
– **Terms Included**: "Israel", "Israeli", "Israelis", "Palestinian", "Palestinians", "Gaza".
– **Overall Pattern**: This series shows less volatility compared to the Ukraine-related series. The frequency remains generally lower, with periodic spikes that suggest events or incidents that drew media attention.
– **Peaks**: The highest peak for this series is just below 6K, indicating a significant event or heightened media coverage around that time. The exact date of this peak isn't clear from the image, but it appears to be in the first quarter of the timeline, possibly in early 2022.
– **Trends**: The series does not show a declining or increasing trend but rather a consistent pattern of occasional spikes.

**Second Time Series (Ukraine-Related Terms):**
– **Representation**: This line represents the frequency of mentions for terms related to Ukraine.
– **Terms Included**: "Ukraine", "Ukrainian", "Kyiv", "Kiev".
– **Overall Pattern**: The Ukraine-related series shows a significant peak early in the timeline, which rapidly declines but then continues with periodic smaller spikes. This pattern suggests a major event that caused an initial surge in mentions, followed by continued but lesser media focus over time.
– **Peaks**: The most prominent peak surpasses 7K around the start of the timeline, indicating a major event related to Ukraine that caught the media's attention, such as a political crisis, conflict, or another significant incident. After this initial surge, subsequent peaks do not reach the same level, suggesting that while Ukraine remained in the news, the level of focus diminished.
– **Trends**: After the initial surge, there is a clear downward trend, indicating decreasing mentions or relevance in the news cycle over time. However, the presence of smaller peaks indicates that certain events or developments caused temporary increases in media coverage.

**Reasoning for Analysis**:
– **Data Source and Nature**: The terms are combined with major news networks, suggesting the data represents the frequency of these terms being mentioned in the news coverage by these outlets.
– **Peaks**: Peaks in a time series often correspond to events of high interest or importance. The size of the peak can indicate the level of urgency or the public's interest in the event.
– **Trends**: The overall trend of a time series can provide insight into whether an issue is gaining or losing media attention over time.

**Limitations of Analysis**:
– The specific details of the events causing the peaks are not provided in the image, so one must infer based on the terms and the known historical context around those dates.
– The exact dates and numerical values of the peaks are not readable from the image provided, so the analysis can't pinpoint the precise timing or scale of the events.

To obtain a more detailed and accurate interpretation, we would need a higher-resolution image or access to the underlying data set.

What is it that might be confusing the model? Perhaps it is the series labels that appear in the graph? Such labels are common in many domains, but less common in the generic templated Excel timelines that so dominate on the web and which likely comprise a lot of the web-scraped training data for LMMs. Let's try simply removing those two series labels from the graph, but otherwise leaving the graph as-is:

Despite removing the labels, we get even worse results this time. The model correctly recognizes that the light blue time series represents Gaza, but falsely claims that its peak is towards the beginning of the timespan and that it then falls and remains low for the remainder of the graph other than brief spikes. This time, despite recognizing that the dark blue series represents Ukraine, it falsely claims that its peak is towards the end of the timeline. Noticeably, for the first time it correctly recognizes that one series peaks early and the other late, but swaps their labels:

The image is a line graph with two lines of different shades of blue, plotted over a time series from January 1, 2022, to around the end of 2023. The horizontal axis represents time, with dates formatted as month/day/year, and the vertical axis represents the frequency of mentions, with a scale from 0 to 8K (presumably 'thousands').

The first line, which is a lighter shade of blue, corresponds to the frequency of mentions combining several keywords: "Israel," "Israeli," "Israelis," "Palestinian," "Palestinians," and "Gaza," within the context of certain media outlets, namely CNN, Fox News, and MSNBC. This line shows spikes in mentions with the highest peak reaching just above 6K, occurring early in the time series, and then it sharply declines. There are several smaller peaks throughout the time period, indicating periodic increases in mentions.

The second line, in a darker shade of blue, represents the frequency of mentions of keywords related to Ukraine, including "Ukraine," "Ukrainian," "Kyiv," or "Kiev," combined with the same media outlets. This line exhibits more volatility with numerous peaks and troughs. The highest peak surpasses 7K mentions and occurs in the latter half of the time series, with other notable spikes and activity throughout.

Both lines show fluctuations over time, which could reflect the coverage of events related to these regions in the specified media outlets. The background is white, and the graph is labeled in the top right corner with the keywords and media outlets mentioned. The labels for the lines correspond to the color of the line they describe, with the lighter blue label for the Israel/Palestine related terms, and the darker blue label for the Ukraine related terms. The graph does not have a legend, instead, the information is directly labeled on the graph lines.

What if we apply our most advanced prompt to this modified graph, does this finally fix the problem?

Describe each time series in this timeline in detail, telling me what they represent and what they show, including any peaks and the time they occurred and their overall pattern. Explain yourself and your reasoning step-by-step.

Even removing the series labels and using our most advanced prompt that incorporates all of the standard prompt best practices and OpenAI's prompt engineering recommendations, it still fails, reporting that that the Israeli peak occurred at the beginning of the graph and that no other period comes close, while for Ukraine it reports that its its highest peak occurs somewhere in late 2022 and 2023, with the majority of major spikes in that same time period, with a smaller but measurable peak in mid-2022. This is therefore the worse of all of the outputs, completely failing on all counts:

The image shows two time series represented by lines on a graph, each tracking the frequency of mentions of specific keywords on major news stations CNN, Fox News, and MSNBC over a nearly two-year period.

**Light Blue Line: Israel/Palestine Related Keywords**
– **Keywords Tracked**: Israel, Israeli, Israelis, Palestinian, Palestinians, Gaza
– **Initial Peak**: The first notable peak occurs at the beginning of the timeline, around early 2022, where mentions shoot up to just above 6,000 before quickly declining.
– **Pattern**: After the initial peak, the line shows a sharp decline, settling to a baseline frequency of mentions that fluctuates but generally remains well below 2,000.
– **Subsequent Activity**: There are multiple smaller peaks throughout the timeline, indicating periods of increased news coverage. These peaks are much lower than the initial one, suggesting that while the topic remains in the news, it does not reach the same level of coverage intensity as it did at the beginning of 2022.
– **End of Timeline**: Towards the end of 2023, there is a modest increase, but it does not reach the heights of the initial spike.

**Dark Blue Line: Ukraine Related Keywords**
– **Keywords Tracked**: Ukraine, Ukrainian, Kyiv, Kiev
– **Overall Pattern**: The dark blue line is more volatile, with numerous peaks and troughs throughout the timeline, reflecting periods of varying intensity in news coverage.
– **Significant Peaks**:
– **Mid-2022**: A notable peak occurs, though it is not as high as the largest peaks. This suggests a period of increased news coverage, possibly due to a significant event.
– **Late 2022 and 2023**: There are several significant spikes, with the highest surpassing 7,000 mentions. These spikes are particularly prominent, indicating major news events related to Ukraine that caught the attention of these media outlets.
– **End of Timeline**: The frequency of mentions declines towards the end of 2023 but remains higher than the baseline level for the Israel/Palestine-related coverage, suggesting ongoing news events of interest related to Ukraine.

**Analysis and Reasoning:**
– The peaks in the graph likely correlate with specific events of high interest or importance related to the regions and topics mentioned. For instance, the early 2022 peak in the Israel/Palestine related line might coincide with significant political or military events in that region.
– The more frequent and higher peaks on the Ukraine line suggest a series of events that consistently drew media attention, which could be related to ongoing conflicts or other significant developments in Ukraine.
– The overall higher frequency of peaks in the Ukraine line as compared to the Israel/Palestine line suggests that, during this period, Ukraine was a more consistent focus in the news cycle than Israel/Palestine.
– The reason for the detailed analysis of each peak and trough would require additional context not provided by the graph alone, such as correlating the dates of the peaks with historical events. The graph provides a quantitative measure of media attention but not the qualitative reasons behind it.

To fully interpret the meaning behind these peaks and the overall patterns, one would need to cross-reference these dates with historical events or news archives to understand what specific events led to these spikes in media coverage.

How about an even more advanced prompt that breaks things down step-by-step for the model?

Count how many time series there are in this graph. Describe what each time series represents. Then tell me the date where each time series has its highest peak and its value. Then tell me any other major dates where it peaks or collapses and the value there. Then tell me the overall pattern/trend of each series. Think step-by-step and explain your reasoning.

Despite the addition of nearly every major prompt engineering recommendation for LMMs overall and for GPT-4 specifically, including those by OpenAI itself custom designed for the model, it fails miserably once again, this time performing even worse than before:

There are two time series represented in the graph:

**1. Light Blue Line: Israel/Palestine Related Keywords**
– **What It Represents**: This line tracks the frequency of mentions of terms related to Israel and Palestine (Israel, Israeli, Israelis, Palestinian, Palestinians, Gaza) on major news stations (CNN, Fox News, and MSNBC).
– **Highest Peak**: The highest peak for this line occurs at the start of the timeline, around January 2022. The exact date isn’t visible, but it appears to be in early January, with a value just above 6,000 mentions.
– **Other Significant Points**:
– After the initial peak, the line shows a sharp decline to below 2,000 mentions, suggesting a significant drop in media coverage or interest.
– The line has multiple smaller peaks at various points, but without specific dates and values visible on the graph, they cannot be quantified precisely.
– **Overall Pattern/Trend**: The trend for this line shows an initial burst of coverage, followed by a decline and then a pattern of lower-level fluctuations. This could indicate an event of significant interest related to Israel/Palestine at the beginning of 2022, followed by a return to more normal levels of news coverage.

**2. Dark Blue Line: Ukraine Related Keywords**
– **What It Represents**: This line tracks the frequency of mentions of terms related to Ukraine (Ukraine, Ukrainian, Kyiv, Kiev) on the same news stations.
– **Highest Peak**: The highest peak is in the latter half of the time series, which seems to be late in 2023. The exact date is not clear, but the value surpasses 7,000 mentions.
– **Other Significant Points**:
– There are several notable spikes in coverage throughout the timeline. For instance, there's a peak that appears to be in late 2022 or early 2023, with a value around 4,000 mentions.
– After the highest peak, the line shows a sharp decline, suggesting a drop in coverage.
– **Overall Pattern/Trend**: The Ukraine-related coverage is characterized by volatility with many peaks, indicating multiple events causing spikes in news coverage. The highest peak late in 2023 suggests a particularly newsworthy event or development related to Ukraine.

**Analysis and Reasoning**:
– The graph indicates two distinct patterns of media coverage for the two sets of keywords. The Israel/Palestine-related coverage has an initial spike, indicating a significant event or series of events that caused a surge in media attention, followed by less frequent and lower-level coverage.
– The Ukraine-related coverage shows more frequent and higher peaks, suggesting ongoing events that consistently drew media attention over the period.
– The specific reasons for the peaks would relate to political, social, or military events, but without additional context, the graph only informs us about the frequency of media mentions, not the causes.

To identify the exact dates and values of each peak or collapse, and to understand the causes behind these patterns, one would need to look at the raw data used to create the graph or have access to news archives from the period in question. The graph provides an overview but does not allow for precise analysis without additional information.

GPT-4's Data Analysis Capabilities (Python Writing & Executing)

Clearly, the multimodal capabilities of GPT-4 to visually examine a timeline leave much to be desired and are far too brittle to be useable in the real world, yielding false results in every single example we tested here. Yet, GPT-4 has another trick up its sleeve: autonomous code writing and execution of Python for guided data analysis. Rather than attempt to analyze the data itself, GPT-4 merely translates the prompt into Python code, executes it on the data and then summarizes the results. Does that yield better results?

Instead of the TV Explorer's natively generated timeline visualization, we'll download the table underlying the chart via the CSV export feature, which yields:

We drag-drop this only GPT-4 (this required multiple attempts, as the first dozen or so times it errored and reported that it does not support CSV files) and use the following prompt:

Count how many time series there are in the spreadsheet. Describe what each time series represents. Then tell me the date where each time series has its highest peak and its value. Then tell me any other major dates where it peaks or collapses and the value there. Then tell me the overall pattern/trend of each series. Think step-by-step and explain your reasoning.

Rather than attempt to analyze the spreadsheet itself, GPT takes the prompt and translates it into actual Python code, which it then seamlessly executes against the spreadsheet:

# Identify the number of time series and their descriptions
time_series_count = len(data.columns) - 1  # Subtracting 1 for the 'Date' column
time_series_descriptions = data.columns[1:]

# Describing each time series
description_of_each_series = {col: f"Time series '{col}' represents the frequency of terms related to a specific geopolitical topic in media outlets CNN, FOXNEWS, and MSNBC." for col in time_series_descriptions}

# Finding the highest peak date and value for each time series
peak_dates_and_values = {}
major_peaks_and_collapses = {}

for series in time_series_descriptions:
    # Finding the peak date and value
    peak_value = data[series].max()
    peak_date = data[data[series] == peak_value]['Date'].iloc[0]
    peak_dates_and_values[series] = {'Peak Date': peak_date, 'Peak Value': peak_value}

    # Finding other major peaks and collapses
    # Here, we consider major peaks and collapses as the top 5 highest and lowest points, excluding the absolute peak
    sorted_series = data.sort_values(by=series, ascending=False)
    top_peaks = sorted_series.head(6)
    top_collapses = sorted_series.tail(5)
    major_peaks_and_collapses[series] = {
        'Other Major Peaks': top_peaks[['Date', series]].iloc[1:].to_dict(orient='records'),
        'Major Collapses': top_collapses[['Date', series]].to_dict(orient='records')
    }

# Analyzing the overall trend of each series
# We will use a simple approach of comparing the start, midpoint, and end values to determine if the trend is increasing, decreasing, or stable
overall_trends = {}
mid_index = len(data) // 2

for series in time_series_descriptions:
    start_value = data[series].iloc[0]
    mid_value = data[series].iloc[mid_index]
    end_value = data[series].iloc[-1]

    if start_value < end_value and mid_value < end_value: trend = 'Increasing' elif start_value > end_value and mid_value > end_value:
        trend = 'Decreasing'
    else:
        trend = 'Stable or Fluctuating'
    
    overall_trends[series] = trend

description_of_each_series, peak_dates_and_values, major_peaks_and_collapses, overall_trends

This yields the following output:

({'(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': "Time series '(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined' represents the frequency of terms related to a specific geopolitical topic in media outlets CNN, FOXNEWS, and MSNBC.",
  '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': "Time series '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined' represents the frequency of terms related to a specific geopolitical topic in media outlets CNN, FOXNEWS, and MSNBC."},
 {'(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': {'Peak Date': '10-08-2023',
   'Peak Value': 6494},
  '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': {'Peak Date': '02-24-2022',
   'Peak Value': 4679}},
 {'(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': {'Other Major Peaks': [{'Date': '10-09-2023',
     '(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 6328},
    {'Date': '10-14-2023',
     '(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 5959},
    {'Date': '10-15-2023',
     '(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 5773},
    {'Date': '10-13-2023',
     '(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 5729},
    {'Date': '10-10-2023',
     '(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 5571}],
   'Major Collapses': [{'Date': '06-16-2022',
     '(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 0},
    {'Date': '08-03-2022',
     '(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 0},
    {'Date': '08-04-2022',
     '(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 0},
    {'Date': '06-08-2022',
     '(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 0},
    {'Date': '08-18-2022',
     '(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 0}]},
  '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': {'Other Major Peaks': [{'Date': '02-26-2022',
     '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 4472},
    {'Date': '02-25-2022',
     '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 4338},
    {'Date': '03-06-2022',
     '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 3743},
    {'Date': '03-01-2022',
     '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 3722},
    {'Date': '03-10-2022',
     '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 3712}],
   'Major Collapses': [{'Date': '11-24-2023',
     '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 39},
    {'Date': '09-29-2022',
     '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 35},
    {'Date': '01-07-2022',
     '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 29},
    {'Date': '01-06-2022',
     '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 4},
    {'Date': '01-05-2022',
     '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 3}]}},
 {'(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 'Increasing',
  '(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined': 'Stable or Fluctuating'})

GPT-4 then examines this output and produces the following textual summary. Miraculously, for the first time in this entire series of tests, we get the correct results:

The spreadsheet contains two time series. Here's an analysis of each:

1. **Time Series: Israel/Palestinian in Media Outlets**
– **Description**: This series represents the frequency of terms related to Israel, Palestinians, and Gaza in CNN, FOXNEWS, and MSNBC.
– **Highest Peak**:
– **Date**: October 8, 2023
– **Value**: 6494 mentions
– **Other Major Peaks**:
– Oct 9, 2023: 6328 mentions
– Oct 14, 2023: 5959 mentions
– Oct 15, 2023: 5773 mentions
– Oct 13, 2023: 5729 mentions
– **Major Collapses**:
– Dates ranging from Aug 18, 2022, to Nov 24, 2023, with values dropping to 0 mentions.
– **Overall Trend**: Increasing trend over time.

2. **Time Series: Ukraine in Media Outlets**
– **Description**: This series tracks the frequency of terms related to Ukraine, Kyiv, and Kiev in the same media outlets.
– **Highest Peak**:
– **Date**: February 24, 2022
– **Value**: 4679 mentions
– **Other Major Peaks**:
– Feb 26, 2022: 4472 mentions
– Feb 25, 2022: 4338 mentions
– Mar 6, 2022: 3743 mentions
– Mar 1, 2022: 3722 mentions
– **Major Collapses**:
– Dates ranging from Jan 5, 2022, to Nov 24, 2023, with values as low as 3 mentions.
– **Overall Trend**: Stable or fluctuating, without a clear increasing or decreasing trend.

The analysis suggests that the media coverage of these geopolitical topics has varied significantly over time, with certain dates experiencing particularly high or low coverage. The Israel/Palestinian topic shows an increasing trend, possibly indicating escalating conflicts or media focus over time. In contrast, the Ukraine topic displays a more stable or fluctuating trend, reflecting variable media attention.

What about a simple question of which series received the most attention?

Which series in the spreadsheet received the most attention? Which had the highest peak?

Once again, this yields the correct results:

The time series related to Ukraine (including terms like "Ukraine", "Ukrainian", "Kyiv", "Kiev") in media outlets CNN, FOXNEWS, and MSNBC received the most overall attention, with a total of 391,469 mentions.

However, the highest peak was observed in the series related to Israel/Palestinian (including terms like "Israel", "Israeli", "Palestinians", "Gaza"), which had its highest peak at 6,494 mentions on a single day.

And provides the Python code it used:

# Finding the series with the highest overall attention and the highest peak
total_mentions_per_series = data.sum()[1:]  # Sum of mentions for each series, excluding the Date column
most_attention_series = total_mentions_per_series.idxmax()
most_attention_value = total_mentions_per_series.max()

highest_peak_series = max(peak_dates_and_values, key=lambda k: peak_dates_and_values[k]['Peak Value'])
highest_peak_value = peak_dates_and_values[highest_peak_series]['Peak Value']

most_attention_series, most_attention_value, highest_peak_series, highest_peak_value

And the output that code provided back to GPT-4:

('(Ukraine OR Ukrainian OR Kyiv OR Kiev) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined',
 391469,
 '(Israel OR Israeli OR Israelis OR Palestinian OR Palestinians OR Gaza) (station:CNN OR station:FOXNEWS OR station:MSNBC): Combined',
 6494)

What if we ask GPT-4 to visualize the timeline itself? Rather than attempt to render the graph itself, this will cause GPT-4 to write and execute a block of Python code to render the graph:

Create a timeline visualizing the spreadsheet.

This yields the accurate but fairly unremarkable and unmediagenic visual:

Critically, it provides the Python code it wrote to make the graph, offering the user a template they can tweak to achieve a more desirable visual if they know Python:

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

# Convert 'Date' column to datetime for plotting
data['Date'] = pd.to_datetime(data['Date'])

# Plotting
plt.figure(figsize=(15, 8))

# Plotting each time series
for series in time_series_descriptions:
    plt.plot(data['Date'], data[series], label=series)

# Formatting the plot
plt.title('Timeline of Media Coverage on Israel/Palestinian and Ukraine Topics')
plt.xlabel('Date')
plt.ylabel('Mentions')
plt.legend()
plt.grid(True)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(mdates.MonthLocator(interval=1))
plt.gcf().autofmt_xdate()  # Rotate date labels

plt.show()

What if we ask for something higher-quality?

Create a beautiful high-quality timeline visualizing the spreadsheet suitable for a magazine publication.

This isn't any better, suggesting that GPT-4 has not yet been tuned with example Python code to produce higher-quality publication-ready data visualizations:

As before, it outputs the code it used:

import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for the plot
sns.set(style="whitegrid")

# Create a figure and a set of subplots
fig, ax = plt.subplots(figsize=(15, 10))

# Plotting each time series with better aesthetics
for series in time_series_descriptions:
    sns.lineplot(x=data['Date'], y=data[series], label=series, ax=ax)

# Setting titles and labels
ax.set_title('Media Coverage Timeline: Israel/Palestinian and Ukraine Topics', fontsize=20)
ax.set_xlabel('Date', fontsize=15)
ax.set_ylabel('Number of Mentions', fontsize=15)

# Formatting the date axis
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
plt.xticks(rotation=45)

# Enhancing the legend
ax.legend(title='Media Topics', title_fontsize='13', fontsize='11')

# Adjusting layout
plt.tight_layout()

# Show the plot
plt.show()

GPT-4's Data Analysis Capabilities + GPT-4 Interpretation Of The Results

In each of the cases above, GPT-4 wrote Python code to perform the actual analysis and merely regurgitated them back to us. What if we ask it to actually interpret those results rather than merely return then as-is in textual form?

How about a generic question about which series s more interesting/unusual, asking it to perform some more free-form analysis?

Which time series is more interesting or unusual? Explain your reasoning step-by-step.

Unfortunately, now that GPT-4 itself is responsible for performing the analysis (rather than writing Python code to perform the analysis for it), we're back to where we were before, with it claiming that the Gaza series is increasing steadily over time and Ukraine merely moving slightly around a baseline with no major changes.

To determine which time series is more interesting or unusual, we need to consider several aspects of the data:

1. **Magnitude of Peaks and Troughs**:
– **Israel/Palestinian Series**: Exhibited a very high peak value of 6,494 mentions. Additionally, it shows periods where mentions drop to zero, indicating significant fluctuations.
– **Ukraine Series**: While having a high overall count, its highest peak was 4,679 mentions, lower than the Israel/Palestinian series. Its troughs, though notable, do not reach zero.

2. **Overall Trend**:
– **Israel/Palestinian Series**: Shows an increasing trend over time. This could indicate escalating tensions or increasing media focus, which is noteworthy.
– **Ukraine Series**: Exhibits a stable or fluctuating trend without a clear direction. This suggests consistent but varied attention over time.

3. **Consistency of Coverage**:
– **Israel/Palestinian Series**: Shows dramatic fluctuations, including periods with no coverage. This inconsistency is unusual and may point to specific events causing spikes in media attention.
– **Ukraine Series**: Appears to have more consistent coverage over time, albeit with fluctuations.

4. **Contextual Considerations**:
– **Israel/Palestinian Series**: The extreme peaks and drops may correspond to specific geopolitical events or crises, making it interesting from a news analysis perspective.
– **Ukraine Series**: While significant, the coverage seems more evenly spread, suggesting a steady stream of newsworthy events rather than sporadic, high-impact ones.

**Conclusion**: The Israel/Palestinian series is more interesting or unusual due to its higher peak, increasing trend, and dramatic fluctuations, including periods of no coverage. These characteristics suggest a more volatile media interest, potentially tied to specific, high-impact events. In contrast, the Ukraine series, while important, shows a more consistent pattern of media coverage.

What if we ask it to be a journalist and write the graph up as an article?

Write a brief news article publishing the findings of this graph to a general news audience.

While more verbose, the generated text merely repeats the same false findings as above that Gaza mentions steadily increase over the course of the data, while Ukraine remains constant:

**Headline:** Dramatic Fluctuations in Media Coverage: Israel/Palestinian Topics Peak, Ukraine Maintains Steady Attention

**Byline:** [Your Name], Data Analysis Correspondent

**Date:** [Today's Date]

**Body:**

In a revealing analysis of media focus, a recent study has unveiled the fluctuating patterns of media coverage on critical geopolitical issues, specifically regarding the Israel/Palestinian conflict and Ukraine. The study, which analyzed mentions in major news outlets CNN, FOXNEWS, and MSNBC, offers a unique glimpse into how media attention ebbs and flows in response to global events.

The data, spanning from early 2022 to late 2023, illustrates a stark contrast in coverage patterns. The Israel/Palestinian topic, involving key terms like Israel, Palestinian, and Gaza, displayed a highly volatile media presence. It reached an astonishing peak of 6,494 mentions in a single day, eclipsing the Ukraine series, which peaked at 4,679 mentions. This peak, coinciding with significant dates in the region, highlights moments of intense global focus.

However, it's not just the peaks that draw attention. The Israel/Palestinian series also showed periods of complete silence, with zero mentions on some days. This rollercoaster pattern of attention is indicative of the sporadic yet intense nature of media coverage on this issue. In contrast, the Ukraine series, encompassing terms like Ukraine, Kyiv, and Kiev, showed a more consistent level of media attention. Despite fluctuations, it maintained a stable presence in the news cycle, reflecting ongoing international interest in the region.

The overall trend further differentiates the two series. While the Israel/Palestinian series shows an increasing trend over time, indicating escalating tensions or growing media scrutiny, the Ukraine series demonstrates a stable or fluctuating pattern. This difference might suggest that the media's approach to the Israel/Palestinian conflict is reactive, surging around specific events, whereas coverage of Ukraine is more evenly distributed, possibly due to a continuous stream of newsworthy developments.

In conclusion, this data-driven analysis offers a fascinating lens into the world of media coverage. It underscores the impact of geopolitical events on news focus and highlights the varying nature of media attention across different global issues. As the world continues to grapple with complex international dynamics, the role of media in shaping public perception remains as crucial as ever.

**End of Article.**

(Note: Replace "[Your Name]" with the author's name and "[Today's Date]" with the actual date of publication.)