The GDELT Project

Visualizing A Decade Of Twitter's Evolution: Jan 2012 – Mar 2022

Each year we publish a "State of the Twitterverse" that looks back at how Twitter has evolved since the start of 2012 (when we first began analyzing Twitter in-depth). Last year's analysis examined the impacts of Covid-19 and the 2020 US presidential election on Twitter's growth. In a major milestone, this year marks one full decade of Twitter's evolution in our graphs, showing how the platform has existentially changed over the past 10 years, especially how it has transformed in the years since its growth leveled off in 2013 and the lasting impact of Twitter's brief 2020 election "friction" misinformation experiment, which appears to have nearly reversed its pandemic gains.

Tweets Per Day

Twitter's growth leveled off in mid-2013, then entered a period of slow decline, before reversing in early 2019. The early pandemic lockdowns yielded explosive growth for Twitter almost overnight in March 2020, as daily tweet volume rose almost to its July 2013 levels, just short of 500 million tweets a day. All of this growth collapsed in a single day, October 22, 2020, coinciding with Twitter's rollout of added "friction" to retweeting, part of the company's efforts to combat election misinformation. While tweet volume has since recovered slightly, it has never recovered to its pandemic highs, suggesting the change, while rolled back on December 16, 2020, permanently damaged Twitter's pandemic-fueled resurgent growth.

On a typical day today, around 300 million distinct users send around 400 million tweets.

Zooming to the period from January 2020 to March 2022, the pandemic rise and election 2020 fall of Twitter's tweet volume can be seen clearly.

Verified Users

Only a very small fraction of Twitter users have the coveted "verified" blue checkmark beside their name and they collectively account for just a fraction of a percent of all tweets sent each day.

However, the story changes dramatically when retweets of those verified users' tweets are included. If both tweets by verified users and all retweets by ordinary users of those verified tweets are included, the combined total is around 8-10% of the entire daily tweet volume, despite verified users making up less than 0.4% of all Twitter users. This shows the outsized impact this tiny group of users has on the overall Twitter discourse.

Notably, there has been a permanent decline in the reach of verified users, from around 10% of all daily tweets towards the end of 2020 to around 8% today.

Zooming into the period from 2020 to present, the combined percentage of the Twitterverse that was either a tweet by a verified user or a retweet of a verified tweet by an ordinary user, collapsed in the aftermath of the 2020 election. On November 9, 2020, two days after Biden was called as the winner of the presidential election, retweeting of verified users collapsed, continuing to decline through November 14th. Retweeting of verified users never recovered and has remained remarkably stable in the year and a half since – the most stable it has been in the past decade.

Retweets

A decade ago, Twitter was a place to hear from users themselves, with just 20% of daily tweets being retweets. By late 2018, more than half of all tweets were retweets, as Twitter became an echo chamber of amplification. The impact of Twitter's "added friction" experiment is starkly visible, though retweet volume has recovered to just under its pre-2020 election levels.

Replying To Other Users

How often are tweets a reply to another user, suggesting a conversation? Replies decreased from just over a quarter of all tweets in 2012 to a low point of around 15% in 2017, back to a record high in 2020 and back to its level from a decade ago by the start of this year. Still, on a given day this year, just over a quarter of tweets are replies, suggesting that nearly three quarters of tweets are simply users shouting into the fray, rather than engaging in conversations.

Tweets Mentioning Other Users

What percentage of all tweets mention another Twitter user in any way? From around 55% of all tweets in 2012, mentions steadily increased until 2017, when they accelerated rapidly until the start of the pandemic, where they leveled off. Mentions collapsed with Twitter's added retweet "friction" and while they have recovered, they recovered by January 2021, today they account for just over three-quarters of all tweets. Given that replies account for just over 25% of all tweets, this suggests that the majority of tweets mentioning another user are the Twitterverse equivalent of shouting another person's name in the hopes they hear you, rather than a dialogue between users.

Comparing Retweets, Mentions And Replies

The graph below overlays the three, showing their convergent and divergent trajectories.

Tweeting Of Links

Twitter quickly rose to prominence as a major way to draw attention to information published elsewhere, through the sharing of links. Link-sharing peaked in 2017, then entered a period of decline through late 2020, before recovering through late 2021 and leveling off since at around 40% of all daily tweets. This suggests that a majority of daily tweets are self-contained, communicating their intended message without requiring the user to visit an external site.

Hashtag Use

Hashtags were once heralded as the soon-to-be ubiquitous way the digital world would be organized. While popular on other platforms, they never took off on Twitter, accounting for just around 17% of all tweets today.

The Total Daily Size Of The Twitterverse In Characters & Bytes

Just how big is the Twitterverse?

The size of the average tweet in terms of characters and bytes (in the case of multibyte charactersets) after removing retweets, username references and hyperlinks has remained relatively stable over time. From late 2018 through late 2020, the average bytes per tweet dropped sharply, alongside a drop in average character length, suggesting tweets were becoming smaller, but also a shift away from multibyte emoji, which represent a single character, but multiple bytes. Since late 2020, both average character size and average byte size have increased, with byte size increasingly dramatically, suggesting either increased tweeting in multibyte languages or increased used of multibyte characters like emoji.

What about the sum total of the entire Twitterverse?

The timeline below shows the total number of bytes of content posted to Twitter each day after removing retweets, username references and hyperlinks, showing that on a typical day in 2022 just 12.8GB of text is posted to Twitter – around the same as it was just over a decade ago in early 2012. Even with the record-high number of tweets during the pandemic, the platform hovered at just over 15GB of novel content per day, around its November 2015 levels, far removed from its mid-2013 peak and suggesting that pandemic-era tweets were shorter and consisted more of retweets, user mentions and hyperlinks.

Notably, while the total number of daily tweets recovered at the start of the pandemic to their mid-2013 levels, the total volume of novel tweeted content each day is far lower than it was in 2013. This is because retweets constitute a far higher portion of Twitter's daily volume than they did at its previous peak.

Importantly, while Twitter is often held up as a quintessential "big data" dataset, the total volume of novel text published on it each day is relatively small compared with other textual datasets.

The Dominance Of English Twitter

What percentage of all tweets are in English? There was a sharp decline from 2012 through early 2015, a reassurance through early 2016 and then a stable and very slow decrease over time ever since, with around 43% of tweets today being in English. The pandemic appears to have led to a noticeable increase in English tweets, though even this increase places it only back early 2017 levels. Note that these results were computed using CLD2 and exclude cases where no language could be determined (such as a tweet that contains only a hyperlink).

Average Age Of Twitter's User Accounts

Given that Twitter is not growing, its fixed user volume could be explained either by little change in its user base from its early days or by rapid churn in which older users are constantly replaced by newer users. The timeline below shows that the average tweeting account age continued to grow linearly through mid-2018, but leveled off through 2020, suggesting there was an influx of new users over that period. The pandemic era shows a sharp decline in average user age, suggesting a much larger number of newer users have been tweeting.

Median account age has remained stable since the start of 2021, suggesting a steady replacement rate of new users.

What about the median account creation year over time? Again, these numbers suggest a newer cohort of users have begun tweeting more heavily during the pandemic.

The Geography Of Twitter

Finally, what about the geography of Twitter?

What percentage of all tweets and Twitter users provide a non-null "Location" field? This number continues its decline since mid-2018, to around 51% from around 70% in 2012 and just over 60% in 2018. Given that the location field is simply a user-provided textual entry box that is not validated and can contain any value, including fictional locations like Rivendell, it suggests that Twitter's user base believes that location plays less and less of a role.

What percentage of all tweets were geocoded with either a GPS coordinate, Place location or GPS coordinates in the Location field and thus precisely mappable using verified coordinates? (Geocoding the user-provided Location field increases the number of mappable tweets, but since this textual location is user-definable is it not sensor-verified. Twitter itself cautioned in 2009 that “since anything can be written in this field, it’s interesting but not very dependable.”) The percentage continues to collapse, suggesting that the era of "Mappable Twitter" continues to close.

The density of such coordinates fell especially sharply during the pandemic, dropping more than a full percentage point since the start of lockdowns. This suggests users disabled location tracking when tweeting from home. A sharp drop in January 2022 appears to have subsided, but today just around 0.65% of tweets are directly mappable.

 

The numbers above reflect a combination of centroid and GPS-tagged tweets. Centroid-level "Place" locations merely report the fixed centroid of an entire city. Thus, every tweet labeled as "New York City" will be assigned the same centroid coordinate. Yet, billions of tweets that all report the same coordinate of the centroid of New York City render mapping attempts futile. In contrast, precise GPS coordinates yield the beautiful and intricate maps we've come to associate with Twitter, tracing transportation corridors and city life.

What percentage of tweets have included GPS coordinates over time? By and large, the era of precision Twitter mapping has come to an end. Today around 0.08% of tweets have GPS coordinates, with a visible dropoff as Covid-19 lockdowns began. Such tweets continue to slowly and steadily decline as a percentage of the Twitterverse, suggesting that precision Twitter maps become less and less populated each day.

Statuses

Every time a user tweets, the total number of posts ("statuses") they have sent up until that point is recorded. This offers a glimpse at the "prolificness" of users over time. Looking at the daily mean status count of users tweeting each day, it levels off in late 2015, then begins to become more variable and bursty from late 2017.

 

Means status counts will be overwhelmed by highly prolific users, so the median offers a better understanding of the user-scale trends. This shows a rise through mid-2017, a decline from late 2019 that turns into a collapse from March 2020 through mid-May 2020, then rising through mid-2021 and declining since the start of this year.

Followers

Looking at the average daily follower count of all users tweeting each day, it rises through late 2017, then declines through late 2020 and then rises again.

Since mean follower counts will be overwhelmed by highly-followed users, the median daily follower count shows an increase through mid-2015, then a decline through mid-2020, a rise through late 2020 and a decline through present.

Conclusion

For more detail on the methodology and an explanation of the implications of each of the graphs above, see the original study from March 2019: Visualizing Seven Years Of Twitter's Evolution: 2012-2018.