Many recent papers have shown that social media has changed society, but the power of Twitter goes far beyond its impact on its users. This column reports on a new research project, relying on nearly two billion tweets and an innovative empirical approach, which shows that not only does Twitter set the agenda of media coverage in a quantitatively meaningful way, it also influences mainstream media due to short-term considerations generated by advertising revenue-bearing clicks. The findings also suggest that journalists’ reliance on Twitter might distort the information they produce compared to what citizens actually prefer.
In April 2022, tech billionaire Elon Musk tried to buy Twitter, saying the social media company needs to be transformed privately. Notably, the founder of PayPal, Tesla, and SpaceX argues that he wants to restore free speech on the platform. Many have since pointed out – rightly – that Musk’s track record with free speech is problematic to say the least. Yet, there is another reason why Musk’s acquisition of Twitter may put democracy at risk: by controlling the platform, the self-described “free speech absolutist” will also influence the mainstream media agenda.
Many recent papers have shown that social media has changed society (e.g. Fujiwara et al. 2021, Levy 2021). But the power of Twitter goes far beyond its impact on its users. In a new research project, relying on nearly two billion tweets and an innovative empirical approach, we quantify what many long suspected – that Twitter affects publishers’ production and editorial decisions (Cagé et al. 2022).
To do so, we proceed in three steps. First, we collect a representative sample of all the tweets produced in French between August 2018 and July 2019 and combine it with the content published online by all the mainstream media outlets (encompassing newspapers, television channels, radio stations, pure online media, and news agencies’ dispatches). Our dataset, which contains around 1.8 billion tweets, encompasses around 70% of all the tweets in French (including retweets) during this time period. Figure 1 plots the daily distribution of the number of tweets.
Figure 1 Daily distribution of the number of tweets in the sample
Notes: The figure plots the daily number of tweets included in our dataset. The red line plots all the tweets, the blue dotted line shows these tweets once we apply the filter, and the green dashed line plots only the original tweets. Time period is 18 June 2018 – 10 August 2019. The few days without information are due to rare occasions when the server collapsed and we were thus unable to capture the tweets in real time.
For each of these tweets, we collect information on their ‘success’ on Twitter (number of likes, comments, etc.), as well as information on the characteristics of the user at the time of the tweet (e.g. its number of followers). To construct this unique dataset, we have combined the Sample and the Filter Twitter Application Programming Interfaces (APIs), and selected keywords. Figure 2 summarises our data collection setup.
Figure 2 Diagram of our experimental setup to select the best tweet collection method
Second, we develop novel algorithms to identify all the ‘news stories’ covered both on social and traditional media. An event here is a cluster of documents (tweets and media articles) that discuss the same news story. So, for example, all the documents (tweets and media articles) discussing the Hokkaido Eastern Iburi earthquake on 6 September 2018 will be classified as part of the same event. Events are detected by our algorithms using the fact that the documents share sufficient semantic similarity. In a nutshell, for Twitter, our approach consists in modelling the event detection problem as a dynamic clustering problem, using a ‘first story detection’ (FSD) algorithm (see Mazoyer et al. 2022 for more details). To detect the news events among the stories published online by traditional media outlets, we follow Cagé et al. (2020) and describe each news article by a semantic vector (using TF-IDF) and use the cosine distance to measure their semantic similarity. Used jointly with temporal constraints, we can cluster the articles to form the events. Finally, to generate the intersection between social media events and mainstream media events, we rely on the Louvain community detection algorithm (Blondel et al. 2008), as illustrated in Figure 3.
We identify 3,992 joint events, i.e. events that are covered both on social and on traditional media, out of which 3,904 originate first on Twitter.
Third, we rely on the structure of the social media network – and in particular, on the centrality of its users – to isolate ‘exogenous’ shocks to the popularity of the stories on Twitter (measured by the number of tweets, retweets, likes, etc.). In other words, we isolate variations in the popularity of stories on Twitter independent of the intrinsic interest of these stories. To do so, we leverage the enormity of our dataset to propose a novel instrumental variable strategy: our instrument is the interaction between the first Twitter users’ centrality in the network (measured computing PageRank centrality just before the event) and the news pressure in the social media at the time of the first tweets on the event. Our identification assumption is that, once we control for the direct effect of centrality and news pressure, the interaction between users’ centrality and news pressure should only affect traditional news production through its effect on the tweets’ visibility on Twitter.
Our findings are enlightening. Everything else equal – and, in particular, independently of the newsworthiness of a story – a 55% increase in the number of tweets posted before the first media article on a story leads to an increase in the number of news articles covering the story corresponding to 17% of the mean. In other words, Twitter sets the agenda of media coverage in a quantitatively meaningful way.
Why is this so? First, a growing literature in journalism studies highlights the fact that social media plays an important role as a news source. Consistent with this idea, we show that the magnitude of the effect is higher for the media outlets that have a high number of journalists with a Twitter account, pointing towards the role played by the monitoring of Twitter by journalists.
But the use of the platforms as journalistic sources is not the only factor at play here. In particular, we investigate whether the magnitude of the contagion between social and mainstream media depends on the outlets’ business model. For each of the media in our dataset, we collect information on whether it uses a paywall (at the time of the data collection), the characteristics of this paywall (e.g. soft versus hard), and the date of introduction of the paywall. This information is summarised in Figure 4.
Notes: The Figure reports the share of the media outlets in our sample depending on their online business model. 52% of the media in our sample do not have a paywall (“no paywall”), and 4.3% condition the reading of the paid articles on the fact of watching an ad (“paid articles can be accessed by watching an ad”). Of the outlets that do have a paywall, we distinguish between three models: hard paywall, metered paywall, and soft paywall (“some articles locked behind paywall”).
We show that the magnitude of our effects is much greater for the media outlets that rely fully or strongly on advertising revenues than for those whose online content is behind a paywall (and thus mainly depend on subscriptions). For the former, a 50% increase in popularity leads to an increase in news coverage corresponding to 22.0% (no paywall), 20.3% (soft paywall) and 21.1% (‘watch-an-ad’ paywall) of the mean, compared to 6.2% of the mean for the outlets using a metered paywall, a coefficient that is furthermore not statistically significant. In other words, Twitter influences mainstream media because of short-term considerations generated by advertising revenue-bearing clicks.
While there are widespread fears that new technologies are worsening editorial quality – in particular because they have led to savings in the newsroom, which in turn have reduced the quality of news provision and the production of original content (Cagé et al. 2017) – our findings thus imply that they are disproportionately worsening the quality for people who cannot afford or are unwilling to pay for news. Put another way, because media outlets whose content is available online for free tend to be more influenced by the popularity of stories on Twitter than those using a paywall, the platform generates an increase in information inequality, making disadvantage voters further vulnerable to manipulation (Kennedy and Prat 2019).
Besides, our findings – which capture the effects of a variation in popularity that is uncorrelated with a story’s underlying newsworthiness – suggest that social media may provide a biased signal of what readers want, which may in turn explain why, as highlighted by survey data, a significant share of the population is not interested in the news produced by the media (and might thus decide not to consume news). Twitter users are indeed not representative of the general news-reading population. This points to a negative effect of social media driven by the production side, consistent with recent changes in both The Guardian and The New York Times social media guidelines, which highlight the fact that journalists tend to rely too much on Twitter as both a reporting and feedback tool and that it may distort their view of who their audience is.
Turning to the demand for news and using audience data, we finally show that the news articles covering events that are more popular on Twitter do not get more views compared to the other articles, further reflecting the fact that the journalists’ reliance on Twitter might distort the information they produce compared to what citizens actually prefer.
Whether Elon Musk will actually buy Twitter remains an open question. Whether the new European regulations such as the Digital Markets Act (Crémer et al. 2022) and the Digital Services Act will be effective at regulating content on social networks has yet to be proven, even if the DSA is a step in the right direction. In the meantime, it is vital to keep in mind that social media matters for democracy beyond what anyone could have expected. Indeed, not only does it impact the users who spend time on the platforms, but also there is a contagion from social to mainstream media. This contagion casts doubt on the business model of the legacy media, as well as the welfare effects of the platforms. In particular, our results call into question whether citizens would be better informed in the absence of Twitter, and whether social media may be harmful to both journalism and democracy.
Blondel, V D, J-L Guillaume, R Lambiotte, and E Lefebvre (2008), “Fast Unfolding of Communities in Large Networks”, Journal of Statistical Mechanics: Theory and Experiment2008 (10): P10008.
Cagé, J, N Hervé and M-L Viaud (2017), "The commercial value of news in the internet era", VoxEU.org, 19 June.
Cagé, J, Nicolas H, and M-L Viaud (2020), “The Production of Information in an Online World”,The Review of Economic Studies 87(5): 2126–64.
Cagé, J, N Hervé, and B Mazoyer (2022), “Social Media Influence Mainstream Media: Evidence from Two Billion Tweets”, CEPR Discussion Paper No. 17358.
Crémer, J, D Dinielli, A Fletcher, P Heidhues, M Schnitzer and F Scott Morton (2022), "The Digital Markets Act: An economic perspective on the final negotiations", VoxEU.org, 11 February.
Fujiwara, T, K Muller, and C Schwarz (2021), “The Effect of Social Media on Elections: Evidence from the United States”, NBER Working Paper No. 28849.
Kennedy, P J, and A Prat (2019), “Where Do People Get Their News?”, Economic Policy 34(97): 5–47.
Mazoyer, B, N Hervé, C Hudelot, and J Cagé (2022), “Short-Text Embeddings for Unsupervised Event Detection in a Stream of Tweets”, Advances in Knowledge Discovery and Management 10, forthcoming.