Despite efforts by major platforms to limit its spread, copies of the widely debunked conspiracy video “Plandemic” continued to multiply and spread largely through niche online conspiracy communities in early May 2020.

The DFRLab used the CrowdTangle API and an R package called CooRNet, developed by Fabio Giglietto, Nicola Righetti, and Luca Rossi, to track the spread of the viral conspiracy through hundreds of Facebook groups.

This document walks through the data analysis portion of the research, providing reproducible code for the key visualizations.

Pulling in the data from CrowdTangle

The first step was to get a dataset from CrowdTangle of posts promoting the Plandemic conspiracy shared to public Faebook groups that also contained URLs. The goal here was to capture posts that linked to either a copy of the video hosted off of Facebook, such as on YouTube or dedicated domains, or to other content that furthered the conspiracy (blog posts, op-eds, etc).

We created a search for posts containing the Plandemic video in CrowdTangle. Then, we used the CrowdTangle Historical Data feature to get all of the posts from the saved search containing links that were posted between May 3, 2020 - May 10, 2020.

Exploring the data

We now have a dataframe of highly_conneted_coordinated_entities that repeatedly shared the same URLs within the coordination interval.

And now we’ll display the top 50 entities sorted by coord.shares in an inline table:

# Load DT package for displaying inline tables
library(DT)

# Display inline table of top 50 Facebook groups identified by CooRNet, sorted by coord.shares
datatable(head(highly_connected_coordinated_entities_names, 50), options = list(order = list(list(3, 'desc'))))

But what is the threshold that defines a rapid link share for these highly connected entities? To determine that, we ran the estimate_coord_interval function in CooRNet.

cord_int<-estimate_coord_interval(ctshares, q=0.1, p=0.5)

cord_int
## [[1]]
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       9      14    7399    4251  103950 
## 
## [[2]]
## [1] "14 secs"

This returned a coordination interval of 14 seconds. A link share between two groups that occurred within 14 seconds is defined as unusually rapid, relative to the rest of the dataset.

The Top Shared URLs

We also wanted to a plot of the top URLs in the dataset that were being rapidly shared among the groups. We first got the list of URLs using the get_top_coord_urls function in `CooRNet.

# Get top URLs
top_urls_all <- get_top_coord_urls(output, order_by = "shares", component = FALSE, top = 6)

#Drop unwanted columns
top_urls_all <- select(top_urls_all, expanded, shares, engagement)

# Display as inline table 
datatable(top_urls_all) %>%
  formatStyle(names(top_urls_all), lineHeight='1%')

All of the top URLs shared by the entities were links to the Plandemic movie on YouTube, Vimeo, or PlandemicMovie.com.

Visualizing the network in Gephi

One of our outputs – highly_connected_g – was a large igraph object representing a network. The next step was to prepare this network for analysis in Gephi. We obtained summary statistics for degree. In the study of networks, degree is the number of connections a node has to other nodes. For the purposes of our data, nodes were individual Facebook groups, and connections were shares of URLs.

summary(V(highly_connected_g)$degree)

To make the graph less cluttered, we filtered it by deleting all vertices with a degree less than 1000. This will leave us with only the most connected Facebook groups.

library(igraph)

g <- delete.vertices(highly_connected_g, V(highly_connected_g)[degree < 1000])

#Export the graph as a graphml object

write.graph(g, file = "g.graphml", format = c("graphml"))

Our work was done in R for now, and we were ready to move to Gephi.

After exporting the graphml file from R, we imported it into Gephi. The result was something that looked like this: