Data Collection
Web Scraping
Using the web scraping tool called snscrape, we were able to scrape 2,730 unique tweets. Unfortunately, due to time constraints, we were not able to review and classify each scraped tweet whether it is a disinformation tweet or not. However, from these tweets, we managed to identify 203 disinformation tweets from 158 accounts. Moreover, we also identified 265 non-disinformation tweets as a control dataset for our data analysis.
Data Columns
The following are all the columns automatically labeled by the scraper.
| tweet_id | tweet_url | keywords | account_handle | account_name |
| account_bio | account_bio_rendered | account_verified | joined | following |
| followers | location | tweet | tweet_rendered | date_posted |
| likes | replies | retweets | quote_tweets | views |
| source_url | source_label | links_url | media | retweeted_tweet_id |
| quoted_tweet_id | in_reply_to_tweet_id | in_reply_to_user_id | conversation_id |
Date Labeling
Aside from the columns above, we also added new columns which we manually labeled. These new columns are:
- leni_sentiment - The tweet's sentiment (negative, neutral, positive) towards former Vice President Robredo.
- marcos_sentiment - The tweet's sentiment (negative, neutral, positive) towards President Marcos Jr.
- incident - The incident associated to the misinformation tweet. This is expounded below.
- account_type - Indicates whether the account is anonymous, identified, or media.
- tweet_type - Indicates whether the tweet is text, reply, image, URL, video, or a combination of these.
- content_type - Indicates whether the tweet is rational, emotional, transactional, or a combination of these.
- country - Indicates the country of the account based on their profile location field.
- has_leni_ref - Indicates whether the tweet contains references to former Vice President Robredo (labeled as 1 if there is a reference, 0 otherwise).
- alt-text - The alt-text of the tweet in case it contains videos, images, or articles.
The Allegations/Incidents
We have identified five disinformation topics/incidents about the Robredo sisters.
- Jillian Robredo heckling at Baguio (codename: Baguio)
- Alleged ladder incident involving Tricia Robredo (codename: Ladder)
- Alleged sensitive videos of Aika and Tricia Robredo (codename: Scandal)
- Alleged quarantine violation by the Robredo's (codename: Quarantine)
- Other topics include, "dissemination of anti-BBM flyers," and "accusing Leni of using public funds for her daughter's Harvard tuition." (codename: Others)
incident column.