25 Users’ Data: Legal & Ethical Considerations
Before we dive into collecting data from the internet, we need to discuss some serious questions. Is it legal or ethical to computationally collect data from the internet? Is it legal or ethical to publish research that includes internet users’ data without their knowledge?
25.1 Legal Considerations
If internet data is publicly available (e.g., tweets from a public Twitter account), it is generally considered legal to collect this data, even if a particular platform says that you cannot. In 2019, the Ninth Circuit Court of Appeals ruled that scraping publicly accessible websites likely does not violate federal anti-hacking laws. You can read more about this legal ruling from the Electronic Frontier Foundation.
25.2 Institutional Review Boards (IRBs)
Research that involves human participants (e.g., surveys, interviews, blood draws) needs to be approved by an Institutional Review Board (IRB). But research about publicly available internet data does not typically require IRB approval.
The Cornell Institutional Review Board recommends being cautious with regard to data mining from the internet, however, and seeking “formal confirmation of non-human participant research status”:
If the individual or social media/network site has not placed any restrictions on access to information about himself/herself (e.g., information available on a public website, blog, twitter feed, chat room, etc.), the following best practices should be followed: - The researcher should send a project description to the IRB office and seek a formal confirmation of non-human participant research status for the study. We believe that in most cases, this will not be considered human participant research, but caution is recommended before a researcher makes his/her own determination, because of the emerging ethical sensitivities in this area.
25.3 Publishing, Privacy, & Citation
Just because something is legal or gets approved by an IRB does not mean it is ethical. Collecting, sharing, and publishing internet data created by or about individuals can lead to unwanted public scrutiny, harm, and other negative consequences for those individuals. For these reasons, some researchers attempt to anonymize internet data before sharing it or before publishing an article that cites a post specifically. Yet anonymizing internet data also does not give credit to internet users as creators and authors.
There is no single, simple answer to the many difficult questions raised by internet data collection. It is important to develop an ethical framework that responds to the specifics of your particular research project or use case (e.g., the platform, the people involved, the context, the potential consequences, etc.).
In my own research, I have started seeking explicit permission from internet users when I want to quote them in a published article. In this book, I only share internet data that meets a certain threshold of publicness, such as tweets from verified Twitter accounts or Reddit posts with a certain number of upvotes. This is an approach that I have developed based on some of the models and readings included below.
25.5 Further Recommended Reading
- Doc Now White Paper, Bergis Jules, Ed Summers, Dr. Vernon Mitchell, Jr.
- No Robots, Spiders, or Scrapers: Legal and Ethical Regulation of Data Collection Methods in Social Media Terms of Service, Casey Fiesler, Nathan Beard, Brian C. Keegan
- #transform(ing)DH Writing and Research: An Autoethnography of Digital Humanities and Feminist Ethics, Moya Bailey
- The #TwitterEthics Manifesto, Dorothy Kim and Eunsong Kim