SAVE THE DATE! Please join us at the Social Science Research Commons (SSRC), Woodburn Hall 200, on Wednesday, January 17, 2018 at 4:00 p.m. for an information session of this new resource at our upcoming Open Science Meeting. Research scientist Giovanni Luca Ciampaglia will give a brief demonstration of how to use the tool, and an overview of the scientific applications of social media data. For more information see the official announcement.
Starting this November, a novel data resource is available to the research and academic community at Indiana University. IU researchers and students can now obtain access to the Twitter stream through a new data API, provided by the Observatory on Social Media (OSoMe), Indiana University’s warehouse for social media data.
The new API provides historical access to billions of tweets, with tens of millions of tweets sampled every day directly from the Twitter stream. On top of Twitter’s Enterprise streaming API endpoints, OSoMe’s new Enhanced Data API provides advanced filtering and analytics capabilities, which can be used to query the collection for fine-grained data collection, analysis, and presentation.
New data snapshots are added to the collection on a daily basis. Access to raw data, in JSON format, is provided, with no caps on the amount of data available for download. An interactive, Web-based query builder (Moe’s Tavern) is available for users who wish to get started immediately without having to learn how to use Hadoop or other big data engines.
This new data resource is available thanks to support from the Indiana University Network Science Institute (IUNI) and the Digital Science Center (DSC).
By virtue of its academic partnership with Twitter, free access to the data is provided for research and academic purposes only. All faculty, students, and staff at IU are eligible to apply for access to the data. For more information, please visit: http://iuni.iu.edu/resources/osome
Frequently Asked Questions
Background and how to apply
How do I access the data?
There is a form that any IU user can complete to request access to the dataset; requests are evaluated by IUNI’s data steward. The full access policy is available here, together with the link to the form: http://iuni.iu.edu/resources/osome#enhanced-data
Is this a free resource?
Yes, but it can be used only for academic and educational purposes.
I am an IU faculty/staff/student but I am not based on the Bloomington campus. May I still access this resource?
Yes, you may. People from all IU campuses are eligible to apply for access to the data.
Do I need to file for IRB review to use this resource?
It depends. As a general rule of thumb, access to the OSoMe Enhanced Data API does not absolve a user from any responsibilities required by the Office of Research Compliance.
Do I need to create a Twitter account to collect these data?
No, you do not have to create a Twitter account.
Am I going to breach the privacy of Twitter users by collecting these data?
No. Only data from the public Twitter stream can be accessed via the Enhanced Data API. In particular, we comply with the terms of the Twitter Developer Agreement and Policy, and honor the intent of Twitter users to remove from our database any tweet that they decide to delete. Please note that in order to apply for access to the Enhanced Data API you also have to agree and comply with those terms.
Data characteristics, sampling, tools and resources
Can I download the data? How large is the file?
The data are only available for download through the OSoMe Enhanced Access API. File size will depend on the type of query you run.
How much data is available?
The API will return tweets from the last 18 months. The collection is based on a 10% sample of all public tweets.
I need to access tweets that are older than 18 months. Can you please make older data available?
Unfortunately due to the limited size of our cluster we cannot maintain more than 18 months worth of data at any give time. If you are interested in supporting the expansion of the Observatory on Social Media please drop us a line and we would be happy to discuss it with you.
How is this different from the standard Twitter API?
The free Twitter Search API returns only results from the last seven days. The new Premium 30-day Search API extends to the last 30 days but is not free. OSoMe’s Enhanced Data API returns results from the last 18 months and is free to any IU students, faculty, or staff, and includes a range of advanced data aggregation and filtering methods.
Is the sample statistically valid?
The API takes advantage of Twitter’s Enterprise real-time sampling algorithms to collect a statistically valid sample of 10% of all public tweets. Please note that this is different from the free Twitter Streaming API, which has been shown to be affected by sampling biases.
I have used the OSoMe tools before. How does this differ the rest of the tools of the Observatory on Social Media?
To comply with Twitter’s Developer Agreement, the amount of data that you can export from our current public front end tools is capped. Moreover, you can only export numerical tweet and user IDs. This means that even when your data fits within the limit, you still have to use the Twitter API to collect each individual tweet. The new Enhanced Access API is not restricted by these limits, but is available only to IU users by virtue of an academic partnership between Indiana University and Twitter, inc.
I have heard that a lot of Twitter users are fake or automated. How can I tell which tweets are from human users?
Please check Botometer, our free social bot detection tool.
Can I visualize these data?
With the Observatory on Social Media you can visualize temporal trends, geographic distributions, and diffusion networks, and even make movies. For the full list of public tools, please see osome.iuni.iu.edu/tools
Can I read more about OSoMe? How can I acknowledge this resource in my work?
For more information and/or a suggested citation, please refer to our open-access publication on the Observatory on Social Media:
Davis CA, Ciampaglia GL, Aiello LM, Chung K, Conover MD, Ferrara E, Flammini A, Fox GC, Gao X, Gonçalves B, Grabowicz PA, Hong K, Hui P, McCaulay S, McKelvey K, Meiss MR, Patil S, Peli Kankanamalage C, Pentchev V, Qiu J, Ratkiewicz J, Rudnick A, Serrette B, Shiralkar P, Varol O, Weng L, Wu T, Younge AJ, Menczer F. (2016) OSoMe: the IUNI observatory on social media. PeerJ Computer Science 2:e87 https://doi.org/10.7717/peerj-cs.87
If you use OSoMe and/or the new Enhanced Access API in your work, please acknowledge support from the Indiana University Network Science Institute. Suggested acknowledgement formula:
Access to Twitter data was obtained thanks to support from the Indiana University Network Science Institute through the Observatory on Social Media (osome.iuni.iu.edu), which is also supported by the Digital Science Center (dsc.sice.indiana.edu) and the Center for Complex Networks and Systems Research (cnets.indiana.edu).