Data Collection

This page really needs one of those old-school "under construction" images (and, for this week end, an "under snow" one too).

A warning

The first thing to note is that Twitter does not guarantee to return all matching Tweets, whatever method is used (unless you have direct access to their "fire hose", which I don't). There is no public information, to my knowledge, about exactly how Twitter determines what tweets to include or exclude; the best I have found is the documentation on Twitter Search Best Practises. This filtering should be taken into account when looking at the analysis presented here. You can see the result of this when comparing the values I get for most-popular retweets with those from Twitter; my values are up to 10 percent less for the number of retweets.

What is a sensible search?

A few years ago there was discussion amongst the attendees whether to use #AAS or #AAS<meeting number>, but fortunately the AAS have been promoting the use of the latter for the last few meetings. This makes data collection easier (assuming people read the tweets of the AAS Office!). As I also wanted to follow the participation in the first ever hack day at AAS, I decided to track the following three terms: aas221, aas and 221, and hackaas. Note that case does not matter, that I am not actually searching on the hash tags, and having the middle term means that I would select a tweet saying "The 221st meeting of the AAS was awesome!". An open question is how much irrelevant material was returned by this approach; anecdotal evidence suggests the rate is low but I should try to quantify this.

As an aside, the use of #AAS for the meetings, although having the advantage of saving 3 charaters, can lead to a lot of noise due to unrelated tweets. I did not try it this year but it has been the case in previous years.

For users following along to a meeting, I would suggest using the search #aas<meeting number> -RT to hide re-tweets, since they form a significant fraction of the volumne, but as I am interested in seeing what is re-tweeted, and by whom, I've left these in.

How did I search?

Unlike the AAS 219 meeting, my primary analysis was based on a search using the Twitter Streaming API. The code to do this is available at my astrosearch bitbucket account. The process requires running the astroserver, which acts as the database, and then astrosearch which deals with the Twitter search. Building it is likely to be non trivial since it is writen in Haskell - which most Astronomers will not have installed on their system - and several patched modules, which are listed in the README. In previous searches this approach was not robust; rather than fixing the issues I ended up treating the search service as a daemon which would be automatically restarted whenever it crashed. This only happened once during the run, much later than the main conference, and lead to a down time of less than 2 seconds.

Unlike previous meetings I did not run a search using the Twitter REST API with my grabtweets code since it was not needed. However, I did take advantage of the TAGSExplorer archive and visualization system which does use the REST API.

I did not use the Archivist site since this service seems to have evolved since I last used it, and didn't offer something that met my needs.

When was the search run

The search started at 2013-01-02 14:20:56.941684 UTC and ended around Fri Feb 1 20:34:01 EST 2013 (apologies for the mix in precision, time zone, and format). As explained below there are actually two tweets included in the dataset that were made before the search started; they have been left in since they are not going to sigificantly bias the results of any analysis I present.

Transformation

Since I used the Twitter Streaming API, the search continually produced results, which were written to disk as JSON (in previous versions I had tried converting them to Haskell data structures but to simplify things I did no processing in the search program). At intervals I would process all the matches, creating a RDF graph of the results, which was passed to a 4store instance. This instance was then queried using SPARQL to produce the results, as described in the Analysis section.

I used the 1.1 POST statuses/filter API for the search, using the track parameter. When transforming the JSON from Twitter, two types were processed: tweets and retweets (although you can see other message types, I didn't). The retweets include full information on the original tweet; as well as letting me link the two tweets together in the RDF this let me find a few tweets which had not been matched by Twitter. Two of these are due to re-tweets of messages which were sent before the search was started.

Missing tweets and the AAS

Overall, 30 "missing" tweets were found, so it is not a huge number, but they do indicate a phenomemon observed at this (and I have seen this with AAS meetings), that the AAS twitter accounts do not seem to have achieved enough "twitter-cred" to be included in searches:

twitter experts: @aas_office is having some visibility issues. they've been tweeting but not showing up in #aas221 search. can u help?
— Kelle Cruz (@kellecruz) January 10, 2013

Below are the missing tweets, grouped by author, and excluding the two that were from before the main search started.

Note that all the posts from AAS_Office and AAS_Press were missed by the search (i.e. they were only found because they were retweeted). This means I will have missed any posts from these two accounts that were not retweeted.

AAS Executive Office

Explore Long Beach #aas221 visitlongbeach.com
— AAS Executive Office (@AAS_Office) January 5, 2013
We're trending!! #aas221 twitter.com/AAS_Office/sta\u2026
— AAS Executive Office (@AAS_Office) January 7, 2013
Reminder: Enter to win $100 AMEX GC by following us before 8pm tonight @aas_office. Must be registered at #aas221 #randomdrawing
— AAS Executive Office (@AAS_Office) January 8, 2013
SPS Evening of Undergraduate Science #aas221 twitter.com/AAS_Office/sta\u2026
— AAS Executive Office (@AAS_Office) January 9, 2013
Trouble in the Blue today at 12:45pm MOVED to Ballroom E #aas221
— AAS Executive Office (@AAS_Office) January 9, 2013
Tonight's Space Science & Public Policy Talk at 8:00pm MOVED to 103B #aas221
— AAS Executive Office (@AAS_Office) January 9, 2013
Battery running low? Check out the new charging station at #aas221 sponsored by #northropgrumman
— AAS Executive Office (@AAS_Office) January 10, 2013
2013 Rodger Doxsey Travel Prize Winners & Runner-Ups #aas221 aas.org/grants/rodger_\u2026
— AAS Executive Office (@AAS_Office) January 18, 2013

AAS Press Office

FYI: Correct hashtag for the 221st American Astronomical Society (AAS) meeting now under way in Long Beach, CA, is #aas221, not #aas.
— AAS Press Office (@AAS_Press) January 6, 2013
NRAO: Massive Outburst in Neighbor Galaxy [NGC 660] Surprises Astronomers. #aas221 tinyurl.com/a38vmlk
— AAS Press Office (@AAS_Press) January 7, 2013
#aas221 press-conference webcast problems seem to be behind us now. tinyurl.com/bkkwfb6
— AAS Press Office (@AAS_Press) January 7, 2013
CfA: At Least One in Six Stars Has an Earth-Sized Planet.#AAS221 tinyurl.com/axcy6vn
— AAS Press Office (@AAS_Press) January 7, 2013
Did you know that AAS press conferences are open to all attendees? We're in room 204, Long Beach Convention Center. #aas221
— AAS Press Office (@AAS_Press) January 7, 2013
JPL: NASA'S Kepler Discovers 461 New Planet Candidates.#AAS221 tinyurl.com/bfvp4tx
— AAS Press Office (@AAS_Press) January 8, 2013
NASA/CXC: New Chandra Movie Features Neutron Star Action [Vela pulsar's jet]. #aas221 tinyurl.com/ac94jzb tinyurl.com/a5rure2
— AAS Press Office (@AAS_Press) January 8, 2013
NWU: Radio wave technique uncovers shadows of clouds and stars in Milky Way\u2019s center. #aas221 tinyurl.com/by4dlo9
— AAS Press Office (@AAS_Press) January 8, 2013
UTA: UT Arlington Researchers Try new Approach For Simulating Supernovas #AAS221 goo.gl/k0xgo
— AAS Press Office (@AAS_Press) January 8, 2013
NASA/JPL: NASA, ESA Telescopes Find Evidence for Asteroid Belt Around Vega. #aas221 tinyurl.com/b2jf8ry
— AAS Press Office (@AAS_Press) January 8, 2013
UCB: Exocomets may be as common as exoplanets. #aas221 tinyurl.com/avfg6w5
— AAS Press Office (@AAS_Press) January 8, 2013
CfA: First "Bone" of the Milky Way Identified. #AAS221 tinyurl.com/ajuyx4j
— AAS Press Office (@AAS_Press) January 8, 2013
Including on-site registrants, attendee count at #aas221 AAS meeting in Long Beach is now 2,929. aas.org/meetings/aas221
— AAS Press Office (@AAS_Press) January 9, 2013
This morning's AAS press conference (10:30 am, Room 204) is on supernovae & dark energy & features Nobel laureate Saul Perlmutter. #aas221
— AAS Press Office (@AAS_Press) January 9, 2013
LBNL: The Farthest Supernova Yet for Measuring Cosmic History. #aas221 tinyurl.com/a35pwlj
— AAS Press Office (@AAS_Press) January 9, 2013
NRAO: Mapping the Milky Way - Radio Telescopes Give Clues to Structure, History. #aas221 tinyurl.com/awkjcca
— AAS Press Office (@AAS_Press) January 9, 2013
Gemini: Next-Generation Adaptive Optics Brings Remarkable Details to Light in Stellar Nursery. #aas221 gemini.edu/node/11925
— AAS Press Office (@AAS_Press) January 9, 2013
Keck: Surprise! Earth-sized Planets Are Common. #aas221 keckobservatory.org/news/surprise_\u2026
— AAS Press Office (@AAS_Press) January 10, 2013
Caltech: A Cloudy Mystery - A puzzling cloud near the galaxy's center may hold clues to how stars are born. #aas221 caltech.edu/content/cloudy\u2026
— AAS Press Office (@AAS_Press) January 11, 2013
AAS: News-briefing videos from #aas221 in Long Beach, Jan. 7-10, are now on our archived-press-conferences page: aas.org/press/archived\u2026
— AAS Press Office (@AAS_Press) January 22, 2013

So, what should the AAS Twitter accounts do?

So, it looks like the AAS accounts need to improve there "Twitteriness", presumably by tweeting regularly outside the conference, including being involved in conversations (i.e. reply to and being replied to by other accounts), although this is a guess on my part. I wonder whether other scholarly societies see this (or have seen this)?

Analysis

To write.

Credits

The data collection and analysis is written in Haskell, using version 7.4.2 of the ghc Haskell compiler, and uses a bunch of packages from the Haskell package database (hackage).

The visualizations presented on this web site use the d3.js Javascript library to create groovy data-driven documents. I have also used Gephi and BioFabric to visualize and explore the user network (i.e. the hair ball and matrix views).

Last, but not least, thank you to all the Astronomers who uses Twitter to discuss the meeting, and those that followed along.

Navigation

AAS221 analysis

Internal Links