Wednesday, November 11, 2015

Visualize Twitter Online Interactions in Social Network

Twitter users can mention or reply other Twitter users in each single tweet. Visualizing those online interactions on Twitter provide better understandings of human online behaviors, such as identifying who are the most important information source, and how information is transmitted online, and etc.

To visualize Twitter users online interactions, we have to extract tweets, and separate the mentioned/ replied users from the tweet texts. This Python script uses Twitter REST API  can extract mentioned/ replied users from tweets, and store the results in a CSV file. To maximize the usage of the REST API, the since_id and max_id are used to retrieve more tweets from a single Timeline.

Once the mentioned/ replied users are separated, we can use Networkx to create a network graph in which the mentioned/ replied users and the author of each single tweet are connected. This created network can be exported into Gephi for further visualization and analysis. A simple script is available here.




Saturday, October 10, 2015

Extract Historic Tweets

Extracting historic Twitter data is always problematic, because the Twitter Rest API can only retrieve 3,200 of a user’s most recent Tweets. That being said, if you want to find what YouTube tweeted between 2013 and 2014, this task can be almost impossible by suing the REST or Stream API only.

 However, Twitter Advanced Search provides historic tweets based on user's defined query, including the time of posting a tweet. For example, this link  from Twitter Advanced Search gives you the full list of tweets that YouTube posted between 2013 and 2014. 

Since the parameters in the request URL of the Twitter Advanced Search can be customized, it is possible to extract historic tweet information by sending request to the Twitter Advanced Search, and extract tweet information from the returned webpages.

Here is a simple Python script that can extract the historic tweet ID for specific twitter users. The logical is straightforward: 

  1. Customize the URL to request the historic tweets of specific twitter users during a defined time period;
  2. Get the responded page from the Twitter Advanced Search;
  3. If the returned page is short as one page, use BeautifulSoup package to extract the twitter id from the webpage;
  4.  If the returned page is longer than one page, use Selenumu to scroll down the webpage to get all the twitter id;
  5. Use Twitter Search API to extract the tweet contents from those collected tweet ID.








Thursday, March 26, 2015

Twitter Data Acquisition in Python



Prerequisites:
 Install Python at https://www.python.org/downloads/

 Create a Twitter Application
1)      Register a Twitter Application at https://apps.twitter.com/
2)      After you have successfully created a Twitter Application, write down your CONSUMER_KEY, CONSUMER_SECRET, Access_TOKEN, and Access_TOKEN_SECRET.
 Install necessary python libraries
1)      Go to https://pip.pypa.io/en/latest/installing.html, download the get-pip.py;
2)      Add Installation Folder/Python27/Scripts to the Path Variable in My Computer/ Properties/ Advanced system settings/ System Environment Variables
3)      Right click on the downloaded get-pip.py , choose Edit with IDLE, Run … Run Module (F5)
4)      Go to Windows/Start, in the Search programs and files type cmd
5)      In the pop-up window, type pip install twitter, the twitter library will be installed automatically.

6)      Type pip install dbf, the dbf library will be installed automatically
7)    Download the  GenerateDBFTable.py and GetTweetByQuery.py at https://github.com/xbwei/GetTwitter/tree/master/Twitter

Create dbf table: 

 Right click on the GenerateDBFTable.py file, choose Edit with IDLE, Run … Run Module (F5), a dbf table named Tweet will be created

Customize python script:
    Open the GetTweetByQuery.py file with IDLE (right click on the file, choose Open with IDLE)
    Fill in your CONSUMER_KEY, CONSUMER_SECRET, OAUTH(Access)_TOKEN, and OAUTH(Access)_TOKEN_SECRET in the OAUTH section

In the define query section, modify the following parameters:
1)      q: define the text that contained in the collected tweets returned by REST API
2)      count: define the maximal number of collected tweets returned by REST APT
3)      lang: specify the language of the tweets returned by the REST API
4)      geocode: define the latitude, longitude and radius where the tweets will be collected by the REST API


Collect Tweets:
Right click on the customized GetTweetByQuery.py, choose Edit with IDLE, Run … Run Module (F5)
    Open the Tweet.dbf in Excel to view the collected tweets.

Wednesday, February 4, 2015

Calculate Spatial Importance of Road Network in ArcGIS

A recent study found that Random Walk algorithm can be utilized to rank spatial importance of road networks. The basic idea is that by simulating a person's random walking in a road networks, the road segments or interactions that have been walked through many times are considered spatially important. Such spatial importance of road networks is evidenced in their close correlation to some social-economic characteristics of surrounding urban areas structured by road networks, e.g., population density, job density, or even house prices. More details can be found at this article: The Random Walk Value for Ranking Spatial Characteristics in Road Networks.

An  ArcGIS Tool has been devised to implement this Random Walk simulation. 

Four functions are provided in this ArcGIS Tool:
  1. Construct graph object

    Open a road network shapefile in ArcGIS. Open Construct Graph Network tool in the ArcGIS tool box, select the road shapefile as the edge layer. The weight field can be any numerical attributes of a road network, such as width, design speed and etc. In the node layer field, select the nodes that will be included in the random walk simulation, such as bus stops. Select the X and Y coordination of the node layer, and define the output folder and network name. This tool will create a graph object.
    If you don't have specific road nodes or you want to include all the road junctions in the random walk simulation , you can create road nodes by adding a network dataset of the road shapefile in ArcGIS.
  2. Simulate random walk

    Open the Calculate Random Walking Value tool in the ArcGIS toolbox, select the created graph in the Network File field. Define the output folder, field name, threshold of loop value, weight, and simulation method of the random walk simulation. The definitions of those parameters can be found in the The Random Walk Value for Ranking Spatial Characteristics in Road Networks.

    Random walk simulation may take several minutes. After the calculation is done, you can import the calculated edge and node shapefile in ArcGIS. A wlk files recording the walking paths is also created in the defined folder.

  3. Visualize random walk paths

    You can visualize the simulated random walk paths in ArcGIS by using the Check Random Walking Paths tool. In the Check Random Walking Paths tool, select the created wlk file in step 3, define how many walking paths you want to check, and define the output folder if you want to visualize those walking paths as shapefiles.
  4. Calculate other network measures of road networks (using Networkx)

    This tool can also calculate other network measures such as PageRank, betweenness, closeness and etc. To do that,open the Calculate PageRank Value tool in the ArcGIS toolbox, select the create network graph in the step1, and define the output folder of the network calculation. The network measures will be saved in a table in the defined folder.


Sunday, February 1, 2015

How to Extract, Visualize and Analyze Facebook and Twitter Data

LBSocial is a data mining website developed in Google App Engine that can extract, visualize and analyze social media data from Facebook and Twitter. If you have a Facebook account, you can use LBSocial to get your friends' posts on Facebook or public tweets on Twitter.

Facebook Data Collection

LBSocial utilizes Facebook API to extract Facebook data.
  1. Go to www.lbsocial.net, log in with your Facebook Account. This website is an official Facebook App, and has no access to your Facebook password. You can also use our test account to explore the functions of this website:
    ·         Test user account: bob_nvqpewi_a@tfbnw.net
    ·         Password: 1234
  2. Once you log in, you will see the number of places you have posted on Facebook.
  3. You can click Collect Friends' Facebook Data button, and the website will read  statuses and photos of your Facebook friends', and extract the posts that contain location information.
  4. After you finish the data collection of your Facebook friends, you can click View Collected Friends Data in Table. This will give you a list of posts from your Facebook friends, including place, time, contents and participants.
  5. You can also view your Friends' posts interactively on a map by clicking View Collected Friends Data on Map. LBSocial uses geojson data format and leaflet to display location-based Facebook activities.
  6. You can also see the social network of your Facebook friends by clicking Analyze Facebook Social Network. LBSocial uses networkx to construct and analyze social networks, and displays results with D3.
  7. The Collect Facebook Data From An Account function can help you to collect  posts from a specific Facebook friend or a public Facebook account.

Twitter Data Collection

LBSocial  combines Twitter REST API and Twitter Search Website to harvest tweets.
  1. Type the Twitter account that you want to collect in the Twitter Account Form. You can also specify the time period that you want to harvest. LBsoical will gather a list of tweet id from Twitter Search Website based on the defined query, and then extract the tweets from the tweet id list by using REST API. 
  2. You can also collect tweets by tweet contents or by locations where tweets are published.
  3. You can view the collected tweets in a table or on a map.
  4. The network analysis function will analyze and visualize the interactions of twitter users based on the collected tweets.