pogo

MIT Hackathon project correlating social media discussion with consumer behavior.

Overview

Pogo is a simple project to explore the relationship between consumer sentiment about products, as expressed in the natural language text of product reviews, Twitter feeds and NYTimes articles with the explicit product ratings on Best Buy. This is a team project for the MIT Hackathon at the Hack/Reduce space in Cambridge, MA in March 2015.

Subgoals were to play with the APIs available by the sponsors (Indico.io, Knowledgent, Basis, Tamr), tap interesting public datasets, and integregate diverse tools of interest to the team, including R, Shiny and Python and make these work together.

Data sources

Indico.io for sentiment analysis of natural language text
Best Buy API for product search, consumer ratings (numeric) and reviews (text)
NYTimes API for news articles mentioning specific products
Twitter API for Twitter feeds

Tools / platforms

Languages: R, Python
R Studio
Shiny application server for R

Methodology

For a specific product search term, e.g. "iPhone" Pogo:

Searches the Best Buy products API to find all products matching the search term
Extracts list of matching SKUs
Retrieves product ratings and customer review text verbatims from the Best Buy Reviews API for those SKUs
Calculates the sentiment value of the review text using Indico.io
Searches Twitter Developer API for Tweets matching the search term, and calculates sentiment
Searches NYTimes Articles API for news articles matching the search term

Results

Below is a screen image of one Pogo visualizations, showing the mean sentiment score of the text comments written by customers (e.g. "The is everything that was promised to Apple customers. I suggest this phone to upgraders and everyone else!"), versus the numerical product rating on Best Buy (e.g. "5" stars). This graph shows a strong correlation between the numerical rating and the sentiment score of the review text, spanning the entire 0.0 to 1.0 scale. Of course, you'd expect more positive words with more positive reviews, the striking thing was how well the sentiment API scores reflected our intuitive sense of the sentiments when reading the words:

For each product, Pogo also produces World Clouds of the text in the Best Buy product reviews and NY Times articles:

Twitter sentiment is displayed as a histogam of sentiment value vs word frequency. This graph below shows that for PlayStation, tweets referencing #playstation were most commonly skewed toward the positive sentiment, although there is a wide distribution of sentiment for individual tweets.

Findings

The Indico.io API for sentiment analysis was very easy to work with with, and yielded scores which jibe with both

Challenges

The Best Buy Products API has relatively coarse search capabilities. For instance, a search for iPhone will return a list of products, many of which are actually iPhones specifically, but also includes things like iPhone cases and iPhone speakers. The alternative is an exact text search, which requires incredible precision from user input.

  { "sku": 1722009, "name": "Apple - iPhone 5c 16GB Cell Phone - Pink (AT&T)" },
  { "sku": 1724671, "name": "Apple - iPhone 5c 16GB Cell Phone - Pink (Sprint)",
  { "sku": 6704115, "name": "ADOPTED - Cushion Wrap Case for Apple® iPhone® 5 and 5s - Black/Rose Gold" },

The Best Buy, Twitter and NYTimes APIs have relatively stringent rate limits, both in terms of queries per second as well as total queries over longer periods, including 15 minutes, hour or day. During the development process, test queries can inadvertently lock out the API for a period of time.
Shiny and Python integrate well on local development machines, but require advanced configuration such as buildbacks when deploying to cloud based servers.

Installation and setup

Clone the repo, git clone https://github.com/pietersv/pogo
Sign up for Developer access at the following sites
Create a file called secret in the /pogo directory with these entries:

BEST_BUY_API_APPLICATION=pogo 
BEST_BUY_API_KEY=
NYTIMES_ARTICLE_API_KEY=
NYTIMES_BOOKREVIEWS_API_KEY=
TWITTER_API_KEY=
TWITTER_API_SECRET=
TWITTER_ACCESS_TOKEN=
TWITTER_ACCESS_SECRET=

Launch R Studio and in the console:
- define a variable in R secretLoc <- "Users/you/projects/pogo/secret"~/
- define a variable with the directory shinyLoc <- "Users/you/projects/pogo"
- start the Shiny server runApp(shinyLoc, launch.browser=TRUE) (to deploy set host=0.0.0.0, port=80)
- replace various stray hard coded paths, grep for /Users

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
Apple Watch data.csv		Apple Watch data.csv
README.md		README.md
api_nytimes.py		api_nytimes.py
api_twitter.py		api_twitter.py
bestbuy.R		bestbuy.R
big data.csv		big data.csv
integrate wordcloud + shiny.R		integrate wordcloud + shiny.R
nytimesarticle.py		nytimesarticle.py
nytimesarticle.pyc		nytimesarticle.pyc
server.R		server.R
ui.R		ui.R
wordcloud.R		wordcloud.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pogo

Overview

Data sources

Tools / platforms

Methodology

Results

Findings

Challenges

Installation and setup

About

Releases

Packages

Contributors 4

Languages

pietersv/pogo

Folders and files

Latest commit

History

Repository files navigation

pogo

Overview

Data sources

Tools / platforms

Methodology

Results

Findings

Challenges

Installation and setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages