Analytics Yogi: Group Homework for Sessions 4 and 5 (Text and Social network analysis)

Sunday, February 23, 2014

Group Homework for Sessions 4 and 5 (Text and Social network analysis)

Hi all,

This is the HW for sessions 4 and 5. It is the last and final HW in this course. It will involve data collection, analysis and inference.

It will require you to go through and understand the R code I used in class. Pls *ensure* you can replicate the classwork examples and exercises before attempting the homework.

This is a group homework, so ensure you divide it amongst your group and co-ordinate. That way, too much burden won't fall on any individual.

If your group formation is not yet done, let Krishna Pusuluri know, he'll assign you to a group.

Any doubts, issues etc, let me know through the blog or via email.

### following Qs are for text analysis using R code from the class on your survey data ###

Q1a. One Q in your survey asks you to "List some brands (five or more) you are personally loyal to." Text analyze this component for the entire class by building a TDMN and a wordcloud. Comment on which brands seem most popular and which categories they come from. Comment on why this may be the case.
Q1b. Now build a simple semantic network for the terms found above. Basically, which brands co-coccur in documents. (See my R code for a function on how to build simple semantic networks from term-document matrices). Speculate on which brands seem to be preferred together by people.

Q2. Text analyze the answers to the Q "List two places OUTSIDE India that you would like to visit. Explain / why in a few lines for each place." Build wordclouds under both TF and TFIDF. Comment on what can be inferred from the wordcloud.

Q3a. Text analyze the responses to the Q: "What are your career goals in the short, medium and long terms? / Explain in a few lines." Build a wordcloud under both TF and TFIDF. Comment on what can be inferred from the wordcloud.
Q3b. Build a semantic network connecting the terms for this Q. Which terms occur together the most in documents? What can be inferred?

### following Qs are for web extraction of data from amazon ###

Q4. Collect 100 odd reviews from Amazon for xbox 360. Analyze the wordcloud. What themes seem to emerge from the wordcloud?

Q5. Analyze the positive wordcloud. What are the xbox's seeming strengths? What can they position around?

Q6. Analyze the negative wordcloud. What are the xbox's seeming weaknesses? What can they prioritize and fix?

Deadline is before the exam. Submission must be in the form of PPTs only. Write your group name, individual members' names and ISB IDs on the title slide and write your group name as file name. Dropbox will be made for this.

Any Qs etc, contact me.

Sudhir

15 comments:

UnknownFebruary 24, 2014 at 2:29 PM
Dear sir,
I tried the codes for Q4 (data extraction), its working fine but showing error for could not find function "%do%"
and how to save the 99 reviews to a file, as in RStudio we are able to see only first few reviews...
ReplyDelete
Replies
UnknownFebruary 24, 2014 at 8:28 PM
Hi Sir,
I need some help. I was trying to execute the R code, which is part of Session4 in LMS. The R code is in the file "text and sentiment analysis code ver2". While executing the following file :
tdm1 = TermDocumentMatrix(x1, control = list(weighting = function(x) weightTfidf(x, normalize = FALSE, stopwords = TRUE)));

I have encountered the error :
"Error in weighting(x) : could not find function "weightTfidf""

I did not find much help in Google on this error. Any suggestions how to resolve this issue ? Thanks.
ReplyDelete
Replies
UnknownFebruary 26, 2014 at 9:24 PM
Thank you professor. After changing the 'weightTfidf' to 'weightTfIdf', it is working. I have also changed the statement : "dtm = t(tdm11)" to "dtm = tfidf(tdm11)" for troubleshooting the other issue. Just thought of sharing.

I have encountered some other issue downstream. Debugging the code now. Will request if I need any additional help. Thanks.
ReplyDelete
Replies
UnknownFebruary 27, 2014 at 11:47 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownMarch 2, 2014 at 9:27 PM
After changing the 'weightTfidf' to 'weightTfIdf', It is showing Error in weightTfIdf(x, normalize = FALSE, stopwords = TRUE) :
unused argument (stopwords = TRUE)
So should we remove this stopwords = TRUE also?
ReplyDelete
Replies
UnknownMarch 9, 2014 at 1:33 AM
I'm trying to do Tweet Extraction in R using twitteR. Using code

twitCred<-OAuthFactory$new(consumerKey=consumerKey,
consumerSecret=consumerSecret,
requestURL=reqURL,
accessURL=accessURL,
authURL=authURL)

(I've created ConsumerKey and secret) but after running above code, I'm still getting error "OAuth authentication is required with Twitter's API v1.1"
I tried to generate authentication Pin but not able to do so. I got a page which was mentioned "OAuth Signing Results" with some string and authorization header. How can I get authorization PIN.

Request you to please help me on this.
ReplyDelete
Replies
UnknownMarch 9, 2014 at 1:51 AM
URLs:
reqURL<-"https://api.twitter.com/oauth/request_token"
accessURL<-"https://api.twitter.com/oauth/access_token"
authURL<-"https://api.twitter.com/oauth/authorize"
ReplyDelete
Replies
Sudhir VoletiMarch 11, 2014 at 4:03 AM
Hi Ashutosh,

I used the code I shared last year. Seems twitter API's connection protocols have changed since. I haven't had occasion to use it again and don;t really have the bandwidth to investigate that aspect now. Should you find the code to get through to twitter, pls share the same here for everyone's benefit. Thanks.

Sudhir
ReplyDelete
Replies
RashmiMarch 19, 2014 at 11:01 PM
Hello Sir,

Need your advice. When I try TDMN function it shows me the term frequency correctly, but I do not see those terms in Wordcloud though they are having higher freq. For eg, in the case of Brands example, I could see "Google" having more term frequency than others but it was missing from the word cloud.
Could memory cache be the reason? Please advice.

Thanks,
Rashmi
ReplyDelete
Replies
VenkatMarch 30, 2014 at 11:02 PM
Hello Sir, Came across Facial Analytics .Reminded me of the New York Museum example from class. http://www.informationweek.com/big-data/big-data-analytics/facial-analytics-what-are-you-smiling-at/d/d-id/1127726
ReplyDelete
Replies

Add comment