Thursday, November 13, 2014

Session 4 Homework for CBA Batch 3

Class,

Individual homework:

Fill up this survey below (on perceptions of what constitutes IT capabilities in a firm). If you have any issues with doing so, let me know and I will assign alternate individual homework.

IT capabilities survey



Group HW:

1. Pick up any well-known brand- product or service. E.g. Xbox360 or Jabong or iphone6 or Nike.

2. Collect 3 sets of data for it:

  • (a) 100+ consumer reviews from either flipkart or Amazon India
  • (b) 500+ tweets
  • (c) 50+ articles from Googlenews or any other news aggregator sites.

3. Feel free to either use R or any other means you know of to collect the data (e.g. Python, chrome scraper etc.). But clearly mention the data collection tool used.

4. For each set of data, perform the following analyses:

  • (a) General wordcloud using both TF and TFIDF weighing schemes. Update stopwords list to filter out noisy or irrelevant terms.
  • (b) Sentiment analysis. Display wordclouds separately for the top 50 most positive and most negative words.
  • (c) Identify the top few most positive and most negative documents. Read them and speculate on why they are so positive or negative about it.

5. Session 4 HW submission format:

  • Use a plain white blank PPT.
  • On the title slide, write your group name and the names + ISB students IDs of all group members.
  • Give your homework an informative title (include name of the product/brand you chose).
  • Have 3 sections in your PPT - one corresponding to one data source and separated by separator slides.
  • As slide separators, mention the source of the data. E.g., "Data source: Amazon Consumer reviews" or "Data Source:Twitter" and so on.
  • For slide headers, use format "TF Wordcloud" or "Positive wordcloud" and so on.
  • Save the slide deck as session4HW_yourgroup.ppt.
  • Put all the raw data you collected, the code you used and your PPT in a zip folder (so that I can replicate your analysis if need arises). Save the folder as session4HW_yourgroup.zip and upload in in the dropbox on LMS before the deadline.

Any Qs etc., let Atreyee or me know. Feel free to use the comments section to this post for any Q&A or discussions.

Sudhir

4 comments:

  1. Is this also due on Dec 10?

    ReplyDelete
  2. Hi Professor

    In session 4 group assignment, question 4(c), ask is to identify few top most positive and negative documents and to read them and speculate on why they are so positive or negative about it.

    Can you please clarify the ask in bit detail as in our group everybody has different interpretation of the ask?

    Thanks
    Bhavik K

    ReplyDelete
    Replies
    1. Hi Bhavik,

      The sequence of the top few most +ve or -ve docs is not subjective but comes from the count of sentiment laden words contained in each doc (sentiment score in qdap package).

      Differing interpretations are fine, just go with the majority view. Take a vote or something.

      Sudhir

      Delete