Thursday, March 26, 2015

Re the DC FInal Exam

Class,

The Data Collection Final exam is on 28th March 2-4 p.m in the after noon. I'm almost done making the paper. Here's what you can expect.

1. The exam will be conducted via a qualtrics web survey. The survey will be launched and weblink sent to you via email at 2 pm on 28th. By 4 pm, you should have completed and 'submitted' the survey/exam. Once the 2 hours time limit is up, the survey will be disabled.

2. No surprise that a lot of the Qs will be of the short answer, multiple-choice, fill-in-the-blanks, true-false type. However, do expect a few short written types also (~ 50 - 75 words max).

3. This is an open-book open-notes exam. You are also allowed to use R, Excel or any other software as required. (Yes, I may give you a small dummy dataset for basic summarizing and interpretation. Nothing sophisticated, but very simple stuff and Excel may well be enough.).

4. Pls remember to strictly adhere to the honor code - Strictly no communication with any third person during the exam, no copying or saving of exam questions etc.

5. Pls ensure you have a good net connection, some sort of power backup if required (laptop is ideal for this) and access to all your course material.

6. All restrictions that generally apply to websurveys apply here. You may not be able to go back to a previous question after a page is turned. Once submitted, a submission will be counted as final.

Any Qs etc, let me or Atreyee know.

Good luck!

Sudhir

Sunday, March 15, 2015

Final Individual Homework for D.C. (and Other Updates)

Class,

Pls find below your final individual homework - different surveys for different sections.

Survey link for section A

Survey link for Section B

First, a little bit of background. You may recall the ice-cream survey example I had covered in class. Well, I mined semantic topics or themes from them.

Attempt now is to evaluate the degree of concordance between machine classification of customer responses and human classification of the same. So basically, I'm using these surveys to collect data from you on the elements of human classification understanding and behavior.

Pls follow the instructions and solve the survey. I expect it won't take more than 10-12 minutes on average.

Submission deadline is a week from now, i.e. midnight of 22-March-2015 (Sunday).

Any Queries etc, let me know.

Other updates:

Apparently quite a few queries on R code remain even after the tutorial y'day.

I will see if I can conduct a follow-up tutorial myself next Saturday (if a slot is available). Attendance etc is optional, of course.

Monday, March 9, 2015

Individual Homework - D.C. on Word of Mouth

Hi Class,

This is individual homework, to be done by each of you independently.It is about primary data collection on Word-of-mouth (henceforth, WOM) communications

Read carefully the following 5 steps for your homework.

Instructions:

1. For one full weekday, make a quick note of every instance of offline WOM communication about *any* brand (product or service) that you come across.

Thus for instance, if you happened to mention to your friend that you liked "The theory of everything" (movie), then make a note of it.

If your colleague happens to mention to you that s/he was at Continental Hospital for a checkup, make a note of that too.

Or it could be that you were the third party at a conversation between two people arguing over whether 'The Times of India' is better of 'The Hindu'. Make a note of that too.

2. Mind you, this is just for 24 hours, when, in the course of your regular day, you make a mental note of what all products, brands, services etc that you came across via interpersonal WOM (offline only) and then record them in a notepad or an excel sheet or some such place.

Important: Do NOT deliberately indulge in WOM for the homework. Only record that WOM which happens naturally, in the course of your everyday routines.

3. I want you to record 3 things:

  • (a) Name of the product/brand etc and which category/ industry it belongs to.

  • (b) Who was the source of the WOM (was it you? a colleague? family member? etc.) and who was the recipient?

  • (c) what was the time of the day (roughly) when the WOM exchange took place.

4. Repeat steps 1-3 for any 24 hour period during a weekend or holiday.

5. Finally, write your primary data collected into an excel sheet with 5 columns: brand/product, industry or category, WOM source, WOM recipient and Date-time.

Name the excel sheet as "YourName_ISB student number.xls" and upload it to the requisite dropbox in LMS.

Deadline for this individual homework is 10 days from now - i.e. 18 March 2015 midnight.

Any queries etc pls let me or Atreyee know.

Sudhir

Session 4 based Group Homework for Batch 4

Class,

This group homework is based on session 4 - text analytics.

I did toy with the idea of inserting a latent topic modeling and interpretation component to this homework but decided against it as it isn't strictly in D.C.'s domain.

The code required to do this HW will be up soon on LMS.

Group HW:

1. Pick up any well-known brand- product or service. E.g. Xbox360 or Jabong or iphone6 or Nike.

2. Collect 3 sets of data for it:

  • (a) 100+ consumer reviews from either flipkart or Amazon India
  • (b) 500+ tweets
  • (c) 50+ articles from Googlenews or any other news aggregator sites.

3. Feel free to either use R or any other means you know of to collect the data (e.g. Python, chrome scraper etc.). But clearly mention the data collection tool used.

4. For each set of data, perform the following analyses:

  • (a) General wordcloud using both TF and TFIDF weighing schemes. Update stopwords list to filter out noisy or irrelevant terms.
  • (b) Sentiment analysis. Display wordclouds separately for the top 50 most positive and most negative words.
  • (c) Identify the top few most positive and most negative documents. Read them and speculate on why they are so positive or negative about it.

5. Session 4 HW submission format:

  • Use a plain white blank PPT.
  • On the title slide, write your group name and the names + ISB students IDs of all group members.
  • Give your homework an informative title (include name of the product/brand you chose).
  • Have 3 sections in your PPT - one corresponding to one data source and separated by separator slides.
  • As slide separators, mention the source of the data. E.g., "Data source: Amazon Consumer reviews" or "Data Source:Twitter" and so on.
  • For slide headers, use format "TF Wordcloud" or "Positive wordcloud" and so on.
  • Save the slide deck as session4HW_yourgroup.ppt.
  • Put all the raw data you collected, the code you used and your PPT in a zip folder (so that I can replicate your analysis if need arises). Save the folder as session4HW_yourgroup.zip and upload in in the dropbox on LMS before the deadline.

Any Qs etc., let Atreyee or me know. Feel free to use the comments section to this post for any Q&A or discussions.

There are two more homeworks coming your way - both individual - and only one of them is a survey based one.

Sudhir

Saturday, March 7, 2015

Individual Homework 2

Hi Class,

First off, happy holi to you all.

This below is your second of three individual homeworks - all of which involve survey filling only - in the course.

This survey is divided into 3 sections (or 'blocks' in Qualtrics' terminology), viz. a psychographic questionnaire, an open-ended elicitation of brand preferences and finally, a close-ended (i.e scaled) elicitation of preferences over particular brands. Pls note that there are no 'right' or 'wrong' answers in the survey. This homework is graded on completeness and timeliness.

I expect upto 20-25 minutes for the survey, so kindly make that much time and take it.

The survey can be found here at this link.

Deadline is 15-March Sunday midnight.

Any queries etc, pls contact me.

Sudhir

Tuesday, March 3, 2015

More readings and material for DC

Hi Class,

I'd put up a set of additional and optional reading material in an earlier blog post for CBA batch 4.

This is more along the same lines. However, the readings I point to here are not optional in the sense that questions based on these readings may feature in your exam.

Readings relating technology to data collection and data use (from the Economist):

1. The first article titled 'Little Brother' (in an obvious play on George Orwell's famous 'big Brother' theme) details the impact of digital on advertising spends of firms worldwide.

2. The second article, 'Getting to know you', is about the various ways in which data is collected about consumers online.

3. The third article in this series, 'The world wild web', extrapolates some of what we are seeing into the future and asks 'Where are we going?'.

Ideally, I'd like you to read and discuss these articles within your groups. Again, remember, questions based on ideas and facts in these articles are fair game in your final exam for DC. Happy reading.

If you have come across such material which may be of interest to the class, you may email me or put up links to that material in the comments section below.

For instance, Nikhil Maddirala from your batch emailed me with information regarding a useful webscraping tool. Recall the Chrome scraper extension/ plug in tool I showed you in class? Well, it seems there's a way to tweak the tool to scrape multiple pages in one go.

See this link here on Scraping multiple Pages using the Scraper Extension and Refine.

Ciao

Sudhir

R code and Data files on LMS (Sessions 4 and 5)

Hi all,

Sorry about the delay in updating the blog and the LMS with the R code and data from sessions 4 and 5.

Pls find folders on LMS containing R code, data and instructions.

As CBA students, my expectation is that you will:

(i) diligently follow the instructions given,

(ii) read and understand the R code line-by-line before running it,

(iii) run the code and replicate the classwork examples,

(iv) discuss any issues etc that arise here on this blog by using the comments section,

(v) solve the next group homework by tweaking and customizing the R code as required, and

(vi) provide constructive feedback where possible.

Instructions:

1. Unzip contents of the zip folder

2. Open Rstudio. File menu --> Open File --> textanalysis R code.R

3. the textanalysis R code.R file will open as an additional window (on the top left) in Rstudio)

4. To run any lines, select them and click the Run icon on the top right of the window. Ensure internet is connected.

5. Read the lines before running as some require input from your side (which files to read in etc)

6. The zip folder contents are self-contained and hopefully should run smoothly. However, if you encounter issues, pls let us know.

7. Pls email aashish_pandey@isb.edu with a copy to Atryee in case of any R related issues. Your group homework for this session will be up soon, in a few days. Pls ensure you are comfortable with this code before the homework arrives.

Thanks.

Sudhir

Monday, March 2, 2015

Group Homework 1

Hi Class,

This group homework covers sessions 1-3 in DC, i.e. problem formulation in terms of D.Ps and R.Os, construct formulation around the D.Ps and R.O.s, questionnaire design around the constructs uncovered, and actually programming a websurvey on ISB's qualtrics websurvey software.

Read the following recent Businessweek article:

Coke's big fat problem.

Imagine you are in the shoes of Sandy Douglas. Now, do the following...

(i) From his 'messy reality', extract a relevant and pressing R.O. (stated clearly in words).

(ii) Map that R.O. onto 'information requirements' (see session 2 slides) that are built around some critical constructs of interest. Give these constructs a descriptive name.

In real life, we'd use exploratory/qualitative work extensively at this stage. Assume you have done so already.

(iii) Now, further break down the construct(s) you identified above into one-dimensional aspects that can be captured using Likerts.

(iv) Define your target audience/ target segment as teenagers. Develop a questionnaire for this target audience that can be taken in under 12 minutes.

Use of SKIP logic and any other Qualtrics features is welcome.

(v) Program your questionnaire into a websurvey into Qualtrics. The survey URL (obtained upon launching) is the deliverable and should be pasted along with your group name in this google form.

(vi) The first page of your survey should be descriptive text only, meant for me and the AAs. Pls write cogently the answers to parts (i) to (iv) above in that space.

Update: In the past, I got quite a few Qs asking if a scale other than Likert can be used etc. Sure, it can. Likert is important in the context of behavioral constructs. For regular, descriptive Qs, use other scales by all means. *Not* every Q has to be a likert.

The instructions for how to get a qualtrics account will be put up on LMS, if they haven't been done so already.

Deadline for this is midnight of 22-March (sunday).

Sudhir

Individual Homework 1

Hi Class,

There are at least two individual homeworks in DC. Both involve filling-up surveys on time and on quality.

The first homework is described below. I will putup details for the second one shortly.

Pls watch this ~ 20 minute video carefully. It features Scott McDonald of Condé Nast holding fort on where Marketing Research is headed in the next decade.

“Social Technological and Economic forces affecting Marketing Research over the next decade”

Now, for your HW, pls answer a few simple Qs (True-False, fill in the blanks variety) about the above talk in the following survey:

Questions for Make-up Homework.

HW Notes:

(i) This is an individual-only HW. Since it involves no R, consulting peers is not permitted.

(ii) I found that using earphones works great in making out what the speaker is saying much more clearly than ordinary speakers. FYI.

(iii) Deadline: The HW should be completed and submitted latest by midnight 15-March.

Any Qs etc, pls feel free to email me or use the comments section below.

Sudhir Voleti

Links to additional material for DC

Hi class,

Wide range of topics we'd seen in the DC course. Some of you asked for more sources and reading material. Pls find the same below (in no particular order) and totally optional only:

1. Recall the google glass example somebody had raised in class? Well, here's a Gigaom article on the Future of the wearables market.

2. Recall the first example in the network analytics class on world international call patterns? Well, here's the associated Atlantic article on a World mapped by phone calls. It nicely illustrates how much visualization of networks can tell us.

3. More from the Atlantic on how its now technologically feasible to arrive at one's Identity. Big Data Can Guess Who You Are Based on Your Zip Code

4. Recall the habit patterns class we'd covered? Here's an article from HBR blogs on How Customers Get Hooked on Products.

5. There's an undercurrent somewhere in the program that spells the words "data science". This link here offers a rounded perspective on what precisely is data science. This follow-on link here describes 8 concrete steps you must take to become a data scientist. Yes, R features there. Apt read for all CBA students, IMO.

For sessions 1-3 which focussed more on constructs, designing questionnaires around constructs etc., here below is some interesting material which you may consider browsing at leisure. They're basically to help understanding for those folks who may have felt the coverage in class was not detailed enough on certain topics.

This is a Wikipedia link to Quantitative psychology as a subject area. It provides a nice, concise and precise introduction to the area in general and has a good number of downstream links that you can pick up on as and when necessary.

This is the Wiki entry to Scaling techniques in general in the social sciences. As you can see the comparative versus noncomparative dichotomy comes in early on here. More links to detaiuled topics are also available.

This is the wiki entry to psychometrics as a discipline. I thought it a tad too inclined towards educational testing but still, worth a read perhaps, for those interested.

Recall that in one of our sessions (2 or 3?), there was much debate about k-means and other clustering (or, in Marketing speak 'Segmentation') algorithms? There was as well as an element of affinity analysis there.

A conceptual introduction to these terms can be found online as well - for instance, here for market segmentation, for cluster analysis and for affinity analysis in retail analytics.

And of course, there's always google available to produce reports and summaries at varying levels of detail on any subject under the sun.

-------------------------------------------------

These links below are more technical in nature. And are even more optional reading than the ones above. I'd suggest revisiting the below links after a couple of more terms are done in the program.

6. This will be kinda boring to many perhaps. But here's an Academic journal paper on Behavior prediction using social networks

7. And here is an excellent set of slides for computing basic metrics in network data from r-bloggers.com. BTW, you should consider subscribing to their newsletter, if you are into R.

8. More R here. An excellent intro to general R and then some network basics along with code and examples workshop style.

That's it for now. Will update as more comes in. Your Homeworks will be up next. And also, data+R code to replicate classwork examples.

Ciao.

Sudhir