Wednesday, October 7, 2015

Session 5 Homeworks

Class,

Your last set of two homeworks, corresponding to session 5, is coming up. One is short and simple. The other a tad challenging and more of a capstone. I realize I couldn't offer any homework practice on the crowdsource and location-based DC part but I guess this is quite solid for a 5-session course.

Individual Homework:

Pls fill up these surveys according to your section.

Survey for section A

Survey for section B

These surveys are again based on latent topic interpretation, similar to what you've previously. Shouldn't take too much time.

Deadline is a week from now: Midnight of Thursday, 14-Oct-2015. Do remember that the individual homeworks are graded on timeliness and completeness.

----------------------------------------

Group Homework:

First, some context. Recall the Google maps example we did for spatially plotting commercial entities of interest in Hyderabad.

There, we were trying to know the distribution of purchasing power over Hyd [or more generally, *any* other] city. How to know – quickly, cheaply, scaleably and reliably?

i. Replicate that classwork example at home. Download code from LMS. Pls view the LMS video on how to create your own account and get data.

ii. Now pick a city as your focal city. Any city except Hyderabad (since it's already done in class).

iii. Pick a sector-Industry. For example Food--> Pizzerias or High End restaurants. Finance --> ATMs, bank branches etc. Consider yourself to be a consultant for a client who wants to enter the focal city in that business.

iv. Profile your client's target segment, who could be either individuals or organizations [e.g., lower, middle or upper SEC; high net worth Individuals; startups, SMEs, service businesses such as in education or healthcare, large MNCs etc]

v. Pick a list of entities from the entity list that Google provides that could serve as proxies for the presence / purchasing power / needs of your target segment. Pick around 2-3 proxies in all. E.g., in the classwork example, we picked banks, malls and hospitals as proxy entities to indicate nearby presence of middle and upper middle class SEC population.

vi. Collect data on these proxy entities in the focal city from the Google Maps API and plot them on Google maps. Interpret what the map is saying.

Your deliverable will be a PPT with these maps' screen caps should be pasted on the slides. Highlight at least 2-3 areas of particular interest for your client using ovals and textboxes. In a separate slide, explain why your chose those areas as interesting for your client.

vii. Bonus points: Run a simple clustering based on a distance matrix for the entities chosen. Display the clusters in a separate map.

----------------------------------------

Deliverable:

PPT form as per the instructions below.

a. Title slide: City and your Client's Business. Also, names and roll numbers of group members. Name the ppt as group_name.pptx [Pick a group name if you haven't already]

b. Problem Formulation slide: State in brief your client's business problem, using, say 1 D.P and 1-2 R.Os

c. Description slide(s): State why you picked your focal city in 1-2 lines. Describe why you picked your client's business in 1-2 lines. Bonus points for slightly out-of-the-way (or non-mainstream) instances.

d. Proxy List: List the proxy entities you are going to search for. Justify your list in 1-2 lines for each entity type you have chosen.

e. Result slides: Paste google map screen cap with proxy entities on it. Highlight using ovals and arros which areas of interest you have chosen. Choose 2-3 promising areas.

f. Interpretation slide: Justify your choice for the areas picked in a few lines.

g. Bonus points if you could build a distance matrix, cluster the proxy entities and display the clustered entities on a slide. [Check the classwork for this, I did something similar there].

h. Bonus points if you submit your code which we can test and run using our AppIDs here. Submit R code as

i. Submit the ppt in the dropbox before deadline.

Deadline is midnight of Sunday, 1-Nov-2015.

Any queries etc, contact aashish_pandey@isb.edu or me. USe the comments sections for general FAQs.

Sudhir

Additional (Optional) Readings

Class,

Here is a set of readings from sessions 4 and 5 that may be of interest. I do realize I got delayed in putting them out and some of you had followed up with me regarding this.

Recall the Xerox-evolv caselet we did in class - The one where psychographic Likerts were combined with straightforward machine learning? We then discussed implications, pros and cons etc, and speculated on when such practices may spread worldwide and into India. Well, speaking of India...

Startups, and India Inc use psychometric tests to peek into potential recruits’ minds

This is the session 5 reading from NYT on 'Data janitor' work: For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights

This is the source for the session 5 table on Data munging functions in R from computer world: Great R packages for data import, wrangling & visualization

This is the Dell Ideastorm website we talked about in 'DC from crowds' ...

...and this is the landing page for the P&G connect + develop FMCG idea-sourcing endeavor.

This is the NYT article on 'the data driven life' about the quantified self movement.

From session 3, this is the API directory from Programmable web. Recommended that as CBA folks, you register with the site and get periodic updates (if youhaven't already). These are all optional reads, so read at leisure.

One last set of homeworks for session 5 are on their way. Another survey to fill and some Google Maps data to pull and play with. Trust that's doable. Well, shall close here.

Ciao

Sudhir

Friday, October 2, 2015

Session 4 homework - Text analytics and topic modeling

Class,

This group homework is based on session 4 - text analytics and topic-mining.

I hope you are comfortable replicating the classwork R code on LMS. There's also an instructional video Pandeyji putup on twitteR. I urge you to leverage the tutorial session with Aashish this Saturday to clarify all issues in this space.

Homework Instructions:

1. Pick up any well-known brand- product or service. E.g. Xbox360 or Jabong or iphone6 or Nike.

2. Collect 3 sets of data for it:

(a) 100+ consumer reviews from either flipkart or Amazon India
(b) 500+ tweets
(c) 50+ articles from Googlenews or any other news aggregator sites.
3. Feel free to either use R or any other means you know of to collect the data (e.g. Python, chrome scraper etc.). But clearly mention the data collection tool used.

4. For each set of data, perform the following analyses:

(a) General wordcloud using both TF and TFIDF weighing schemes. Update stopwords list to filter out noisy or irrelevant terms.
(b) Sentiment analysis. Display wordclouds separately for the top 50 most positive and most negative words.
(c) Identify the top few most positive and most negative documents. Read them and speculate on why they are so positive or negative about it.

5. Latent topic mining: Topic mine a corpus from any one dataset. Use no more than 2 or 3 topics. Make wordclouds of the topic's tokens. Interpret in a few lines what the topics are saying.

6. Session 4 HW submission format:

i. Use a plain white blank PPT.

ii. On the title slide, write your group name and the names + ISB students IDs of all group members.

iii. Give your homework an informative title (include name of the product/brand you chose).

iv. Have 3 sections in your PPT - one corresponding to one data source and separated by separator slides.

v. As slide separators, mention the source of the data. E.g., "Data source: Amazon Consumer reviews" or "Data Source:Twitter" and so on.

vi. For slide headers, use format "TF Wordcloud" or "Positive wordcloud" and so on.

vii. Save the slide deck as session4HW_yourgroup.ppt.

viii. Put all the raw data you collected, the code you used and your PPT in a zip folder (so that I can replicate your analysis if need arises). Save the folder as session4HW_yourgroup.zip and upload in in the dropbox on LMS before the deadline.

ix. Any Qs etc., let Aashish or me know. Feel free to use the comments section to this post for any Q&A or discussions.

There's just one more [group] homework left - on spatial DC using the Google Maps based on a business problem.

Deadline: Sunday 25-Oct Midnight.

Sudhir

Thursday, October 1, 2015

Session 3 Homeworks - Individual and Group

Hi Class,

Recall that in Session 3 we covered two distinct topics - 'DC from Qualitative research' and 'DC from APIs'. Corresponding to these two session 3 topics, are the following two homeworks - one individual (to be done and submitted independently) and one group (one submission per group).

Individual homework: Qualitative DC

This individual homework is about primary DC on Word-of-mouth (henceforth, WOM) communications.

Read carefully the following 5 steps for your homework.

Instructions:

1. For one full weekday, make a quick note of every instance of offline WOM communication about *any* brand (product or service) that you come across.

Thus for instance, if you happened to mention to your friend that you liked "Bahubali" (movie), then make a note of it.

If your colleague happens to mention to you that s/he was at Continental Hospital for a checkup, make a note of that too.

Or it could be that you were the third party at a conversation between two people arguing over whether 'The Times of India' is better of 'The Hindu'. Make a note of that too.

2. Mind you, this is just for 24 hours, when, in the course of your regular day, you make a mental note of what all products, brands, services etc that you came across via interpersonal WOM (offline only) and then record them in a notepad or an excel sheet or some such place.

Important: Do NOT deliberately indulge in WOM for the homework. Only record that WOM which happens naturally, in the course of your everyday routines.

3. I want you to record 3 things:

(a) Name of the product/brand etc and which category/ industry it belongs to.
(b) Who was the source of the WOM (was it you? a colleague? family member? etc.) and who was the recipient?
(c) what was the time of the day (roughly) when the WOM exchange took place.

Model the worksheet columns as shown in the example below:

4. Repeat steps 1-3 for any 24 hour period during a weekend or holiday.

5. Finally, write your primary data collected into an excel sheet with 5 columns: brand/product, industry or category, WOM source, WOM recipient and Date-time.

Name the excel sheet as "YourName_ISB student number.xls" and upload it to the requisite dropbox in LMS.

Deadline for this individual homework is 15 days from now - i.e. 16 October 2015, Friday, midnight.

Any queries etc pls let me or Suresh know.

-----------------------------------------------------

Group Homework: DC from APIs

One submission per group. Use any tools/platform you prefer, not necessarily R.

Instructions:

1. Replicate at home the classwork exercises on API based DC, including viewing the related video instructions on LMS)

2. Google for free traffic APIs available. A few I found, for instance, were:

Microsoft Bing

Yahoo's traffic API

HERE traffic API

3. Read the relevant documentation. Connect to the API. This is a HCC Level 1 assignment - meaning, one can consult peers and other groups for help as required.

4. There are typically two types of information given out in Traffic APIs - incident data (accidents, crashes etc) and flow data. Your task is to obtain either of the two for any major US city.

5. Display the output as a table (or dataframe or matrix object) with well-defined columns and a few rows for illustration.

6. Submit a PPT with your group members names in the title slide, your chosen API's details in the next slide, code used to pull data from the API in the third slide (at least the URL constructed) and a snapshot of the output on the fourth slide.

Submission Deadline is 15 days from now: midnight of 16-Oct Friday.

Any queries etc pls let me or Aashish know.

Sudhir

Tuesday, September 29, 2015

Individual Homework 2

Hi Class,

Whereas a lot of DC homeworks are group homeworks, some (such as this one) are individual homeworks. This means, each individual submits the homework and no consultation with others is permitted. One survey-based individual homework was already putup.

Recall the subject of latent topic-mining we'd covered in DC Session 4. The machine "discovers" topics latent in text corpora and assigns probabilities for topic membership of word-tokens and for topic-presence in documents. This survey aims to assess machine classification of phrase-tokens and 'documents' when pitted against a superior human standard of classification.

There are no right or wrong answers in what follows, so just diligently solve the questions posed to you. This homework is graded on timeliness and completeness.

We've test-run the surveys. I think most people should be able to complete it comfortably within 15-20 minutes. Hence, a relatively deadline is given.

This is the survey link for Section A people.

This one is for Section B people.

Deadline for filling up: By midnight Sunday 04-October. Any queries, clarifications etc, contact me.

Sudhir

Session 2 Based Homework

Hi Class,

Recall that Session 2 was based on survey design - a critical and widely-used primary DC tool. IMO, it's hard to know the intricacies, advantages and limitations of the tool without getting your hands dirty actually designing a survey questionnaire.

This is a group based homework. Only one submission per group. If you don't know who your group is, pls ask Suresh Dasari about this.

Your client is a player in this space in India's top metros. Wants to know what the demand levels are like for such services at what price, who to target and how.

To understand the problem context, first read the following (small) newspaper articles from the past few months.

On demand homemade meals gaining ground with companies like Foodcloud, Holachef, Biteclub

If security concerns not addressed, on-demand service startups could spell disaster for firms

On-demand home services firm Zimmber in final stages to raise $12 million

Your task is to design a questionnaire that: a. surveys target segment respondents on their propensity to use app-based on-demand home services (e.g., food, housekeeping, hairdressing, maybe even nursing or other medical help etc).

b. can be taken in <15 minutes on a good net connection

c. collects info on the distribution of quantities of interest (such as awareness levels about these services, interest levels in using them, what all services are being considered, what price levels might be viable, etc.)

My suggestions before starting: Come up with a (sharply defined) D.P. and corresponding R.O.s for the client's problem.

1. Pick a city. Pick a few home services your client is offering. Define who the target segment is.

2. To understand this target segment's needs and preferences, do some preliminary, quick qualitative research: E.g., conduct a few interviews (these could be casual conversations or telephonic ones) with a few people in that target segment about the subject. Find out what they think, what they need, what they see others around them doing etc.

3. From the qualitative research, draw up a list of topics to be covered --> information requirements --> in the questionnaire. Don't pick too many topics - just a few which you can cover well (your R.O.s should have explicitly spelt them out).

4. Then map these information requirements into a set of survey questions which meet the Dos and Don'ts of quiestionnaire design we covered in class.

5. Write (or 'Program') your survey into Qualtrics. Obtain the "launch" survey link.

6. Bonus points for using SKIP logic in Qualtrics, pretesting the survey with a few folks first, accounting for order effects etc in questionnaire design, etc.

Submission format: Start with a plain white PPT.

Title slide: Homework name and names+ roll numbers of group members.

First slide: Statement of D.P and corresponding R.O.s

Second slide: Description of qualitative research carried out to first narrow-down what topics to cover in the survey.

Third slide: Listing of the topics (or 'constructs') covered in the survey and corresponding number and type of questions per topic. For example, "Topic: Awareness level of home-based services Questions: 3 Likerts, 2 MCQs".

Fourth slide: Deliverable - qualtrics websurvey link. Also, attach the PDF version of your questionnaire onto this slide.

Fifth slide: Any learnings you as a group made - E.g., what constructs were the easiest to measure? hardest? Etc.

Update: In the past, I got quite a few Qs asking if a scale other than Likert can be used etc. Sure, it can. Likert is important in the context of behavioral constructs. For regular, descriptive Qs, use other scales by all means. *Not* every Q has to be a likert.

The instructions for how to get a qualtrics account will be put up on LMS, if they haven't been done so already.

Deadline for this is midnight of 11-Oct (sunday).

Any queries etc, let me know.

Sudhir

Sunday, September 27, 2015

Some additional readings for DC

Hi Class,

Wide range of topics we'd seen in the DC course. Some of you asked for more sources and reading material. Pls find the same below (in no particular order). Some readings are mandatory and others are totally optional only.

The readings below are NOT optional in the sense that questions based on these readings may feature in your exam.

Readings relating technology to data collection and data use (from the Economist):

1. The first article titled 'Little Brother' (in an obvious play on George Orwell's famous 'big Brother' theme) details the impact of digital on advertising spends of firms worldwide.

2. The second article, 'Getting to know you', is about the various ways in which data is collected about consumers online.

3. The third article in this series, 'The world wild web', extrapolates some of what we are seeing into the future and asks 'Where are we going?'.

Ideally, I'd like you to read and discuss these articles within your groups. Again, remember, questions based on ideas and facts in these articles are fair game in your final exam for DC. Happy reading.

Now, these readings that follow below are optional, more for leisure reading and folks with interest in particular topics/ verticals etc.

a. More from the Atlantic on how its now technologically feasible to arrive at one's Identity. Big Data Can Guess Who You Are Based on Your Zip Code.

b. Recall the habit patterns class we'd covered? Here's an article from HBR blogs on How Customers Get Hooked on Products.

c. There's an undercurrent somewhere in the program that spells the words "data science". This link here offers a rounded perspective on what precisely is data science. This follow-on link here describes 8 concrete steps you must take to become a data scientist. Yes, R features there. Apt read for all CBA students, IMO.

d. For sessions 1-3 which focussed more on constructs, designing questionnaires around constructs etc., here below is some interesting material which you may consider browsing at leisure. They're basically to help understanding for those folks who may have felt the coverage in class was not detailed enough on certain topics:

i. This is a Wikipedia link to Quantitative psychology as a subject area. It provides a nice, concise and precise introduction to the area in general and has a good number of downstream links that you can pick up on as and when necessary.

ii. This is the Wiki entry to Scaling techniques in general in the social sciences. As you can see the comparative versus noncomparative dichotomy comes in early on here. More links to detaiuled topics are also available.

iii. This is the wiki entry to psychometrics as a discipline. I thought it a tad too inclined towards educational testing but still, worth a read perhaps, for those interested.

h. Recall that in one of our sessions (2 or 3?), there was much debate about k-means and other clustering (or, in Marketing speak 'Segmentation') algorithms? There was as well as an element of affinity analysis there.

A conceptual introduction to these terms can be found online as well - for instance, here for market segmentation, for cluster analysis and for affinity analysis in retail analytics.

And of course, there's always google available to produce reports and summaries at varying levels of detail on any subject under the sun.

h. This will be kinda boring to many perhaps. But here's an Academic journal paper on Behavior prediction using social networks.

If you have come across such material which may be of interest to the class, you may email me or put up links to that material in the comments section below.

For instance, Nikhil Maddirala from the previous batch emailed me with information regarding a useful webscraping tool. Recall the Chrome scraper extension/ plug in tool I showed you in class? Well, it seems there's a way to tweak the tool to scrape multiple pages in one go.

See this link here on Scraping multiple Pages using the Scraper Extension and Refine.

Ciao

Sudhir

Saturday, September 26, 2015

Homework Assignment 1 (Individual homework)

Hi Class,

A series of homeworks are coming your way. The first is described below. I will putup details for the others shortly.

Pls watch this ~ 20 minute video carefully. It features Scott McDonald of Condé Nast holding fort on where Marketing Research is headed in the next decade.

“Social Technological and Economic forces affecting Marketing Research over the next decade”

Now, for your HW, pls answer a few simple Qs (True-False, fill in the blanks variety) about the above talk in the following survey:

Questions based on the video

HW Notes:

(i) This is an individual-only HW. Since it involves no R, consulting peers is not permitted.

(ii) I found that using earphones works great in making out what the speaker is saying much more clearly than ordinary speakers. FYI.

(iii) Deadline: The HW should be completed and submitted latest by midnight 04-October Sunday.

Any Qs etc, pls feel free to email me or use the comments section below.

Sudhir Voleti

Thursday, September 24, 2015

Hi again

Folks,

Got done with our 5 sessions together and it went by quickly and breezily.

The real work of consolidating the things we learnt and building upon them starts now - by getting our hands dirty with data and minds challenged by reality.

Pls keep an eye for these pages for your homeworks, extra links of interest etc. LMS alerts will be sent for all homework related posts.

Am happy to note that some of you have already started using the comments section of the blog to communicate with me and post interesting and relevant links. This below was sent by an 'anonymous' commenter.

the internet would influence US$ 35 billion of FMCG sales in India by 2020

Here are the class fotos with the two sections.

Yup. From now on, shall make it a tradition to click a picture on the last day with every class I teach. :)

Sudhir

Friday, September 18, 2015

Hi Class

A big "Hi" to everybody.

This is a pro-forma welcome message from me to CBA batch 5 taking Data Collection (DC) in Sept 2015.

DC will use some R. This blog can be a repository for related R code and assistance. Feedback, Q&A etc are always welcome via the comments sections.

Pls download and install both R and Rstudio from LMS, if you haven't already.

Looking forward to smooth sailing.

Sudhir Voleti
Assistant Professor of Marketing
ISB Hyderabad

Thursday, March 26, 2015

Re the DC FInal Exam

Class,

The Data Collection Final exam is on 28th March 2-4 p.m in the after noon. I'm almost done making the paper. Here's what you can expect.

1. The exam will be conducted via a qualtrics web survey. The survey will be launched and weblink sent to you via email at 2 pm on 28th. By 4 pm, you should have completed and 'submitted' the survey/exam. Once the 2 hours time limit is up, the survey will be disabled.

2. No surprise that a lot of the Qs will be of the short answer, multiple-choice, fill-in-the-blanks, true-false type. However, do expect a few short written types also (~ 50 - 75 words max).

3. This is an open-book open-notes exam. You are also allowed to use R, Excel or any other software as required. (Yes, I may give you a small dummy dataset for basic summarizing and interpretation. Nothing sophisticated, but very simple stuff and Excel may well be enough.).

4. Pls remember to strictly adhere to the honor code - Strictly no communication with any third person during the exam, no copying or saving of exam questions etc.

5. Pls ensure you have a good net connection, some sort of power backup if required (laptop is ideal for this) and access to all your course material.

6. All restrictions that generally apply to websurveys apply here. You may not be able to go back to a previous question after a page is turned. Once submitted, a submission will be counted as final.

Any Qs etc, let me or Atreyee know.

Good luck!

Sudhir

Sunday, March 15, 2015

Final Individual Homework for D.C. (and Other Updates)

Class,

Pls find below your final individual homework - different surveys for different sections.

Survey link for section A

Survey link for Section B

First, a little bit of background. You may recall the ice-cream survey example I had covered in class. Well, I mined semantic topics or themes from them.

Attempt now is to evaluate the degree of concordance between machine classification of customer responses and human classification of the same. So basically, I'm using these surveys to collect data from you on the elements of human classification understanding and behavior.

Pls follow the instructions and solve the survey. I expect it won't take more than 10-12 minutes on average.

Submission deadline is a week from now, i.e. midnight of 22-March-2015 (Sunday).

Any Queries etc, let me know.

Other updates:

Apparently quite a few queries on R code remain even after the tutorial y'day.

I will see if I can conduct a follow-up tutorial myself next Saturday (if a slot is available). Attendance etc is optional, of course.

Monday, March 9, 2015

Individual Homework - D.C. on Word of Mouth

Hi Class,

This is individual homework, to be done by each of you independently.It is about primary data collection on Word-of-mouth (henceforth, WOM) communications

Read carefully the following 5 steps for your homework.

Instructions:

1. For one full weekday, make a quick note of every instance of offline WOM communication about *any* brand (product or service) that you come across.

Thus for instance, if you happened to mention to your friend that you liked "The theory of everything" (movie), then make a note of it.

If your colleague happens to mention to you that s/he was at Continental Hospital for a checkup, make a note of that too.

Or it could be that you were the third party at a conversation between two people arguing over whether 'The Times of India' is better of 'The Hindu'. Make a note of that too.

2. Mind you, this is just for 24 hours, when, in the course of your regular day, you make a mental note of what all products, brands, services etc that you came across via interpersonal WOM (offline only) and then record them in a notepad or an excel sheet or some such place.

Important: Do NOT deliberately indulge in WOM for the homework. Only record that WOM which happens naturally, in the course of your everyday routines.

3. I want you to record 3 things:

  • (a) Name of the product/brand etc and which category/ industry it belongs to.

  • (b) Who was the source of the WOM (was it you? a colleague? family member? etc.) and who was the recipient?

  • (c) what was the time of the day (roughly) when the WOM exchange took place.

4. Repeat steps 1-3 for any 24 hour period during a weekend or holiday.

5. Finally, write your primary data collected into an excel sheet with 5 columns: brand/product, industry or category, WOM source, WOM recipient and Date-time.

Name the excel sheet as "YourName_ISB student number.xls" and upload it to the requisite dropbox in LMS.

Deadline for this individual homework is 10 days from now - i.e. 18 March 2015 midnight.

Any queries etc pls let me or Atreyee know.

Sudhir

Session 4 based Group Homework for Batch 4

Class,

This group homework is based on session 4 - text analytics.

I did toy with the idea of inserting a latent topic modeling and interpretation component to this homework but decided against it as it isn't strictly in D.C.'s domain.

The code required to do this HW will be up soon on LMS.

Group HW:

1. Pick up any well-known brand- product or service. E.g. Xbox360 or Jabong or iphone6 or Nike.

2. Collect 3 sets of data for it:

  • (a) 100+ consumer reviews from either flipkart or Amazon India
  • (b) 500+ tweets
  • (c) 50+ articles from Googlenews or any other news aggregator sites.

3. Feel free to either use R or any other means you know of to collect the data (e.g. Python, chrome scraper etc.). But clearly mention the data collection tool used.

4. For each set of data, perform the following analyses:

  • (a) General wordcloud using both TF and TFIDF weighing schemes. Update stopwords list to filter out noisy or irrelevant terms.
  • (b) Sentiment analysis. Display wordclouds separately for the top 50 most positive and most negative words.
  • (c) Identify the top few most positive and most negative documents. Read them and speculate on why they are so positive or negative about it.

5. Session 4 HW submission format:

  • Use a plain white blank PPT.
  • On the title slide, write your group name and the names + ISB students IDs of all group members.
  • Give your homework an informative title (include name of the product/brand you chose).
  • Have 3 sections in your PPT - one corresponding to one data source and separated by separator slides.
  • As slide separators, mention the source of the data. E.g., "Data source: Amazon Consumer reviews" or "Data Source:Twitter" and so on.
  • For slide headers, use format "TF Wordcloud" or "Positive wordcloud" and so on.
  • Save the slide deck as session4HW_yourgroup.ppt.
  • Put all the raw data you collected, the code you used and your PPT in a zip folder (so that I can replicate your analysis if need arises). Save the folder as session4HW_yourgroup.zip and upload in in the dropbox on LMS before the deadline.

Any Qs etc., let Atreyee or me know. Feel free to use the comments section to this post for any Q&A or discussions.

There are two more homeworks coming your way - both individual - and only one of them is a survey based one.

Sudhir

Saturday, March 7, 2015

Individual Homework 2

Hi Class,

First off, happy holi to you all.

This below is your second of three individual homeworks - all of which involve survey filling only - in the course.

This survey is divided into 3 sections (or 'blocks' in Qualtrics' terminology), viz. a psychographic questionnaire, an open-ended elicitation of brand preferences and finally, a close-ended (i.e scaled) elicitation of preferences over particular brands. Pls note that there are no 'right' or 'wrong' answers in the survey. This homework is graded on completeness and timeliness.

I expect upto 20-25 minutes for the survey, so kindly make that much time and take it.

The survey can be found here at this link.

Deadline is 15-March Sunday midnight.

Any queries etc, pls contact me.

Sudhir

Tuesday, March 3, 2015

More readings and material for DC

Hi Class,

I'd put up a set of additional and optional reading material in an earlier blog post for CBA batch 4.

This is more along the same lines. However, the readings I point to here are not optional in the sense that questions based on these readings may feature in your exam.

Readings relating technology to data collection and data use (from the Economist):

1. The first article titled 'Little Brother' (in an obvious play on George Orwell's famous 'big Brother' theme) details the impact of digital on advertising spends of firms worldwide.

2. The second article, 'Getting to know you', is about the various ways in which data is collected about consumers online.

3. The third article in this series, 'The world wild web', extrapolates some of what we are seeing into the future and asks 'Where are we going?'.

Ideally, I'd like you to read and discuss these articles within your groups. Again, remember, questions based on ideas and facts in these articles are fair game in your final exam for DC. Happy reading.

If you have come across such material which may be of interest to the class, you may email me or put up links to that material in the comments section below.

For instance, Nikhil Maddirala from your batch emailed me with information regarding a useful webscraping tool. Recall the Chrome scraper extension/ plug in tool I showed you in class? Well, it seems there's a way to tweak the tool to scrape multiple pages in one go.

See this link here on Scraping multiple Pages using the Scraper Extension and Refine.

Ciao

Sudhir

R code and Data files on LMS (Sessions 4 and 5)

Hi all,

Sorry about the delay in updating the blog and the LMS with the R code and data from sessions 4 and 5.

Pls find folders on LMS containing R code, data and instructions.

As CBA students, my expectation is that you will:

(i) diligently follow the instructions given,

(ii) read and understand the R code line-by-line before running it,

(iii) run the code and replicate the classwork examples,

(iv) discuss any issues etc that arise here on this blog by using the comments section,

(v) solve the next group homework by tweaking and customizing the R code as required, and

(vi) provide constructive feedback where possible.

Instructions:

1. Unzip contents of the zip folder

2. Open Rstudio. File menu --> Open File --> textanalysis R code.R

3. the textanalysis R code.R file will open as an additional window (on the top left) in Rstudio)

4. To run any lines, select them and click the Run icon on the top right of the window. Ensure internet is connected.

5. Read the lines before running as some require input from your side (which files to read in etc)

6. The zip folder contents are self-contained and hopefully should run smoothly. However, if you encounter issues, pls let us know.

7. Pls email aashish_pandey@isb.edu with a copy to Atryee in case of any R related issues. Your group homework for this session will be up soon, in a few days. Pls ensure you are comfortable with this code before the homework arrives.

Thanks.

Sudhir

Monday, March 2, 2015

Group Homework 1

Hi Class,

This group homework covers sessions 1-3 in DC, i.e. problem formulation in terms of D.Ps and R.Os, construct formulation around the D.Ps and R.O.s, questionnaire design around the constructs uncovered, and actually programming a websurvey on ISB's qualtrics websurvey software.

Read the following recent Businessweek article:

Coke's big fat problem.

Imagine you are in the shoes of Sandy Douglas. Now, do the following...

(i) From his 'messy reality', extract a relevant and pressing R.O. (stated clearly in words).

(ii) Map that R.O. onto 'information requirements' (see session 2 slides) that are built around some critical constructs of interest. Give these constructs a descriptive name.

In real life, we'd use exploratory/qualitative work extensively at this stage. Assume you have done so already.

(iii) Now, further break down the construct(s) you identified above into one-dimensional aspects that can be captured using Likerts.

(iv) Define your target audience/ target segment as teenagers. Develop a questionnaire for this target audience that can be taken in under 12 minutes.

Use of SKIP logic and any other Qualtrics features is welcome.

(v) Program your questionnaire into a websurvey into Qualtrics. The survey URL (obtained upon launching) is the deliverable and should be pasted along with your group name in this google form.

(vi) The first page of your survey should be descriptive text only, meant for me and the AAs. Pls write cogently the answers to parts (i) to (iv) above in that space.

Update: In the past, I got quite a few Qs asking if a scale other than Likert can be used etc. Sure, it can. Likert is important in the context of behavioral constructs. For regular, descriptive Qs, use other scales by all means. *Not* every Q has to be a likert.

The instructions for how to get a qualtrics account will be put up on LMS, if they haven't been done so already.

Deadline for this is midnight of 22-March (sunday).

Sudhir

Individual Homework 1

Hi Class,

There are at least two individual homeworks in DC. Both involve filling-up surveys on time and on quality.

The first homework is described below. I will putup details for the second one shortly.

Pls watch this ~ 20 minute video carefully. It features Scott McDonald of Condé Nast holding fort on where Marketing Research is headed in the next decade.

“Social Technological and Economic forces affecting Marketing Research over the next decade”

Now, for your HW, pls answer a few simple Qs (True-False, fill in the blanks variety) about the above talk in the following survey:

Questions for Make-up Homework.

HW Notes:

(i) This is an individual-only HW. Since it involves no R, consulting peers is not permitted.

(ii) I found that using earphones works great in making out what the speaker is saying much more clearly than ordinary speakers. FYI.

(iii) Deadline: The HW should be completed and submitted latest by midnight 15-March.

Any Qs etc, pls feel free to email me or use the comments section below.

Sudhir Voleti

Links to additional material for DC

Hi class,

Wide range of topics we'd seen in the DC course. Some of you asked for more sources and reading material. Pls find the same below (in no particular order) and totally optional only:

1. Recall the google glass example somebody had raised in class? Well, here's a Gigaom article on the Future of the wearables market.

2. Recall the first example in the network analytics class on world international call patterns? Well, here's the associated Atlantic article on a World mapped by phone calls. It nicely illustrates how much visualization of networks can tell us.

3. More from the Atlantic on how its now technologically feasible to arrive at one's Identity. Big Data Can Guess Who You Are Based on Your Zip Code

4. Recall the habit patterns class we'd covered? Here's an article from HBR blogs on How Customers Get Hooked on Products.

5. There's an undercurrent somewhere in the program that spells the words "data science". This link here offers a rounded perspective on what precisely is data science. This follow-on link here describes 8 concrete steps you must take to become a data scientist. Yes, R features there. Apt read for all CBA students, IMO.

For sessions 1-3 which focussed more on constructs, designing questionnaires around constructs etc., here below is some interesting material which you may consider browsing at leisure. They're basically to help understanding for those folks who may have felt the coverage in class was not detailed enough on certain topics.

This is a Wikipedia link to Quantitative psychology as a subject area. It provides a nice, concise and precise introduction to the area in general and has a good number of downstream links that you can pick up on as and when necessary.

This is the Wiki entry to Scaling techniques in general in the social sciences. As you can see the comparative versus noncomparative dichotomy comes in early on here. More links to detaiuled topics are also available.

This is the wiki entry to psychometrics as a discipline. I thought it a tad too inclined towards educational testing but still, worth a read perhaps, for those interested.

Recall that in one of our sessions (2 or 3?), there was much debate about k-means and other clustering (or, in Marketing speak 'Segmentation') algorithms? There was as well as an element of affinity analysis there.

A conceptual introduction to these terms can be found online as well - for instance, here for market segmentation, for cluster analysis and for affinity analysis in retail analytics.

And of course, there's always google available to produce reports and summaries at varying levels of detail on any subject under the sun.

-------------------------------------------------

These links below are more technical in nature. And are even more optional reading than the ones above. I'd suggest revisiting the below links after a couple of more terms are done in the program.

6. This will be kinda boring to many perhaps. But here's an Academic journal paper on Behavior prediction using social networks

7. And here is an excellent set of slides for computing basic metrics in network data from r-bloggers.com. BTW, you should consider subscribing to their newsletter, if you are into R.

8. More R here. An excellent intro to general R and then some network basics along with code and examples workshop style.

That's it for now. Will update as more comes in. Your Homeworks will be up next. And also, data+R code to replicate classwork examples.

Ciao.

Sudhir

Saturday, February 21, 2015

Hi Class

This is a welcome message to CBA batch 4 taking Data Collection (DC) in Feb 2015.

DC will use some R. This blog can be a repository for related R code and assistance. Feedback, Q&A etc are always welcome via the comments sections.

Pls download and install both R and Rstudio from LMS, if you haven't already.

Looking forward to smooth sailing.

Sudhir Voleti
Assistant Professor of Marketing>
ISB Hyderabad