Analytics Yogi: 2014

Monday, November 17, 2014

Homework for Session 5 - Batch 3 CBA

Hi all,

Please find here the individual homework for session 5.

Pls ensure you are able to replicate classwork examples with the R code sent before you try this one.

The idea is simple. I will require you to:

1. Pull your facebook (FB) data. Your friends' list. Pls use the Rfacebook package and the instructions from the slides.

2. Run the communities-detection algorithm on it.

3. Paste a screenshot of the network with communities on a slide. Identify the top few clearly identified groups that you can see (like I'd shown for my FB pull in the class slides).

4. Analyze the 5 largest communities you got in terms of (i) size, transitivity, density, centralities, and (ii) meaning (how does the community relate to the ego or focal person).

Submission format and deadline the same as in the past. Save your PPT as (your.full.name).pptx

Any queries etc, contact me.

Thanks.

Sudhir

Saturday, November 15, 2014

Mailbag

Hi all,

Received this in the mail today and responded to it. Am putting up the exchange here coz I think it merits further dissemination.

The email I got:

Hi Sudhir,
I am a CBA Batch -3 student from Section-A
I am facing issues relating to very fundamental meanings of terminology introduced in DCBA. It may be due to the reason that I am not a business guy who is well versed with business terminology.
eg I am not comfortable with the following keywords: construct, dichotomy, Costs of Capital, trickier proposition, business meta-process and so many keywords introduced on Slide 16 Problem Formulations Examples of R.O.s, and then in psychometric scaling
Considering the example of Baskin Robbins. I am not able to get how psychology is coming into the picture here?
I may sound as asking stupid questions but I know I need to do something about it so that I can get comfortable with this subject.
Please provide me some directions..
Thanks,
P

My response:

Hi P,
Let me try to systematically answer what I can.
1. Regarding what a 'construct' means in our context, pls refer to this blogpost (from the PGP class):
http://marketing-yogi.blogspot.in/2014/09/session-2-exposition-what-are.html
2. The definitions of 'dichotomy', 'business processes' and meta-processes, 'cost of capital' can be had from a google search. A dichotomy means a branching into two separate streams. Thus, Data types exhibit a dichotomy - primary versus secondary data etc.
3. 'trickier proposition' is an expression in speech that means "is more problematic" or "is more challenging".
4. Not every construct need have profound psychological drivers. Many are fairly routine and habit driven.
5. The Baskins Robbins example has nothing to do with psychology. Its merely meant to illustrate the primary-secondary dichotomy.
I hope that helps clarify things somewhat, at least. Thanks for reaching out. I might put up this entire exchange on the blog, in case other students are also facing the same problem.
Thanks,
Sudhir

***************************

Updates. Received two more email queries. My responses are also putup below.

Hi Professor,
I am a student of CBA batch 3. I just had a query around the R code for text analysis (filename: textanalysis R code.R), I have gone through the entire code and wanted to understand the last part i.e. Bayes Factor Model selection and thereon. Can you kindly guide me on this?
I am not able to conceptually grasp the concept of Factor Model and the output from that point onwards.
Look forward to hearing from you.
RT

My response:

Hi RT,
>> I have gone through the entire code and wanted to understand the last part i.e. Bayes Factor Model selection and thereon. Can you kindly guide me on this?
Your query concerns what we call 'model selection' in statistics. A model is a set of relations which we fit upon data to explain them and/or make predictions about them. However, there maybe multiple models that fit the same data.
One way to sort through this multiplicity of models and select the *best* one is to first find how well each model 'fits' the data (i.e. has the least squared error). Accordingly, various 'goodness of fit' criteria have been developed and deployed. The Bayes Factor is one such, very important fit metric in Bayesian statistics.
For our purposes, just take the model results and use them to select the model with the optimal number of components (optimal, as decided by the log bayes factor). Going beyond that would be beyond the scope of the DC course. Wikipedia and other web resources are available however, in case you want to do a deep-dive into fit statistics in general and Bayes Factors in particular.
>> I am not able to conceptually grasp the concept of Factor Model and the output from that point onwards.
When we 'factorize' something (say, a), we break it down into pieces (say, b,c and d) such that the product of b*c*d will yield a back.
In general, any number can be 'factorized' into a product of primes. Similarly, when we factorize a matrix, we break it down into 'factors' whose product yields the original matrix again.
We took the TDM and 'factorized' it (conceptually only the LDA is more complex in its assumptions and its estimation) into 'factors' - terms that together can be interpreted as topics.
For our purposes, all we need to know is that using the latent topic factor model, we 'broke down' the corpus into distinct 'topics' or themes that can be interpreted and used for further analysis.
I hope that helps clarify.
Sudhir

Another one below:

Respected Professor,
I'm a CBA student from technology background. I need your help regarding data collection:
1. Is there any book that I can refer? I feel I'm lost with so much of info/topics. Also with no audio for first class, it seems I don't have way to revisit the fundamentals discussed.
2. It will be extremely helpful if you can please provide some practice papers and solutions. (hope that's possible)
3. Could you please also clarify whether Facbook assignment is group or individual H.W.?
SP

My response:

Hi SP,
>> 1. Is there any book that I can refer? I feel I'm lost with so much of info/topics. Also with no audio for first class, it seems I don't have way to revisit the fundamentals discussed.
I don't use any one text book for DC. The material is collected and collated from multiple sources. However, wikipedia is your friend in case you need more detail on particular topics. Also pls check the early blogposts for yourbatch on analytics-yogi.blogspot.in where some additional links and material was putup.
>> 2. It will be extremely helpful if you can please provide some practice papers and solutions. (hope that's possible)
The exam is open book-open notes. The questions are all short answer quetions (no essay length stuff) for more grade-ability and objectivity. I can;t make any promises regarding the practice exam at this point as I plan to modify the exams I have from previously for this batch as well.
>> 3. Could you please also clarify whether Facbook assignment is group or individual H.W.?
Individual. Because each of you has to pullup your own FB data.
Hope that clarifies.
Sudhir

Ciao.

Friday, November 14, 2014

Make-up Assignment

Hi,

Make-up Assignment in lieu oif survey filling:

Pls watch this ~ 20 minute video carefully. It features Scott McDonald of Condé Nast holding fort on where MKTR is headed.

“Social Technological and Economic forces affecting Marketing Research over the next decade”

Now, for your HW, pls answer a few simple Qs (True-False, fill in the blanks variety) about the above talk in the following survey:

Questions for Make-up Homework.

HW Notes:

(i) This is an individual-only HW. Since it involves no R, consulting peers is not permitted.

(ii) I found that using earphones works great in making out what the speaker is saying much more clearly than ordinary speakers. FYI.

(iii) Deadline: The HW should be completed and submitted latest by midnight 10-December.

Any Qs etc, pls feel free to email me or use the comments section below.

Sudhir Voleti

Thursday, November 13, 2014

Interesting links from different facets of the DC course

Hi class,

Wide range of topics we'd seen in the DC course. Some of you asked for more sources and reading material. Pls find the same below (in no particular order) and totally optional only:

1. Recall the google glass example we'd seen in class? Well, here's a Gigaom article on the Future of the wearables market.

2. Recall the first example in the network analytics class on world international call patterns? Well, here's the associated Atlantic article on a World mapped by phone calls. It nicely illustrates how much visualization of networks can tell us.

3. More from the Atlantic on how its now technologically feasible to arrive at one's Identity. Big Data Can Guess Who You Are Based on Your Zip Code

4. Recall the habit patterns class we'd covered? Here's an article from HBR blogs on How Customers Get Hooked on Products.

5. There's an undercurrent somewhere in the program that spells the words "data science". This link here offers a rounded perspective on what precisely is data science. This follow-on link here describes 8 concrete steps you must take to become a data scientist. Yes, R features there. Apt read for all CBA students, IMO.

-------------------------------------------------

These links below are more technical in nature. And are even more optional reading than the ones above. I'd suggest revisiting the below links after a couple of more terms are done in the program.

6. This will be kinda boring to many perhaps. But here's an Academic journal paper on Behavior prediction using social networks

7. And here is an excellent set of slides for computing basic metrics in network data from r-bloggers.com. BTW, you should consider subscribing to their newsletter, if you are into R.

8. More R here. An excellent intro to general R and then some network basics along with code and examples workshop style.

That's it for now. Will update as more comes in.

Ciao.

Sudhir

Session 4 Homework for CBA Batch 3

Class,

Individual homework:

Fill up this survey below (on perceptions of what constitutes IT capabilities in a firm). If you have any issues with doing so, let me know and I will assign alternate individual homework.

IT capabilities survey

Group HW:

1. Pick up any well-known brand- product or service. E.g. Xbox360 or Jabong or iphone6 or Nike.

2. Collect 3 sets of data for it:

(a) 100+ consumer reviews from either flipkart or Amazon India

(b) 500+ tweets

3. Feel free to either use R or any other means you know of to collect the data (e.g. Python, chrome scraper etc.). But clearly mention the data collection tool used.

4. For each set of data, perform the following analyses:

(a) General wordcloud using both TF and TFIDF weighing schemes. Update stopwords list to filter out noisy or irrelevant terms.

(b) Sentiment analysis. Display wordclouds separately for the top 50 most positive and most negative words.

(c) Identify the top few most positive and most negative documents. Read them and speculate on why they are so positive or negative about it.

5. Session 4 HW submission format:

Use a plain white blank PPT.
On the title slide, write your group name and the names + ISB students IDs of all group members.
Give your homework an informative title (include name of the product/brand you chose).
Have 3 sections in your PPT - one corresponding to one data source and separated by separator slides.
As slide separators, mention the source of the data. E.g., "Data source: Amazon Consumer reviews" or "Data Source:Twitter" and so on.
For slide headers, use format "TF Wordcloud" or "Positive wordcloud" and so on.
Save the slide deck as session4HW_yourgroup.ppt.
Put all the raw data you collected, the code you used and your PPT in a zip folder (so that I can replicate your analysis if need arises). Save the folder as session4HW_yourgroup.zip and upload in in the dropbox on LMS before the deadline.

Any Qs etc., let Atreyee or me know. Feel free to use the comments section to this post for any Q&A or discussions.

Sudhir

Session 2 Group Homework for CBA Batch 3

Hi all,

This homework covers sessions 1,2 and 3, i.e. problem formulation, construct assessment through qualitative research, and questionnaire design for primary data collection.

Group HW:

Consider the following Business problem.

A firm is planning to build a smartphone app that offers location-based services.
The app will collect details about deals, discounts etc from stores on one side and lets inform subscribers about these deals when they are within one cell tower range (roughly a km) of the business establishments where these deals are being offered.
The firm is targeting people below age 35 in the middle and upper-middle class in metropolitan India. The firm however wants to know what about the target segment's app usage habits in general. What types of apps do people use? Why? How many have used apps to transact business online (e.g., pay for orders placed) etc?
Your tasks will be to (1). conduct some exploratory/ qualitative research to find out what constructs underlie people's app based propensities and behaviors. Think of running a small focus group, or conducting a few in-depth interviews with knowledgeable people. (2). Formulate the problem in terms of a D.P. and a few R.O.s that correspond to it. (3). Design a questionnaire centered around measuring the constructs of interest you have identified. Read parts 1,2 and 3 in the HW below.

HW Part 1: Problem Formulation

Q.1.1. Write a decision problem (D.P.) to describe the business problem of interest.

Q.1.2. Write a few research objectives (R.O.s) to address this D.P. (pls use format specified for R.O. in class slides).

HW Part 2: Construct Analysis

Q.2.1. Conduct some exploratory/ qualitative research to find out what constructs underlie people's app based propensities and behaviors. FOr example, you could run a small focus group discussion, or conduct a few in-depth interviews with knowledgeable people.

Q.2.2. List a few major constructs you find from your data collected in Q.2.1. that are of business interest.

Q.2.3. Pick any one construct you have listed in Q.2.2. and break it down into a few aspects. Ask yourself what motives, means and opportunities drive the behavior associated with construct.

Q.2.4. Make a table with 2 columns. In the first column, write the names of the aspects you came up with. In the second column, corresponding to each aspect, write a Likert statement that you might use in a Survey Questionnaire to measure that aspect.

HW Part 3: Web-Survey Programming

Q.3.1. Build a web survey using any free online websurvey tool of your choice. E.g., surveymonkey.com or zoomerang.com offer free websurvey services.

Alternately, you can try Qualtrics, the ISB subscribed survey software. Instructions for how to setup a qualtrics account using your ISB email have been uploaded on LMS

Ensure your questionnaire is "complete" i.e. has an introduction, a section for the psychographic Likerts, a demographic section, and some gateway questions and SKIP logic.

Session 2 HW submission format:

Use a plain white blank PPT.
On the title slide, write your group name and the names + ISB students IDs of all group members.
Give your homework an informative title.
For slide headers, use format "HW Part 1: [Slide content description]" and so on.
Pls mention clearly the Question numbers you are solving in the slide body. Use fresh slides for each new article
Use a blank slide to separate HW Part 2 from HW Part 1.
Provide a working link for your websurvey on a fresh slide titled "HW part 3".
It is advisable to run a pre-test. Perhaps take the survey a few times to ensure clarity, readability, working SKIP logic etc is in place.
Save the slide deck as session2HW_yourgropup.ppt and put in in the dropbox on LMS before the deadline.

HW submission deadline: midnight of 10-December-2015. That's it from me. Any Qs etc., let Atreyee or me know. Feel free to use the comments section to this post for any Q&A or discussions.

Sudhir

Tuesday, November 11, 2014

Session 4 Classwork files on LMS

Hi all,

Sorry about the delay in updating the blog and the LMS.

Pls find on LMS R code, data and instructions for sessions 4 (text analysis).

That for session 5 (network analysis) will come in the next couple of days.

As CBA students, my expectation is that you will:

(i) diligently follow the instructions given,

(ii) read and understand the R code line-by-line before running it,

(iii) run the code and replicate the classwork examples,

(iv) discuss any issues etc that arise here on this blog by using the comments section,

(v) solve the group homeworks by tweaking and customizing the R code as required, and

(vi) provide constructive feedback where possible.

Instructions:

1. Unzip contents of the zip folder

2. Open Rstudio. File menu --> Open File --> textanalysis R code.R

3. the textanalysis R code.R file will open as an additional window (on the top left) in Rstudio)

4. To run any lines, select them and click the Run icon on the top right of the window. Ensure internet is connected.

5. Read the lines before running as some require input from your side (which files to read in etc)

6. The zip folder contents are self-contained and hopefully should run smoothly. However, if you encounter issues, pls let us know.

7. Pls email aashish_pandey@isb.edu with a copy to Atryee in case of any R related issues. Your group homework for this session will be up soon, in a few days. Pls ensure you are comfortable with this code before the homework arrives.

Thanks.

Sudhir

Thursday, November 6, 2014

Session 2 Updates November 2014

Hi all,

Session 2 got done today. We covered a serious lot of ground, even though it may not seem so at first glance.

Doing basic survey design principles, construct basics *and* questionnaire design all in one go in 110 minutes... each of those topics would merit at least a whole session all by itself ideally...

Update:

1. There will be a group homework assignment for this class. As you can guess it will involve your designing a survey around a construct of interest and programming it into any web survey software. This will be a group assignment. I will put up details later after these 5 rush-days of teaching are done...

Today's individual assignment is filling up this survey. You will have until tomorrow evening 10 pm to so. But then, tomorrow, again, there will be more surveys to fill...

SNA survey for sec A

SNA survey for sec B

2. Recall the mini-caselet we discussed in class today. Some of you asked for more info.

The 'Moneyball' approach to hiring CEOs

It was the lesson of the best-selling book-turned-movie, Moneyball: Don’t throw money at big-name baseball players or judge future performance by purely physical attributes. Assess them, instead, by more relevant measurements, like their on-base percentage.
Wharton professor J. Scott Armstrong and Philippe Jacquart of EMLYON Business School in Écully, France, say the same principles can be applied to choosing corporate executives. In a recent paper, they challenge the popular belief that higher pay leads to selecting chief executive officers who will outperform their lower-compensated counterparts.
[...]Instead of throwing money at “superstars,” companies would be better served by using quantifiable measures to pick the right CEO, according to recent Wharton research.

Well, that should go some distance in answering whether Moneyball principles could be applied to hiring for more managerial positions. But reassuringly, the hires will all be human only. Machines still cannot hope to do a CEOs job. Yet.

3. Another article from a recent Economist issue and (ominously?) titled "The future of jobs" has this to say:

A new wave of technological progress may dramatically accelerate this automation of brain-work. Evidence is mounting that rapid technological progress, which accounted for the long era of rapid productivity growth from the 19th century to the 1970s, is back. The sort of advances that allow people to put in their pocket a computer that is not only more powerful than any in the world 20 years ago, but also has far better software and far greater access to useful data, as well as to other people and machines, have implications for all sorts of work.
[...] Ten years ago technologically minded economists pointed to driving cars in traffic as the sort of human accomplishment that computers were highly unlikely to master. Now Google cars are rolling round California driver-free no one doubts such mastery is possible, though the speed at which fully self-driving cars will come to market remains hard to guess.
Even after computers beat grandmasters at chess (once thought highly unlikely), nobody thought they could take on people at free-form games played in natural language. Then Watson, a pattern-recognising supercomputer developed by IBM, bested the best human competitors in America’s popular and syntactically tricksy general-knowledge quiz show “Jeopardy!” Versions of Watson are being marketed to firms across a range of industries to help with all sorts of pattern-recognition problems. Its acumen will grow, and its costs fall, as firms learn to harness its abilities.
The machines are not just cleverer, they also have access to far more data. The combination of big data and smart machines will take over some occupations wholesale; in others it will allow firms to do more with fewer workers. Text-mining programs will displace professional jobs in legal services. Biopsies will be analysed more efficiently by image-processing software than lab technicians. Accountants may follow travel agents and tellers into the unemployment line as tax software improves. Machines are already turning basic sports results and financial data into good-enough news stories.
Jobs that are not easily automated may still be transformed. New data-processing technology could break “cognitive” jobs down into smaller and smaller tasks.

Well, tech 'progress' cannot be stopped I guess. But its the distribution of reward that had the Economist (and consequently me too) all worried. Only. See below.

4. How do the economic spoils get split up in the coming years? Who gets what share of the prosperity pie? And why?

Yet some now fear that a new era of automation enabled by ever more powerful and capable computers could work out differently. They start from the observation that, across the rich world, all is far from well in the world of work. The essence of what they see as a work crisis is that in rich countries the wages of the typical worker, adjusted for cost of living, are stagnant. In America the real wage has hardly budged over the past four decades. Even in places like Britain and Germany, where employment is touching new highs, wages have been flat for a decade. Recent research suggests that this is because substituting capital for labour through automation is increasingly attractive; as a result owners of capital have captured ever more of the world’s income since the 1980s, while the share going to labour has fallen.
At the same time, even in relatively egalitarian places like Sweden, inequality among the employed has risen sharply, with the share going to the highest earners soaring.

So who might be the winners and losers in what is surely coming? Here's a clue

There will still be jobs. Even Mr Frey and Mr Osborne, whose research speaks of 47% of job categories being open to automation within two decades, accept that some jobs—especially those currently associated with high levels of education and high wages—will survive (see table). Tyler Cowen, an economist at George Mason University and a much-read blogger, writes in his most recent book, “Average is Over”, that rich economies seem to be bifurcating into a small group of workers with skills highly complementary with machine intelligence, for whom he has high hopes, and the rest, for whom not so much.

5. The good news? The future for bright Business analytics people who combine non-standardized inputs (such as those from exploratory and/or qualitative work on the demand side) with machine intelligence is bright. When all is done, chances are you will belong to that select group. Change is coming whether we like it or not. The best we can do is to be better prepared. And that we are already doing...

OK, this part stretched longer than I intended. Will update and complete this post (or maybe have a second post to continue).

For now, I'll sign off. See you in class today for session 3 (Qualitative Research and Experimentation basics).

Sudhir

Wednesday, November 5, 2014

Session 1 Updates

Hi all,

Session 1 got done. The post lunch session is always a tough one regarding engagement and response from the class.

This blog post is regarding the following points:

1. Surveys for data collection *today*:

Below is the link to two short online surveys which shouldn't take more than 15-20 minutes by my reckoning.

Big5 personality survey

Brand prefs survey

I need you to fill up this survey today itself, by 8 am tomorrow latest. This gives me enough time to collect the data and use it in class for sessions 4 (text analytics) and session 5 (Social network measurement and analysis).

By the way, completing the above 2 surveys has grade credit and will be considered as part of your class participation (CP) grade.

2. Links for some topics we covered in Session 1:

Here are some links for some of the concepts we studied in session 1. These are optional readings and you may cover them at leisure. They're basically to help understanding for those folks who may have felt the coverage in class was not detailed enough on certain topics.

This is a Wikipedia link to Quantitative psychology as a subject area. It provides a nice, concise and precise introduction to the area in general and has a good number of downstream links that you can pick up on as and when necessary.

This is the Wiki entry to Scaling techniques in general in the social sciences. As you can see the comparative versus noncomparative dichotomy comes in early on here. More links to detaiuled topics are also available.

This is the wiki entry to psychometrics as a discipline. I thought it a tad too inclined towards educational testing but still, worth a read perhaps, for those interested.

Recall that in reading 1 in session 1 we could vaguely discern elements of segmentation analysis (second paragraph) as well as a elements of affinity analysis (last paragraph). A conceptual introduction to these terms can be found online as well - for instance, here for market segmentation, for cluster analysis and for affinity analysis in retail analytics.

And of course, there's always google available to produce reports and summaries at varying levels of detail on any subject under the sun.

3. Homework Group assignment for Session 1:

There will be a homework assignment (group submission) for session 1. The idea is to give you a few business articles to read and then formulate the business problem (D.P. and R.O.) for each of them.

I will give this in a separate post. In any case, the deadline for this will be between now and your return to campus for the second half of term 1.

4. Group Formation:

That reminds me, pls form teams of 4 people, and one person (as team representative) and email the Academic associate for the class Mr. Krishna Pusuluri the names and ISB IDs of the team members with a Cc to the other members of the team.

If for any reason, you are unable to form a team or find team members, pls let me know and I will assign you to a team.

5. In case you feel, addressing me as 'sir' or 'prof' is too stuffy and formal, then pls feel free to call me 'Sudhir'. Its perfectly OK by me. That's it for today. See you in class, soon.

Sudhir

Tuesday, November 4, 2014

Hi Class

This is a welcome message to the CBA batch taking Data Collection (DC) in November 2014.

DC will use some R. This blog can be a repository for related R code and assistance. Feedback, Q&A etc are always welcome via the comments sections.

Pls download and install both R and Rstudio from LMS, if you haven't already.

Looking forward to smooth sailing.

Sudhir Voleti
Assistant Professor of Marketing
ISB Hyderabad

Wednesday, February 26, 2014

Interesting links and aRticles

Hi all,

Pls use this post and its comments section as a general purpose place to share interesting links on articles that could benefit the class as a whole regarding the topics we covered in the DC course.

For a start, I would strongly urge you to register with r-bloggers.com and get onto their daily email subscription service. Daily email updates on what new packages are there, interesting examples with code for doing neat stuff on R etc.

Foe example, here's a good aRticle that I saw today:

Job Trends in the Analytics Market: Where does R stand?

Ensoi.

Sudhir

Sunday, February 23, 2014

Group Homework for Sessions 4 and 5 (Text and Social network analysis)

Hi all,

This is the HW for sessions 4 and 5. It is the last and final HW in this course. It will involve data collection, analysis and inference.

It will require you to go through and understand the R code I used in class. Pls *ensure* you can replicate the classwork examples and exercises before attempting the homework.

This is a group homework, so ensure you divide it amongst your group and co-ordinate. That way, too much burden won't fall on any individual.

If your group formation is not yet done, let Krishna Pusuluri know, he'll assign you to a group.

Any doubts, issues etc, let me know through the blog or via email.

### following Qs are for text analysis using R code from the class on your survey data ###

Q1a. One Q in your survey asks you to "List some brands (five or more) you are personally loyal to." Text analyze this component for the entire class by building a TDMN and a wordcloud. Comment on which brands seem most popular and which categories they come from. Comment on why this may be the case.
Q1b. Now build a simple semantic network for the terms found above. Basically, which brands co-coccur in documents. (See my R code for a function on how to build simple semantic networks from term-document matrices). Speculate on which brands seem to be preferred together by people.

Q2. Text analyze the answers to the Q "List two places OUTSIDE India that you would like to visit. Explain / why in a few lines for each place." Build wordclouds under both TF and TFIDF. Comment on what can be inferred from the wordcloud.

Q3a. Text analyze the responses to the Q: "What are your career goals in the short, medium and long terms? / Explain in a few lines." Build a wordcloud under both TF and TFIDF. Comment on what can be inferred from the wordcloud.
Q3b. Build a semantic network connecting the terms for this Q. Which terms occur together the most in documents? What can be inferred?

### following Qs are for web extraction of data from amazon ###

Q4. Collect 100 odd reviews from Amazon for xbox 360. Analyze the wordcloud. What themes seem to emerge from the wordcloud?

Q5. Analyze the positive wordcloud. What are the xbox's seeming strengths? What can they position around?

Q6. Analyze the negative wordcloud. What are the xbox's seeming weaknesses? What can they prioritize and fix?

Deadline is before the exam. Submission must be in the form of PPTs only. Write your group name, individual members' names and ISB IDs on the title slide and write your group name as file name. Dropbox will be made for this.

Any Qs etc, contact me.

Sudhir

Friday, February 14, 2014

Group Homework for Session 2 (Build Survey Qs and Constructs based on R.O.s)

Hi all,

Pls find below the group HW based on Session 2.

Pls read *any 2* of the articles from the business press below. The following HW is based on the above articles.

(a) The appeal of 3D movies can lead to a survey estimating preference for -> willingness to pay for -> likely demand for movies in the 3D format among a particular target segment. Here's the management problem sourced from the Economist (July 2011) The appeal of 3D movies - Cinema's great hope

(b) Here's a desi innovation that might get a huge fillip in demand as demographics start to favor it. Demand soars for a "House-call doctor services" for the elderly and the chronically infirm. Source is Economic times, 2012.

(c) This Economic times article from a year ago talks about a recent phenomenon they call 'showrooming'.Consumers spot deals in stores, close them online; showrooming threatens to make life difficult for electronics retailers.

(d) Here's an interesting possibility that requires folks to look at reeeally new products - akin to forecasting email's effect on postal services in 1994 - the impact on small-scale manufacturing of 3D printing services. This too is sourced from the Economist (Dec 2011).

(e) Here's an interview with the boss of the cafe coffee day chain and he describes some interesting looking initiatives CCD is taking in trying to leverage facebook and other social media to provide speedy feedback on CCD Ops nationwide etc.

HW Part 1: Reducing a Business problem (B.P.) to a D.P. to an R.O.

For each article,

Q.1.1. write a short description of what a B.P. may look like.

Q.1.2. Write one D.P. corresponding to the B.P.

Q.1.3. Write an example or two of R.O.s that correspond to the D.P.

HW Part 2: Construct Analysis

Q.2.1. List a few major constructs you find (if any) in each of the two articles that are of MKTR interest.

Q.2.2. Pick any one construct you have listed in Q.2.1. and break it down into a few aspects.

Q.2.3. Make a table with 2 columns. In the first column, write the names of the aspects you came up with. In the second column, corresponding to each aspect, write a Likert statement that you might use in a Survey Questionnaire to measure that aspect.

HW Part 3: Web-Survey Programming

Q.3.1. Build a web survey using any free online websurvey tool of your choice. Alternately, you can try Qualtrics, the ISB subscribed survey software.

Session 2 HW submission format:

Use a plain white blank PPT.
On the title slide, write your name and PGID.
For slide headers, use format "HW Part 1: [Article name]" (and so on for the next article chosen)
Pls mention clearly the Question numbers you are solving in the slide body. Use fresh slides for each new article
Use a blank slide to separate HW Part 2 from HW Part 1.
Provide a working link for your websurvey on a fresh slide titled "HW part 3".
Save the slide deck as session2HW_yourname.ppt and put in in the dropbox on LMS before the deadline (start of the exam).

That's it from me. Any Qs etc., let me know.

Sudhir

Group Homework for Session 1 (Problem Formulation)

Hi all,

This group assignment is for session 1. Pls split the work among the group and put it together again, if that is preferable. I strongly recommend you try using project-specific Wikis as a tool for co-ordination within your group. Its kind of like an editable google document and associated chat space for group members. Anyway, how you do it is upto you.

Pls ensure group formation (of upto 5 people in a group and across-sections is also OK) is complete by 18th Feb else we will randomly allot people into groups.

Problem background:

You are a consultant and your client, a multinational manufacturing behemoth, wants to know trends and impact of disruption in manufacturing technologies in the next decade with particular emphasis on 'additive manufacturing' (a.k.a. 3 dimensional printing) technologies.

Your D.P. is to find "Which industries and product categories will shift earliest to (or be most affected by) 3D printing tech and around what time line?".

An alternative D.P. says, "What are the most likely consumer uses of 3D printing and around what time line?"

Choose any one of the two D.P.s, build corresponding R.O.s and write a 3 (or fewer) page report (Times New Roman 12 font, 1.5 line spacing, standard margins) outlining your principal findings in solving that R.O. through secondary research alone.

Hint:

Google for 'economist.com 3D printing' (without the inverted commas). Scan through the links that appear on the first page. I have posted a few examples below.

How 3D printers work (7 Sept 2013)

3D printing Out of the box (6 Aug 2013)

3D printing scales up (7 Sept 2013)

Inventing HP in 3D (28 Nov 2013)

Pls ensure you have:

Written your name and ISB ID on the document
Clearly spelt out which D.P. you have chosen
Clearly spelt our your R.O.(s)
Clearly included citations of sources (URLs etc) either as footnotes or as a separate References section outside the page limit.

The deadline is before the start of the exam. Pls submit electronically to a dropbox that Krishna will make on LMS for this purpose.

Any queries etc., contact me.

Thanks.

Sudhir

Session 5 Updates

Hi all,

Session 5 has been over for over a week. That I'm writing today about it is testament to the immense research backlog I had first to clear before I could get here.

Session 5 dealt with qualitative techniques in general and in marketing (i.e. on the demand side) in particular. We viewed both observation techniques and communication techniques too (e.g. Surveys in earlier sessions).

The unpredictability and persistence of habit patterns in consumer decision making is a big deal from the analytics point of view. The star reading - the target case - also made the case for a combination of qualitative marketing insights and analytics capabilities to make things happen.

Some new techniques are coming up that promise even greater insights than current ones into sub-conscious habits and consumptiontriggers. The biggest among them is the fMRI machine. Below, I link two good articles for beginners that talk about fMRI use in Marketing in more detail.

This Is Your Brain on Marketing Up close and personal with fMRI By Chip Bayers (Adweek)

Functional Magnetic Resonance Imaging (fMRI): A New Research Tool

I will have nothing more to add other than the 7 big take-aways from the course at the end of the session 5 slides.

The next few blog posts here will reflect group homework assignments. So watch this space.

Sudhir

Sunday, February 9, 2014

Session 4 Updates

Hi all,

Session 4 got done y'day.

We covered the basics - the why, how and what after of text analysis as a prelude to doing the 'web extraction of text data' piece which, technically, was the centerpiece from the DC course POV.

1. In hindsight, I should've anticipated the issues that arose in trying to Live-run the R code on an untested machine, especially in Section A. My Research Assistant Ankit Anand usually does this stuff - package installation, dry testing of the code etc - before the session begins (in the MBA courses where I have covered this) and I got used to that. Ankit's not in town this week and the usual checklist simply escaped me. So, sorry about the hiccups in running the R code in class, basically.

I'm still working on a version of the code that you can run without such trouble. Pls ensure you have the latest version of Java loaded on your machines before you start.

---------------------------------------

2. I've received student queries about additional sources of material for study. Well, there are two ways about it. If you are presently working on a problem on R and encounter roadblocks, then the best thing is to simply google your query. Chances are sites like Stackoverflow will have answers for it. It usually works very well for me.

On the other hand, if you are looking for a structured way to start, then there are any number of books you could consider getting and starting. Below I list some which can help the rank beginner get started:

A beginner's guide to R from Computerworld, a video introduction to R here from Google and here is a full fledged book from the Springer publishers' stable on how to get started in R.

Better still is this list of links for books on R: Link for list of books and downloads for R. More advanced users, especially after you are introduced to supervised machine learning as part of the CBA program, may want to consider the following books (some of which are free downloads):

Machine Learning with R, by Brett Lantz. The link takes you to the table of contents which you can browse and also through a sample chapter.

This short document from MIT's open courseware on Machine learning is a useful reporsitary of the very basic datasets, algorithms and packages a beginner can use to get started on the machine learning part of R analytics.

---------------------------------------

3. Regarding text analytics in particular, here's a quick set of code that can get you started with the basic things we did with text analytics (in addition to the code I will send you).

In any case, you are advised to subscribe to the r-bloggers.com daily newsletter for quick daily overviews of what's new and hot on R. Here is a link and expert commentary on text mining in R from R-bloggers.com, for instance.

This is an example of Q&A at stackoverflow, which is among the pre-eminent sites for code level discussions on R and (other packages).

---------------------------------------

4. Whew! That's it from me for now. There'll be homework for this session - will involve you extracting, storing and processing web based text data, will also involve you processing text data from your class, processing it into semantic network analyses etc. But that's all for later.

See you in class soon.

Sudhir

Saturday, February 8, 2014

Session 3 Updates

Hi all,

Session 3 go done y'day.

It was a 'dry and technical' session, admittedly, owing to the nature of subject matter.... but, it cannot be denied that we (both you and me) did try to enliven things a bit here and there, didn't we?

1. We covered sampling basics - notational and definitional stuff included. Now, with sampling done, in theory, you are equipped enough to design a full fledged survey based primary data collection survey exercise... only.

-------------------------------------------------

2. Significantly, y'day also marked the first use of R in DC. The data and code are put up as .txt files on LMS. YOu are encouraged to pls try replicating my classroom results at home. Of course, replications won't be perfect because of the very nature of 'random' sampling but that's OK.

-------------------------------------------------

3. We also delved into the business experimentation space. This is a rapidly evolving space and one, I believe, that defines the frontiers in demand-side analytics.

If you were to ask me which is a promising area within analytics to build skill-sets in, I'd promptly say 'Experimentation analytics'.

-------------------------------------------------

4. We studied 'traditional' experimental design in Session 3. Traditionally, experiments were used to measure the *average* treatment effect across the sample. This was particularly true in the natural sciences. However, increasingly in business, we find that the average is misleading and not good enough.

In a whole host of modern businesses (both web based and brick-and-mortar), the treatment effect of interest is produced by exposing a micro-segment (in extreme cases, a segment of One) to a causal condition (in extreme cases, a product or service custom-defined for that micro segment) and measuring the outcome difference from the average for the market as a whole.

It is this combination of product design and micro-segmentation that gives the new age business experimentation its edge and its own distinctive flavour. It forms the subject of the last reading in session 3.

I believe I did not get to emphasise this point enough, particularly in Section A. I strongly encourage people to read the relevant 2007 HBR article on business experimentation provided in your course-pack.

-------------------------------------------------

5. By the way, this particular trend I refer to in point 4 above isn't necessarily restricted to the business sector either.... Politics is not immune. For instance, take this Washington Post article from June 2013 that gives a layman's introduction to how Sri Obama leveraged Big data and microtargetting techniques for his 2012 campaign.

Here is a short 5 minute video that makes a similar point. And here is a more detailed, longish article from the MIT technology review that goes into more detail. I have every reason to believe that at least some of these ideas will or have found their way into India's coming MahaBhArat - the general elections of 2014....

-------------------------------------------------

6. One last bit about traditional experiments before I sign off. Here's an interesting article on a piece of academic research that aimed to test (using traditional 'True' experimental design) whether social networks make us smarter or dumber...

Seeking to find out if social networks make us smarter a team of scientists investigated if networks help us imitate analytical thought processes from our peers.
To carry out their experiment the researchers tested university students with a series of brain-straining questions. 100 volunteers were separated into 5 social networks each with 20 individuals. Connections between the people in the networks were assigned randomly by a computer to fit 5 different network patterns. At one extreme all the people in the network were connected directly to all the others, and at the other extreme there were no connections at all. To test how these networks helped the people in them to learn, the scientists quizzed the volunteers with a 'cognitive reflection test', a series of questions which rely on analytical reasoning to overcome incorrect intuition.
To see if the social networks helped the people in them to improve their answers the volunteers were asked each of the questions 5 times. The first time the volunteers had to figure it out on their own, the next 5 times they were allowed to copy the answer from their neighbours in the network. The researchers found that in well connected networks ...

OK, so what did they find? Well, to find out I suggest you read the entire article only....

-------------------------------------------------

OK, that's it from me for now. See you in class today for an R joy ride into the Text analytics skies...

Sudhir

Friday, February 7, 2014

Session 2 Updates

Hi all,

Session 2 got done y'day. We covered a serious lot of ground, even though it may not seem so at first glance.

Doing basic survey design principles, construct basics *and* questionnaire design all in one go in 110 minutes... each of those topics would merit at least a whole session all by itself ideally...

Update:

1. There will be a homework assignment for this class. As you can guess it will involve your designing a survey around a construct of interest and programming it into any web survey software. This will be a group assignment. I will put up details later after these 5 rush-days of teaching are done...

2. Recall that chilling discussion on Reading 1 we had in class... well, here's what I got in my inbox y'day (I subscribe to the free nowledge@Wharton newsletter):

The 'Moneyball' approach to hiring CEOs

Co-incidence? Maybe. Or maybe it's a sign... Anyway, humour apart, here's the gist of the article:

It was the lesson of the best-selling book-turned-movie, Moneyball: Don’t throw money at big-name baseball players or judge future performance by purely physical attributes. Assess them, instead, by more relevant measurements, like their on-base percentage.
Wharton professor J. Scott Armstrong and Philippe Jacquart of EMLYON Business School in Écully, France, say the same principles can be applied to choosing corporate executives. In a recent paper, they challenge the popular belief that higher pay leads to selecting chief executive officers who will outperform their lower-compensated counterparts.
[...]Instead of throwing money at “superstars,” companies would be better served by using quantifiable measures to pick the right CEO, according to recent Wharton research.

3. Another article from a recent Economist issue and (ominously?) titled "The future of jobs" has this to say:

A new wave of technological progress may dramatically accelerate this automation of brain-work. Evidence is mounting that rapid technological progress, which accounted for the long era of rapid productivity growth from the 19th century to the 1970s, is back. The sort of advances that allow people to put in their pocket a computer that is not only more powerful than any in the world 20 years ago, but also has far better software and far greater access to useful data, as well as to other people and machines, have implications for all sorts of work.
[...] Ten years ago technologically minded economists pointed to driving cars in traffic as the sort of human accomplishment that computers were highly unlikely to master. Now Google cars are rolling round California driver-free no one doubts such mastery is possible, though the speed at which fully self-driving cars will come to market remains hard to guess.
Even after computers beat grandmasters at chess (once thought highly unlikely), nobody thought they could take on people at free-form games played in natural language. Then Watson, a pattern-recognising supercomputer developed by IBM, bested the best human competitors in America’s popular and syntactically tricksy general-knowledge quiz show “Jeopardy!” Versions of Watson are being marketed to firms across a range of industries to help with all sorts of pattern-recognition problems. Its acumen will grow, and its costs fall, as firms learn to harness its abilities.
The machines are not just cleverer, they also have access to far more data. The combination of big data and smart machines will take over some occupations wholesale; in others it will allow firms to do more with fewer workers. Text-mining programs will displace professional jobs in legal services. Biopsies will be analysed more efficiently by image-processing software than lab technicians. Accountants may follow travel agents and tellers into the unemployment line as tax software improves. Machines are already turning basic sports results and financial data into good-enough news stories.
Jobs that are not easily automated may still be transformed. New data-processing technology could break “cognitive” jobs down into smaller and smaller tasks.

Well, tech 'progress' cannot be stopped I guess. But its the distribution of reward that had the Economist (and consequently me too) all worried. Only. See below.

4. How do the economic spoils get split up in the coming years? Who gets what share of the prosperity pie? And why?

Yet some now fear that a new era of automation enabled by ever more powerful and capable computers could work out differently. They start from the observation that, across the rich world, all is far from well in the world of work. The essence of what they see as a work crisis is that in rich countries the wages of the typical worker, adjusted for cost of living, are stagnant. In America the real wage has hardly budged over the past four decades. Even in places like Britain and Germany, where employment is touching new highs, wages have been flat for a decade. Recent research suggests that this is because substituting capital for labour through automation is increasingly attractive; as a result owners of capital have captured ever more of the world’s income since the 1980s, while the share going to labour has fallen.
At the same time, even in relatively egalitarian places like Sweden, inequality among the employed has risen sharply, with the share going to the highest earners soaring.

So who might be the winners and losers in what is surely coming? Here's a clue

There will still be jobs. Even Mr Frey and Mr Osborne, whose research speaks of 47% of job categories being open to automation within two decades, accept that some jobs—especially those currently associated with high levels of education and high wages—will survive (see table). Tyler Cowen, an economist at George Mason University and a much-read blogger, writes in his most recent book, “Average is Over”, that rich economies seem to be bifurcating into a small group of workers with skills highly complementary with machine intelligence, for whom he has high hopes, and the rest, for whom not so much.

OK, this part stretched longer than I intended. Will update and complete this post (or maybe have a second post to continue).

For now, I'll sign off. See you in class today for session 3 (Sampling and Experimentation basics).

Sudhir

Thursday, February 6, 2014

Session 1 Updates

Hi all,

Session 1 got done yesterday. Was heartening to see a healthy dose of engagement and response from the class.

This blog post is regarding the following points:

1. Surveys for data collection *today*:

Below is the link to two short online surveys which shouldn't take more than 15-20 minutes by my reckoning.

Social Network data collection for Section A students

Social network data collection for section B students

Text Data inputs for students from both sections

By the way, completing the above 2 surveys has grade credit and will be considered as part of your class participation (CP) grade.

2. Links for some topics we covered in Session 1:

This is the wiki entry to psychometrics as a discipline. I thought it a tad too inclined towards educational testing but still, worth a read perhaps, for those interested.

And of course, there's always google available to produce reports and summaries at varying levels of detail on any subject under the sun.

3. Homework Group assignment for Session 1:

There will be a homework assignment (group submission) for session 1. The idea is to give you a few business articles to read and then formulate the business problem (D.P. and R.O.) for each of them.

I will give this in a separate post. In any case, the deadline for this will be between now and your return to campus for the second half of term 1.

4. Group Formation:

If for any reason, you are unable to form a team or find team members, pls let me know and I will assign you to a team.

5. In case you feel, addressing me as 'sir' or 'prof' is too stuffy and formal, then pls feel free to call me 'Sudhir'. Its perfectly OK by me. That's it for today. See you in class, soon.

Sudhir

Monday, February 3, 2014

Hi

This is a welcome message to the CBA batch joining in 2014.

The Data collection (DC) course will use some R. This blog can be a repositary for related R code and assistance. Feedback, Q&A etc are always welcome via the comments sections.

Looking forward to smooth sailing.

Sudhir Voleti
Assistant Professor of Marketing
ISB Hyderabad