Analytics Yogi: November 2014

Monday, November 17, 2014

Homework for Session 5 - Batch 3 CBA

Hi all,

Please find here the individual homework for session 5.

Pls ensure you are able to replicate classwork examples with the R code sent before you try this one.

The idea is simple. I will require you to:

1. Pull your facebook (FB) data. Your friends' list. Pls use the Rfacebook package and the instructions from the slides.

2. Run the communities-detection algorithm on it.

3. Paste a screenshot of the network with communities on a slide. Identify the top few clearly identified groups that you can see (like I'd shown for my FB pull in the class slides).

4. Analyze the 5 largest communities you got in terms of (i) size, transitivity, density, centralities, and (ii) meaning (how does the community relate to the ego or focal person).

Submission format and deadline the same as in the past. Save your PPT as (your.full.name).pptx

Any queries etc, contact me.

Thanks.

Sudhir

Saturday, November 15, 2014

Mailbag

Hi all,

Received this in the mail today and responded to it. Am putting up the exchange here coz I think it merits further dissemination.

The email I got:

Hi Sudhir,
I am a CBA Batch -3 student from Section-A
I am facing issues relating to very fundamental meanings of terminology introduced in DCBA. It may be due to the reason that I am not a business guy who is well versed with business terminology.
eg I am not comfortable with the following keywords: construct, dichotomy, Costs of Capital, trickier proposition, business meta-process and so many keywords introduced on Slide 16 Problem Formulations Examples of R.O.s, and then in psychometric scaling
Considering the example of Baskin Robbins. I am not able to get how psychology is coming into the picture here?
I may sound as asking stupid questions but I know I need to do something about it so that I can get comfortable with this subject.
Please provide me some directions..
Thanks,
P

My response:

Hi P,
Let me try to systematically answer what I can.
1. Regarding what a 'construct' means in our context, pls refer to this blogpost (from the PGP class):
http://marketing-yogi.blogspot.in/2014/09/session-2-exposition-what-are.html
2. The definitions of 'dichotomy', 'business processes' and meta-processes, 'cost of capital' can be had from a google search. A dichotomy means a branching into two separate streams. Thus, Data types exhibit a dichotomy - primary versus secondary data etc.
3. 'trickier proposition' is an expression in speech that means "is more problematic" or "is more challenging".
4. Not every construct need have profound psychological drivers. Many are fairly routine and habit driven.
5. The Baskins Robbins example has nothing to do with psychology. Its merely meant to illustrate the primary-secondary dichotomy.
I hope that helps clarify things somewhat, at least. Thanks for reaching out. I might put up this entire exchange on the blog, in case other students are also facing the same problem.
Thanks,
Sudhir

***************************

Updates. Received two more email queries. My responses are also putup below.

Hi Professor,
I am a student of CBA batch 3. I just had a query around the R code for text analysis (filename: textanalysis R code.R), I have gone through the entire code and wanted to understand the last part i.e. Bayes Factor Model selection and thereon. Can you kindly guide me on this?
I am not able to conceptually grasp the concept of Factor Model and the output from that point onwards.
Look forward to hearing from you.
RT

My response:

Hi RT,
>> I have gone through the entire code and wanted to understand the last part i.e. Bayes Factor Model selection and thereon. Can you kindly guide me on this?
Your query concerns what we call 'model selection' in statistics. A model is a set of relations which we fit upon data to explain them and/or make predictions about them. However, there maybe multiple models that fit the same data.
One way to sort through this multiplicity of models and select the *best* one is to first find how well each model 'fits' the data (i.e. has the least squared error). Accordingly, various 'goodness of fit' criteria have been developed and deployed. The Bayes Factor is one such, very important fit metric in Bayesian statistics.
For our purposes, just take the model results and use them to select the model with the optimal number of components (optimal, as decided by the log bayes factor). Going beyond that would be beyond the scope of the DC course. Wikipedia and other web resources are available however, in case you want to do a deep-dive into fit statistics in general and Bayes Factors in particular.
>> I am not able to conceptually grasp the concept of Factor Model and the output from that point onwards.
When we 'factorize' something (say, a), we break it down into pieces (say, b,c and d) such that the product of b*c*d will yield a back.
In general, any number can be 'factorized' into a product of primes. Similarly, when we factorize a matrix, we break it down into 'factors' whose product yields the original matrix again.
We took the TDM and 'factorized' it (conceptually only the LDA is more complex in its assumptions and its estimation) into 'factors' - terms that together can be interpreted as topics.
For our purposes, all we need to know is that using the latent topic factor model, we 'broke down' the corpus into distinct 'topics' or themes that can be interpreted and used for further analysis.
I hope that helps clarify.
Sudhir

Another one below:

Respected Professor,
I'm a CBA student from technology background. I need your help regarding data collection:
1. Is there any book that I can refer? I feel I'm lost with so much of info/topics. Also with no audio for first class, it seems I don't have way to revisit the fundamentals discussed.
2. It will be extremely helpful if you can please provide some practice papers and solutions. (hope that's possible)
3. Could you please also clarify whether Facbook assignment is group or individual H.W.?
SP

My response:

Hi SP,
>> 1. Is there any book that I can refer? I feel I'm lost with so much of info/topics. Also with no audio for first class, it seems I don't have way to revisit the fundamentals discussed.
I don't use any one text book for DC. The material is collected and collated from multiple sources. However, wikipedia is your friend in case you need more detail on particular topics. Also pls check the early blogposts for yourbatch on analytics-yogi.blogspot.in where some additional links and material was putup.
>> 2. It will be extremely helpful if you can please provide some practice papers and solutions. (hope that's possible)
The exam is open book-open notes. The questions are all short answer quetions (no essay length stuff) for more grade-ability and objectivity. I can;t make any promises regarding the practice exam at this point as I plan to modify the exams I have from previously for this batch as well.
>> 3. Could you please also clarify whether Facbook assignment is group or individual H.W.?
Individual. Because each of you has to pullup your own FB data.
Hope that clarifies.
Sudhir

Ciao.

Friday, November 14, 2014

Make-up Assignment

Hi,

Make-up Assignment in lieu oif survey filling:

Pls watch this ~ 20 minute video carefully. It features Scott McDonald of Condé Nast holding fort on where MKTR is headed.

“Social Technological and Economic forces affecting Marketing Research over the next decade”

Now, for your HW, pls answer a few simple Qs (True-False, fill in the blanks variety) about the above talk in the following survey:

Questions for Make-up Homework.

HW Notes:

(i) This is an individual-only HW. Since it involves no R, consulting peers is not permitted.

(ii) I found that using earphones works great in making out what the speaker is saying much more clearly than ordinary speakers. FYI.

(iii) Deadline: The HW should be completed and submitted latest by midnight 10-December.

Any Qs etc, pls feel free to email me or use the comments section below.

Sudhir Voleti

Thursday, November 13, 2014

Interesting links from different facets of the DC course

Hi class,

Wide range of topics we'd seen in the DC course. Some of you asked for more sources and reading material. Pls find the same below (in no particular order) and totally optional only:

1. Recall the google glass example we'd seen in class? Well, here's a Gigaom article on the Future of the wearables market.

2. Recall the first example in the network analytics class on world international call patterns? Well, here's the associated Atlantic article on a World mapped by phone calls. It nicely illustrates how much visualization of networks can tell us.

3. More from the Atlantic on how its now technologically feasible to arrive at one's Identity. Big Data Can Guess Who You Are Based on Your Zip Code

4. Recall the habit patterns class we'd covered? Here's an article from HBR blogs on How Customers Get Hooked on Products.

5. There's an undercurrent somewhere in the program that spells the words "data science". This link here offers a rounded perspective on what precisely is data science. This follow-on link here describes 8 concrete steps you must take to become a data scientist. Yes, R features there. Apt read for all CBA students, IMO.

-------------------------------------------------

These links below are more technical in nature. And are even more optional reading than the ones above. I'd suggest revisiting the below links after a couple of more terms are done in the program.

6. This will be kinda boring to many perhaps. But here's an Academic journal paper on Behavior prediction using social networks

7. And here is an excellent set of slides for computing basic metrics in network data from r-bloggers.com. BTW, you should consider subscribing to their newsletter, if you are into R.

8. More R here. An excellent intro to general R and then some network basics along with code and examples workshop style.

That's it for now. Will update as more comes in.

Ciao.

Sudhir

Session 4 Homework for CBA Batch 3

Class,

Individual homework:

Fill up this survey below (on perceptions of what constitutes IT capabilities in a firm). If you have any issues with doing so, let me know and I will assign alternate individual homework.

IT capabilities survey

Group HW:

1. Pick up any well-known brand- product or service. E.g. Xbox360 or Jabong or iphone6 or Nike.

2. Collect 3 sets of data for it:

(a) 100+ consumer reviews from either flipkart or Amazon India

(b) 500+ tweets

3. Feel free to either use R or any other means you know of to collect the data (e.g. Python, chrome scraper etc.). But clearly mention the data collection tool used.

4. For each set of data, perform the following analyses:

(a) General wordcloud using both TF and TFIDF weighing schemes. Update stopwords list to filter out noisy or irrelevant terms.

(b) Sentiment analysis. Display wordclouds separately for the top 50 most positive and most negative words.

(c) Identify the top few most positive and most negative documents. Read them and speculate on why they are so positive or negative about it.

5. Session 4 HW submission format:

Use a plain white blank PPT.
On the title slide, write your group name and the names + ISB students IDs of all group members.
Give your homework an informative title (include name of the product/brand you chose).
Have 3 sections in your PPT - one corresponding to one data source and separated by separator slides.
As slide separators, mention the source of the data. E.g., "Data source: Amazon Consumer reviews" or "Data Source:Twitter" and so on.
For slide headers, use format "TF Wordcloud" or "Positive wordcloud" and so on.
Save the slide deck as session4HW_yourgroup.ppt.
Put all the raw data you collected, the code you used and your PPT in a zip folder (so that I can replicate your analysis if need arises). Save the folder as session4HW_yourgroup.zip and upload in in the dropbox on LMS before the deadline.

Any Qs etc., let Atreyee or me know. Feel free to use the comments section to this post for any Q&A or discussions.

Sudhir

Session 2 Group Homework for CBA Batch 3

Hi all,

This homework covers sessions 1,2 and 3, i.e. problem formulation, construct assessment through qualitative research, and questionnaire design for primary data collection.

Group HW:

Consider the following Business problem.

A firm is planning to build a smartphone app that offers location-based services.
The app will collect details about deals, discounts etc from stores on one side and lets inform subscribers about these deals when they are within one cell tower range (roughly a km) of the business establishments where these deals are being offered.
The firm is targeting people below age 35 in the middle and upper-middle class in metropolitan India. The firm however wants to know what about the target segment's app usage habits in general. What types of apps do people use? Why? How many have used apps to transact business online (e.g., pay for orders placed) etc?
Your tasks will be to (1). conduct some exploratory/ qualitative research to find out what constructs underlie people's app based propensities and behaviors. Think of running a small focus group, or conducting a few in-depth interviews with knowledgeable people. (2). Formulate the problem in terms of a D.P. and a few R.O.s that correspond to it. (3). Design a questionnaire centered around measuring the constructs of interest you have identified. Read parts 1,2 and 3 in the HW below.

HW Part 1: Problem Formulation

Q.1.1. Write a decision problem (D.P.) to describe the business problem of interest.

Q.1.2. Write a few research objectives (R.O.s) to address this D.P. (pls use format specified for R.O. in class slides).

HW Part 2: Construct Analysis

Q.2.1. Conduct some exploratory/ qualitative research to find out what constructs underlie people's app based propensities and behaviors. FOr example, you could run a small focus group discussion, or conduct a few in-depth interviews with knowledgeable people.

Q.2.2. List a few major constructs you find from your data collected in Q.2.1. that are of business interest.

Q.2.3. Pick any one construct you have listed in Q.2.2. and break it down into a few aspects. Ask yourself what motives, means and opportunities drive the behavior associated with construct.

Q.2.4. Make a table with 2 columns. In the first column, write the names of the aspects you came up with. In the second column, corresponding to each aspect, write a Likert statement that you might use in a Survey Questionnaire to measure that aspect.

HW Part 3: Web-Survey Programming

Q.3.1. Build a web survey using any free online websurvey tool of your choice. E.g., surveymonkey.com or zoomerang.com offer free websurvey services.

Alternately, you can try Qualtrics, the ISB subscribed survey software. Instructions for how to setup a qualtrics account using your ISB email have been uploaded on LMS

Ensure your questionnaire is "complete" i.e. has an introduction, a section for the psychographic Likerts, a demographic section, and some gateway questions and SKIP logic.

Session 2 HW submission format:

Use a plain white blank PPT.
On the title slide, write your group name and the names + ISB students IDs of all group members.
Give your homework an informative title.
For slide headers, use format "HW Part 1: [Slide content description]" and so on.
Pls mention clearly the Question numbers you are solving in the slide body. Use fresh slides for each new article
Use a blank slide to separate HW Part 2 from HW Part 1.
Provide a working link for your websurvey on a fresh slide titled "HW part 3".
It is advisable to run a pre-test. Perhaps take the survey a few times to ensure clarity, readability, working SKIP logic etc is in place.
Save the slide deck as session2HW_yourgropup.ppt and put in in the dropbox on LMS before the deadline.

HW submission deadline: midnight of 10-December-2015. That's it from me. Any Qs etc., let Atreyee or me know. Feel free to use the comments section to this post for any Q&A or discussions.

Sudhir

Tuesday, November 11, 2014

Session 4 Classwork files on LMS

Hi all,

Sorry about the delay in updating the blog and the LMS.

Pls find on LMS R code, data and instructions for sessions 4 (text analysis).

That for session 5 (network analysis) will come in the next couple of days.

As CBA students, my expectation is that you will:

(i) diligently follow the instructions given,

(ii) read and understand the R code line-by-line before running it,

(iii) run the code and replicate the classwork examples,

(iv) discuss any issues etc that arise here on this blog by using the comments section,

(v) solve the group homeworks by tweaking and customizing the R code as required, and

(vi) provide constructive feedback where possible.

Instructions:

1. Unzip contents of the zip folder

2. Open Rstudio. File menu --> Open File --> textanalysis R code.R

3. the textanalysis R code.R file will open as an additional window (on the top left) in Rstudio)

4. To run any lines, select them and click the Run icon on the top right of the window. Ensure internet is connected.

5. Read the lines before running as some require input from your side (which files to read in etc)

6. The zip folder contents are self-contained and hopefully should run smoothly. However, if you encounter issues, pls let us know.

7. Pls email aashish_pandey@isb.edu with a copy to Atryee in case of any R related issues. Your group homework for this session will be up soon, in a few days. Pls ensure you are comfortable with this code before the homework arrives.

Thanks.

Sudhir

Thursday, November 6, 2014

Session 2 Updates November 2014

Hi all,

Session 2 got done today. We covered a serious lot of ground, even though it may not seem so at first glance.

Doing basic survey design principles, construct basics *and* questionnaire design all in one go in 110 minutes... each of those topics would merit at least a whole session all by itself ideally...

Update:

1. There will be a group homework assignment for this class. As you can guess it will involve your designing a survey around a construct of interest and programming it into any web survey software. This will be a group assignment. I will put up details later after these 5 rush-days of teaching are done...

Today's individual assignment is filling up this survey. You will have until tomorrow evening 10 pm to so. But then, tomorrow, again, there will be more surveys to fill...

SNA survey for sec A

SNA survey for sec B

2. Recall the mini-caselet we discussed in class today. Some of you asked for more info.

The 'Moneyball' approach to hiring CEOs

It was the lesson of the best-selling book-turned-movie, Moneyball: Don’t throw money at big-name baseball players or judge future performance by purely physical attributes. Assess them, instead, by more relevant measurements, like their on-base percentage.
Wharton professor J. Scott Armstrong and Philippe Jacquart of EMLYON Business School in Écully, France, say the same principles can be applied to choosing corporate executives. In a recent paper, they challenge the popular belief that higher pay leads to selecting chief executive officers who will outperform their lower-compensated counterparts.
[...]Instead of throwing money at “superstars,” companies would be better served by using quantifiable measures to pick the right CEO, according to recent Wharton research.

Well, that should go some distance in answering whether Moneyball principles could be applied to hiring for more managerial positions. But reassuringly, the hires will all be human only. Machines still cannot hope to do a CEOs job. Yet.

3. Another article from a recent Economist issue and (ominously?) titled "The future of jobs" has this to say:

A new wave of technological progress may dramatically accelerate this automation of brain-work. Evidence is mounting that rapid technological progress, which accounted for the long era of rapid productivity growth from the 19th century to the 1970s, is back. The sort of advances that allow people to put in their pocket a computer that is not only more powerful than any in the world 20 years ago, but also has far better software and far greater access to useful data, as well as to other people and machines, have implications for all sorts of work.
[...] Ten years ago technologically minded economists pointed to driving cars in traffic as the sort of human accomplishment that computers were highly unlikely to master. Now Google cars are rolling round California driver-free no one doubts such mastery is possible, though the speed at which fully self-driving cars will come to market remains hard to guess.
Even after computers beat grandmasters at chess (once thought highly unlikely), nobody thought they could take on people at free-form games played in natural language. Then Watson, a pattern-recognising supercomputer developed by IBM, bested the best human competitors in America’s popular and syntactically tricksy general-knowledge quiz show “Jeopardy!” Versions of Watson are being marketed to firms across a range of industries to help with all sorts of pattern-recognition problems. Its acumen will grow, and its costs fall, as firms learn to harness its abilities.
The machines are not just cleverer, they also have access to far more data. The combination of big data and smart machines will take over some occupations wholesale; in others it will allow firms to do more with fewer workers. Text-mining programs will displace professional jobs in legal services. Biopsies will be analysed more efficiently by image-processing software than lab technicians. Accountants may follow travel agents and tellers into the unemployment line as tax software improves. Machines are already turning basic sports results and financial data into good-enough news stories.
Jobs that are not easily automated may still be transformed. New data-processing technology could break “cognitive” jobs down into smaller and smaller tasks.

Well, tech 'progress' cannot be stopped I guess. But its the distribution of reward that had the Economist (and consequently me too) all worried. Only. See below.

4. How do the economic spoils get split up in the coming years? Who gets what share of the prosperity pie? And why?

Yet some now fear that a new era of automation enabled by ever more powerful and capable computers could work out differently. They start from the observation that, across the rich world, all is far from well in the world of work. The essence of what they see as a work crisis is that in rich countries the wages of the typical worker, adjusted for cost of living, are stagnant. In America the real wage has hardly budged over the past four decades. Even in places like Britain and Germany, where employment is touching new highs, wages have been flat for a decade. Recent research suggests that this is because substituting capital for labour through automation is increasingly attractive; as a result owners of capital have captured ever more of the world’s income since the 1980s, while the share going to labour has fallen.
At the same time, even in relatively egalitarian places like Sweden, inequality among the employed has risen sharply, with the share going to the highest earners soaring.

So who might be the winners and losers in what is surely coming? Here's a clue

There will still be jobs. Even Mr Frey and Mr Osborne, whose research speaks of 47% of job categories being open to automation within two decades, accept that some jobs—especially those currently associated with high levels of education and high wages—will survive (see table). Tyler Cowen, an economist at George Mason University and a much-read blogger, writes in his most recent book, “Average is Over”, that rich economies seem to be bifurcating into a small group of workers with skills highly complementary with machine intelligence, for whom he has high hopes, and the rest, for whom not so much.

5. The good news? The future for bright Business analytics people who combine non-standardized inputs (such as those from exploratory and/or qualitative work on the demand side) with machine intelligence is bright. When all is done, chances are you will belong to that select group. Change is coming whether we like it or not. The best we can do is to be better prepared. And that we are already doing...

OK, this part stretched longer than I intended. Will update and complete this post (or maybe have a second post to continue).

For now, I'll sign off. See you in class today for session 3 (Qualitative Research and Experimentation basics).

Sudhir

Wednesday, November 5, 2014

Session 1 Updates

Hi all,

Session 1 got done. The post lunch session is always a tough one regarding engagement and response from the class.

This blog post is regarding the following points:

1. Surveys for data collection *today*:

Below is the link to two short online surveys which shouldn't take more than 15-20 minutes by my reckoning.

Big5 personality survey

Brand prefs survey

I need you to fill up this survey today itself, by 8 am tomorrow latest. This gives me enough time to collect the data and use it in class for sessions 4 (text analytics) and session 5 (Social network measurement and analysis).

By the way, completing the above 2 surveys has grade credit and will be considered as part of your class participation (CP) grade.

2. Links for some topics we covered in Session 1:

Here are some links for some of the concepts we studied in session 1. These are optional readings and you may cover them at leisure. They're basically to help understanding for those folks who may have felt the coverage in class was not detailed enough on certain topics.

This is a Wikipedia link to Quantitative psychology as a subject area. It provides a nice, concise and precise introduction to the area in general and has a good number of downstream links that you can pick up on as and when necessary.

This is the Wiki entry to Scaling techniques in general in the social sciences. As you can see the comparative versus noncomparative dichotomy comes in early on here. More links to detaiuled topics are also available.

This is the wiki entry to psychometrics as a discipline. I thought it a tad too inclined towards educational testing but still, worth a read perhaps, for those interested.

Recall that in reading 1 in session 1 we could vaguely discern elements of segmentation analysis (second paragraph) as well as a elements of affinity analysis (last paragraph). A conceptual introduction to these terms can be found online as well - for instance, here for market segmentation, for cluster analysis and for affinity analysis in retail analytics.

And of course, there's always google available to produce reports and summaries at varying levels of detail on any subject under the sun.

3. Homework Group assignment for Session 1:

There will be a homework assignment (group submission) for session 1. The idea is to give you a few business articles to read and then formulate the business problem (D.P. and R.O.) for each of them.

I will give this in a separate post. In any case, the deadline for this will be between now and your return to campus for the second half of term 1.

4. Group Formation:

That reminds me, pls form teams of 4 people, and one person (as team representative) and email the Academic associate for the class Mr. Krishna Pusuluri the names and ISB IDs of the team members with a Cc to the other members of the team.

If for any reason, you are unable to form a team or find team members, pls let me know and I will assign you to a team.

5. In case you feel, addressing me as 'sir' or 'prof' is too stuffy and formal, then pls feel free to call me 'Sudhir'. Its perfectly OK by me. That's it for today. See you in class, soon.

Sudhir

Tuesday, November 4, 2014

Hi Class

This is a welcome message to the CBA batch taking Data Collection (DC) in November 2014.

DC will use some R. This blog can be a repository for related R code and assistance. Feedback, Q&A etc are always welcome via the comments sections.

Pls download and install both R and Rstudio from LMS, if you haven't already.

Looking forward to smooth sailing.

Sudhir Voleti
Assistant Professor of Marketing
ISB Hyderabad