Wednesday, February 26, 2014

Interesting links and aRticles

Hi all,

Pls use this post and its comments section as a general purpose place to share interesting links on articles that could benefit the class as a whole regarding the topics we covered in the DC course.

For a start, I would strongly urge you to register with r-bloggers.com and get onto their daily email subscription service. Daily email updates on what new packages are there, interesting examples with code for doing neat stuff on R etc.

Foe example, here's a good aRticle that I saw today:

Job Trends in the Analytics Market: Where does R stand?

Ensoi.

Sudhir

Sunday, February 23, 2014

Group Homework for Sessions 4 and 5 (Text and Social network analysis)

Hi all,

This is the HW for sessions 4 and 5. It is the last and final HW in this course. It will involve data collection, analysis and inference.

It will require you to go through and understand the R code I used in class. Pls *ensure* you can replicate the classwork examples and exercises before attempting the homework.

This is a group homework, so ensure you divide it amongst your group and co-ordinate. That way, too much burden won't fall on any individual.

If your group formation is not yet done, let Krishna Pusuluri know, he'll assign you to a group.

Any doubts, issues etc, let me know through the blog or via email.

### following Qs are for text analysis using R code from the class on your survey data ###

Q1a. One Q in your survey asks you to "List some brands (five or more) you are personally loyal to." Text analyze this component for the entire class by building a TDMN and a wordcloud. Comment on which brands seem most popular and which categories they come from. Comment on why this may be the case.
Q1b. Now build a simple semantic network for the terms found above. Basically, which brands co-coccur in documents. (See my R code for a function on how to build simple semantic networks from term-document matrices). Speculate on which brands seem to be preferred together by people.

Q2. Text analyze the answers to the Q "List two places OUTSIDE India that you would like to visit. Explain / why in a few lines for each place." Build wordclouds under both TF and TFIDF. Comment on what can be inferred from the wordcloud.

Q3a. Text analyze the responses to the Q: "What are your career goals in the short, medium and long terms? / Explain in a few lines." Build a wordcloud under both TF and TFIDF. Comment on what can be inferred from the wordcloud.
Q3b. Build a semantic network connecting the terms for this Q. Which terms occur together the most in documents? What can be inferred?

### following Qs are for web extraction of data from amazon ###

Q4. Collect 100 odd reviews from Amazon for xbox 360. Analyze the wordcloud. What themes seem to emerge from the wordcloud?

Q5. Analyze the positive wordcloud. What are the xbox's seeming strengths? What can they position around?

Q6. Analyze the negative wordcloud. What are the xbox's seeming weaknesses? What can they prioritize and fix?

Deadline is before the exam. Submission must be in the form of PPTs only. Write your group name, individual members' names and ISB IDs on the title slide and write your group name as file name. Dropbox will be made for this.

Any Qs etc, contact me.

Sudhir

Friday, February 14, 2014

Group Homework for Session 2 (Build Survey Qs and Constructs based on R.O.s)

Hi all,

Pls find below the group HW based on Session 2.

Pls read *any 2* of the articles from the business press below. The following HW is based on the above articles.

HW Part 1: Reducing a Business problem (B.P.) to a D.P. to an R.O.

For each article,

  • Q.1.1. write a short description of what a B.P. may look like.
  • Q.1.2. Write one D.P. corresponding to the B.P.
  • Q.1.3. Write an example or two of R.O.s that correspond to the D.P.
HW Part 2: Construct Analysis

  • Q.2.1. List a few major constructs you find (if any) in each of the two articles that are of MKTR interest.
  • Q.2.2. Pick any one construct you have listed in Q.2.1. and break it down into a few aspects.
  • Q.2.3. Make a table with 2 columns. In the first column, write the names of the aspects you came up with. In the second column, corresponding to each aspect, write a Likert statement that you might use in a Survey Questionnaire to measure that aspect.

HW Part 3: Web-Survey Programming

  • Q.3.1. Build a web survey using any free online websurvey tool of your choice. Alternately, you can try Qualtrics, the ISB subscribed survey software.
  • Session 2 HW submission format:

    • Use a plain white blank PPT.
    • On the title slide, write your name and PGID.
    • For slide headers, use format "HW Part 1: [Article name]" (and so on for the next article chosen)
    • Pls mention clearly the Question numbers you are solving in the slide body. Use fresh slides for each new article
    • Use a blank slide to separate HW Part 2 from HW Part 1.
    • Provide a working link for your websurvey on a fresh slide titled "HW part 3".
    • Save the slide deck as session2HW_yourname.ppt and put in in the dropbox on LMS before the deadline (start of the exam).
    That's it from me. Any Qs etc., let me know.

    Sudhir

    Group Homework for Session 1 (Problem Formulation)

    Hi all,

    This group assignment is for session 1. Pls split the work among the group and put it together again, if that is preferable. I strongly recommend you try using project-specific Wikis as a tool for co-ordination within your group. Its kind of like an editable google document and associated chat space for group members. Anyway, how you do it is upto you.

    Pls ensure group formation (of upto 5 people in a group and across-sections is also OK) is complete by 18th Feb else we will randomly allot people into groups.

    Problem background:

    You are a consultant and your client, a multinational manufacturing behemoth, wants to know trends and impact of disruption in manufacturing technologies in the next decade with particular emphasis on 'additive manufacturing' (a.k.a. 3 dimensional printing) technologies.

    Your D.P. is to find "Which industries and product categories will shift earliest to (or be most affected by) 3D printing tech and around what time line?".

    An alternative D.P. says, "What are the most likely consumer uses of 3D printing and around what time line?"

    Choose any one of the two D.P.s, build corresponding R.O.s and write a 3 (or fewer) page report (Times New Roman 12 font, 1.5 line spacing, standard margins) outlining your principal findings in solving that R.O. through secondary research alone.

    Hint:

    Google for 'economist.com 3D printing' (without the inverted commas). Scan through the links that appear on the first page. I have posted a few examples below.

    How 3D printers work (7 Sept 2013)

    3D printing Out of the box (6 Aug 2013)

    3D printing scales up (7 Sept 2013)

    Inventing HP in 3D (28 Nov 2013)

    Pls ensure you have:

    • Written your name and ISB ID on the document
    • Clearly spelt out which D.P. you have chosen
    • Clearly spelt our your R.O.(s)
    • Clearly included citations of sources (URLs etc) either as footnotes or as a separate References section outside the page limit.
    The deadline is before the start of the exam. Pls submit electronically to a dropbox that Krishna will make on LMS for this purpose.

    Any queries etc., contact me.

    Thanks.

    Sudhir

    Session 5 Updates

    Hi all,

    Session 5 has been over for over a week. That I'm writing today about it is testament to the immense research backlog I had first to clear before I could get here.

    Session 5 dealt with qualitative techniques in general and in marketing (i.e. on the demand side) in particular. We viewed both observation techniques and communication techniques too (e.g. Surveys in earlier sessions).

    The unpredictability and persistence of habit patterns in consumer decision making is a big deal from the analytics point of view. The star reading - the target case - also made the case for a combination of qualitative marketing insights and analytics capabilities to make things happen.

    Some new techniques are coming up that promise even greater insights than current ones into sub-conscious habits and consumptiontriggers. The biggest among them is the fMRI machine. Below, I link two good articles for beginners that talk about fMRI use in Marketing in more detail.

    This Is Your Brain on Marketing Up close and personal with fMRI By Chip Bayers (Adweek)

    Functional Magnetic Resonance Imaging (fMRI): A New Research Tool

    I will have nothing more to add other than the 7 big take-aways from the course at the end of the session 5 slides.

    The next few blog posts here will reflect group homework assignments. So watch this space.

    Sudhir

    Sunday, February 9, 2014

    Session 4 Updates

    Hi all,

    Session 4 got done y'day.

    We covered the basics - the why, how and what after of text analysis as a prelude to doing the 'web extraction of text data' piece which, technically, was the centerpiece from the DC course POV.

    1. In hindsight, I should've anticipated the issues that arose in trying to Live-run the R code on an untested machine, especially in Section A. My Research Assistant Ankit Anand usually does this stuff - package installation, dry testing of the code etc - before the session begins (in the MBA courses where I have covered this) and I got used to that. Ankit's not in town this week and the usual checklist simply escaped me. So, sorry about the hiccups in running the R code in class, basically.

    I'm still working on a version of the code that you can run without such trouble. Pls ensure you have the latest version of Java loaded on your machines before you start.

    ---------------------------------------

    2. I've received student queries about additional sources of material for study. Well, there are two ways about it. If you are presently working on a problem on R and encounter roadblocks, then the best thing is to simply google your query. Chances are sites like Stackoverflow will have answers for it. It usually works very well for me.

    On the other hand, if you are looking for a structured way to start, then there are any number of books you could consider getting and starting. Below I list some which can help the rank beginner get started:

    A beginner's guide to R from Computerworld, a video introduction to R here from Google and here is a full fledged book from the Springer publishers' stable on how to get started in R.

    Better still is this list of links for books on R: Link for list of books and downloads for R. More advanced users, especially after you are introduced to supervised machine learning as part of the CBA program, may want to consider the following books (some of which are free downloads):

    Machine Learning with R, by Brett Lantz. The link takes you to the table of contents which you can browse and also through a sample chapter.

    This short document from MIT's open courseware on Machine learning is a useful reporsitary of the very basic datasets, algorithms and packages a beginner can use to get started on the machine learning part of R analytics.

    ---------------------------------------

    3. Regarding text analytics in particular, here's a quick set of code that can get you started with the basic things we did with text analytics (in addition to the code I will send you).

    In any case, you are advised to subscribe to the r-bloggers.com daily newsletter for quick daily overviews of what's new and hot on R. Here is a link and expert commentary on text mining in R from R-bloggers.com, for instance.

    This is an example of Q&A at stackoverflow, which is among the pre-eminent sites for code level discussions on R and (other packages).

    ---------------------------------------

    4. Whew! That's it from me for now. There'll be homework for this session - will involve you extracting, storing and processing web based text data, will also involve you processing text data from your class, processing it into semantic network analyses etc. But that's all for later.

    See you in class soon.

    Sudhir

    Saturday, February 8, 2014

    Session 3 Updates

    Hi all,

    Session 3 go done y'day.

    It was a 'dry and technical' session, admittedly, owing to the nature of subject matter.... but, it cannot be denied that we (both you and me) did try to enliven things a bit here and there, didn't we?

    1. We covered sampling basics - notational and definitional stuff included. Now, with sampling done, in theory, you are equipped enough to design a full fledged survey based primary data collection survey exercise... only.

    -------------------------------------------------

    2. Significantly, y'day also marked the first use of R in DC. The data and code are put up as .txt files on LMS. YOu are encouraged to pls try replicating my classroom results at home. Of course, replications won't be perfect because of the very nature of 'random' sampling but that's OK.

    -------------------------------------------------

    3. We also delved into the business experimentation space. This is a rapidly evolving space and one, I believe, that defines the frontiers in demand-side analytics.

    If you were to ask me which is a promising area within analytics to build skill-sets in, I'd promptly say 'Experimentation analytics'.

    -------------------------------------------------

    4. We studied 'traditional' experimental design in Session 3. Traditionally, experiments were used to measure the *average* treatment effect across the sample. This was particularly true in the natural sciences. However, increasingly in business, we find that the average is misleading and not good enough.

    In a whole host of modern businesses (both web based and brick-and-mortar), the treatment effect of interest is produced by exposing a micro-segment (in extreme cases, a segment of One) to a causal condition (in extreme cases, a product or service custom-defined for that micro segment) and measuring the outcome difference from the average for the market as a whole.

    It is this combination of product design and micro-segmentation that gives the new age business experimentation its edge and its own distinctive flavour. It forms the subject of the last reading in session 3.

    I believe I did not get to emphasise this point enough, particularly in Section A. I strongly encourage people to read the relevant 2007 HBR article on business experimentation provided in your course-pack.

    -------------------------------------------------

    5. By the way, this particular trend I refer to in point 4 above isn't necessarily restricted to the business sector either.... Politics is not immune. For instance, take this Washington Post article from June 2013 that gives a layman's introduction to how Sri Obama leveraged Big data and microtargetting techniques for his 2012 campaign.

    Here is a short 5 minute video that makes a similar point. And here is a more detailed, longish article from the MIT technology review that goes into more detail. I have every reason to believe that at least some of these ideas will or have found their way into India's coming MahaBhArat - the general elections of 2014....

    -------------------------------------------------

    6. One last bit about traditional experiments before I sign off. Here's an interesting article on a piece of academic research that aimed to test (using traditional 'True' experimental design) whether social networks make us smarter or dumber...

    Seeking to find out if social networks make us smarter a team of scientists investigated if networks help us imitate analytical thought processes from our peers.

    To carry out their experiment the researchers tested university students with a series of brain-straining questions. 100 volunteers were separated into 5 social networks each with 20 individuals. Connections between the people in the networks were assigned randomly by a computer to fit 5 different network patterns. At one extreme all the people in the network were connected directly to all the others, and at the other extreme there were no connections at all. To test how these networks helped the people in them to learn, the scientists quizzed the volunteers with a 'cognitive reflection test', a series of questions which rely on analytical reasoning to overcome incorrect intuition.

    To see if the social networks helped the people in them to improve their answers the volunteers were asked each of the questions 5 times. The first time the volunteers had to figure it out on their own, the next 5 times they were allowed to copy the answer from their neighbours in the network. The researchers found that in well connected networks ...

    OK, so what did they find? Well, to find out I suggest you read the entire article only....

    -------------------------------------------------

    OK, that's it from me for now. See you in class today for an R joy ride into the Text analytics skies...

    Sudhir

    Friday, February 7, 2014

    Session 2 Updates

    Hi all,

    Session 2 got done y'day. We covered a serious lot of ground, even though it may not seem so at first glance.

    Doing basic survey design principles, construct basics *and* questionnaire design all in one go in 110 minutes... each of those topics would merit at least a whole session all by itself ideally...

    Update:

    1. There will be a homework assignment for this class. As you can guess it will involve your designing a survey around a construct of interest and programming it into any web survey software. This will be a group assignment. I will put up details later after these 5 rush-days of teaching are done...

    2. Recall that chilling discussion on Reading 1 we had in class... well, here's what I got in my inbox y'day (I subscribe to the free nowledge@Wharton newsletter):

    The 'Moneyball' approach to hiring CEOs

    Co-incidence? Maybe. Or maybe it's a sign... Anyway, humour apart, here's the gist of the article:

    It was the lesson of the best-selling book-turned-movie, Moneyball: Don’t throw money at big-name baseball players or judge future performance by purely physical attributes. Assess them, instead, by more relevant measurements, like their on-base percentage.

    Wharton professor J. Scott Armstrong and Philippe Jacquart of EMLYON Business School in Écully, France, say the same principles can be applied to choosing corporate executives. In a recent paper, they challenge the popular belief that higher pay leads to selecting chief executive officers who will outperform their lower-compensated counterparts.

    [...]Instead of throwing money at “superstars,” companies would be better served by using quantifiable measures to pick the right CEO, according to recent Wharton research.

    Well, that should go some distance in answering whether Moneyball principles could be applied to hiring for more managerial positions. But reassuringly, the hires will all be human only. Machines still cannot hope to do a CEOs job. Yet.

    3. Another article from a recent Economist issue and (ominously?) titled "The future of jobs" has this to say:

    A new wave of technological progress may dramatically accelerate this automation of brain-work. Evidence is mounting that rapid technological progress, which accounted for the long era of rapid productivity growth from the 19th century to the 1970s, is back. The sort of advances that allow people to put in their pocket a computer that is not only more powerful than any in the world 20 years ago, but also has far better software and far greater access to useful data, as well as to other people and machines, have implications for all sorts of work.

    [...] Ten years ago technologically minded economists pointed to driving cars in traffic as the sort of human accomplishment that computers were highly unlikely to master. Now Google cars are rolling round California driver-free no one doubts such mastery is possible, though the speed at which fully self-driving cars will come to market remains hard to guess.

    Even after computers beat grandmasters at chess (once thought highly unlikely), nobody thought they could take on people at free-form games played in natural language. Then Watson, a pattern-recognising supercomputer developed by IBM, bested the best human competitors in America’s popular and syntactically tricksy general-knowledge quiz show “Jeopardy!” Versions of Watson are being marketed to firms across a range of industries to help with all sorts of pattern-recognition problems. Its acumen will grow, and its costs fall, as firms learn to harness its abilities.

    The machines are not just cleverer, they also have access to far more data. The combination of big data and smart machines will take over some occupations wholesale; in others it will allow firms to do more with fewer workers. Text-mining programs will displace professional jobs in legal services. Biopsies will be analysed more efficiently by image-processing software than lab technicians. Accountants may follow travel agents and tellers into the unemployment line as tax software improves. Machines are already turning basic sports results and financial data into good-enough news stories.

    Jobs that are not easily automated may still be transformed. New data-processing technology could break “cognitive” jobs down into smaller and smaller tasks.

    Well, tech 'progress' cannot be stopped I guess. But its the distribution of reward that had the Economist (and consequently me too) all worried. Only. See below.

    4. How do the economic spoils get split up in the coming years? Who gets what share of the prosperity pie? And why?

    Yet some now fear that a new era of automation enabled by ever more powerful and capable computers could work out differently. They start from the observation that, across the rich world, all is far from well in the world of work. The essence of what they see as a work crisis is that in rich countries the wages of the typical worker, adjusted for cost of living, are stagnant. In America the real wage has hardly budged over the past four decades. Even in places like Britain and Germany, where employment is touching new highs, wages have been flat for a decade. Recent research suggests that this is because substituting capital for labour through automation is increasingly attractive; as a result owners of capital have captured ever more of the world’s income since the 1980s, while the share going to labour has fallen.

    At the same time, even in relatively egalitarian places like Sweden, inequality among the employed has risen sharply, with the share going to the highest earners soaring.

    So who might be the winners and losers in what is surely coming? Here's a clue

    There will still be jobs. Even Mr Frey and Mr Osborne, whose research speaks of 47% of job categories being open to automation within two decades, accept that some jobs—especially those currently associated with high levels of education and high wages—will survive (see table). Tyler Cowen, an economist at George Mason University and a much-read blogger, writes in his most recent book, “Average is Over”, that rich economies seem to be bifurcating into a small group of workers with skills highly complementary with machine intelligence, for whom he has high hopes, and the rest, for whom not so much.

    5. The good news? The future for bright Business analytics people who combine non-standardized inputs (such as those from exploratory and/or qualitative work on the demand side) with machine intelligence is bright. When all is done, chances are you will belong to that select group. Change is coming whether we like it or not. The best we can do is to be better prepared. And that we are already doing...

    OK, this part stretched longer than I intended. Will update and complete this post (or maybe have a second post to continue).

    For now, I'll sign off. See you in class today for session 3 (Sampling and Experimentation basics).

    Sudhir

    Thursday, February 6, 2014

    Session 1 Updates

    Hi all,

    Session 1 got done yesterday. Was heartening to see a healthy dose of engagement and response from the class.

    This blog post is regarding the following points:

    1. Surveys for data collection *today*:

    Below is the link to two short online surveys which shouldn't take more than 15-20 minutes by my reckoning.

    Social Network data collection for Section A students

    Social network data collection for section B students

    Text Data inputs for students from both sections

    I need you to fill up this survey today itself, by 8 am tomorrow latest. This gives me enough time to collect the data and use it in class for sessions 4 (text analytics) and session 5 (Social network measurement and analysis).

    By the way, completing the above 2 surveys has grade credit and will be considered as part of your class participation (CP) grade.

    2. Links for some topics we covered in Session 1:

    Here are some links for some of the concepts we studied in session 1. These are optional readings and you may cover them at leisure. They're basically to help understanding for those folks who may have felt the coverage in class was not detailed enough on certain topics.

    This is a Wikipedia link to Quantitative psychology as a subject area. It provides a nice, concise and precise introduction to the area in general and has a good number of downstream links that you can pick up on as and when necessary.

    This is the Wiki entry to Scaling techniques in general in the social sciences. As you can see the comparative versus noncomparative dichotomy comes in early on here. More links to detaiuled topics are also available.

    This is the wiki entry to psychometrics as a discipline. I thought it a tad too inclined towards educational testing but still, worth a read perhaps, for those interested.

    Recall that in reading 1 in session 1 we could vaguely discern elements of segmentation analysis (second paragraph) as well as a elements of affinity analysis (last paragraph). A conceptual introduction to these terms can be found online as well - for instance, here for market segmentation, for cluster analysis and for affinity analysis in retail analytics.

    And of course, there's always google available to produce reports and summaries at varying levels of detail on any subject under the sun.

    3. Homework Group assignment for Session 1:

    There will be a homework assignment (group submission) for session 1. The idea is to give you a few business articles to read and then formulate the business problem (D.P. and R.O.) for each of them.

    I will give this in a separate post. In any case, the deadline for this will be between now and your return to campus for the second half of term 1.

    4. Group Formation:

    That reminds me, pls form teams of 4 people, and one person (as team representative) and email the Academic associate for the class Mr. Krishna Pusuluri the names and ISB IDs of the team members with a Cc to the other members of the team.

    If for any reason, you are unable to form a team or find team members, pls let me know and I will assign you to a team.

    5. In case you feel, addressing me as 'sir' or 'prof' is too stuffy and formal, then pls feel free to call me 'Sudhir'. Its perfectly OK by me. That's it for today. See you in class, soon.

    Sudhir

    Monday, February 3, 2014

    Hi

    This is a welcome message to the CBA batch joining in 2014.

    The Data collection (DC) course will use some R. This blog can be a repositary for related R code and assistance. Feedback, Q&A etc are always welcome via the comments sections.

    Looking forward to smooth sailing.

    Sudhir Voleti
    Assistant Professor of Marketing
    ISB Hyderabad