This post is to put up some good practices and (optional) readings for DC. More such posts with readings may follow in the days to come.
Good practices:
1. If you haven't already you should register and signup for daily email newsletters from r-bloggers (for creative uses of R code and info on new packages). Even though 80% of the posts on the daily email may not be of direct interest to me, the other 20% makes it very worthwhile indeed.
For example, here's a post detailing job market trends globally in Data science jobs (with data and visualizations done in R).
2. Similarly, signup for programmable web (for API related directory and news) as well.
3. www.kaggle.com hosts data science competitions, releases datasets and tutorials. Can use your FB or goog login. We'll be using some kaggle datasets in NLP. Good practice to keep checking what's new and hot on kaggle from time to time.
4. Replicate *all* the classwork code line by line. Especially so if you're new to R. Lookout for new functions that may come in (do ?function_name in console to see its description), read inline comment documentation carefully, etc.
Should you have trouble running any particular piece of code, search the web, ask peers etc. The coming DC tutorial on 4-March which will be conducted jointly by Sudha and Aashish Pandey is another good opportunity to get clarifications.
Some (optional) readings:
1. This is the NYT article on 'Data Janitor' work we saw in Session 5.
2. This is the NYT article on 'the data driven life' about the quantified self movement.
3. The next couple of readings relate technology to data collection and data use (from the Economist): 'Getting to know you', is about the various ways in which data is collected about consumers online.
4. 'The world wild web', extrapolates some of what we are seeing into the future and asks 'Where are we going?'.
5. From India, here's an article from ET on how data brokers are syndicating and selling user data sans oversight. 'How data brokers are selling all your personal info for less than a rupee to whoever wants it'.
6. Recall the Xerox-evolv caselet we did in class - The one where psychographic Likerts were combined with straightforward machine learning? We then discussed implications, pros and cons etc, and speculated on when such practices may spread worldwide and into India. Well, speaking of India...
'Startups, and India Inc use psychometric tests to peek into potential recruits’ minds'.
Update: Some more readings of interest
7. Recall the 'Exponential learning curve' example in session 5? Well, here's an article I wrote for the NASSCOM sales and Marketing community I wrote a few months ago on that issue.
8. Recall the GE example in 'Information Imperative' in Session 5. Here's the interview by McKinsey of GE's CEO Jeff Immelt on that topic from Oct 2015. (Might require you to register for free with McKinsey quarterly).
9. Recall what the GEexample led to? A discussion on predictive analytics and maintenance on the human machine.... Here's a timely piece from the Economist from 2 days ago on the data revolution in personal healthcare 'A digital revolution in health care is speeding up'.
Well, that's it for now. Watch this space for more to come.
Sudhir
No comments:
Post a Comment