Sunday, October 20, 2019

Reg Text-An - quanteda, DFms, LDA and Naive bayes

Thisis a nice article on quanteda use in R.

Yup, yet another R texn-an package. I trace a long history on this one, starting with tm some 6 yrs ago. Have settled on tidytext for all BOW work nowadays.

Anyway, interesting pieces of this article are [1] its dataset - UN speeches etc, [2] ref to papers that show how to do unsupervised (LDA) as well as supervised learning (classification) on text, etc.

Will write more later, this is it for now.

Sunday, October 13, 2019

Association Rules in R

Here's a nice post on the use of associative rules for market basket analysis (MBA) in R.

Yup, as suspected, it does use R's well-ish-known arules package and the datasets in that package.

I have to wonder if there's any other datasets out there and available for in-class demos.

Problem with MBA typically is that at the SKU level there's way too many items to handle and at the category level, it just isn't as interesting only.

Some sorta hierarchical setup might help perhaps, eh?

Wednesday, October 9, 2019

Restarting An-Yogi.


Or testing Waters, more like it.

This blog closed after an unfoirtunate run-in with Batch 8, CBA. I'd then decided to close this blog for CBA course-related work.

But nothing stops me from restarting it for keeping track of my own work. Which is what follows.

I read many posts online on tech and code. This blog seems like a good place to keep track of some of the more interesting or useful ones.

+++++++++++

First, here's a nice post on how to use LASSO (Least Absolute Shrinkage and Selection Operator) to do [political] micro-targeting and online-ad-retargeting using FB 'Likes' and preference data on the one side and the Big-5 personality framework on the other.

Let me quote from the blog itself:

Basically, Microtargeting is the prediction of psychological profiles on the basis of social media activity and using that knowledge to address different personality types with customized ads. Microtargeting is not only used in the political arena but of course also in Marketing and Customer Relationship Management (CRM).

So, what's *LASSO*? An extension of the classical linear regression wherein, given many IVs on the RHS, we fish for the 'best'/'most important' ones by *shrinking* all coefficients to zero via a constrained optimization routine neatly packaged and delivered via the **glm** R library.

Another advantage of this post - it provides dummy data to test the code on. As well as decent interpretations for the shrunk yet non-zero beta coefficients in the output.

Even better, since its all pure-R, *shiny-fying* it for later use in a PGP classroom is eased.

Note: Machine Learning Basics for Marketers (MLBM) is on offer for the first time in Term 7 this acad year.

+++++++++++

Well, will close here for now. More posts will hopefully follow and this attempt won't fizzle out yet again.

Sudhir Voleti, Oct 2019.