One Shot Learning #9: Let's discuss data products for a bit.

hell

                            July 30, 2019

                One Shot Learning #9: Let's discuss data products for a bit.

                        One Shot Learning
#9: Let's discuss data products for a bit.
Hi folks! Hope everyone has had a lovely summer so far. Let's get back to it.
Recapping where we left off: 

At work these days I am especially engaged in the 90% of typical data science work that does not involve applied science or analytics. Someone asked me about this recently, roughly as follows:
“So you’re just writing software and managing an integration?”
I try not to fall prey to the “affliction” Theo described earlier. Careers are long, and I’m lucky my past experience has positioned me to try and tackle some valuable non-data work right now. I do not know if that’s ideal, but at least it’s useful. 

I'll level with you: it has been difficult to write this newsletter in recent weeks. I struggle to write about a topic when my day-to-day activities take me away from that topic. My daily work at Devoted involves zero machine learning, and this newsletter is ostensibly about machine learning in industry! You hate to see it.
Our team does not want to let our dreams be dreams, though - in recent weeks we've discussed how our data team may deliver data products which extend beyond a reporting platform or a Powerpoint deck. Business intelligence and machine learning are dramatically different fields. Adopting this new competency will require different toolsets and perspectives. 
This is a newsletter and not a Github repository, so tools are less appropriate here than some perspective. Over the next few weeks, I'll try to outline my take on data products. I'll draw from examples shared across some of our most successful tech stories, spanning music, fashion, social networks, and video. I'll note when I lean on my own experience building data products from recommender systems in music and education.
Today, let's answer a key question: what the hell are data products? Let's start by quoting a lot of people.
First, a definition from Jeremy Stanley, the former VP of Data Science at Instacart:

Data products use data science and engineering to improve product performance, typically in the form of better search results, recommendations and automated decisions. [...] Data products can create value and delight users through improved optimization, relevance, etc. 

DJ Patil, CTO at Devoted Health (disclosure: the folks who sign my checks) in a throwback O'Reilly article, tells us more:

To start, for me, a good definition of a data product is a product that facilitates an end goal through the use of data.   [...] The key is to start simple and stay simple for as long as possible. Ideas for data products tend to start simple and become complex; if they start complex, they become impossible. 

Emily Glassberg Sands, head of data science at Coursera, describes the product development process for data products:

The lifecycle of a so-called “data product” mirrors standard product development: identifying the opportunity to solve a core user need, building an initial version, and then evaluating its impact and iterating. But the data component adds an extra layer of complexity. 

In his article on data products, Simon O'Regan helpfully notes Google Images, Spotify's Discover Weekly, and Netflix's recommendation system as examples of data products. On the last two, he notes that these systems, along with self-driving cars and automated drones, "outsource all of the intelligence within a given domain."
There are a few qualities shared across all of these descriptions and examples.
First, data products are applied to domain-specific problems that cannot be manually solved in any scalable way. Curating millions of playlists would require a small army of human editors. At the extreme, the best recommender system is your friend or partner: someone who completely understands your tastes and, in Vicki's case, absolutely does not work at Netflix.
Netflix could probably skip the collaborative filtering and just show me 5 rows of Frasier episodes and it would work about the same from my perspective.
— Vicki Boykis (@vboykis) June 22, 2019

Data size is not the only factor here! We should consider scale and complexity. In the case of Coursera, Emily writes

Coursera's Skills Graph is one example. A series of algorithms that map a robust library of skills to content, careers, and learners, the graph powers a range of discovery-related applications on the site.

At Knewton we dynamically recommended course content to users based on several dimensions of their academic experience. We hired some fantastic educators to help us represent course content programmatically. This gets you through the early days, but scaling quality representations to hundreds of textbooks (and thousands or tens of thousands of other pieces of academic content) is unrealistic.
Therefore, our data products address these problems by approximating domain-specific expertise using data and concepts from machine learning and statistics. In his NYU class and excellent textbook, Mehryar Mohri describes on-line learning using the metaphor of "experts" training algorithms using "advice:"

In this setting, at the t-th round, in addition to receiving data X, the algorithm also receives advice y_i from i = 1, ..., N experts. Following the general framework of online algorithms, it then makes a prediction, receives the true label, and incurs a loss. After T rounds, the algorithm has incurred a cumulative loss. The objective in this setting is to minimize the regret which compares the cumulative loss of the algorithm to that of the best expert in hindsight after T rounds.

If you permit me the thinly-veiled excuse to talk about machine learning theory, I love this description. Imagine that the best expert here is not a weak learner, but someone that knows you best. In lieu of that person, we try to train expert models to solve problems for us at scale.
At iHeartRadio we could not afford to hire a DJ for each of our millions of users, but we could record user activity, estimate preferences from that activity, and then develop a recommender system to build tens of thousands of artist stations on a regular basis. Same goes for educators at Knewton. Heuristics can take you pretty far, but iterating them in response to user feedback is very expensive. Instead, we leverage algorithms to learn some expertise which we can then apply to hard problems.
Finally, a data product relies on an engineering system to deliver this approximated expertise at scale. Leaving this note out is an act of negligence. Jeremy Stanley recently said:
"ML engineers spend less than X% of time on machine learning" is false

cleaning data?
infrastructure setup?
debating objective?
debugging pipeline?
monitoring results?
engineering integration?
persuading others?

Critical tasks that often should be done by the algorithm author
— Jeremy Stanley (@jeremystan) July 22, 2019

Whether these tasks can be classified as "machine learning" is another debate, but they are undeniable components of a successful data product. An RMarkdown file cannot deliver the goods in isolation. Addressing complex problems at scale means doing a thing at scale, a task for which engineering systems are well-suited. You could develop systems responsible for scaling those RMarkdown files. Netflix has done this with Jupyter! You would thus give your data people the right tools!! But your data product does not ship when you've constructed the payload - you need to deliver it too.
So what do you say, Reader, to this definition? Data products rely on concepts from software engineering, machine learning and statistics to apply domain-specific expertise to otherwise intractable problems. This seems like a good start! Next time, let's think about how to deliver valuable data products.

                            Don't miss what's next. Subscribe to One Shot Learning: