8 Lessons of An Unconventional Machine Learning Engineer
I worked for more than a year and half on Music Search and Discovery at Wynk Music. This was my first real-life encounter with Machine Learning. I self-educated myself with these rigorous mathematical giants. And, I will tell you Machine Learning is not easy to learn by yourself along side a full-time job. Wynk Music is a startup and largest music streaming company in India (Seriously! Check it out. Wynk Music. And do explore Music using the search bar at the top, a product I loved building during my stint.). This post is about an unconventional way in which I learnt Machine Learning. And if you find yourself doing it this way. No problem. Go do it the right way. The right way is the just opposite of my way. Ha! I will delve on my experience, it may or may not resonate with you, but I hope it helps a general population. I will show you 8 lessons I learned from my experience.
Lesson 1: What’s machine learning?
My experience was more of do-and-learn rather learn-and-do. Although, do-and-learn helps in most of the real life scenario. It does not help when you are creating a production level machine learning product with real stringent timelines of that of a startup. The first encounter I had, was with Linear Regression. And, I am so embarrassed to say, I thought at that time “Wow! It’s so simple. I can learn Machine Learning. Just fit a line! Not much different from curve fitting back in undergrad and solving an optimization problem on top of that.” Only a couple weeks later I realized, Linear Regression is that large onion where you peel and peel and never reach the end. Moreover, many months later I realized, what is “linear” in linear regression. It’s not about the data but the coefficients. (Surprise! Machine Learning is an ocean with pearls at the deepest bed! To get them, you need to learn how to dive.)
Lesson 2: What technologies and tools?
I had studied about linear regression on some online blogs. Looked at some github projects. But to implement that myself, I would need to have knowledge of certain technologies first. Like Python. And Numpy. And Pandas. And Matplotlib. And on and on. I would go on to learn these good old vectorized implementations in Python first, before any real machine learning. Well, you may need knowledge of Matrix Calculus, to understand them better. Fortunately, I was an Electrical Engineering grad, therefore well equipped with all the mathematics.
Lesson 3: Where is my data?
Cool! So I knew how to implement Linear Regression in Python. Time to work on real data now. To do machine learning in real life, first you got to sanitize your humongous amount of unstructured data. So, I went on to get data for the machine learning training process. From Music logs that were stored in AWS S3. “Alright! Alright! Alright!”. The whole 3 TB mammoth spread in many buckets and directories in a file system on cloud. So you got to do machine learning in cloud, not your laptop. Even worse (or not, depends on your view), real life data does not fit into your laptop RAM. Hence, first you got to learn MapReduce. And Distributed systems. And Hadoop. And S3. And Apache Spark. There is no end to this list. However, Spark helped me out here. It has good integration with EMR cluster in AWS that talks to S3 and HDFS. Moreover, has a good SQL like dataframe APIs. Not just that, it has highly efficient implementations of some basic machine learning algorithms. Well, it served my purpose. It had Linear Regression! I had my data and algorithm. Soon, enough a machine learning model. (Not to forget the months spent in cleaning and extracting data.)
Lesson 4: How do I evaluate my model?
For Wynk Music I was trying to solve relevance re-ranking problem in Music Search. Even if I have a Machine Learning model, how do I validate and test it? Hence, I needed to learn about evaluation metrics. All I could learn from numerous blogs and documentations of scikit-learn and spark.ml/mllib was classical metrics like Root Mean Square error. RMS is simplistic and straightforward. It is not a bad starting point. But how do I evaluate the relevance ranking? This involved getting help from domain experts. With my experience, I will tell you, does not matter even if you choose state-of-the-art machine learning model for your task, a domain expert will break it very easily. A domain expert is required to evaluate your model. Not just that, a domain expert is required to build your model efficiently, iterations after iterations. Root Mean Square error is good for training your model as an optimization problem, not to test in real life production scenario. (We did come up with a modified version of NDCG for testing relevance re-ranking in search and consumption score of this search as evaluation metric). Learning about ways of evaluation and metrics would go on to be another month of work.
Lesson 5: How do I use the model in Production?
I have a model. I have learnt how to create it. Mine the data for it. Wrangle the data to feed it. Evaluate. How do I push it in production? Our production setup had Apache Solr serving Music Search. How do I plug it in Solr? Well Bloomberg had done a tremendously good work with Solr and released LTR-plugin. This is learning to rank plugin that supports a Linear Model or a regression tree such as LambdaMART (used in Bing Search). Using this plug-in we got the model serve in production. All is good. (Check my previous blogs to set up Solr 1, 2, 3)
Lesson 6: Is my model the most optimal?
This is do-and-learn approach in Computer Science. Most of the software developers take this approach. However, a data scientist is still distinctly different from such software developers. Just because, I found tools that were available publicly like Apache Spark, Solr-LTR and numerous amount of blogs to help me, I was no expert in machine learning. They helped me implement machine learning solutions rather than understand it thoroughly. You know, there are tons of linear regression model. Why not use a Generalized Linear Model? And why regression, why not classification? Moreover, was my data suited for a particular type of model? A data scientist knows answer to these questions. A structured machine learning approach is a necessity.
Lesson 7: Do-and-learn or learn-and-do? MOOCs?
Do-and-learn approach gives you huge amount of learning only if failures do not dishearten you. You fail a lot. And often. Learn-and-do approach is little different. You fail a lot less. You probably learn a lot, but not lessons of what not to do, but what to do. It is important that we have both approaches in our life and career.
Let’s recap. The timeline for all this is well over 6 months. In 6 months you can push a simple machine learning model online bursting with your domain expertise. Could that be done in less and more efficiently? Can the model be explained in more simplistic way? Can I find a better explanation for my model of choice? I believe so now. I would not have an answer then. It was a good experience learning to do machine learning this way. But I needed more structured approach. I needed to understand intuition and mathematical concepts behind them. So, I went to do Machine Learning courses through MOOCs. Even studied Deep Learning. These MOOCs are good but touch the concepts very lightly. More than the concepts in the course, I learnt the “structured way of learning”. It inspired me to yearn for more. I still needed more discipline and structure. Required vigorous study. Implementing scikit-learn and Apache Spark is not going to help understand machine learning. I needed learn-and-do approach. So, I believed in this “hypothesis” and went on to do a more structured “learn-and-do”.
Lesson 8: Go to grad school? Maybe or maybe not?
Now I have been school for over 6 months. At North Carolina State University, I study computer science specializing in Machine Learning. I have taken courses in Machine Learning. I have learnt concepts in structured fashion. Let me list 5 things that can make or break your machine learning project. These 5 things were known to me during work and while studying for MOOCs. But I had a comprehensive understanding only when I attended school.
- Baselines for adequate AI
- Cross validation and hyper parameter tuning (Grid Search), evaluation metrics
- Overfitting and underfitting — Bias and variance tradeoff
- Curse of dimensionality, feature selection, feature reduction
- Data miners and multi objective optimizers — the growing thin line between them
I believe I am well equipped to do Machine Learning Projects now. But here is the catch. My previous exposure to machine learning did help me structure my course program here. Being in industry for 3 years helped me understand the bridge between machine learning and software engineering, in class and in practice. It helped me relate the concepts to the industry. I believe, it is the combination of both learn-and-do and do-and-learn that works the best. If done in this order, first classes and then industry it would have been even better. As it is proven for any other engineering field, you need to have higher studies first, to innovate better in the industry later. I did the same things, a little unconventionally. Nevertheless, I plan to jump back to do-and-learn approach as I hope to move back to being a software developer after graduation. And again I will fail fast and fail often and learn along the way. And most importantly learn by doing!
— An unconventional machine learning engineer
Vivek Vivek | LinkedIn
View Vivek Vivek's profile on LinkedIn, the world's largest professional community. Vivek has 3 jobs listed on their…