For one of our clients, we have built out a random forest model that predicted lifetime value of any user. And, as with most models, we knew that communication was key.
Communication from the start
Ok, so first of all, communication is never an afterthought. In reality, you want to be communicating your model to the stakeholders from the start.
Do I need a model?
Please, please, please — make sure to do cost-benefit analysis of making a model. I know that as data scientists, we are excited by the complexities and learning involved in executing a machine learning project. But, do realize that the gains of the machine learning model must be huge enough to justify the long amount of time it takes to build a full machine learning model. So, do seek the simplest solution. Avoid machine learning solutions at first and see if a much simpler solution would suffice. Only if the problem is important and can’t be solved without machine learning, do you embark on a machine learning project.
Communication barrier
As stakeholders listen to you — they have one thing on their mind “how does this affect or concern my business?” So, tailor your information accordingly. Also, business stakeholders don’t have the technical background that you have, so here is what your detailed explanation of a random forest algorithm sounds like — “bla bla lalala bla”.
So, does this mean not talking about the model at all and just stating the error of the model? Well, no. The goal is to abstract away from the model to the point that it is understandable by anyone. There are 2 ways of explaining gravity, for example — you can write out all the physics equations or say “it’s a force that makes things fall when we drop them”. Choose the easier one.
How we did it?
We realized that stakeholders did not quite care all that much about nitty gritty of the algorithms we were using. So, we spared the details and talked about the model in general abstracted terms.
The error metric we used internally as a team was RMSE. However, the RMSE idea was a bit complex and intangible for stakeholders. Also, we built the model on user level but then we rolled it up into really granular cohorts. So, we picked a different, more fitting metric for stakeholders.
The metric we picked was the percentage difference of rolled up actual vs predicted values. Then, we would look at distribution of that error across cohorts. Internally, as a team, however, when we were tuning the model, we were using RMSE as a metric we were trying to minimize. But, when we talked to stakeholders we showed a much less intimidating, and more fitting "percentage difference between predicted and actual values" metric.
And, because RMSE is a much more sensitive error measure, whatever we did to decrease RMSE would also decrease the more downstream metric of cohort percentage difference.