When a project with no precedence is to be worked on , the question becomes what are the data points that you can use to solve a particular business problem . What data should I pull out to work a cross sell model ? Which data points will help me build an attrition model ?
This is important because otherwise, organisation systems can end up giving you 1000+ variables of various kinds. Then starts the pain of identifying what not to do or which variable not to use. Once you have reduce the set of data to perhaps 200 to 300 variables you are in a position to start working on the model.
Which are the variables that are truly useful and can be part of the final model . Sounds daunting? Yes, it is. And boring. And critical to the success of the project – what if you took up a variable which is never going to be present at the point where your project will get deployed? For example, for a lead optimization project, all the variables that you collect at the customer onboarding point, will not be available at the lead generation point . Thus, the point of deployment of the project has a bearing on what type of information you can use for the project.
This is where the knowledge of the industry, the purpose of the analytics or data science project, the problem that you are speaking to solve with the projects become exceedingly important. It is much more important to be able to decide on the data on which you should be doing the analytics, on the basis of understanding of the deployment cell very young data scientist, requirement of the mentor is not for the part of coding, and sometimes not even for the part of identifying statistical models to be used. The importance of the mentor is to be able to guide the team to create a project framework, not only a statistically valid model but also a deployable model.
Do you agree?
This poses a big problem for the smaller firms, who may want to support a full time team of Junior Analysts and Junior Data Scientists, but may find that getting a Senior resource makes the cost of Analytics exorbitant. Its easier therefore to hire a Shared – Chief Data Scientist.
This concept of Shared CFO has taken up in a big way, especially in the cities of Mumbai and Delhi and I do believe that it is just a matter of time that the Shared Chief Data Scientist becomes a common phenomenon .
Final advice : don’t let your data sleep .. put your data to good use.