Most Data Scientists like to get their hands dirty with data just as quickly as possible, but it is important to practice some delayed gratification and first dig into the details of the Data Science project before you start modeling. A Data Scientist who has the business in mind will attempt to determine what factors might get in the way of the business experiencing success with the project. At different phases there are differing needs for information, but once you have moved past gathering the initial stage of understanding the business, a successful Data Scientist’s objective becomes diving into the details quick and deep.
1: Conduct a Resource Inventory
As a Data Scientist, it is important to know the in’s and out’s of the available resources of a Data Science project. This is not just about how much computer power you have to run your analysis. A professional Data Scientist needs to consider many things like the business experts, data experts, technical support, and other data scientists. In addition, there are important variables such as fixed extracts, access to live data, warehoused data, and operational data. However, no one should forget the computing resources such as hardware and software. Any Data Scientist who takes on a project without seriously considering these areas is walking into a minefield, never knowing when something might explode.
2: Understand the Requirements, Assumptions, and Constraints
Most Data Scientists know they have to be better than average at predicting outcomes for whatever the business has selected as a target, but highly successful Data Scientists know that there is more to it than simply gaining a few more points in predictive accuracy. Take for example a Data Scientist who considers all the assumptions that are known about the project both from a bushiness perspective and an analytical perspective. These assumptions can take many forms – however, the ones that rear their ugly heads most often are about the data. Sometimes assumptions are not verifiable as they relate to the business – these can be the riskiest. If at all possible these risky assumptions should be prioritized at the top of the list because they could affect the validity of the results you aim to discover.
Data Scientists need to watch for traps. Consider making explicit any and all availability of resources, even technology constraints. Think outside the box when it comes to limitations. For example, is the size of the data practical for modeling? This may seem obvious, but many Data Scientists overlook this important consideration.
3: Determine Risk and Contingencies
Have you ever started a data analysis project that ended up falling apart only because there were external delays to the project? It is a wise move to consider contingency plans up front. Many Data Scientists take a short-cut here and do not take seriously the insurance that this sort of preparation can provide when needed. It can be extremely helpful to have a backup plan or two in place in the event unknown risks try to derail your projects success. Experience would say that something is always trying to cause you to fail, so plan for alternatives from the beginning.
4: Document Meaning
The question “What do you mean?” is a particularly important question to answer when working with inter-disciplinary teams in a business environment. It should be obvious that we all do not speak the same language when it comes to our domains. Taking the time up front to develop a working glossary of relevant business terminology can keep you and others on track. Another good practice is to have Data Science terminology defined and illustrated with examples, but only work with the terms that directly relate to the business problem at hand. This does not need to be a 700-page document; rather, keep things cogent and useful to all parties involved. Keep in mind others want you to be the Data Scientist; only at the highest level do others want to know the underbelly of statistics and coding.
5: Calculate Cost and Benefits
It is good practice to demonstrate value in your Data Science projects. Remember that as a professional who supports the business it is important to ask and answer the question, “Is the Data Science project of value?” A simple comparison of the associated costs of the project against the potential benefits if successful will go a long way for both you and the business. Knowing this at the beginning of the project is clearly more beneficial to you and the organization than at the close. In my judgment, to not ask and answer this question is a career limiting move that your most successful Data Scientist will seek to get right straight out of the gate. Have the common sense to take on this activity yourself and not wait on your business counterparts or leaders to ask you to do it.
As Data Science matures in a business context, a Data Scientist needs to be more aware of assessing the situation, taking an inventory, learning about the risk and developing contingencies, and understanding the cost benefits of having a successful Data Science project. Not every Data Scientist will take these steps, but then again not every Data Scientist is highly successful. Like water in the desert is a solid Data Science methodology to a business. Do not leave your organization thirsty when it needs you most.