Creating Value for Business: 2 Data Science Questions You Must Ask from the Start

Decisions in Data Science

Business goals are no doubt important, but in an analytic project it makes sense to balance the organization’s goals with those of the Data Science department. Most individuals will recognize balance as a principle of art, but the notion of creating a sense of equilibrium between the business and the Data Scientist is just as foundational in today’s insight economy. To not cultivate this balance is to invite ruin into the organization.

Question 1: What are the Data Science Goals?

As a Data Scientist working in an organization, it is important to understand how the intended outputs of the Data Science project enable the achievement of the business objectives. Imagine a situation where a business has a set of defined goals, but the analytics team had a different target in mind or vice versa. The result is extra cost, time delay, and missed business opportunities. Unfortunately, these sort of happenings are more common than you would imagine in everyday business – and with organizations big and small. As a Data Scientist serving a business, it is prudent to define your goals in tandem with the business objectives and obtain buy-in of your interpretation. This can be done by explicitly documenting what you expect the output to be like and confirming its usefulness to the business unit you are supporting.

Question 2: What is the Data Science success criteria?

Businesses should work with Data Scientists who know how to precisely define a correct outcome in technical terms. In truth, it could prove important to describe these outcomes in subjective terms; however, if this ends up being the case, the person in charge of making these subjective judgments needs to be identified. Neither the business nor the Data Science department will succeed with a moving target. Transparency and visibility are always good things in business. This allows individuals to manage towards a known expectation.

Organizations working with Data Scientists who simply have technical know-how are missing out on significant value within their analytic projects. Organizations should seek to find professionals who know how to translate business concepts into analytic outcomes. This skill should be considered primary over knowing the most advanced techniques and methods when analyzing data. Unfortunately, most organizations are still on a discovery mission with regard to what they need from Data Science. Organizations still remain beholden to the idea that if they hire a Ph.D. in some highly-analytical field then success is just around the corner for their organization. This is rarely the case. In fact, most Ph.D.’s need significant time to warm up to the corporate culture and learn the language of business before they can be fully effective.

It may seem obvious to the organization, but having your analytic superhero be able to quickly judge the type of Data Science problem that you are looking for them to contribute to is paramount to pulling it off.  Typically, being able to specify things like whether the target is a classification, description, prediction, or a clustering problem works well for all involved and starts to build context across disciplines in the organization. This becomes especially important when a Data Science department begins to grow and less experienced Data Scientists can learn to see more like senior Data Scientists; this can only happen with intentionality and purpose.

Organizations should come to expect that one way a good Data Scientist will often demonstrate his or her ability is by reframing or redefining the problem put before them by the company. The first few times this may seem off-putting, but organizations who learn to embrace this sort of transformation of the business problem will be able to compete for the future. Practically speaking this may look like shifting to “medical device retention” rather than “patient retention” when targeting patient retention delivers results too late to affect the outcome.

As a business concerned with the ROI from your Data Science investment, you will undoubtedly want to see activities of the Data Scientist which specify criteria for model assessment. These typically present themselves as model accuracy or performance and complexity. In many cases, it is indispensable to see that a Data Scientist has defined benchmarks for evaluation criteria. Even in the case of subjective assessment, criteria definition becomes important. At times it can be difficult to meet a company’s Data Science goal of model explainability – or data insights provided by the model – if the Data Scientist has not done a good job of uncovering this as a businesses need. So, the adage “to begin with the end in mind” should prompt the Data Scientist to ask an appropriate series of questions of the business to ensure value creation.

Summary

Remember that the Data Science project success criteria are without a doubt different than the business success criteria. Any Data Scientist with experience will say that it is always best to plan with deployment from the beginning of a project. If the organization experiences a Data Scientist not following this best practice, expect spotty results and a bit of frustration from business counterparts. As an organization, it is vital to push your Data Scientist to work hard and be assertive within the project – as well as to use their mind and imagination. This should give him or her the permission to shape the future your company desires.

7 Questions Every Data Scientist Should Be Answering for Businesses

 

Business professionals of all levels have asked me over the years what it is that they should know that their Data Science departments may not be telling them. To be candid, many Data Scientists operate in fear wondering what they should be doing as it relates to the business. In my judgment, the questions below address both parties with the common goal of a win-win for the organization: Data Scientists support their organization as they should while business professionals become more informed with each analysis.

What problem are we trying to solve?

 

It is important to be able to answer this question in the form of a sentence. Remember that the business end-user most likely does not use common terms like CV, logistic regression, or error-based learning in their everyday business routine. It does not help anyone when a Data Scientist hides behind fancy terms instead of providing actionable insight that moves the organization along. I can assure you that translating the Data Science jargon into something digestible for the business professional will create many allies. After all, a Data Scientist should have the primary skill of being able to transform complex ideas and make them readily understood.

Does the approach make sense?

 

In truth, this may be the single best question that benefits the Data Scientist even though it is asked primarily of the business professional. Learning to write out an effective analytic plan can have profound meaning. Writing is a discipline that should be embraced by the Data Scientist. It allows the Data Scientist to synthesize his or her thoughts. Although we live in a day and time where technology is at the center of everything we do, we should remember that technology, Data Science, and statistical computing are not replacements for critical thinking.

Does the answer make sense?

 

Can you make sense out of what you have found? Do you know how to explain the answer you have received? Your organization is counting on you to be the translation piece between the computer output and their business needs. Remember: computers simply do what they are told. As Data Scientists, we need to be sure we directed it to do the right thing. Validate that the instructions you gave it were the ones you intended. Be scientific in your approach, document your assumptions, and be sure you have not introduced bias into your work.

Is it a finding or a mistake?

 

Not everything is a Eureka! moment. So, make skepticism a discipline as a Data Scientist. One should always be skeptical of surprise findings. Experience should tell you that if it seems wrong, then it probably is wrong. Do not blindly accept the conclusion your data presents to you. Again, there is no substitute for critical thinking. Make absolutely sure you understand, and can clearly explain, why things are the way they are – whether a finding or a mistake.

Does the analysis address the original intent?

 

Unless you are surrounded by other Data Scientists in your organization, this question requires accountability to one’s self. You should be honest with yourself, always ensuring that you are not aligning the outcome with the expectations of the organization. It may be obvious to note, but it is critical to speak the truth of the data, realizing sometimes that the outcome does not align with the question the business is seeking to answer. However, if your analysis is essentially something unflattering to the organization, be sure you are 100% confident in your findings. In this situation, additional analysis is more important than less. Giving an analysis that does not reflect well on the business – and that is not well substantiated – may very well be your last.

Is the story complete?

 

We would agree that the best speakers, writers, and leaders are all good storytellers; it is no different for the Data Scientist. While storytelling is not the only way to engage people with your ideas, it is certainly a critical part of the Data Science recipe. Do your best to tell an actionable story. Resist the urge to rely on your business audience to stitch the pieces of your data story together. After all, your analysis is too important to leave up to wild interpretations. Take time to identify potential holes in your story and fill them appropriately to avoid surprises. Grammar, spelling, and graphics matter; your audience will lose confidence in your analysis if your results look sloppy.

Where would we head next?

 

As Data Scientists we should realize that no analysis is truly ever finished – we simply run out of resources. It is worth the effort for a Data Scientist to understand and be able to explain what additional measures could be taken if the business was able to provide additional resources. In simple terms, the business professionals you work with, at the very least, will need to have that information so they can decide if it makes sense to move forward with the supplemental analysis.

Summary

It is key to remember that Data Science techniques are tools that we can use to help make better decisions for an organization and that the predictive models are not an end in themselves. It is paramount that, when tasked with creating a predictive model, we fully understand the business problem that this model is being constructed to address – and then ensure that it does just that. These seven questions begin to form the bond of a stronger partnership between the data science department and the organization.

How to Become a Data Scientist

How does one become a data scientist?

Well, in truth, the path is most certainly clear. However, the work it takes to travel down the road is not for everyone. Before reading this you may want to have an understanding of where you are with your current analytic skills (e.g. MS Excel only, maybe a little bit of SQL, Crystal reports, etc). Use the rest of this article as a measuring stick for where you are and where you would like to go. In fact, it is best to begin with the end in mind and work backwards to the most basic skill you will need and start building from there…

Recently DataCamp posted an infographic which described 8 easy steps to become a data scientist.

How to become a data scientist

How to become a data scientist A portion of the infographic posted on the DataCamp blog

What is a Data Scientist

It’s important to understand what this infographic is based on:

  1. Drew Conway’s data science venn diagram that combines hacking skills, math and statistics knowledge and substantive expertise.
  2. A graph showing the survey results on the question of education level, not unlike the graph in O’Reilly’s Analyzing the Analyzers.
  3. Josh Wills’ quote on what is a data scientist.

Become a Data Scientist

Using the infographic, the 8 steps to becoming an data scientists are:

  1. You need to know (there is a spectrum here) stats and machine learning. The fix – take online courses for free.
  2. Learn to code (not everything, but very specific things). Get a book or take a class (online or offline). Popular languages are Python and R in the data science space.
  3. You should understand databases. This is important because for the most part this is where the data lives.
  4. Critical skills are data munging (data clean-up and transformations), visualization, and reporting.
  5. You will need to Biggie-Size your skills. Learn to use tools like Hadoop, MapReduce, and Spark.
  6. This part is extremely important – get experience. You should be meeting with other data scientists in meetups or talking with people in your office about what you are learning and accomplishing with your enhanced skills. Do yourself a favor obtain a data set online and start exploring them with your new found techniques. I recommend Kaggle and CrowdAnalytx for interesting data sets.
  7. Get yourself one of these: internship, bootcamp or a job. You can’t beat real experience.
  8. Know who the players are in this space and why. Follow them and engage with them, and be a part of and engage with the data science community.

My thoughts…

In my judgement, look at the data and the algorithms first then get busy with the math and programming. However, I do agree with the idea of moving steps 1-5 for familiarity sake of the discipline. Steps 6-7 I would categorize as working the problem and the final step would be plugging into a community.

It may be important to go another step forward. 

It is more intuitive to minimize steps 1-5 into one (this could be a crash course of terms and themes relevant to data science). My preference (its what has worked for me) is to jump in with the data and the tools of the trade as soon as possible. More need to develop just-in-time learning mechanisms, rather than learning the entire universe of a topic. Approaching data science in this way allows an individual to build on a combination of theory and practical experience. This done by encountering problem sets over and over again.

Learn the art of relevance…what makes sense for my situation right now. Obtain a solid data set and get learning. This sort of action works to build context for the tools you are using.

The fastest way to become a data scienist is to recognize where you are with you current skills, grab a data set, pick a language (R,Python, Julia, C++, Matlab,etc) and start working through a problem end-to-end.

What do you think it takes to be a data scientist?

 

The Death of the Data Scientist?

There has been a lot of chatter recently around the notion that data scientist are soon to be replaced by a 30/hr specialist from places like Odesk, Freelancer, and Elance.  Before we go down the path of can we replace a data scientist, let us take some time to hone in on exactly what a data scientist does? Being candid, there is a plethora of answers to this question.  If we mean, a person who pull together a data summary or modeling task that has been well-defined before they even encounter the problem, then I think it is absolutely possible to come in at a30/hr price. In truth, I see that time of data scientist being replaced by automated software without having to deal with a freelancer at all. Look to how other scenarios like this have occurred, such as online marketing or site development.

But we need to focus on the concept “the data problem was previously well-defined”.

Data scientist who achieve higher salaries happen to be in either two distinct camps:

1) The Engineer:

This individual knows how to choose the proper tools and infrastructure to solve a specific, technology laden data problem. These individuals usually work on the leading edge of a problem or at times there may be very few examples of this problem being worked in global community. This is markedly different than the well-defined problem of the freelancer situation we defined earlier.

2) The Communicator:

This individual knows the technical side of what data science is and how to get at solutions, but there strength is in the story telling. Many times business leadership is unknowing about what is possible with data science and for that they need a translator of sorts. These types of individuals encounter organizations that know they have a problem to solve, but they do not necessarily know how to frame the question so that it can be satisfied by the data. These business look for someone who is personable and not thousands of miles away to guide  them through what they feel is incredibly difficult and important.

While it is certainly true that there may segments of data science which are automated, there will certainly always be a place for problem solvers – think physicians, attorneys, developers, consultants, etc. Like these roles just mentioned, data scientist is not simple a role.

Not all data scientist are performing rote tasks.

There will always be a place for individuals skilled at solving leveraging technology to solve complex business problems and we will have to invest more than $30/hr to garner their expertise.