3 Effortless Tactics to Be a Data Science Success in Business

Damian Mingle - Business Decision

“Move out of the way – I am ready to model.” That is the typical sentiment of a Data Science team when given a business problem. However, in the context of a dynamic business, things are not that simple; instead, business needs require that the Data Science team be detailed in the communication of their process. The last thing a Data Science team wants to do is produce a project plan they feel is a pedestrian artifact aimed to pacify their business counterparts. They tend to prefer a more fluid and creative style as opposed to one that is stiff and inflexible. Data Scientists may be tempted to promote the idea that they cannot let anything get in the way of creativity and brilliance or it will be to the detriment of the business. However, in many cases, Data Scientists may be allowing their human fear of transparency and accountability to dictate how they approach what the business needs – maximum visibility. Don’t fall into the trap of believing that these templated documents merely exist to check the proverbial box in order to placate the MBAs and Project Managers in the room. Data Science teams designed for success will most certainly deliver a Data Science project plan and use it throughout their analytics project.

Producing a Data Science Project Plan 

You might ask what the intended purpose behind such a fancy business document really is at its core. The Data Science project plan is incredibly straightforward: its sole purpose is to be the battle plan for achieving the Data Science goals which in turn achieve the business goals. Successful Data Science teams will know that there is immense value in not only being able to achieve the Data Science goals, but in being able to relate them back to the business on a constant basis. It’s the burden of the Data Scientist to be sure that clear communication exists between the two groups. The challenge for a Data Scientist is translating Data Science into business terms. This is the kind of thing that is built through experience and through learning what the business expects in a traditional project plan. If a business had a choice between a model with higher predictive accuracy by a Data Scientist without a project plan and a model with lower predictive accuracy by a Data Scientist with a project plan, they most certainly would choose to work with a Data Scientist who could communicate in terms of business, translate Data Science ideas, and understand the power of leveraging other individuals in the organization to contribute to the overall outcome.

Project Plan in Action

The nuts and bolts of a Data Science project plan will be different for each team and each organization, but there are core elements you will see in almost all effective Data Science project plans – sort of a Tao of Data Science Project Plans.

Three Effortless Tactics:

  1. List the stages in the project 

The business should not have to make assumptions about the stages you may take them through as a Data Scientist. Display your expectation to everyone and let them know how much time each stage may take. Also, do the obvious things like listing the resources required as well as the types of inputs and outputs your team expects. Lastly, list dependencies. After all, you will want your counterparts to be aware that you cannot move forward until “x” event happens; for example, the Data Scientist may be waiting to receive a data feed from IT. This is precisely the kind of thing to call out in the Data Science project plan.

2. Define the large iterations in the project 

Most business users will not be intimately involved in how a Data Science team works or why it may change when you encounter a classification problem versus a regression problem. So in an effort to be clear and meaningful, share stages that are more iterative as well as their corresponding durations – such as modeling or the evaluation stages. The best Data Scientists know how to  appropriately manage expectations from the business through communication with the broader organization.

3. Point out scheduling and risks

Virtually all working individuals know that it’s unrealistic to think everything happens only in ideal scenarios. Data Scientists should take the necessary time to consider scheduling resources and the inherent risk they could encounter in the project. Give the business the comfort that only a trusted advisor can provide them. Think through what could happen and what you would recommend to them if they encounter turbulence – because turbulence is inevitable. Taking this extra step is the hallmark of a Data Science professional.

Summary

Do not view the Data Science project plan as training wheels for a junior Data Scientist who is new to working with business, but rather what a skilled Data Scientist will review each time his or her team begins a new task within the Data Science project. Crafting a Data Science project plan to pacify the business – and never utilizing it for team guidance – is a grave mistake that one day could end in ruin for the Data Science team, the business, or both. An effective Data Scientist will work from the perspective that a goal without a plan is simply a wish and nothing more. Or, said differently, an effective Data Science team works a plan at all times.

[Originally posted on LinkedIn]

A Discriminative Feature Space for Detecting and Recognizing Pathologies of the Vertebral Column

ABSTRACT:

Each year it has become more and more difficult for healthcare providers to determine if a patient has a pathology related to the vertebral column. There is great potential to become more efficient and effective in terms of quality of care provided to patients through the use of automated systems. However, in many cases automated systems can allow for misclassification and force providers to have to review more cases than necessary. In this study, we analyzed methods to increase the True Positives and lower the False Positives while comparing them against state-of-the-art techniques in the biomedical community. We found that by applying the studied techniques of a data-driven model, the benefits to healthcare providers are significant and align with the methodologies and techniques utilized in the current research community.

Research Article:

Mingle D (2015) A Discriminative Feature Space for Detecting and Recognizing Pathologies of the Vertebral Column. Biomedical Data Mining 4: 114. doi: 10.4172/2090-4924. 100114

Creating Value for Business: 2 Data Science Questions You Must Ask from the Start

Decisions in Data Science

Business goals are no doubt important, but in an analytic project it makes sense to balance the organization’s goals with those of the Data Science department. Most individuals will recognize balance as a principle of art, but the notion of creating a sense of equilibrium between the business and the Data Scientist is just as foundational in today’s insight economy. To not cultivate this balance is to invite ruin into the organization.

Question 1: What are the Data Science Goals?

As a Data Scientist working in an organization, it is important to understand how the intended outputs of the Data Science project enable the achievement of the business objectives. Imagine a situation where a business has a set of defined goals, but the analytics team had a different target in mind or vice versa. The result is extra cost, time delay, and missed business opportunities. Unfortunately, these sort of happenings are more common than you would imagine in everyday business – and with organizations big and small. As a Data Scientist serving a business, it is prudent to define your goals in tandem with the business objectives and obtain buy-in of your interpretation. This can be done by explicitly documenting what you expect the output to be like and confirming its usefulness to the business unit you are supporting.

Question 2: What is the Data Science success criteria?

Businesses should work with Data Scientists who know how to precisely define a correct outcome in technical terms. In truth, it could prove important to describe these outcomes in subjective terms; however, if this ends up being the case, the person in charge of making these subjective judgments needs to be identified. Neither the business nor the Data Science department will succeed with a moving target. Transparency and visibility are always good things in business. This allows individuals to manage towards a known expectation.

Organizations working with Data Scientists who simply have technical know-how are missing out on significant value within their analytic projects. Organizations should seek to find professionals who know how to translate business concepts into analytic outcomes. This skill should be considered primary over knowing the most advanced techniques and methods when analyzing data. Unfortunately, most organizations are still on a discovery mission with regard to what they need from Data Science. Organizations still remain beholden to the idea that if they hire a Ph.D. in some highly-analytical field then success is just around the corner for their organization. This is rarely the case. In fact, most Ph.D.’s need significant time to warm up to the corporate culture and learn the language of business before they can be fully effective.

It may seem obvious to the organization, but having your analytic superhero be able to quickly judge the type of Data Science problem that you are looking for them to contribute to is paramount to pulling it off.  Typically, being able to specify things like whether the target is a classification, description, prediction, or a clustering problem works well for all involved and starts to build context across disciplines in the organization. This becomes especially important when a Data Science department begins to grow and less experienced Data Scientists can learn to see more like senior Data Scientists; this can only happen with intentionality and purpose.

Organizations should come to expect that one way a good Data Scientist will often demonstrate his or her ability is by reframing or redefining the problem put before them by the company. The first few times this may seem off-putting, but organizations who learn to embrace this sort of transformation of the business problem will be able to compete for the future. Practically speaking this may look like shifting to “medical device retention” rather than “patient retention” when targeting patient retention delivers results too late to affect the outcome.

As a business concerned with the ROI from your Data Science investment, you will undoubtedly want to see activities of the Data Scientist which specify criteria for model assessment. These typically present themselves as model accuracy or performance and complexity. In many cases, it is indispensable to see that a Data Scientist has defined benchmarks for evaluation criteria. Even in the case of subjective assessment, criteria definition becomes important. At times it can be difficult to meet a company’s Data Science goal of model explainability – or data insights provided by the model – if the Data Scientist has not done a good job of uncovering this as a businesses need. So, the adage “to begin with the end in mind” should prompt the Data Scientist to ask an appropriate series of questions of the business to ensure value creation.

Summary

Remember that the Data Science project success criteria are without a doubt different than the business success criteria. Any Data Scientist with experience will say that it is always best to plan with deployment from the beginning of a project. If the organization experiences a Data Scientist not following this best practice, expect spotty results and a bit of frustration from business counterparts. As an organization, it is vital to push your Data Scientist to work hard and be assertive within the project – as well as to use their mind and imagination. This should give him or her the permission to shape the future your company desires.

5 Unbelievable Ways You Can Be a Better Data Scientist in Business

 

Most Data Scientists like to get their hands dirty with data just as quickly as possible, but it is important to practice some delayed gratification and first dig into the details of the Data Science project before you start modeling. A Data Scientist who has the business in mind will attempt to determine what factors might get in the way of the business experiencing success with the project. At different phases there are differing needs for information, but once you have moved past gathering the initial stage of understanding the business, a successful Data Scientist’s objective becomes diving into the details quick and deep.

1: Conduct a Resource Inventory

 

As a Data Scientist, it is important to know the in’s and out’s of the available resources of a Data Science project. This is not just about how much computer power you have to run your analysis. A professional Data Scientist needs to consider many things like the business experts, data experts, technical support, and other data scientists. In addition, there are important variables such as fixed extracts, access to live data, warehoused data, and operational data. However, no one should forget the computing resources such as hardware and software. Any Data Scientist who takes on a project without seriously considering these areas is walking into a minefield, never knowing when something might explode.

2: Understand the Requirements, Assumptions, and Constraints

Most Data Scientists know they have to be better than average at predicting outcomes for whatever the business has selected as a target, but highly successful Data Scientists know that there is more to it than simply gaining a few more points in predictive accuracy. Take for example a Data Scientist who considers all the assumptions that are known about the project both from a bushiness perspective and an analytical perspective. These assumptions can take many forms – however, the ones that rear their ugly heads most often are about the data. Sometimes assumptions are not verifiable as they relate to the business – these can be the riskiest. If at all possible these risky assumptions should be prioritized at the top of the list because they could affect the validity of the results you aim to discover.

Data Scientists need to watch for traps. Consider making explicit any and all availability of resources, even technology constraints. Think outside the box when it comes to limitations. For example, is the size of the data practical for modeling? This may seem obvious, but many Data Scientists overlook this important consideration.

3: Determine Risk and Contingencies

Have you ever started a data analysis project that ended up falling apart only because there were external delays to the project? It is a wise move to consider contingency plans up front. Many Data Scientists take a short-cut here and do not take seriously the insurance that this sort of preparation can provide when needed. It can be extremely helpful to have a backup plan or two in place in the event unknown risks try to derail your projects success. Experience would say that something is always trying to cause you to fail, so plan for alternatives from the beginning.

4: Document Meaning

The question “What do you mean?” is a particularly important question to answer when working with inter-disciplinary teams in a business environment. It should be obvious that we all do not speak the same language when it comes to our domains. Taking the time up front to develop a working glossary of relevant business terminology can keep you and others on track. Another good practice is to have Data Science terminology defined and illustrated with examples, but only work with the terms that directly relate to the business problem at hand. This does not need to be a 700-page document; rather, keep things cogent and useful to all parties involved. Keep in mind others want you to be the Data Scientist; only at the highest level do others want to know the underbelly of statistics and coding.

5: Calculate Cost and Benefits

It is good practice to demonstrate value in your Data Science projects. Remember that as a professional who supports the business it is important to ask and answer the question, “Is the Data Science project of value?” A simple comparison of the associated costs of the project against the potential benefits if successful will go a long way for both you and the business. Knowing this at the beginning of the project is clearly more beneficial to you and the organization than at the close. In my judgment, to not ask and answer this question is a career limiting move that your most successful Data Scientist will seek to get right straight out of the gate. Have the common sense to take on this activity yourself and not wait on your business counterparts or leaders to ask you to do it.

Summary

As Data Science matures in a business context, a Data Scientist needs to be more aware of assessing the situation, taking an inventory, learning about the risk and developing contingencies, and understanding the cost benefits of having a successful Data Science project. Not every Data Scientist will take these steps, but then again not every Data Scientist is highly successful. Like water in the desert is a solid Data Science methodology to a business. Do not leave your organization thirsty when it needs you most.

How Is Knowing the Business Important to Data Science?

Businesses around the world are involved in a multitude of projects at any given time. As Data Scientists come into the business fold, it becomes more important with each passing day to have both parties – “the business” and “the Data Scientist” – begin to define successful strategies of working together. Businesses are having to become aware of the techniques and methods of a Data Scientist in order to maximize their analytic investments; and, simultaneously, Data Scientists are having to learn how to be relevant to an organization that is in a constant state of change. From a business perspective, knowing what to expect of a Data Scientist and having that Data Scientist develop a reasonable Data Science workflow can create huge competitive advantage over other companies who are lost at the “Data Science Sea.”

Our Business Conditions, Today

Performing a bit of journalistic investigation into the organization’s business situation will help provide a Data Scientist with the necessary context for their Data Science project right off the top. Getting background facts on the business will help the Data Scientist know what he or she is getting involved in – in the truest sense. This may not be obvious to the Data Scientist at first, but learning background facts about the business helps to uncover details that will round out one’s understanding of what the business has determined it needs as it relates to the Data Science project. Through this process, information on identifying resources most certainly bubbles to the surface. The takeaway: even if a Data Scientist has worked at the organization for years, this critical step should not be skipped. The business background is a dynamic concept that speaks to the circumstances or situation prevailing at a particular time – it should not be looked at as part of a one-and-done process. Data Scientists should be careful not to fall into the trap of believing that nothing has changed since the last Data Science project.

 It Doesn’t Matter What the Business Wants – I Can Model Anyway!

Many Data Scientists forget the essential step of learning about the business from the business’s perspective. Since the business is the customer of the Data Scientist, this can be easily boiled down to “What does the customer truly want to accomplish?” This simple but straightforward question may seem frivolous to an inexperienced Data Scientist, but getting at what the business objectives are for any Data Science project will create a necessary roadmap for moving forward. The fact of the matter is that most businesses have many competing objectives and constraints that have to be properly balanced in order to be successful on a day-to-day basis. As the Data Scientist, one of your primary aims in ensuring a successful Data Science project is uncovering important, possibly derailing factors that can impact outcomes. Data Scientists should not advance the project workflow on the basis of their analytic talent alone, but rather take the time and necessary steps to learn the business objectives; otherwise, a Data Scientist runs the risk of being seen as a rogue employee with irrelevant results. At the end of a Data Science project, everybody can see clearly when a Data Scientist has come up with the right answer to the wrong problem. A Data Scientist with half the analytic skill can be more effective to an organization than a Data Scientist who squeezes every last bit of information gain from a dataset, but does not know how to frame the business problem.

 What Do You Mean I Missed The Target?

As a Data Scientist who operates in business, you should want to know what it takes for your Data Science project to be successful. However, this cannot be only about the evaluation of predictive models or how a Data Scientist designs experiments, but in addition to how the business will judge success. Learning how to frame up the business success criteria in the form of a question – and whether the criteria will be judged subjectively or objectively – will help a Data Scientist pinpoint the true target. An example of a business criterion that might be specific and measurable objectively would be “reduction of patient readmissions to below 19%.” An example of a business success criterion that is more subjective would be something like “gives actionable insights into the relationship of the data we have.” However, in this later case, it only makes sense for the Data Scientist to ask who is making the call on what is useful and how “useful” is defined. Bottom-line: if Data Scientists do not know what the business success criteria is for a Data Science project, they have already failed before the project has begun.

Summary

Having a solid business understanding about a Data Science project will prove to be valuable for both the Data Scientist and the business. Real-world Data Scientists should not operate as an island. In reality they need to learn to speak many languages beyond Python, R, and Julia; they should also learn to speak “business.” The better a Data Scientist can understand the business milieu, the business objectives, and how to measure the success of a Data Science project in the eyes of the business, the more effective a Data Science will be for an organization.

7 Questions Every Data Scientist Should Be Answering for Businesses

 

Business professionals of all levels have asked me over the years what it is that they should know that their Data Science departments may not be telling them. To be candid, many Data Scientists operate in fear wondering what they should be doing as it relates to the business. In my judgment, the questions below address both parties with the common goal of a win-win for the organization: Data Scientists support their organization as they should while business professionals become more informed with each analysis.

What problem are we trying to solve?

 

It is important to be able to answer this question in the form of a sentence. Remember that the business end-user most likely does not use common terms like CV, logistic regression, or error-based learning in their everyday business routine. It does not help anyone when a Data Scientist hides behind fancy terms instead of providing actionable insight that moves the organization along. I can assure you that translating the Data Science jargon into something digestible for the business professional will create many allies. After all, a Data Scientist should have the primary skill of being able to transform complex ideas and make them readily understood.

Does the approach make sense?

 

In truth, this may be the single best question that benefits the Data Scientist even though it is asked primarily of the business professional. Learning to write out an effective analytic plan can have profound meaning. Writing is a discipline that should be embraced by the Data Scientist. It allows the Data Scientist to synthesize his or her thoughts. Although we live in a day and time where technology is at the center of everything we do, we should remember that technology, Data Science, and statistical computing are not replacements for critical thinking.

Does the answer make sense?

 

Can you make sense out of what you have found? Do you know how to explain the answer you have received? Your organization is counting on you to be the translation piece between the computer output and their business needs. Remember: computers simply do what they are told. As Data Scientists, we need to be sure we directed it to do the right thing. Validate that the instructions you gave it were the ones you intended. Be scientific in your approach, document your assumptions, and be sure you have not introduced bias into your work.

Is it a finding or a mistake?

 

Not everything is a Eureka! moment. So, make skepticism a discipline as a Data Scientist. One should always be skeptical of surprise findings. Experience should tell you that if it seems wrong, then it probably is wrong. Do not blindly accept the conclusion your data presents to you. Again, there is no substitute for critical thinking. Make absolutely sure you understand, and can clearly explain, why things are the way they are – whether a finding or a mistake.

Does the analysis address the original intent?

 

Unless you are surrounded by other Data Scientists in your organization, this question requires accountability to one’s self. You should be honest with yourself, always ensuring that you are not aligning the outcome with the expectations of the organization. It may be obvious to note, but it is critical to speak the truth of the data, realizing sometimes that the outcome does not align with the question the business is seeking to answer. However, if your analysis is essentially something unflattering to the organization, be sure you are 100% confident in your findings. In this situation, additional analysis is more important than less. Giving an analysis that does not reflect well on the business – and that is not well substantiated – may very well be your last.

Is the story complete?

 

We would agree that the best speakers, writers, and leaders are all good storytellers; it is no different for the Data Scientist. While storytelling is not the only way to engage people with your ideas, it is certainly a critical part of the Data Science recipe. Do your best to tell an actionable story. Resist the urge to rely on your business audience to stitch the pieces of your data story together. After all, your analysis is too important to leave up to wild interpretations. Take time to identify potential holes in your story and fill them appropriately to avoid surprises. Grammar, spelling, and graphics matter; your audience will lose confidence in your analysis if your results look sloppy.

Where would we head next?

 

As Data Scientists we should realize that no analysis is truly ever finished – we simply run out of resources. It is worth the effort for a Data Scientist to understand and be able to explain what additional measures could be taken if the business was able to provide additional resources. In simple terms, the business professionals you work with, at the very least, will need to have that information so they can decide if it makes sense to move forward with the supplemental analysis.

Summary

It is key to remember that Data Science techniques are tools that we can use to help make better decisions for an organization and that the predictive models are not an end in themselves. It is paramount that, when tasked with creating a predictive model, we fully understand the business problem that this model is being constructed to address – and then ensure that it does just that. These seven questions begin to form the bond of a stronger partnership between the data science department and the organization.

Math for Machine Learning

Stochastic Processes (Random Planar Maps)

Once you decide to get started with Data Science (in a serious way), the first few months (if not year) can seem pretty difficult. At times, maybe even hopeless, especially if you do not have the necessary academic background or it has been a while since you have been in an academic setting. In my judgment, you should not let that stop you from your pursuit. There are so many resources online/offline to help you start to fill in your gaps.

Below are some of the key areas that you should have mastery over for you to go further with data science/machine learning:

Calculus

  • Functions
  • Continuity
  • Differentiability
  • Integration (single and multi-variables)
  • Optimization
  • Convexity/Concavity

Linear Algebra

  • Vectors
  • Matrices
  • Eigenvalue
  • Vector
  • Singular Value Decomposition
  • Least Squares Estimation and Matrix Algebra

Statistics/Probability

  • Basic probability
  • Sample spaces
  • Conditional probabilities and independence
  • Random variables
  • Moments
  • Distributions
  • Chi-Squared
  • F-Test
  • T-Test
  • Bayes’ Theorem
  • Marginalization
  • Bayesian Inference
  • Likelihood
  • Estimation
  • Regression
  • Analysis of Variance

Stochastic Processes and Dynamical Systems

  • Dirichlet Processes
  • Gaussian Processes for Machine Learning

What else do you think is necessary?

Introduction to Inference and Learning

Many of my subscribers have asked for some resources to help get them on a path for better understanding with regards to inference and learning. As many individuals have various learning styles there are both reading and video (I would recommend both).

  • Book: Murphy — Chapter 1 — Introduction
  • Book: Bishop — Chapter 1 — Introduction

Books mentioned above:

Machine Learning: A Probabilistic Perspective Kevin P. Murphy, MIT Press, 2012.

Pattern Recognition and Machine Learning Christopher M. Bishop, Springer, 2006. An excellent and affordable book on machine learning, with a Bayesian focus. It covers fewer topics than the Murphy book, but goes into more depth on the topics it covers.

If you have resources that you think that I missed, please let me know. If there is a resource that you particularly enjoyed I would like to hear from you as well.

R | Data Selection and Manipulation

This functions below aim to give a bit of background on data and data manipulation in R.

  • which.max(x) returns the index of the greatest element of x
  • which.min(x) returns the index of the smallest element of x
  • rev(x) reverses the elements of x
  • sort(x) sorts the elements of x in increasing order; to sort in decreasing order: rev(sort(x))
  • cut(x,breaks) divides x into intervals (factors); breaks is the number of cut intervals or a vector of cut points
  • match(x, y) returns a vector of the same length than x with the elements of x which are in y (NA otherwise)
  • which(x == a) returns a vector of the indices of x if the comparison operation is true (TRUE), in this example the values of i for which x[i] == a (the argument of this function must be a variable of mode logical)
  • choose(n, k) computes the combinations of k events among n repetitions = n!/[(n−k)!k!]
  • na.omit(x) suppresses the observations with missing data (NA) (suppresses the corresponding line if x is a matrix or a data frame)
  • na.fail(x) returns an error message if x contains at least one NA
  • unique(x) if x is a vector or a data frame, returns a similar object but with the duplicate elements suppressed
  • table(x) returns a table with the numbers of the differents values of x (typically for integers or factors)
  • subset(x, …) returns a selection of x with respect to criteria (…,
  • typically comparisons: x$V1 < 10); if x is a data frame, the option
  • select gives the variables to be kept or dropped using a minus sign
  • sample(x, size) resample randomly and without replacement size elements in the vector x, the option replace = TRUE allows to resample with replacement
  • prop.table(x,margin=) table entries as fraction of marginal table

 

Functions for Manipulating Character Variables
nchar(x) a vector fo the lengths of each value in x
paste(a,b,sep=”_”) concatenates character values, using sep between them
substr(x,start,stop) extract characters from positions start to stop from x
strsplit(x,split) split each value of x into a list of strings using split as the delimiter
grep(pattern,x) return a vector of the elements of x that included pattern
grepl(pattern,x) returns a logical vector indicating whether each element of x contained pattern
regexpr(pattern,x) returns the integer positions of the first occurrence of pattern in each element of x
gsub(pattern,replacement,x) replaces each occurrence of pattern with occurrence
tolower(x) converts x to all lower case
toupper(x) converts x to all upper case

 

Logical Operators
== is equal to
!= is not equal to
> greater than
>= greater than or equal to
< less than
<= less than or equal to
%in% is in the list
! not (reverses T & F
& and
| or

 

R | Variable Information

If you want to know a little bit more about the variables you are working with try out these R commands.

  • is.na(x), is.null(x), is.array(x), is.data.frame(x)is.numeric(x), is.complex(x), is.character(x),… test for type; for a complete list, use methods(is)
  • length(x) number of elements in x
  • dim(x) Retrieve or set the dimension of an object; dim(x) <- c(3,2)
  • dimnames(x) Retrieve or set the dimension names of an object
  • nrow(x) number of rows; NROW(x) is the same but treats a vector as a onerow matrix
  • ncol(x) and NCOL(x) id. for columns
  • class(x) get or set the class of x; class(x) <- “myclass”
  • unclass(x) remove the class attribute of x
  • attr(x,which) get or set the attribute which of x
  • attributes(obj) get or set the list of attributes of obj