How to Become a Data Scientist

How does one become a data scientist?

Well, in truth, the path is most certainly clear. However, the work it takes to travel down the road is not for everyone. Before reading this you may want to have an understanding of where you are with your current analytic skills (e.g. MS Excel only, maybe a little bit of SQL, Crystal reports, etc). Use the rest of this article as a measuring stick for where you are and where you would like to go. In fact, it is best to begin with the end in mind and work backwards to the most basic skill you will need and start building from there…

Recently DataCamp posted an infographic which described 8 easy steps to become a data scientist.

How to become a data scientist

How to become a data scientist A portion of the infographic posted on the DataCamp blog

What is a Data Scientist

It’s important to understand what this infographic is based on:

  1. Drew Conway’s data science venn diagram that combines hacking skills, math and statistics knowledge and substantive expertise.
  2. A graph showing the survey results on the question of education level, not unlike the graph in O’Reilly’s Analyzing the Analyzers.
  3. Josh Wills’ quote on what is a data scientist.

Become a Data Scientist

Using the infographic, the 8 steps to becoming an data scientists are:

  1. You need to know (there is a spectrum here) stats and machine learning. The fix – take online courses for free.
  2. Learn to code (not everything, but very specific things). Get a book or take a class (online or offline). Popular languages are Python and R in the data science space.
  3. You should understand databases. This is important because for the most part this is where the data lives.
  4. Critical skills are data munging (data clean-up and transformations), visualization, and reporting.
  5. You will need to Biggie-Size your skills. Learn to use tools like Hadoop, MapReduce, and Spark.
  6. This part is extremely important – get experience. You should be meeting with other data scientists in meetups or talking with people in your office about what you are learning and accomplishing with your enhanced skills. Do yourself a favor obtain a data set online and start exploring them with your new found techniques. I recommend Kaggle and CrowdAnalytx for interesting data sets.
  7. Get yourself one of these: internship, bootcamp or a job. You can’t beat real experience.
  8. Know who the players are in this space and why. Follow them and engage with them, and be a part of and engage with the data science community.

My thoughts…

In my judgement, look at the data and the algorithms first then get busy with the math and programming. However, I do agree with the idea of moving steps 1-5 for familiarity sake of the discipline. Steps 6-7 I would categorize as working the problem and the final step would be plugging into a community.

It may be important to go another step forward. 

It is more intuitive to minimize steps 1-5 into one (this could be a crash course of terms and themes relevant to data science). My preference (its what has worked for me) is to jump in with the data and the tools of the trade as soon as possible. More need to develop just-in-time learning mechanisms, rather than learning the entire universe of a topic. Approaching data science in this way allows an individual to build on a combination of theory and practical experience. This done by encountering problem sets over and over again.

Learn the art of relevance…what makes sense for my situation right now. Obtain a solid data set and get learning. This sort of action works to build context for the tools you are using.

The fastest way to become a data scienist is to recognize where you are with you current skills, grab a data set, pick a language (R,Python, Julia, C++, Matlab,etc) and start working through a problem end-to-end.

What do you think it takes to be a data scientist?



The notion of a function is that of something which provides a distinct output for a given input.


Think about two sets, D and R along with a principle which appropriates a unique element of R to each and every element of D. This rule is termed a function and it is represented by a letter such as f. Given n x ∈ D, f (x) is the name of the thing in R which comes from doing f to x. D is called the domain of f. In order to establish that D refers to f, the representation D (f) may be used. The set R is sometimes described as the range of f. Nowadays it.
is known as the codomain. The set of all elements of R which are of the form f (x) for some x ∈ D is consequently, a subset of R. This is sometimes referred to as the image of f. When this set equals R, the function f is said to be onto, also surjective, if whenever x  ̸= y it followss f (x) ̸= f (y), the function is called one-to-one, also injective.

It is typical representation to write f : D → R to denote the condition just described within this definition where f is a function characterized on a domain D which has values in a codomain R.