Getting Started with Machine Learning

In truth, I am an advocate for jumping in head first and using what you learn in real-time. Practically speaking this means learn less about all the theory and heavy math behind what it is you are using with the attitude that you will move towards understanding.

Do you know how to program in a specific language? If so, then determine if that language has a library which can be leveraged to aid you in your machine learning journey.

If you do not know how to program, that is okay also. Survey a few languages (R and Python are popular among data scientist) and see if you have one that is more understandable to you and then go down the same path…seeking a machine learning library.

Shhh, it’s a Library

No Programming Necessary
  • WEKA – you can do virtually everything with this workbench. Pre-processing the data, visualizing the data, building classifiers, and make predictions.
  • BigML – Like WEKA you will not have to program with BIGML. You can explore model building in a browser. If you not certain about machine learning (or data science for that matter), this would be a great place to start.
R (Statistical Computing)
  • If you are really enjoy math and have not picked a language yet, then this may be for you. There are a lot of packages here developed by pioneers in the field which you can leverage without having to refactor any code. All packages come with instructions – giving you some of the theory and example cases for you to see in action. In my judgment, learning this language allows you to explore and prototype quickly which most certainly will prove valuable.
Python
  • Scikit Learn – If you enjoy Python then this library is for you. This library is known for its documentation which allows you to rapidly deploy virtually any machine learning algorithm.
Octave
  • Octave is the open-source version of MatLab (some functions are not present). As is MatLab, Octave is known for solving linear and non-linear problems. If you have an engineering background then this might be the place for you. Although, practically speaking many organizations do not use Octave/MatLab as it is seen as a primarily academic software.

No matter what you pick, decide to use it and stick with it for awhile. In fact, I would commit to it for the next 12-months. Actually use the language/library you choose do not just read about it.

Learning Online

If you are really a beginner, you may want to stay clear of some of what you see online. Many people I talk to like the idea of data science and machine learning and decide to sign-up for an online course. The problem they encounter is that in many cases they already have to know how to program (to some degree) and they should know linear algebra and probability theory.

Linear Algebra Example

Probability Theory Example

If you do decide to watch classes online, then you should absolutely take notes (even if you toss them later). The key is to participate – which may sound obvious, but when you are at home in your pajamas learning about data science it is not quite so obvious.

That being said there are some really good (and free) online lectures (do not be overwhelmed):

Research Papers

This may not be your thing either, not everybody likes to pick up a research paper to read. Many individuals complain that the reading is a bit to academic and does not lend itself to really conveying insight to the reader (which is opposite of the intent of the paper). To be candid some are written better than others, many cases that has to do with the topic or the time period the paper was written in. However, there are a few seminal papers which you should be acquainted with that will allow you to gain context for machine learning and data science which should prove invaluable in your journey. My encouragement to you is to find these papers and if you are not ready to read them due to your efforts to skill building on other areas then simply hold on to them and test read them every 3-months. See how far you get without getting lost, see if you understand what you are doing when you are coding a solution at a deeper level for having read the paper, and best of all read the reference page – find out who influenced the paper you read.

Machine Learning Books for those Just Starting

Let’s face it there are not a lot of books out there that aim to aid those just starting out in machine learning. As before, the expectation is that you will have some linear algebra or probability theory down pat. Unless you come from the hard sciences (mathematician, engineer, bio-statistics, etc) then you probably will have to do some skill building here even before reading most of the books out in the market place. However, there are a few that are approach the true beginning most people are at and encourage those of you willing to try on your own.

Curious to know your thoughts on the above. Have you used any of these resources? Do you have any that you would recommend?

Please note: I reserve the right to delete comments that are offensive or off-topic.

Leave a Reply

Your email address will not be published. Required fields are marked *

6 thoughts on “Getting Started with Machine Learning

  1. I am struggling a bit with the topics of “Exploratory Data Analysis” and “Feature Engineering”. In my Kaggle ventures, this has been the hardest part for me so far, yet it seems like one of the most important aspects in Kaggle competitions (and from that I dare to extrapolate to the “real world”).

    Do you know any good resources or strategies to build up knowledge on these topics? I am not even sure what the right search terms would be, maybe it is because much of feature engineering depends on actual domain knowledge of the problem at hand and cannot be generalized?

  2. I do agree with all the ideas you have presented for your post. They are really convincing and will definitely work. Still, the posts are too quick for newbies. Please consider lengthening them a little next time?

    Thanks for the post.

  3. I am glad you mentioned WEKA. A lot of machine learning folks can be a bit of a snob with their tools, leaving out WEKA.

  4. How much time would I have to spend with Machine Learning to get really good at it? It seems like there is so much to actually learn.

  5. “Practically speaking this means learn less about all the theory and heavy math behind what it is you are using with the attitude that you will move towards understanding.”

    This type of quote gives hope to us mere mortals!