R|Getting Help

Most R functions have online documentation.

 

  • help(topic) documentation on topic
  • ?topic id.
  • help.search(“topic”) search the help system
  • apropos(“topic”) the names of all objects in the search list matching the regular expression ”topic”
  • help.start() start the HTML version of help
  • str(a) display the internal *str*ucture of an R object
  • summary(a) gives a “summary” of a, usually a statistical summary but it is generic meaning it has different operations for different classes of a
  • ls() show objects in the search path; specify pat=”pat” to search on a
  • pattern
  • ls.str() str() for each variable in the search path
  • dir() show files in the current directory
  • methods(a) shows S3 methods of a
  • methods(class=class(a)) lists all the methods to handle objects of class a

Getting Started with Machine Learning

In truth, I am an advocate for jumping in head first and using what you learn in real-time. Practically speaking this means learn less about all the theory and heavy math behind what it is you are using with the attitude that you will move towards understanding.

Do you know how to program in a specific language? If so, then determine if that language has a library which can be leveraged to aid you in your machine learning journey.

If you do not know how to program, that is okay also. Survey a few languages (R and Python are popular among data scientist) and see if you have one that is more understandable to you and then go down the same path…seeking a machine learning library.

Shhh, it’s a Library

No Programming Necessary
  • WEKA – you can do virtually everything with this workbench. Pre-processing the data, visualizing the data, building classifiers, and make predictions.
  • BigML – Like WEKA you will not have to program with BIGML. You can explore model building in a browser. If you not certain about machine learning (or data science for that matter), this would be a great place to start.
R (Statistical Computing)
  • If you are really enjoy math and have not picked a language yet, then this may be for you. There are a lot of packages here developed by pioneers in the field which you can leverage without having to refactor any code. All packages come with instructions – giving you some of the theory and example cases for you to see in action. In my judgment, learning this language allows you to explore and prototype quickly which most certainly will prove valuable.
Python
  • Scikit Learn – If you enjoy Python then this library is for you. This library is known for its documentation which allows you to rapidly deploy virtually any machine learning algorithm.
Octave
  • Octave is the open-source version of MatLab (some functions are not present). As is MatLab, Octave is known for solving linear and non-linear problems. If you have an engineering background then this might be the place for you. Although, practically speaking many organizations do not use Octave/MatLab as it is seen as a primarily academic software.

No matter what you pick, decide to use it and stick with it for awhile. In fact, I would commit to it for the next 12-months. Actually use the language/library you choose do not just read about it.

Learning Online

If you are really a beginner, you may want to stay clear of some of what you see online. Many people I talk to like the idea of data science and machine learning and decide to sign-up for an online course. The problem they encounter is that in many cases they already have to know how to program (to some degree) and they should know linear algebra and probability theory.

Linear Algebra Example

Probability Theory Example

If you do decide to watch classes online, then you should absolutely take notes (even if you toss them later). The key is to participate – which may sound obvious, but when you are at home in your pajamas learning about data science it is not quite so obvious.

That being said there are some really good (and free) online lectures (do not be overwhelmed):

Research Papers

This may not be your thing either, not everybody likes to pick up a research paper to read. Many individuals complain that the reading is a bit to academic and does not lend itself to really conveying insight to the reader (which is opposite of the intent of the paper). To be candid some are written better than others, many cases that has to do with the topic or the time period the paper was written in. However, there are a few seminal papers which you should be acquainted with that will allow you to gain context for machine learning and data science which should prove invaluable in your journey. My encouragement to you is to find these papers and if you are not ready to read them due to your efforts to skill building on other areas then simply hold on to them and test read them every 3-months. See how far you get without getting lost, see if you understand what you are doing when you are coding a solution at a deeper level for having read the paper, and best of all read the reference page – find out who influenced the paper you read.

Machine Learning Books for those Just Starting

Let’s face it there are not a lot of books out there that aim to aid those just starting out in machine learning. As before, the expectation is that you will have some linear algebra or probability theory down pat. Unless you come from the hard sciences (mathematician, engineer, bio-statistics, etc) then you probably will have to do some skill building here even before reading most of the books out in the market place. However, there are a few that are approach the true beginning most people are at and encourage those of you willing to try on your own.

Curious to know your thoughts on the above. Have you used any of these resources? Do you have any that you would recommend?

R Objects: Data Types

  • OBJECTS
  • R has five basic classes of objects:
  1. Character
  2. Numeric
  3. Integer
  4. Complex
  5. Logical

However, the most basic object is a vector.

  • There are two things which you should remember when dealing with vectors.
    1. A vector can only contain objects of the same class.
    2. AND there is an exception to this, a list. A list looks like a vector, but can have different classes.

To create an empty vector use the following function:

vector()
Vectors can be created by using the following:
c() # used to concatenate individual values together
: # to create a sequence, such as 1:10
seq() # to create more complex sequences
rep() # replicates values
sort() # ordering elements in a vector
order() # ordering elements in a vector

 

An example of using rep()

rep(5,2) #a vector of two fives
[1] 5 5

 

An example of using c()

c(3,2,1) # vector of three numeric elements in that order
[1] 3 2 1

 

An example of using seq()

seq(4,20, by = 2)
[1]  4  6  8 10 12 14 16 18 20
seq(1,length = 20, by =4)
 [1]  1  5  9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77

R: Numbers

In general, numbers in R are treated as numeric objects.

For example,

 3 # numeric object
[1] 3
 3L # explicitly gives an integer
[1] 3
 Inf # a special number which represents infinity
[1] Inf
 1/0
[1] Inf
 1/Inf # can be used in calculations
[1] 0
 0/0 # NaN ("not a number"); also, seen as a missing number
[1] NaN

 

Numerics are also decimal values in R. This happens by default, so that if you create a decimal value for x that is will be of the numeric type.

 x = 8.3 # create x which a decimal value
 x # print the value of x
[1] 8.3
 class(x) # what is the class of x?
[1] "numeric"

 

Even when assigning an integer to a variable such as N, it is still being retained as a numeric value.

 N = 43
 N #print the value of N
[1] 43
 class(N) # what is the class of N?
[1] "numeric"

 

You can further confirm that N is not an integer by using the is.integer function.

 is.integer(N) # is N an integer?
[1] FALSE
 is.numeric(N) # is N numeric?
[1] TRUE