R | Data Selection and Manipulation

This functions below aim to give a bit of background on data and data manipulation in R.

  • which.max(x) returns the index of the greatest element of x
  • which.min(x) returns the index of the smallest element of x
  • rev(x) reverses the elements of x
  • sort(x) sorts the elements of x in increasing order; to sort in decreasing order: rev(sort(x))
  • cut(x,breaks) divides x into intervals (factors); breaks is the number of cut intervals or a vector of cut points
  • match(x, y) returns a vector of the same length than x with the elements of x which are in y (NA otherwise)
  • which(x == a) returns a vector of the indices of x if the comparison operation is true (TRUE), in this example the values of i for which x[i] == a (the argument of this function must be a variable of mode logical)
  • choose(n, k) computes the combinations of k events among n repetitions = n!/[(n−k)!k!]
  • na.omit(x) suppresses the observations with missing data (NA) (suppresses the corresponding line if x is a matrix or a data frame)
  • na.fail(x) returns an error message if x contains at least one NA
  • unique(x) if x is a vector or a data frame, returns a similar object but with the duplicate elements suppressed
  • table(x) returns a table with the numbers of the differents values of x (typically for integers or factors)
  • subset(x, …) returns a selection of x with respect to criteria (…,
  • typically comparisons: x$V1 < 10); if x is a data frame, the option
  • select gives the variables to be kept or dropped using a minus sign
  • sample(x, size) resample randomly and without replacement size elements in the vector x, the option replace = TRUE allows to resample with replacement
  • prop.table(x,margin=) table entries as fraction of marginal table


Functions for Manipulating Character Variables
nchar(x) a vector fo the lengths of each value in x
paste(a,b,sep=”_”) concatenates character values, using sep between them
substr(x,start,stop) extract characters from positions start to stop from x
strsplit(x,split) split each value of x into a list of strings using split as the delimiter
grep(pattern,x) return a vector of the elements of x that included pattern
grepl(pattern,x) returns a logical vector indicating whether each element of x contained pattern
regexpr(pattern,x) returns the integer positions of the first occurrence of pattern in each element of x
gsub(pattern,replacement,x) replaces each occurrence of pattern with occurrence
tolower(x) converts x to all lower case
toupper(x) converts x to all upper case


Logical Operators
== is equal to
!= is not equal to
> greater than
>= greater than or equal to
< less than
<= less than or equal to
%in% is in the list
! not (reverses T & F
& and
| or


Please note: I reserve the right to delete comments that are offensive or off-topic.

Leave a Reply

Your email address will not be published. Required fields are marked *

3 thoughts on “R | Data Selection and Manipulation

  1. This is really helpful. Thanks for sharing this information. Special thanks to the people who came up with these cheatsheets.