Introduction to using R with org-babel, Part 1

May 23, 2010

In my opinion, the description of orgmode by its creator as a tool “for keeping notes, maintaining ToDo lists, doing project planning, and authoring with a fast and effective plain-text system” is a little like describing Emacs as a “text editor”. While technically accurate, one not acquainted with orgmode might not be immediately persuaded to learn it based on its pedestrian description. Needless to say, I think it is worthwhile.

While there are plenty of tutorials and a great introduction to orgmode on its site, there exists a relatively new orgmode extension called org-babel whose documentation, although complete and accurate, might benefit from a high-level overview showing how it can be used to write an R program in an orgmode buffer.

What can orgmode do with source code?

Even without org-babel, you can create a source code block in an orgmode buffer. Why would you want to do this? Mostly so that when you export the orgmode buffer to HTML, that the source code looks like source code. Source code will be displayed in a monospace font, and colored to look just like it does in your Emacs buffer.

To accomplish this, you would put something like the following in your org-mode document. I will use the programming language R as an example. To define a source code block, simply use the #+BEGIN_SRC syntax along with the major mode of the language, in this case ‘R’.

#+BEGIN_SRC R
x <- 1:5

square <- function(x) {
  x^2
}

square(x)
#+END_SRC

Then, when you export the orgmode document to, say, HTML, your R code will look just as it does in Emacs with syntax highlighting, and will be offset from the text surrounding it, like this:

x <- 1:5

square <- function(x) {
  x^2
}

square(x)

A note about syntax highlighting in Emacs

Ideally, when you entered into a code block, Emacs would recognize this, and the syntax highlighting within the code block would reflect whatever language you had specified as you typed it. Unfortunately, it is difficult to get Emacs to behave this way, at least with orgmode. I have investigated several options for doing this, but have run into problems with all of them. Orgmode’s solution to this is to create an indirect buffer containing the source code you’re editing when you type C-c ‘ (i.e., Control-C, then a single quote) in a source code block. You don’t have to do this, but it is nice to get your program in a temporary buffer that is set to the right mode. In R, this also means you can use all the usual ESS shortcut keys.

What does org-babel add to what orgmode can already do?

Org-babel lets you execute source code blocks in an orgmode buffer. Well, what does that mean? At its simplest, org-babel lets you submit a source code block, like the one above, to an R process, and places the results in the actual orgmode buffer.

You activate the additional features that org-babel provides by giving the #+BEGIN_SRC line in an orgmode buffer special arguments, some of which I describe below. All the available options are described in the org-babel documentation. I will show some basic examples of how to add arguments to a #+BEGIN_SRC line.

However, since submitting a code block to a new R process and placing the results in the orgmode buffer is the default behavior of org-babel, you don’t need to supply any arguments to the source code block for this simple operation. You simply carry out this process by typing C-c C-c in a source code block.

#+BEGIN_SRC R
x <- 1:5

square <- function(x) {
  x^2
}

square(x)
#+END_SRC


In your org-mode buffer, with point in the code block, you now type C-c C-c, an R process is started, and the following is placed in the orgmode buffer.

#+results: |  1 | |  4 | |  9 | | 16 | | 25 |

Since we used the default value for the results argument (i.e., we didn’t specify anything), org-babel collected the output and put it in an org-table, as shown above. If you just want the output like it would appear in your ESS R process buffer, you can use the value ‘output’ to the results argument, as shown below. You specify an argument by typing a ‘:’, and then using one of the valid parameter names, typing a space, and then giving the value of the parameter. So, in the example below, results is the name of the parameter, and we’re setting its value to ‘output’. The default value for the results parameter is ‘value’, and gives you the same results as above, i.e., a table of the last results returned by the code block.

#+BEGIN_SRC R :results output
x <- 1:5

square <- function(x) {
  x^2
}

square(x)
#+END_SRC
#+results: : [1]  1  4  9 16 25 

Since we set results to ‘output’, we see the familiar R notation for printing a vector. The default value, where an org-table is created, would be useful for passing the resulting table to perhaps a different function within the orgmode buffer, even one programmed in another language. That is a feature that org-babel supports and encourages, but I will not describe that further here.

Personally, I set results to ‘output’ to write this entries for this blog, since it shows users what the actual R output will look like. This is nice, because there is no cutting and pasting of the results! I can just write my R code in a source code block, and then add a special argument to export the code and results to an HTML file upon exporting in orgmode. My code block would look like this to accomplish that. Notice the new argument, exports, and its value ‘both’, as in both the code and its results.

#+BEGIN_SRC R :results output :exports both 
x <- 1:5

square <- function(x) {
  x^2
}

square(x)
#+END_SRC

Putting the above code in an orgmode buffer and exporting to HTML will show the following:

x <- 1:5

square <- function(x) {
  x^2
}

square(x)
[1]  1  4  9 16 25

Indeed, using org-babel in this fashion is how I now generate content for this site. Compare this approach to what I used to be doing with Sweave. The two approaches are similar, but I now have a completely contained orgmode approach, where as before I had to preprocess the orgmode buffer with an Sweave function before export. You can see the difference not only in how I specify code blocks, but also how the appearance of the output has changed.

Note that the last thing I still have to do in an Elisp function is to insert inline CSS code to generate the background color and outline of the code and results blocks. As far as I know, there is no way to do this in orgmode yet, as the HTML output it contains the CSS style information in the HTML header.

Why do things this way?

I think org-babel is interesting for several reasons. First, as R was one of the first languages supported by it, and I use R as my main programming language at my job, I was naturally interested in it. Org-babel is very useful for me because I am used to working in orgmode, so I can add links, tags, TODO items, create notes, and more right in my code. It’s like having a hyper-commenting system that can be crosslinked with everything. Using orgmode’s intuitive visibility cycling system results in a powerful code-folding feature. Orgmode properties allow me to collect meta-information about my code, such as what objects a code block create.

The export feature of orgmode is very useful for producing this blog, and producing readable documentation for my source code. Also, if I am writing a program to be run in a script, a very common task, org-babel can handle that through a feature called tangling, which I will cover in a future article. The tangling feature turns orgmode into a tool to perform literate programming.

What is left to cover?

In addition to tangling mentioned above, there are several important features that I have not even mentioned in this short introduction. In the subsequent article, I want to show how I use R with org-babel to include inline images created with R, and how to generate \LaTeX output that can be previewed inline in orgmode documents. Using org-babel in this manner closely mirrors a ‘notebook’ style interactive session such as Mathematica provides. The other main feature that org-babel provides is using it as a meta-programming language to, say, call R functions using data generated from a shell script or Python program. Org-babel is a very interesting project that is definitely worth your time to check out, especially if you’re already an Emacs or orgmode user. If you’ve read through this post, you can get started by reading the official org-babel introduction, which will describe what to include in your .emacs file to setup org-babel.

Advertisements

R Function of the Day: sample

May 23, 2010

The R Function of the Day series will focus on describing in plain language how certain R functions work, focusing on simple examples
that you can apply to gain insight into your own data.

Today, I will discuss the sample function.

Random Permutations

In its simplest form, the sample function can be used to return a
random permutation of a vector. To illustrate this, let’s create a
vector of the integers from 1 to 10 and assign it to a variable x.

x <- 1:10

Now, use sample to create a random permutation of the vector x.

sample(x) 
 [1]  3  2  1 10  7  9  4  8  6  5

Note that if you give sample a vector of length 1 (e.g., just the
number 10) that it will do the exact same thing as above, that is,
create a random permutation of the integers from 1 to 10.

sample(10) 
 [1] 10  7  4  8  2  6  1  9  5  3

Warning!

This can be a source of confusion if you’re not careful. Consider the
following example from the sample help file.

sample(x[x > 8])
sample(x[x > 9])
[1] 10  9
 [1]  9  3  4  8  1 10  7  5  2  6

Notice how the first output is of length 2, since only two numbers are
greater than eight in our vector. But, because of the fact that only
one number (that is, 10) is greater than nine in our vector, sample
thinks we want a sample of the numbers from 1 to 10, and therefore
returns a vector of length 10.

The replace argument

Often, it is useful to not simply take a random permutation of a
vector, but rather sample independent draws of the same vector. For
instance, we can simulate a Bernoulli trial, the result of the flip of
a fair coin. First, using our previous vector, note that we can tell
sample the size of the sample we want, using the size argument.

sample(x, size = 5)
[1]  2 10  5  1  6

Now, let’s perform our coin-flipping experiment just once.

coin <- c("Heads", "Tails")
sample(coin, size = 1)
[1] "Tails"

And now, let’s try it 100 times.

sample(coin, size = 100)
Error in sample(coin, size = 100) : 
  cannot take a sample larger than the population when 'replace = FALSE'

Oops, we can’t take a sample of size 100 from a vector of size 2,
unless we set the replace argument to TRUE.

table(sample(coin, size = 100, replace = TRUE))

Heads Tails 
   53    47

Simple bootstrap example

The sample function can be used to perform a simple bootstrap.
Let’s use it to estimate the 95% confidence interval for the mean of a
population. First, generate a random sample from a normal
distribution.

rn <- rnorm(1000, 10)

Then, use sample multiple times using the replicate function to
get our bootstrap resamples. The defining feature of this technique is
that replace = TRUE. We then take the mean of each new sample, gather them, and finally compute the relevant quantiles.

quantile(replicate(1000, mean(sample(rn, replace = TRUE))),
         probs = c(0.025, 0.975))
     2.5%     97.5% 
 9.936387 10.062525

Compare this to the standard parametric technique.

t.test(rn)$conf.int
[1]  9.938805 10.061325
attr(,"conf.level")
[1] 0.95