Introduction to using R with org-babel, Part 1

May 23, 2010

In my opinion, the description of orgmode by its creator as a tool “for keeping notes, maintaining ToDo lists, doing project planning, and authoring with a fast and effective plain-text system” is a little like describing Emacs as a “text editor”. While technically accurate, one not acquainted with orgmode might not be immediately persuaded to learn it based on its pedestrian description. Needless to say, I think it is worthwhile.

While there are plenty of tutorials and a great introduction to orgmode on its site, there exists a relatively new orgmode extension called org-babel whose documentation, although complete and accurate, might benefit from a high-level overview showing how it can be used to write an R program in an orgmode buffer.

What can orgmode do with source code?

Even without org-babel, you can create a source code block in an orgmode buffer. Why would you want to do this? Mostly so that when you export the orgmode buffer to HTML, that the source code looks like source code. Source code will be displayed in a monospace font, and colored to look just like it does in your Emacs buffer.

To accomplish this, you would put something like the following in your org-mode document. I will use the programming language R as an example. To define a source code block, simply use the #+BEGIN_SRC syntax along with the major mode of the language, in this case ‘R’.

#+BEGIN_SRC R
x <- 1:5

square <- function(x) {
  x^2
}

square(x)
#+END_SRC

Then, when you export the orgmode document to, say, HTML, your R code will look just as it does in Emacs with syntax highlighting, and will be offset from the text surrounding it, like this:

x <- 1:5

square <- function(x) {
  x^2
}

square(x)

A note about syntax highlighting in Emacs

Ideally, when you entered into a code block, Emacs would recognize this, and the syntax highlighting within the code block would reflect whatever language you had specified as you typed it. Unfortunately, it is difficult to get Emacs to behave this way, at least with orgmode. I have investigated several options for doing this, but have run into problems with all of them. Orgmode’s solution to this is to create an indirect buffer containing the source code you’re editing when you type C-c ‘ (i.e., Control-C, then a single quote) in a source code block. You don’t have to do this, but it is nice to get your program in a temporary buffer that is set to the right mode. In R, this also means you can use all the usual ESS shortcut keys.

What does org-babel add to what orgmode can already do?

Org-babel lets you execute source code blocks in an orgmode buffer. Well, what does that mean? At its simplest, org-babel lets you submit a source code block, like the one above, to an R process, and places the results in the actual orgmode buffer.

You activate the additional features that org-babel provides by giving the #+BEGIN_SRC line in an orgmode buffer special arguments, some of which I describe below. All the available options are described in the org-babel documentation. I will show some basic examples of how to add arguments to a #+BEGIN_SRC line.

However, since submitting a code block to a new R process and placing the results in the orgmode buffer is the default behavior of org-babel, you don’t need to supply any arguments to the source code block for this simple operation. You simply carry out this process by typing C-c C-c in a source code block.

#+BEGIN_SRC R
x <- 1:5

square <- function(x) {
  x^2
}

square(x)
#+END_SRC


In your org-mode buffer, with point in the code block, you now type C-c C-c, an R process is started, and the following is placed in the orgmode buffer.

#+results: |  1 | |  4 | |  9 | | 16 | | 25 |

Since we used the default value for the results argument (i.e., we didn’t specify anything), org-babel collected the output and put it in an org-table, as shown above. If you just want the output like it would appear in your ESS R process buffer, you can use the value ‘output’ to the results argument, as shown below. You specify an argument by typing a ‘:’, and then using one of the valid parameter names, typing a space, and then giving the value of the parameter. So, in the example below, results is the name of the parameter, and we’re setting its value to ‘output’. The default value for the results parameter is ‘value’, and gives you the same results as above, i.e., a table of the last results returned by the code block.

#+BEGIN_SRC R :results output
x <- 1:5

square <- function(x) {
  x^2
}

square(x)
#+END_SRC
#+results: : [1]  1  4  9 16 25 

Since we set results to ‘output’, we see the familiar R notation for printing a vector. The default value, where an org-table is created, would be useful for passing the resulting table to perhaps a different function within the orgmode buffer, even one programmed in another language. That is a feature that org-babel supports and encourages, but I will not describe that further here.

Personally, I set results to ‘output’ to write this entries for this blog, since it shows users what the actual R output will look like. This is nice, because there is no cutting and pasting of the results! I can just write my R code in a source code block, and then add a special argument to export the code and results to an HTML file upon exporting in orgmode. My code block would look like this to accomplish that. Notice the new argument, exports, and its value ‘both’, as in both the code and its results.

#+BEGIN_SRC R :results output :exports both 
x <- 1:5

square <- function(x) {
  x^2
}

square(x)
#+END_SRC

Putting the above code in an orgmode buffer and exporting to HTML will show the following:

x <- 1:5

square <- function(x) {
  x^2
}

square(x)
[1]  1  4  9 16 25

Indeed, using org-babel in this fashion is how I now generate content for this site. Compare this approach to what I used to be doing with Sweave. The two approaches are similar, but I now have a completely contained orgmode approach, where as before I had to preprocess the orgmode buffer with an Sweave function before export. You can see the difference not only in how I specify code blocks, but also how the appearance of the output has changed.

Note that the last thing I still have to do in an Elisp function is to insert inline CSS code to generate the background color and outline of the code and results blocks. As far as I know, there is no way to do this in orgmode yet, as the HTML output it contains the CSS style information in the HTML header.

Why do things this way?

I think org-babel is interesting for several reasons. First, as R was one of the first languages supported by it, and I use R as my main programming language at my job, I was naturally interested in it. Org-babel is very useful for me because I am used to working in orgmode, so I can add links, tags, TODO items, create notes, and more right in my code. It’s like having a hyper-commenting system that can be crosslinked with everything. Using orgmode’s intuitive visibility cycling system results in a powerful code-folding feature. Orgmode properties allow me to collect meta-information about my code, such as what objects a code block create.

The export feature of orgmode is very useful for producing this blog, and producing readable documentation for my source code. Also, if I am writing a program to be run in a script, a very common task, org-babel can handle that through a feature called tangling, which I will cover in a future article. The tangling feature turns orgmode into a tool to perform literate programming.

What is left to cover?

In addition to tangling mentioned above, there are several important features that I have not even mentioned in this short introduction. In the subsequent article, I want to show how I use R with org-babel to include inline images created with R, and how to generate \LaTeX output that can be previewed inline in orgmode documents. Using org-babel in this manner closely mirrors a ‘notebook’ style interactive session such as Mathematica provides. The other main feature that org-babel provides is using it as a meta-programming language to, say, call R functions using data generated from a shell script or Python program. Org-babel is a very interesting project that is definitely worth your time to check out, especially if you’re already an Emacs or orgmode user. If you’ve read through this post, you can get started by reading the official org-babel introduction, which will describe what to include in your .emacs file to setup org-babel.

Advertisements

Making Emacs Buffer Names Unique Using the Uniquify Package

October 5, 2009

If you ever work in Emacs with different files that all have the same name, this tip may be very useful for you. In my experience, this most often happens when working with Makefiles from multiple projects or directories. Assuming you have a buffer visiting a file called Makefile, by default Emacs will call the second buffer Makefile<2>. This is not very useful for figuring out which one is which.

The uniquify package

Enter the uniquify package. It offers several options for making buffer names unique. In my .emacs, I have the following two lines of code.

(require 'uniquify)
(setq uniquify-buffer-name-style 'forward)

You can set the value to any of the following options, nil being the default. These examples are taken from the help file within Emacs. They use the information from the directories the files are located in to make the names unique. The first buffer called “name” is visiting a file in the directory bar/mumble/, and the second buffer is visiting a file called “name” in directory quux/mumble.

value first buffer called “name” second buffer called “name”
forward bar/mumble/name quux/mumble/name
reverse name\mumble\bar name\mumble\quux
post-forward name|bar/mumble name|quux/mumble
post-forward-angle-brackets name<bar/mumble> name<quux/mumble>
nil name name<2>

Using doc-view with auto-revert to view LaTeX PDF output in Emacs

October 3, 2009

When I am authoring a \LaTeX document in Emacs, such as a report or my CV, it is useful for me to compile the \LaTeX source file periodically to see what the resulting file PDF looks like. I used to run a separate PDF viewer to look at the output, but I now have a complete Emacs solution.

The editing environment

When writing a \LaTeX document, I usually want the output to be a PDF file. Accordingly, I put the following in my .emacs file.

(setq TeX-PDF-mode t)

I then split my Emacs frame into two buffers vertically, using C-x 3 (see screencast below). After compiling my \LaTeX file with C-c C-c, I visit the resulting PDF file in the other Emacs window. The Emacs doc-view package will display the PDF file.

Including auto-revert functionality

The final piece to the puzzle is to set files visited in doc-view-mode to auto-revert when changed on disk. That way, then I update my \LaTeX file and recompile with C-c C-c, the PDF in the other window will automatically update.

This is achieved by placing the following line in my .emacs.

(add-hook 'doc-view-mode-hook 'auto-revert-mode)

Screencast Example

Here is a screencast of this process in action.

This is a simple setup that I use to author reports, edit them, and see immediate updates to my output file without leaving Emacs.


R Object Tooltips in ESS

October 1, 2009

Whether at work or for personal projects, I use ESS a lot to perform interactive data analyses. The ability to write, edit, and submit R commands to an interactive R process is simply something I cannot imagine analyzing data without.

An example

One thing that I end up having to do a lot is inspect an object that I have just assigned to a variable in R. To fix ideas, let us create a data.frame called df for this example.


> df <- data.frame(patient = 1:100,
                   age = rnorm(100, mean = 10, sd = 6),
                   sex = sample(gl(2, 50, labels =
                     c("Male", "Female"))))


I just created the data.frame df, and I want to know if I did it correctly. For instance, does it look like I expect it to? Does it have 100 observations like I want? Do the variables have the right names? Is the sex variable a factor with two levels? In short, I want to call the str function using the object df as an argument.

Here is the output I am interested in seeing:


> str(df)
'data.frame':   100 obs. of  3 variables:
 $ patient: int  1 2 3 4 5 6 7 8 9 10 ...
 $ age    : num  11.06 7.73 17.61 3.11 6.76 ...
 $ sex    : Factor w/ 2 levels "Male","Female": 2 1 1 2 2 1 1 2 2 2 ... 

Inspecting objects the old way

So, how can I quickly see the structure as shown above? One idea is to switch over to my interactive R buffer in Emacs, type the command at the prompt, and then switch back to my code buffer to edit the data.frame command or continue programming. I dislike having to switch back and forth between the buffers for a one-off command though.

Alternatively, I could type str(df) in my code buffer, evaluate it, and decide to keep it or delete the line. Since this is more of a quick check, without permanent results, I usually will not want to keep lines like this around, since they clutter up my program. Typically, I am writing the program to be later run in BATCH mode, so I also do not want functions like that in my code since some can be excessively time-consuming depending on the size of the data.frame.

Another option is to use the ESS function ess-execute-in-tb, by default this is bound to C-c C-t, which will prompt me for an expression to evaluate. This is nice because I do not have to clutter my buffer with extraneous function calls. However, after using this method for a while, I noticed that I had many patterns with my objects. For data.frames, I would almost always use summary or str on them after assignment. For factors, I would want to table the values after I created them, to be sure they looked right. For numeric vectors, I would want to summarize them. I also wanted to summarize model fits (e.g., lm). I wanted to take advantage of my usage patterns so that I did not have to type so much after assigning an object to a variable.

Inspecting objects the new way

I therefore wrote an Emacs Lisp function that, when called via a key chord in Emacs, inspects the object at point, determines the class of that object, and based on the class, calls an R function on that object, showing the results in a tooltip. For the df example above, I would just put point on “df”, anywhere in the source code, and type C-c C-g (my default binding). A tooltip is then shown with the output of str(df).

An example similar to this, along with several others are shown in this screencast. I think this is the best way to show how my Lisp function interacts with R to show object information in tooltips.

Pretty nice! One thing to note is that the tooltips are displaying in a proportional font, not a monospace one. I know at some point I had found a customizable variable to specify which font tooltips display in, but I apparently did not save it. If I find that variable, I will update this post to reflect how to do that.

The Emacs Lisp function and keybinding

Here is the code you will need for this behavior. It depends on having tooltip-show-at-point defined, which is found only in ESS 5.4 (the current version as of this post) or later. I contributed tooltip-show-at-point to the ESS project a few months ago. It is used to show argument tooltips when you type an opening parenthesis. Perhaps my object tooltip function will find its way into a future version of ESS. Here is the code.

;; ess-R-object-tooltip.el
;; 
;; I have defined a function, ess-R-object-tooltip, that when 
;; invoked, will return a tooltip with some information about
;; the object at point.  The information returned is 
;; determined by which R function is called.  This is controlled
;; by an alist, called ess-R-object-tooltip-alist.  The default is
;; given below.  The keys are the classes of R object that will
;; use the associated function.  For example, when the function
;; is called while point is on a factor object, a table of that
;; factor will be shown in the tooltip.  The objects must of course
;; exist in the associated inferior R process for this to work.
;; The special key "other" in the alist defines which function
;; to call when the class is not mached in the alist.  By default,
;; the str function is called, which is actually a fairly useful
;; default for data.frame and function objects. 
;; 
;; The last line of this file shows my default keybinding. 
;; I simply save this file in a directory in my load-path
;; and then place (require 'ess-R-object-tooltip) in my .emacs 

;; the alist
(setq ess-R-object-tooltip-alist
      '((numeric    . "summary")
        (factor     . "table")
        (integer    . "summary")
        (lm         . "summary")
        (other      . "str")))


(defun ess-R-object-tooltip ()
  "Get info for object at point, and display it in a tooltip."
  (interactive)
  (let ((objname (current-word))
        (curbuf (current-buffer))
        (tmpbuf (get-buffer-create "**ess-R-object-tooltip**")))
    (if objname
        (progn
          (ess-command (concat "class(" objname ")\n")  tmpbuf )   
          (set-buffer tmpbuf)
          (let ((bs (buffer-string)))
            (if (not(string-match "\(object .* not found\)\|unexpected" bs))
                (let* ((objcls (buffer-substring 
                                (+ 2 (string-match "\".*\"" bs)) 
                                (- (point-max) 2)))
                       (myfun (cdr(assoc-string objcls 
                                                ess-R-object-tooltip-alist))))
                  (progn
                    (if (eq myfun nil)
                        (setq myfun 
                              (cdr(assoc-string "other" 
                                                ess-R-object-tooltip-alist))))
                    (ess-command (concat myfun "(" objname ")\n") tmpbuf)
                    (let ((bs (buffer-string)))
                      (progn
                        (set-buffer curbuf)
                        (tooltip-show-at-point bs 0 30)))))))))
    (kill-buffer tmpbuf)))

;; my default key map
(define-key ess-mode-map "\C-c\C-g" 'ess-R-object-tooltip)

(provide 'ess-R-object-tooltip)

Notice that you can add your own object classes and functions fairly easily at the top of the program. There is a special “other” class which will be called for classes not defined otherwise.

Further meta-data features in ESS?

If you can think if anymore examples for types of objects that this would be useful for, feel free to post them in the comments. I think this is a very useful feature when interactively examining datasets, fitting models, and analyzing data. In general, I think there are many more interesting ways to have meta-data on objects available quickly within the ESS and R system. I will be sure to share them as I explore ways to more efficiently do statistical analysis within the R environment.


Emacs Key Binding for eval-defun in lisp-mode

September 22, 2009

When I use R in Emacs through the ESS package, C-c C-c in a .R buffer will send a “block” of code to the inferior R process for evaluation. This was added just a few years ago, but my fingers are now trained to use that key combination for evaluating any block of code. Since I have been learning Emacs Lisp, I decided that a good idea would be to make C-c C-c a binding to eval-defun. I really like how it is working out as I have to redefine my Lisp functions many times! 🙂

Just put the following in your .emacs file to get this behavior. However, please note the following from the eval-defun help string, “If the current defun is actually a call to `defvar’ or `defcustom’, evaluating it this way resets the variable using its initial value expression even if the variable already has some other value. (Normally `defvar’ and `defcustom’ do not alter the value if there already is one.)”

(define-key lisp-mode-shared-map "\C-c\C-c" 'eval-defun) 

Welcome to Blogistic Reflections! (A blog created entirely in Emacs org-mode)

September 20, 2009

John Tukey’s preface to Exploratory Data Analysis begins with a useful rule, “It is important to understand what you can do before you learn to measure how well you seem to have done it.” When I decided I wanted to start a blog concentrating on statistics, R, and Emacs, I thought I had better learn first what I can do to make the process of generating content easier. Here, as a meta first post, I present how I used Emacs, R, and other technologies to produce the output you are reading here.

Emacs, ESS, and org-mode

I use Emacs for everything I can. I started learning it as a graduate student in Statistics in order to use the ESS package. ESS lets you run R within Emacs, which is really nice for developing programs and performing data analysis interactively. I have discovered many Emacs packages through the years, but lately I have been learning the excellent org-mode package. Org-mode is designed so that you can get started with very minimal knowledge of its capabilities. You can ignore the more complicated aspects of it if you do not need them, and still have a great system for general organizational tasks and note taking. I think it is worth learning Emacs just so you can use org-mode!

Since I am so familiar with Emacs, and work in it daily, I really wanted to use it to write posts. Ideally, I would like to write my blog entries in org-mode, and be able to use in-line R code. The text and R code should be output to HTML and published to my blog. This HTML generation is possible because org-mode has a fantastic feature that exports to various formats, including HTML. This is beneficial since then all of my headlines, lists, tables, and source code markup in org-mode will be carried over to the HTML automatically.

Using the tools already available for Emacs, creating the HTML turned out to be much easier than I initially thought. The vast majority of the time preparing this was spent learning about those tools, and figuring out how to combine them in the right way to accomplish what I wanted to do.

To summarize, my goals were to:

  1. write blog content in Emacs
  2. use org-mode to manage content
  3. be able to include R commands, output, and graphics easily
  4. automatically syntax highlight the R commands and output
  5. automatically syntax highlight other source code (e.g., elisp)
  6. generate HTML automatically directly from Emacs

R, Sweave, and the ascii package

I really enjoy using R. Since I plan to write about statistics, and especially statistical computing, I imagine many of my posts will contain R code. R comes with an interesting function called Sweave. Sweave allows you to incorporate R commands within a document you are writing wrapped in a special syntax, see some demos. So you might be writing some \LaTeX or HTML and interweaving R code. After you run the Sweave function using the source (e.g., HTML) document as the input, a new file is created that replaces the R code with the results of the commands. This might sound simple, but it leads to a very powerful model for report generation and reproducible research. My goal was to somehow get publishable HTML from a source org-mode file after running Sweave on it. For example, say I want to generate 100 samples from a normal distribution with mean 10, and summarize the results.


> x <- rnorm(100, mean = 10, sd = 5)
> summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -3.077   6.844  10.100   9.819  13.690  22.300  
> sd(x)
[1] 5.356113 

I do not want to have to actually type any of the summary output, of course, but I also don’t have to even copy and paste it. What I can type is the R code to produce the output, and wrap it in Sweave syntax, like this:

 <<>>=
 x <- rnorm(100, mean = 10, sd = 5)
 summary(x)
 sd(x)
 @

When I prepare this post for publishing, Sweave will run the R code and insert the output in place of the Sweave commands. I have wrapped the Sweave syntax in org-mode’s special BEGIN SRC construct, so that when I export to HTML, org-mode will properly syntax highlight the output using the htmlize package, as inspired by this post on using the htmlize package with Erlang. The only elisp I had to write was the following.

;; export an org file into WordPress-ready HTML

(defun run-ascii-sweave-on-buffer ()
  (ess-command "library(ascii)\n")
  (ess-command (concat "Sweave(\"" (buffer-file-name) 
                       "\", RweaveAsciidoc)\n")))

(defun sweave-and-htmlize-blog-entry ()
   "Run Sweave on current file and produce HTML 
ready for pasting to WordPress. Copies text to kill-ring for 
pasting. "
   (interactive)
   (save-excursion
     (run-ascii-sweave-on-buffer)
     (let* ((name (buffer-file-name))
            (txt-filename (concat name ".txt"))
            (txt-buf (find-file txt-filename)))
       (message "Preparing HTML for '%s' ..." name)
       (switch-to-buffer txt-buf)
       (revert-buffer t t t)
       (goto-char (point-min))
       (while (re-search-forward "BEGIN_SRC R" nil t)
         (replace-match "BEGIN_SRC R-transcript" t nil))
       (goto-char (point-min))
       (while (re-search-forward "^----" nil t)
         (replace-match "" nil nil))
       (basic-save-buffer)
       (switch-to-buffer (org-export-region-as-html 
                          (point-min) (point-max) t "*HTML*"))
       (kill-buffer txt-buf)
       (while (re-search-forward "<pre " nil t)
         (replace-match 
          "<pre style=\"background-color:#FFFFE5; font-size:8pt\" " 
          t nil))
       (kill-ring-save (point-min) (point-max))
       (message "Finished Converting to WordPress HTML" ))))


(define-key global-map (kbd "<f5>") 'sweave-and-htmlize-blog-entry) 
         



All the first function is doing is running an input file (an org-mode file) through Sweave with the RweaveAsciidoc driver found in the R ascii package. It basically just puts in the R output as plain text, as opposed to \LaTeX code or HTML.

The rest of the elisp function manipulates the output in some trivial ways. The explanation for the replacement of the four dashes is that the RweaveAsciidoc Sweave driver produces the dashed string both before and after the R output as a visual offset, since asciidoc uses that as a markup indicator. I did not need that, since my output will not be fed through asciidoc, but rather the org-mode HTML exporter, so I simply replace the dashes found at the beginning of a line with the empty string. The last trick I had to do was to replace the R syntax highlighting with R-transcript syntax highlighting, since the results of Sweave are essentially a transcript of the R commands entered, not the actual R code. Finally, the resulting HTML produced by the org exporter has its pre tags modified with custom colors and font size.

What is left to do?

There are a few loose ends that I didn’t have time to clean up yet. I would like to modify my Lisp function to work on regions if mark is set and transient-mark-mode is enabled. As of now, it just works on the whole buffer. The idea would be to have an org-mode file, say blog.org, that would have a top-level headline for each entry. You could then highlight that entry and publish it. I realize this is fairly trivial to implement, but it is not done yet.

I also want to investigate the publishing feature of org-mode, and see if there is any value add to using weblogger mode in Emacs. Using that, I could have a full Emacs solution to posting new entries, without even having to go into WordPress to paste HTML as I have done now. As an aside, I found longlines-mode in Emacs very useful for writing this post in org-mode, so that there are not newlines in random spots when the HTML is produced.

I have not tested how plotting from R works in this system. It would be great to be able to generate in-line graphics in my posts using R commands to construct plots.

Finally, it looks like there is a really interesting project on worg called org-babel that will allow not only weaving of R, but many other languages including Ruby, elisp, and shell scripts, turning org-mode into a platform for literate programming. I also just saw blorgit, a blogging engine in org-mode that makes use of the git version control system.

My function provided above should work if you have a recent version of org-mode, R, ESS, and the ascii package installed on your system. I have bound the function to F5 on my keyboard, so just hitting F5 in my org-mode buffer will create an HTML buffer and copy its contents to the kill-ring for pasting into WordPress.

Conclusion

When I started this process, I assumed I would have to write at least a few elisp functions, and possibly some extensions to org-mode, perhaps even an Sweave driver. After thoroughly examining the available tools, I ended up only having to write, in effect, one lisp function, and that is only a utility function to automate the combination of solutions I found. The moral is that while often times you will have to write your own functions to get the exact behavior you are after, it does pay to really research what is out there already. I found a solution to my problem, org-mode, that allows a much more flexible framework for extensions and has been thoroughly tested. I now get to enjoy the benefits of whatever future enhancements org-mode comes up with, including extensions by other org-mode users. In particular, the org-babel functionality looks very promising to replace or augment some of my work here. So before you write your own packages, research what others have already done. At the very least, you’ll know what value you are adding by doing it your own way, which is the lesson I took away from Tukey’s rule.