R Function of the Day: cut

September 23, 2009

The R Function of the Day series will focus on describing in plain language how certain R functions work, focusing on simple examples that you can apply to gain insight into your own data.

Today, I will discuss the cut function.

What situation is cut useful in?

In many data analysis settings, it might be useful to break up a continuous variable such as age into a categorical variable. Or, you might want to classify a categorical variable like year into a larger bin, such as 1990-2000. There are many reasons not to do this when performing regression analysis, but for simple displays of demographic data in tables, it could make sense. The cut function in R makes this task simple!

How do I use cut?

First, we will simulate some data from a hypothetical clinical trial that includes variables for patient ID, age, and year of enrollment.


> ## generate data for clinical trial example
> clinical.trial <-
    data.frame(patient = 1:100,              
               age = rnorm(100, mean = 60, sd = 8),
               year.enroll = sample(paste("19", 85:99, sep = ""),
                 100, replace = TRUE))
> summary(clinical.trial)
    patient            age         year.enroll
 Min.   :  1.00   Min.   :41.18   1991   :12  
 1st Qu.: 25.75   1st Qu.:52.99   1988   :11  
 Median : 50.50   Median :60.08   1985   : 9  
 Mean   : 50.50   Mean   :59.67   1993   : 7  
 3rd Qu.: 75.25   3rd Qu.:65.67   1995   : 7  
 Max.   :100.00   Max.   :76.40   1997   : 7  
                                  (Other):47   

Now, we will use the cut function to make age a factor, which is what R calls a categorical variable. Our first example calls cut with the breaks argument set to a single number. This method will cause cut to break up age into 4 intervals. The default labels use standard mathematical notation for open and closed intervals.


> ## basic usage of cut with a numeric variable
> c1 <- cut(clinical.trial$age, breaks = 4)
> table(c1)
c1
  (41.1,50]   (50,58.8] (58.8,67.6] (67.6,76.4] 
          9          34          41          16  
> ## year.enroll is a factor, so must convert to numeric first!
> c2 <- cut(as.numeric(as.character(clinical.trial$year.enroll)),
            breaks = 3)
> table(c2)
c2
(1985,1990] (1990,1994] (1994,1999] 
         36          34          30  

Well, the intervals that cut chose by default are not the nicest looking with the age example, although they are fine with the year example, since it was already discrete. Luckily, we can specify the exact intervals we want for age. Our next example shows how.


> ## specify break points explicitly using seq function
> 
> ## look what seq does  
> seq(30, 80, by = 10)
[1] 30 40 50 60 70 80 
> ## cut the age variable using the seq defined above
> c1 <- cut(clinical.trial$age, breaks = seq(30, 80, by = 10))
> ## table of the resulting factor           
> table(c1)
c1
(30,40] (40,50] (50,60] (60,70] (70,80] 
      0       9      40      42       9  

That looks pretty good. There is no reason that the breaks argument has to be equally spaced as I have done above. It could be any grouping that you want.

Finally, I am going to show you an example of a custom R function to categorize ages. It uses cut inside of it, but does some preprocessing and uses the labels argument to cut to make the output look nice.

age.cat <- function(x, lower = 0, upper, by = 10,
                   sep = "-", above.char = "+") {

 labs <- c(paste(seq(lower, upper - by, by = by),
                 seq(lower + by - 1, upper - 1, by = by),
                 sep = sep),
           paste(upper, above.char, sep = ""))

 cut(floor(x), breaks = c(seq(lower, upper, by = by), Inf),
     right = FALSE, labels = labs)
}

This function categorizes age in a fairly flexible way. The first assignment to labs inside the function creates a vector of labels. Then, the cut function is called to do the work, with the custom labels as an argument. Here are some examples using our simulated data from above. I am no longer going to save the results of the function calls to a variable and call table on them, but rather just nest the call to age.cat in a call to table. I previously did a post on the table function.


> ## only specifying an upper bound, uses 0 as lower bound, and
> ## breaks up categories by 10
> table(age.cat(clinical.trial$age, upper = 70))
  0-9 10-19 20-29 30-39 40-49 50-59 60-69   70+ 
    0     0     0     0     9    40    42     9  
> ## now specifying a lower bound
> table(age.cat(clinical.trial$age, lower = 30, upper = 70))
30-39 40-49 50-59 60-69   70+ 
    0     9    40    42     9  
> ## now specifying a lower bound AND the "by" argument 
> table(age.cat(clinical.trial$age, lower = 30, upper = 70, by = 5))
30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69   70+ 
    0     0     3     6    22    18    22    20     9  

Summary of cut

The cut function is useful for turning continuous variables into factors. You saw how to specify the number of cutpoints, specify the exact cutpoints, and saw a function built around cut that simplifies categorizing an age variable and giving it appropriate labels.


Emacs Key Binding for eval-defun in lisp-mode

September 22, 2009

When I use R in Emacs through the ESS package, C-c C-c in a .R buffer will send a “block” of code to the inferior R process for evaluation. This was added just a few years ago, but my fingers are now trained to use that key combination for evaluating any block of code. Since I have been learning Emacs Lisp, I decided that a good idea would be to make C-c C-c a binding to eval-defun. I really like how it is working out as I have to redefine my Lisp functions many times! 🙂

Just put the following in your .emacs file to get this behavior. However, please note the following from the eval-defun help string, “If the current defun is actually a call to `defvar’ or `defcustom’, evaluating it this way resets the variable using its initial value expression even if the variable already has some other value. (Normally `defvar’ and `defcustom’ do not alter the value if there already is one.)”

(define-key lisp-mode-shared-map "\C-c\C-c" 'eval-defun) 

R Function of the Day: rle

September 22, 2009

The R Function of the Day series will focus on describing in plain language how certain R functions work, focusing on simple examples that you can apply to gain insight into your own data.

Today, I will discuss the rle function.

What situation is rle useful in?

The rle function is named for the acronym of “run length encoding”. What does the term “run length” mean? Imagine you flip a coin 10 times and record the outcome as “H” if the coin lands showing heads, and “T” if the coin lands showing tails. You want to know what the longest streak of heads is. You also want to know the longest streak of tails. The run length is the length of consecutive types of a flip. If the outcome of our experiment was “H T T H H H H H T H”, the longest run length of heads would be 5, since there are 5 consecutive heads starting at position 4, and the longest run length for tails would be 2, since there are two consecutive heads starting at position 2. If you just have 10 flips, it is pretty easy to simply eyeball the answer. But if you had 100 flips, or 100,000, it would not be easy at all. However, it is very easy with the rle function in R! That function will encode the entire result into its run lengths. Using the example above, we start with 1 H, then 2 Ts, 5 Hs, 1 T, and finally 1 H. That is exactly what the rle function computes, as you will see below in the example.

How do I use rle?

First, we will simulate the results of a the coin flipping experiment. This is trivial in R using the sample function. We simulate flipping a coin 1000 times.


> ## generate data for coin flipping example 
> coin <- sample(c("H", "T"), 1000, replace = TRUE)
> table(coin) 
coin
  H   T 
501 499  
> head(coin, n = 20)
 [1] "T" "H" "T" "T" "T" "H" "T" "H" "T" "T" "H" "T" "H" "T"
[15] "T" "T" "H" "H" "H" "H" 

We can see the results of the first 20 tosses by using the head (as in “beginning”, nothing to do with coin tosses) function on our coin vector.

So, our question is, what is the longest run of heads, and longest run of tails? First, what does the output of the rle function look like?


> ## use the rle function on our SMALL EXAMPLE above
> ## note results MATCH what I described above... 
> rle(c("H", "T", "T", "H", "H", "H", "H", "H", "T", "H"))
Run Length Encoding
  lengths: int [1:5] 1 2 5 1 1
  values : chr [1:5] "H" "T" "H" "T" "H" 
> ## use the rle function on our SIMULATED data
> coin.rle <- rle(coin)
> ## what is the structure of the returned result? 
> str(coin.rle)
List of 2
 $ lengths: int [1:500] 1 1 3 1 1 1 2 1 1 1 ...
 $ values : chr [1:500] "T" "H" "T" "H" ...
 - attr(*, "class")= chr "rle" 
> ## sort the data, this shows the longest run of
> ## ANY type (heads OR tails)
> sort(coin.rle$lengths, decreasing = TRUE)
  [1] 9 8 7 7 7 7 7 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5
 [28] 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [55] 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [82] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[109] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2
[136] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[163] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[190] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[217] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[244] 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[271] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[298] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[325] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[352] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[379] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[406] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[433] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[460] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[487] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
> ## use the tapply function to break up
> ## into 2 groups, and then find the maximum
> ## within each group
> 
> tapply(coin.rle$lengths, coin.rle$values, max)
H T 
9 8  

So in this case the longest run of heads is 9 and the longest run of tails is 8. The tapply function was discussed in a previous R Function of the Day article.

Summary of rle

The rle function performs run length encoding. Although it is not used terribly often when programming in R, there are certain situations, such as time series and longitudinal data analysis, where knowing how it works can save a lot of time and give you insight into your data.


R Function of the Day: table

September 21, 2009

The R Function of the Day series will focus on describing in plain language how certain R functions work, focusing on simple examples that you can apply to gain insight into your own data.

Today, I will discuss the table function.

What situation is table useful in?

The table function is a very basic, but essential, function to master while performing interactive data analyses. It simply creates tabular results of categorical variables. However, when combined with the powers of logical expressions in R, you can gain even more insights into your data, including identifying potential problems.

Example 1

We want to know how many subjects are enrolled at each center in a clinical trial.

Example 2

We want to know how many subjects are under the age of 60 in a clinical trial.

Example 3

Which center has the most subjects with a missing value for age in the clinical trial?

How do I use table?

The table function simply needs an object that can be interpreted as a categorical variable (called a “factor” in R).


> ## generate data for medical example 
> clinical.trial <-
    data.frame(patient = 1:100,
               age = rnorm(100, mean = 60, sd = 6),
               treatment = gl(2, 50,
                 labels = c("Treatment", "Control")),
               center = sample(paste("Center", LETTERS[1:5]), 100, replace = TRUE)) 
> ## set some ages to NA (missing) 
> is.na(clinical.trial$age) <- sample(1:100, 20)
> summary(clinical.trial)
    patient            age            treatment       center  
 Min.   :  1.00   Min.   :46.61   Treatment:50   Center A:22  
 1st Qu.: 25.75   1st Qu.:56.19   Control  :50   Center B:10  
 Median : 50.50   Median :60.59                  Center C:28  
 Mean   : 50.50   Mean   :60.57                  Center D:23  
 3rd Qu.: 75.25   3rd Qu.:64.84                  Center E:17  
 Max.   :100.00   Max.   :77.83                               
                  NA's   :20.00                                

Example 1 is the most trivial. We want to know how many subjects are enrolled at each center, so simply pass in the variable “center” to the table function.


> ## a simple example of a table call 
> table(clinical.trial$center)
Center A Center B Center C Center D Center E 
      22       10       28       23       17  

For example 2, we need to create a logical vector indicating whether or not a patient is under 60 or not. We can then pass that into the table function. Also, since there are missing ages, we might be interested in seeing those in the table also. It is shown both ways by setting the “useNA” argument to table.


> ## a logical vector is created and passed into table
> table(clinical.trial$age < 60)
FALSE  TRUE 
   41    39  
> ## the useNA argument shows the missing values, too
> table(clinical.trial$age < 60, useNA = "always")
FALSE  TRUE  <NA> 
   41    39    20  

Example 3, finding the center that has the most missing values for age, sounds the trickiest, but is once again an extremely simple task with the table function. You just need to know that the is.na function returns a logical vector that indicates whether an observation is missing or not.


> ## the table of missing age by center 
> table(clinical.trial$center, is.na(clinical.trial$age))
           FALSE TRUE
  Center A    16    6
  Center B     8    2
  Center C    23    5
  Center D    20    3
  Center E    13    4 
> ## centers with most missing ages listed in order 
> ## highest to lowest
> sort(table(clinical.trial$center, is.na(clinical.trial$age))[, 2],      
       decreasing = TRUE)
Center A Center C Center E Center D Center B 
       6        5        4        3        2  

Summary of table

Although table is an extremely simple function, its use should not be avoided when exploring a dataset. These examples have shown you how to use table on variables in a dataset, and on variables created from a logical expression in R. The “useNA” argument was also introduced.


R Function of the Day: tapply

September 20, 2009

The R Function of the Day series will focus on describing in plain language how certain R functions work, focusing on simple examples that you can apply to gain insight into your own data.

Today, I will discuss the tapply function.

What situation is tapply useful in?

In statistics, one of the most basic activities we do is computing summaries of variables. These summaries might be as simple as an average, or more complex. Let’s look at some simple examples.

When you read the results of a medical trial, you will see things such as “The average age of subjects in this trial was 55 years in the treatment group, and 54 years in the control group.”

As another example, let’s look at one from the world of baseball.

Batting Leaders per Team

Team Player Batting Average
Minnesota Twins Joe Mauer .374
Seattle Mariners Ichiro Suzuki .355
Boston Red Sox Kevin Youkilis .309

These two examples have a lot in common, even if they don’t appear to when first reading. In the first example, we have a dataset from a medical trial. We want to break up the dataset into two groups, treatment and control, and then compute the sample average for age within each group.

In the second example, we want to break up the dataset into 30 groups, one for each MLB team, and then compute the maximum batting average within each group.

So what is in common?

In each case we have

  1. A dataset that can be broken up into groups
  2. We want to break it up into groups
  3. Within each group, we want to apply a function

The following table summarizes the situation.

Example Group Variable Summary Variable Function
Medical Example Treatment age mean
Baseball Example Team batting average max

The tapply function can solve both of these problems for us!

How do I use tapply?

The tapply function is simple to use. First, we will generate some data.


> ## generate data for medical example
> medical.example <-
    data.frame(patient = 1:100,
               age = rnorm(100, mean = 60, sd = 12),
               treatment = gl(2, 50,
                 labels = c("Treatment", "Control")))
> summary(medical.example)
    patient            age             treatment 
 Min.   :  1.00   Min.   : 29.40   Treatment:50  
 1st Qu.: 25.75   1st Qu.: 54.31   Control  :50  
 Median : 50.50   Median : 61.24                 
 Mean   : 50.50   Mean   : 61.29                 
 3rd Qu.: 75.25   3rd Qu.: 66.22                 
 Max.   :100.00   Max.   :102.47                  
> ## generate data for baseball example
> ## 5 teams with 5 players per team
> 
> baseball.example <-
    data.frame(team = gl(5, 5,
                 labels = paste("Team", LETTERS[1:5])),
               player = sample(letters, 25),
               batting.average = runif(25, .200, .400))
> summary(baseball.example)
     team       player   batting.average 
 Team A:5   a      : 1   Min.   :0.2172  
 Team B:5   c      : 1   1st Qu.:0.2553  
 Team C:5   d      : 1   Median :0.2854  
 Team D:5   e      : 1   Mean   :0.2887  
 Team E:5   f      : 1   3rd Qu.:0.3013  
            g      : 1   Max.   :0.3859  
            (Other):19                    

Now we have some sample data. Using tapply is now straightforward. In general, the call to the function will look like the example in the first comment. Then, actual calls to the function using the data we defined above are shown.


> ## Generic Example
> ## tapply(Summary Variable, Group Variable, Function)
> 
> ## Medical Example
> tapply(medical.example$age, medical.example$treatment, mean)
Treatment   Control 
 62.26883  60.30371  
> ## Baseball Example
> tapply(baseball.example$batting.average, baseball.example$team,
         max)
   Team A    Team B    Team C    Team D    Team E 
0.3784396 0.3012680 0.3488655 0.2962828 0.3858841  


Summary of tapply

The tapply function is useful when we need to break up a vector into groups defined by some classifying factor, compute a function on the subsets, and return the results in a convenient form. You can even specify multiple factors as the grouping variable, for example treatment and sex, or team and handedness.


Welcome to Blogistic Reflections! (A blog created entirely in Emacs org-mode)

September 20, 2009

John Tukey’s preface to Exploratory Data Analysis begins with a useful rule, “It is important to understand what you can do before you learn to measure how well you seem to have done it.” When I decided I wanted to start a blog concentrating on statistics, R, and Emacs, I thought I had better learn first what I can do to make the process of generating content easier. Here, as a meta first post, I present how I used Emacs, R, and other technologies to produce the output you are reading here.

Emacs, ESS, and org-mode

I use Emacs for everything I can. I started learning it as a graduate student in Statistics in order to use the ESS package. ESS lets you run R within Emacs, which is really nice for developing programs and performing data analysis interactively. I have discovered many Emacs packages through the years, but lately I have been learning the excellent org-mode package. Org-mode is designed so that you can get started with very minimal knowledge of its capabilities. You can ignore the more complicated aspects of it if you do not need them, and still have a great system for general organizational tasks and note taking. I think it is worth learning Emacs just so you can use org-mode!

Since I am so familiar with Emacs, and work in it daily, I really wanted to use it to write posts. Ideally, I would like to write my blog entries in org-mode, and be able to use in-line R code. The text and R code should be output to HTML and published to my blog. This HTML generation is possible because org-mode has a fantastic feature that exports to various formats, including HTML. This is beneficial since then all of my headlines, lists, tables, and source code markup in org-mode will be carried over to the HTML automatically.

Using the tools already available for Emacs, creating the HTML turned out to be much easier than I initially thought. The vast majority of the time preparing this was spent learning about those tools, and figuring out how to combine them in the right way to accomplish what I wanted to do.

To summarize, my goals were to:

  1. write blog content in Emacs
  2. use org-mode to manage content
  3. be able to include R commands, output, and graphics easily
  4. automatically syntax highlight the R commands and output
  5. automatically syntax highlight other source code (e.g., elisp)
  6. generate HTML automatically directly from Emacs

R, Sweave, and the ascii package

I really enjoy using R. Since I plan to write about statistics, and especially statistical computing, I imagine many of my posts will contain R code. R comes with an interesting function called Sweave. Sweave allows you to incorporate R commands within a document you are writing wrapped in a special syntax, see some demos. So you might be writing some \LaTeX or HTML and interweaving R code. After you run the Sweave function using the source (e.g., HTML) document as the input, a new file is created that replaces the R code with the results of the commands. This might sound simple, but it leads to a very powerful model for report generation and reproducible research. My goal was to somehow get publishable HTML from a source org-mode file after running Sweave on it. For example, say I want to generate 100 samples from a normal distribution with mean 10, and summarize the results.


> x <- rnorm(100, mean = 10, sd = 5)
> summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -3.077   6.844  10.100   9.819  13.690  22.300  
> sd(x)
[1] 5.356113 

I do not want to have to actually type any of the summary output, of course, but I also don’t have to even copy and paste it. What I can type is the R code to produce the output, and wrap it in Sweave syntax, like this:

 <<>>=
 x <- rnorm(100, mean = 10, sd = 5)
 summary(x)
 sd(x)
 @

When I prepare this post for publishing, Sweave will run the R code and insert the output in place of the Sweave commands. I have wrapped the Sweave syntax in org-mode’s special BEGIN SRC construct, so that when I export to HTML, org-mode will properly syntax highlight the output using the htmlize package, as inspired by this post on using the htmlize package with Erlang. The only elisp I had to write was the following.

;; export an org file into WordPress-ready HTML

(defun run-ascii-sweave-on-buffer ()
  (ess-command "library(ascii)\n")
  (ess-command (concat "Sweave(\"" (buffer-file-name) 
                       "\", RweaveAsciidoc)\n")))

(defun sweave-and-htmlize-blog-entry ()
   "Run Sweave on current file and produce HTML 
ready for pasting to WordPress. Copies text to kill-ring for 
pasting. "
   (interactive)
   (save-excursion
     (run-ascii-sweave-on-buffer)
     (let* ((name (buffer-file-name))
            (txt-filename (concat name ".txt"))
            (txt-buf (find-file txt-filename)))
       (message "Preparing HTML for '%s' ..." name)
       (switch-to-buffer txt-buf)
       (revert-buffer t t t)
       (goto-char (point-min))
       (while (re-search-forward "BEGIN_SRC R" nil t)
         (replace-match "BEGIN_SRC R-transcript" t nil))
       (goto-char (point-min))
       (while (re-search-forward "^----" nil t)
         (replace-match "" nil nil))
       (basic-save-buffer)
       (switch-to-buffer (org-export-region-as-html 
                          (point-min) (point-max) t "*HTML*"))
       (kill-buffer txt-buf)
       (while (re-search-forward "<pre " nil t)
         (replace-match 
          "<pre style=\"background-color:#FFFFE5; font-size:8pt\" " 
          t nil))
       (kill-ring-save (point-min) (point-max))
       (message "Finished Converting to WordPress HTML" ))))


(define-key global-map (kbd "<f5>") 'sweave-and-htmlize-blog-entry) 
         



All the first function is doing is running an input file (an org-mode file) through Sweave with the RweaveAsciidoc driver found in the R ascii package. It basically just puts in the R output as plain text, as opposed to \LaTeX code or HTML.

The rest of the elisp function manipulates the output in some trivial ways. The explanation for the replacement of the four dashes is that the RweaveAsciidoc Sweave driver produces the dashed string both before and after the R output as a visual offset, since asciidoc uses that as a markup indicator. I did not need that, since my output will not be fed through asciidoc, but rather the org-mode HTML exporter, so I simply replace the dashes found at the beginning of a line with the empty string. The last trick I had to do was to replace the R syntax highlighting with R-transcript syntax highlighting, since the results of Sweave are essentially a transcript of the R commands entered, not the actual R code. Finally, the resulting HTML produced by the org exporter has its pre tags modified with custom colors and font size.

What is left to do?

There are a few loose ends that I didn’t have time to clean up yet. I would like to modify my Lisp function to work on regions if mark is set and transient-mark-mode is enabled. As of now, it just works on the whole buffer. The idea would be to have an org-mode file, say blog.org, that would have a top-level headline for each entry. You could then highlight that entry and publish it. I realize this is fairly trivial to implement, but it is not done yet.

I also want to investigate the publishing feature of org-mode, and see if there is any value add to using weblogger mode in Emacs. Using that, I could have a full Emacs solution to posting new entries, without even having to go into WordPress to paste HTML as I have done now. As an aside, I found longlines-mode in Emacs very useful for writing this post in org-mode, so that there are not newlines in random spots when the HTML is produced.

I have not tested how plotting from R works in this system. It would be great to be able to generate in-line graphics in my posts using R commands to construct plots.

Finally, it looks like there is a really interesting project on worg called org-babel that will allow not only weaving of R, but many other languages including Ruby, elisp, and shell scripts, turning org-mode into a platform for literate programming. I also just saw blorgit, a blogging engine in org-mode that makes use of the git version control system.

My function provided above should work if you have a recent version of org-mode, R, ESS, and the ascii package installed on your system. I have bound the function to F5 on my keyboard, so just hitting F5 in my org-mode buffer will create an HTML buffer and copy its contents to the kill-ring for pasting into WordPress.

Conclusion

When I started this process, I assumed I would have to write at least a few elisp functions, and possibly some extensions to org-mode, perhaps even an Sweave driver. After thoroughly examining the available tools, I ended up only having to write, in effect, one lisp function, and that is only a utility function to automate the combination of solutions I found. The moral is that while often times you will have to write your own functions to get the exact behavior you are after, it does pay to really research what is out there already. I found a solution to my problem, org-mode, that allows a much more flexible framework for extensions and has been thoroughly tested. I now get to enjoy the benefits of whatever future enhancements org-mode comes up with, including extensions by other org-mode users. In particular, the org-babel functionality looks very promising to replace or augment some of my work here. So before you write your own packages, research what others have already done. At the very least, you’ll know what value you are adding by doing it your own way, which is the lesson I took away from Tukey’s rule.