Making Emacs Buffer Names Unique Using the Uniquify Package

October 5, 2009

If you ever work in Emacs with different files that all have the same name, this tip may be very useful for you. In my experience, this most often happens when working with Makefiles from multiple projects or directories. Assuming you have a buffer visiting a file called Makefile, by default Emacs will call the second buffer Makefile<2>. This is not very useful for figuring out which one is which.

The uniquify package

Enter the uniquify package. It offers several options for making buffer names unique. In my .emacs, I have the following two lines of code.

(require 'uniquify)
(setq uniquify-buffer-name-style 'forward)

You can set the value to any of the following options, nil being the default. These examples are taken from the help file within Emacs. They use the information from the directories the files are located in to make the names unique. The first buffer called “name” is visiting a file in the directory bar/mumble/, and the second buffer is visiting a file called “name” in directory quux/mumble.

value first buffer called “name” second buffer called “name”
forward bar/mumble/name quux/mumble/name
reverse name\mumble\bar name\mumble\quux
post-forward name|bar/mumble name|quux/mumble
post-forward-angle-brackets name<bar/mumble> name<quux/mumble>
nil name name<2>
Advertisements

Using R to Analyze Baseball Games in “Real Time”

October 4, 2009

In order to honor the last day of the 2009 MLB regular season (excepting the Twins/Tigers tiebreaker Tuesday night), I was reading a book that combines a few of my favorite thing: statistics, R, and baseball. The book, by Joseph Adler, is called Baseball Hacks, and I highly recommend it if you are interested in analyzing baseball data. Joseph uses Excel for some tips, R for others, and shows you how to download historical and current baseball data for further analysis. One tip that the book offered was a way to download “real time” baseball data from MLB’s site in XML format. I decided to try to write some R functions to retrieve, summarize, and analyze what was available.

Where are the data?

Joseph shows how, at least at the time of the writing of his book and this post, you can go here to download a wealth of XML data from past and current seasons. If you drill down far enough into the directories, you can find a file called miniscoreboard.xml, which is the one I use for this analysis.

The R functions

Here are the R functions I wrote. You can copy and paste them into your R session so that they are available to you. The next section will describe how to use them. Writing these was fairly straightforward, and simply a matter of XML manipulation. I admit that there may be far better ways to do this manipulation using the XML package, but this worked for now.

################################################################################
#   Program Name:     xml-mlb-gameday.R
#   Author:           Erik
#   Created:          10/04/2009
#
#   Last saved
#    Time-stamp:      <2009-10-04 17:23:02 erik>
#
#   Purpose:          show current scoreboard in R 
#
#   ** Generated by auto-insert on 10/04/2009 at 13:25:58**
################################################################################

## need XML package, may need to install w/ install.packages()
library(XML)

## create a boxscore object from an XML description of a game 
createBoxScore <- function(x) {
  status <- if(x$.attrs["status"] != "In Progress")
    "Final" else if(x$.attrs["top_inning"] == "Y")
      "Top" else "Bot"
  
  bs <- list(status = status, 
             inning = as.numeric(x$.attrs["inning"]),
             away.team = x$.attrs["away_name_abbrev"],
             away.runs = as.numeric(x$.attrs["away_team_runs"]),
             away.hits = as.numeric(x$.attrs["away_team_hits"]), 
             away.errors = as.numeric(x$.attrs["away_team_errors"]),
             home.team = x$.attrs["home_name_abbrev"],
             home.runs = as.numeric(x$.attrs["home_team_runs"]), 
             home.hits = as.numeric(x$.attrs["home_team_hits"]), 
             home.errors = as.numeric(x$.attrs["home_team_errors"]))
  class(bs) <- "boxscore"
  bs
}

## print the boxscore object in traditional format
print.boxscore <- function(x, ...) {
  cat("     ", "R   ", "H  ", "E (",
      x$status, " ",
      x$inning, ")\n",
      format(x$away.team, width = 3), " ", 
      format(x$away.runs, width = 2), "  ", 
      format(x$away.hits, width = 2), "  ", 
      x$away.errors, "\n",
      format(x$home.team, width = 3), " ", 
      format(x$home.runs, width = 2), "  ", 
      format(x$home.hits, width = 2), "  ", 
      x$home.errors, "\n\n", sep = "")
}

## utility function ... 
as.data.frame.boxscore <- function(x, row.names, optional, ...) {
  class(x) <- "list"
  as.data.frame(x)
}

## This is the "user accessible" public function you should be calling!
## downloads the XML data, and prints out boxscores for games on "date"
boxscore <- function(date = Sys.Date()) {
  if(date > Sys.Date())
    stop("Cannot retrieve scores from the future.")
         
  year  <- paste("year_", format(date, "%Y"), "/", sep = "")
  month <- paste("month_", format(date, "%m"), "/", sep = "")
  day   <- paste("day_", format(date, "%d"), "/", sep = "")
         
  xmlFile <-
    paste("http://gd2.mlb.com/components/game/mlb/",
          year, month, day, "miniscoreboard.xml", sep = "")
  xmlTree <- xmlTreeParse(xmlFile, useInternalNodes = TRUE)
  xp <- xpathApply(xmlTree, "//game")
  xmlList <- lapply(xp, xmlToList)

  bs.list <- lapply(xmlList, createBoxScore)
  names(bs.list) <-
    paste(sapply(bs.list, "[[", "away.team"),
                 "@",
          sapply(bs.list, "[[", "home.team"))
  bs.list
}



















Examples of summarizing real-time baseball data

Here is how to run some simple analyses on baseball games happening right now. This is the real value add for the idea of downloading data through R. Obviously you could just go to your favorite sports site to find scores if you wanted to know how your team was doing, but pulling the data into R lets you further analyze the data, and even combine it with other data sources (e.g., weather).



> ## print boxscores for games happening NOW!
> boxscore()
$`CWS @ DET`
     R   H  E (Final 9)
CWS  3   7  0
DET  5  12  0


$`HOU @ NYM`
     R   H  E (Final 9)
HOU  0   4  1
NYM  4   9  0


$`PIT @ CIN`
     R   H  E (Final 9)
PIT  0  10  0
CIN  6  11  0


$`WSH @ ATL`
     R   H  E (Final 15)
WSH  2  13  0
ATL  1  13  0


$`CLE @ BOS`
     R   H  E (Final 9)
CLE  7   8  0
BOS 12  11  0


$`FLA @ PHI`
     R   H  E (Final 10)
FLA  6  11  1
PHI  7  12  0


$`TOR @ BAL`
     R   H  E (Final 11)
TOR  4   9  2
BAL  5   8  0


$`NYY @ TB`
     R   H  E (Final 9)
NYY 10  12  0
TB   2   7  2


$`KC @ MIN`
     R   H  E (Final 9)
KC   4  12  0
MIN 13  11  0


$`MIL @ STL`
     R   H  E (Final 10)
MIL  9  15  2
STL  7   7  0


$`ARI @ CHC`
     R   H  E (Final 9)
ARI  5   8  0
CHC  2   6  0


$`LAA @ OAK`
     R   H  E (Final 9)
LAA  5   9  1
OAK  3  12  1


$`SF @ SD`
     R   H  E (Bot 9)
SF   3  11  1
SD   3   4  0


$`COL @ LAD`
     R   H  E (Top 8 )
COL  1   4  1
LAD  5  12  0


$`TEX @ SEA`
     R   H  E (Final 9)
TEX  3   4  0
SEA  4   8  1 
> ## print boxscores for a different day's games
> boxscore(date = as.Date("2009-10-01"))
$`STL @ CIN`
     R   H  E (Final 9)
STL 13  15  1
CIN  0   5  0


$`MIN @ DET`
     R   H  E (Final 9)
MIN  8  13  4
DET  3   7  1


$`MIL @ COL`
     R   H  E (Final 9)
MIL  2   6  0
COL  9  14  1


$`ARI @ SF`
     R   H  E (Final 9)
ARI  3   6  1
SF   7  11  0


$`TEX @ LAA`
     R   H  E (Final 9)
TEX 11  15  1
LAA  3   7  2


$`WSH @ ATL`
     R   H  E (Final 9)
WSH  2   7  0
ATL  1   6  0


$`HOU @ PHI`
     R   H  E (Final 9)
HOU  5  10  0
PHI  3  13  1


$`BAL @ TB`
     R   H  E (Final 9)
BAL  3   7  0
TB   2   5  1


$`CLE @ BOS`
     R   H  E (Final 9)
CLE  0   3  0
BOS  3  12  0


$`PIT @ CHC`
     R   H  E (Final NA)
PIT NA  NA  NA
CHC NA  NA  NA


$`OAK @ SEA`
     R   H  E (Final 9)
OAK  2   7  1
SEA  4   8  0 
> ## save the boxscores for futher analysis
> bs <- boxscore()
> ## convert to a more useful form, a data.frame
> ## with one game per row 
> bs.df <- do.call(rbind, lapply(bs, as.data.frame))
> ## status of today's games
> table(bs.df$status)
Final   Bot   Top 
   13     1     1  
> ## how many innings have been played today? 
> sum(bs.df$inning, na.rm = TRUE)
[1] 144 
> ## how many runs have been scored by the home teams today?
> sum(bs.df$home.runs, na.rm = TRUE)
[1] 79 
> ## how many runs have been scored by the away teams today?
> sum(bs.df$away.runs, na.rm = TRUE)
[1] 62 

Conclusion

These functions are far from robust, and I think they only work for the current year (i.e., 2009, dates from 2008 were not working right). The format looks like it has changed over time, which is not surprising. I only use a very small subset of the available data, even the miniscoreboard.xml file contains far more information than I summarize here. This is really the first time I have dealt with XML data, so I am sure there is a lot more that can be done, but for a one-day project, I think the results are pretty interesting. I will definitely provide the updates I make to these functions, and may even start a baseball R package if they grow extensive enough. I suppose this is a project I can work on in the off season!


Using doc-view with auto-revert to view LaTeX PDF output in Emacs

October 3, 2009

When I am authoring a \LaTeX document in Emacs, such as a report or my CV, it is useful for me to compile the \LaTeX source file periodically to see what the resulting file PDF looks like. I used to run a separate PDF viewer to look at the output, but I now have a complete Emacs solution.

The editing environment

When writing a \LaTeX document, I usually want the output to be a PDF file. Accordingly, I put the following in my .emacs file.

(setq TeX-PDF-mode t)

I then split my Emacs frame into two buffers vertically, using C-x 3 (see screencast below). After compiling my \LaTeX file with C-c C-c, I visit the resulting PDF file in the other Emacs window. The Emacs doc-view package will display the PDF file.

Including auto-revert functionality

The final piece to the puzzle is to set files visited in doc-view-mode to auto-revert when changed on disk. That way, then I update my \LaTeX file and recompile with C-c C-c, the PDF in the other window will automatically update.

This is achieved by placing the following line in my .emacs.

(add-hook 'doc-view-mode-hook 'auto-revert-mode)

Screencast Example

Here is a screencast of this process in action.

This is a simple setup that I use to author reports, edit them, and see immediate updates to my output file without leaving Emacs.


R Object Tooltips in ESS

October 1, 2009

Whether at work or for personal projects, I use ESS a lot to perform interactive data analyses. The ability to write, edit, and submit R commands to an interactive R process is simply something I cannot imagine analyzing data without.

An example

One thing that I end up having to do a lot is inspect an object that I have just assigned to a variable in R. To fix ideas, let us create a data.frame called df for this example.


> df <- data.frame(patient = 1:100,
                   age = rnorm(100, mean = 10, sd = 6),
                   sex = sample(gl(2, 50, labels =
                     c("Male", "Female"))))


I just created the data.frame df, and I want to know if I did it correctly. For instance, does it look like I expect it to? Does it have 100 observations like I want? Do the variables have the right names? Is the sex variable a factor with two levels? In short, I want to call the str function using the object df as an argument.

Here is the output I am interested in seeing:


> str(df)
'data.frame':   100 obs. of  3 variables:
 $ patient: int  1 2 3 4 5 6 7 8 9 10 ...
 $ age    : num  11.06 7.73 17.61 3.11 6.76 ...
 $ sex    : Factor w/ 2 levels "Male","Female": 2 1 1 2 2 1 1 2 2 2 ... 

Inspecting objects the old way

So, how can I quickly see the structure as shown above? One idea is to switch over to my interactive R buffer in Emacs, type the command at the prompt, and then switch back to my code buffer to edit the data.frame command or continue programming. I dislike having to switch back and forth between the buffers for a one-off command though.

Alternatively, I could type str(df) in my code buffer, evaluate it, and decide to keep it or delete the line. Since this is more of a quick check, without permanent results, I usually will not want to keep lines like this around, since they clutter up my program. Typically, I am writing the program to be later run in BATCH mode, so I also do not want functions like that in my code since some can be excessively time-consuming depending on the size of the data.frame.

Another option is to use the ESS function ess-execute-in-tb, by default this is bound to C-c C-t, which will prompt me for an expression to evaluate. This is nice because I do not have to clutter my buffer with extraneous function calls. However, after using this method for a while, I noticed that I had many patterns with my objects. For data.frames, I would almost always use summary or str on them after assignment. For factors, I would want to table the values after I created them, to be sure they looked right. For numeric vectors, I would want to summarize them. I also wanted to summarize model fits (e.g., lm). I wanted to take advantage of my usage patterns so that I did not have to type so much after assigning an object to a variable.

Inspecting objects the new way

I therefore wrote an Emacs Lisp function that, when called via a key chord in Emacs, inspects the object at point, determines the class of that object, and based on the class, calls an R function on that object, showing the results in a tooltip. For the df example above, I would just put point on “df”, anywhere in the source code, and type C-c C-g (my default binding). A tooltip is then shown with the output of str(df).

An example similar to this, along with several others are shown in this screencast. I think this is the best way to show how my Lisp function interacts with R to show object information in tooltips.

Pretty nice! One thing to note is that the tooltips are displaying in a proportional font, not a monospace one. I know at some point I had found a customizable variable to specify which font tooltips display in, but I apparently did not save it. If I find that variable, I will update this post to reflect how to do that.

The Emacs Lisp function and keybinding

Here is the code you will need for this behavior. It depends on having tooltip-show-at-point defined, which is found only in ESS 5.4 (the current version as of this post) or later. I contributed tooltip-show-at-point to the ESS project a few months ago. It is used to show argument tooltips when you type an opening parenthesis. Perhaps my object tooltip function will find its way into a future version of ESS. Here is the code.

;; ess-R-object-tooltip.el
;; 
;; I have defined a function, ess-R-object-tooltip, that when 
;; invoked, will return a tooltip with some information about
;; the object at point.  The information returned is 
;; determined by which R function is called.  This is controlled
;; by an alist, called ess-R-object-tooltip-alist.  The default is
;; given below.  The keys are the classes of R object that will
;; use the associated function.  For example, when the function
;; is called while point is on a factor object, a table of that
;; factor will be shown in the tooltip.  The objects must of course
;; exist in the associated inferior R process for this to work.
;; The special key "other" in the alist defines which function
;; to call when the class is not mached in the alist.  By default,
;; the str function is called, which is actually a fairly useful
;; default for data.frame and function objects. 
;; 
;; The last line of this file shows my default keybinding. 
;; I simply save this file in a directory in my load-path
;; and then place (require 'ess-R-object-tooltip) in my .emacs 

;; the alist
(setq ess-R-object-tooltip-alist
      '((numeric    . "summary")
        (factor     . "table")
        (integer    . "summary")
        (lm         . "summary")
        (other      . "str")))


(defun ess-R-object-tooltip ()
  "Get info for object at point, and display it in a tooltip."
  (interactive)
  (let ((objname (current-word))
        (curbuf (current-buffer))
        (tmpbuf (get-buffer-create "**ess-R-object-tooltip**")))
    (if objname
        (progn
          (ess-command (concat "class(" objname ")\n")  tmpbuf )   
          (set-buffer tmpbuf)
          (let ((bs (buffer-string)))
            (if (not(string-match "\(object .* not found\)\|unexpected" bs))
                (let* ((objcls (buffer-substring 
                                (+ 2 (string-match "\".*\"" bs)) 
                                (- (point-max) 2)))
                       (myfun (cdr(assoc-string objcls 
                                                ess-R-object-tooltip-alist))))
                  (progn
                    (if (eq myfun nil)
                        (setq myfun 
                              (cdr(assoc-string "other" 
                                                ess-R-object-tooltip-alist))))
                    (ess-command (concat myfun "(" objname ")\n") tmpbuf)
                    (let ((bs (buffer-string)))
                      (progn
                        (set-buffer curbuf)
                        (tooltip-show-at-point bs 0 30)))))))))
    (kill-buffer tmpbuf)))

;; my default key map
(define-key ess-mode-map "\C-c\C-g" 'ess-R-object-tooltip)

(provide 'ess-R-object-tooltip)

Notice that you can add your own object classes and functions fairly easily at the top of the program. There is a special “other” class which will be called for classes not defined otherwise.

Further meta-data features in ESS?

If you can think if anymore examples for types of objects that this would be useful for, feel free to post them in the comments. I think this is a very useful feature when interactively examining datasets, fitting models, and analyzing data. In general, I think there are many more interesting ways to have meta-data on objects available quickly within the ESS and R system. I will be sure to share them as I explore ways to more efficiently do statistical analysis within the R environment.