Skip to content

Documenting packages

hadley edited this page Jan 21, 2013 · 4 revisions

Package level documentation

Package level documentation exists to help the user understand how to use the package as a whole. Function level documentation is useful once you know exactly what you want to do, and you've discovered the appropriate function: once you've done all that, function-level documentation helps you use the function appropriately. Package-level documentation gets you to point where you can do that: it explains what the package does as a whole, breaks the functions down into useful categories, and shows you how to combine the functions (and maybe functions from other packages) to solve problems.

There are XXX components to package level documentation. None of them are compulsory, but you more you make, the easier it will be for others to use your package.

  • README, NEWS, CITATION are specially formatted plain-text documentation that provide a brief overview of your package, what's changed recently, and how to cite your package.

  • Vignettes provide long form pdf documentation

  • Demos are pure R files that provide case studies linking together multiple components of the package

  • It's also a good idea to include "function" documentation for your package - that's described in the following chapter.

README

The README file lives in the package directory. It should be fairly short (3-4 paragraphs) and answer the following questions:

  • Why should someone use your package?
  • How does it compare to other existing solutions?
  • What are the main functions?

If you're using github, this will appear on the package home page. I also recommend using it when you announce a new version of your package.

Some examples from our packages follow. Note that most of these use markdown, http://daringfireball.net/projects/markdown/, a plain text formatting language to add headings, basic text formatting and bullets. A brief introduction to markdown is included in the appendix. If you use markdown, you should call you readme file README.md.

plyr

plyr is a set of tools for a common set of problems: you need to __split__
up a big data structure into homogeneous pieces, __apply__ a function to
each piece and then __combine__ all the results back together. For
example, you might want to:

  * fit the same model each patient subsets of a data frame
  * quickly calculate summary statistics for each group
  * perform group-wise transformations like scaling or standardising

It's already possible to do this with base R functions (like split and the
apply family of functions), but plyr makes it all a bit easier with:

  * totally consistent names, arguments and outputs
  * convenient parallelisation through the foreach package
  * input from and output to data.frames, matrices and lists
  * progress bars to keep track of long running operations
  * built-in error recovery, and informative error messages
  * labels that are maintained across all transformations

Considerable effort has been put into making plyr fast and memory
efficient, and in many cases plyr is as fast as, or faster than, the
built-in equivalents.

A detailed introduction to plyr has been published in JSS: "The
Split-Apply-Combine Strategy for Data Analysis",
http://www.jstatsoft.org/v40/i01/. You can find out more at
http://had.co.nz/plyr/, or track development at
http://github.com/hadley/plyr. You can ask questions about plyr (and data
manipulation in general) on the plyr mailing list. Sign up at
http://groups.google.com/group/manipulatr.

stringr

Strings are not glamorous, high-profile components of R, but they do play
a big role in many data cleaning and preparations tasks. R provides a
solid set of string operations, but because they have grown organically
over time, they can be inconsistent and a little hard to learn.
Additionally, they lag behind the string operations in other programming
languages, so that some things that are easy to do in languages like Ruby
or Python are rather hard to do in R. The `stringr` package aims to remedy
these problems by providing a clean, modern interface to common string
operations.

More concretely, `stringr`:

 * Processes factors and characters in the same way.

 * Gives functions consistent names and arguments.

 * Simplifies string operations by eliminating options that you don't need
   95% of the time.

 * Produces outputs than can easily be used as inputs. This includes
   ensuring that missing inputs result in missing outputs, and zero length
   inputs result in zero length outputs.

 * Completes R's string handling functions with useful functions from
   other programming languages.

NEWS

The NEWS file should list all changes that have occurred since the last release of the package.

The following sample shows the NEWS file from the stringr package.

stringr 0.5
===========

* new `str_wrap` function which gives `strwrap` output in a more
  convenient format

* new `word` function extract words from a string given user defined
  separator (thanks to suggestion by David Cooper)

* `str_locate` now returns consistent type when matching empty string
  (thanks to Stavros Macrakis)

* new `str_count` counts number of matches in a string.

* `str_pad` and `str_trim` receive performance tweaks - for large vectors
  this should give at least a two order of magnitude speed up

* str_length returns NA for invalid multibyte strings

* fix small bug in internal `recyclable` function

NEWS has a special format, but it's not well documented. The basics are:

  • The information for each version should start with the name of the package and its version number, followed by a line of =s.

  • Each change should be listed with a bullet. If a bullet continues over multiple lines, the second and subsequent lines need to be indented by at least two spaces. (I usually add a blank line between each bullet to make it easier to read.)

  • If you have many changes, you can use subheadings to divide them into sections. A subheading should be all upper case and flush left.

  • I use markdown formatting inside the bullets. This doesn't help the formatting in R, but is useful if you want to publish the NEWS file elsewhere.

You can use devtools::show_news() to display the NEWS using R's built-in parser and check that it appears correctly. show_news() defaults to showing just the news for the most recent version of the package. You can override this by using argument latest = FALSE.

CITATION

The CITATION file lives in the inst directory and is intimately connected to the citation() function which tells you how to cite R and R packages. Calling citation() without any arguments tells you how to cite base R:

To cite R in publications use:

  R Core Team (2012). R: A language and environment for statistical
  computing. R Foundation for Statistical Computing, Vienna, Austria.
  ISBN 3-900051-07-0, URL http://www.R-project.org/.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2012},
    note = {{ISBN} 3-900051-07-0},
    url = {http://www.R-project.org/},
  }

We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also ‘citation("pkgname")’ for
citing R packages.

This is generated from a CITATION file that looks like this:

bibentry("Manual",
   title = "R: A Language and Environment for Statistical Computing",
   author = person("R Core Team"),
   organization = "R Foundation for Statistical Computing",
   address      = "Vienna, Austria",
   year   = version$year,
   note   = "{ISBN} 3-900051-07-0",
   url    = "http://www.R-project.org/",

   mheader = "To cite R in publications use:",

   mfooter = 
     paste("We have invested a lot of time and effort in creating R,",
      "please cite it when using it for data analysis.",
      "See also", sQuote("citation(\"pkgname\")"),
      "for citing R packages.", sep = " ")
)

As you can see, it's pretty simple: you only need to learn one new function, bibentry(). The most important arguments, are bibtype (the first argument, which can "Article", "Book", "PhDThesis" and so on), and then the standard bibliographic information like title,, author, year, publisher, journal, volume, issue, pages and so on (they are all described in detail in ?bibEntry). The header (mheader) and footer (mfooter) are optional, and are useful places for additional exhortations.

Vignettes

Vignettes are long-form pdf guides to your package. They are sweave documents that live in the vignettes/ directory. To write a vignette you must be familiar with both Sweave and latex. Describing these is outside the scope of this book, but some useful resources are:

If you write a nice vignette, you might want to consider submitting it to the Journal of Statistical Software or the R Journal. Both are electronic only journals and peer-reviewing can be very helpful for improving the quality of your vignette and the related software.

Demos

A demo is very much like a function example, but is longer, and shows how to use multiple functions together. Demos are .R files that live in the demo/ package directory, and are access with the demo() function.

(NOT YET IMPLEMENTED) The demos directory also needs an index. The easiest way to generate that index is to a roxygen comment with @demoTitle tag:

#' @demoTitle my title

The roxygen process that turns this comment into an index is described in the next chapter.

Clone this wiki locally