R workshop

Mike Hammond
U. of Arizona
  1. General
    1. q() — quit the program.
    2. control-c — stop R if it's doing something you don't want.
    3. apropos('some topic') — find out about some topic (must be quoted).
    4. help(command) — find out about some command.
    5. ?command — same as above.
  2. Importing Data
    1. data.frame() — create a statistical object by typing it in at the prompt.
      age <- c(25,30,56,22,17,9)
      gender <- c("m","f","m","m","f","f")
      weight <- c(160,110,220,150,90,100)
      major <- c('ling','comm','comm','ling','ling','ling','ling')
      g <- data.frame(age,gender,weight,major)
      
    2. fix() — alter a statistical object by typing it in with the R editor.
      g <- data.frame(age=numeric(0), gender=character(0),
         weight=numeric(0), major=character(0))
      fix(g)
      
    3. read.table() — read in a text file you have created.
      g <- read.table('gender.txt',header=T)
      
  3. Objects
    1. ls() — list currently loaded objects.
    2. rm(object) — delete a currently loaded object.
    3. summary(object) — find out about some object.
  4. Accessing a Dataframe
    1. g$gender — access a column by name.
    2. g[[1]] — access a column as a list.
    3. g[,1] — access a column.
    4. g[1,] — access a row.
    5. g[4,2] — access a cell.
    6. g[3:6,1:2] — access a block.
    7. g[g$gender=='f',] — subsetting rows.
      g[g$gender=='f',]
      g[g$weight < 160,]
      g[g$gender=='m' & g$weight < 160,]
      
    8. attach(object) — make the parts of some currently loaded object directly available, e.g. g$gender is available as gender.
    9. detach(object) — remove the attached object.
    10. search() — examine what objects are attached.
  5. Basic Statistics
    1. mean() — mean.
    2. sd() — standard deviation.
    3. var() — variance.
    4. In general, all mathematical functions are available for vectorized calculations.
  6. Manipulating Parts of Dataframes
    1. tapply() — apply some function by factors to a dataframe.
      tapply(weight,gender,mean)
      tapply(weight,gender,length)
      
    2. t() — transpose a dataframe.
    3. transform() — alter a dataframe in some way.
      transform(g,age=age+weight)
      
    4. aggregate() — aggregate within some dataframe.
      aggregate(g,list(gender),length)
      
    5. as.factor() — convert a list/vector of numbers into a factor.
    6. is.factor() — test if something is a factor.
    7. cut(vector,levels) — break a vector of numbers into a factor with n levels.
      tapply(age,cut(weight,2),mean)
      
  7. Plotting
    1. plot() — generic plotting function; lots and lots of options.
      plot(age ~ gender) 
      plot(age,weight) 
      plot(age ~ cut(weight,3))
      
    2. barplot() — makes a barplot, without whiskers, lots and lots of options.
      barplot(tapply(age,gender,mean),xlab="Gender", 
         col=c('sienna','violet'),names.arg=c('female','male'))
      
    3. colors() — lists all available named colors.
    4. hist() — makes a histogram from a list of numbers
    5. interaction.plot() — creates an interaction plot for two factors.
      interaction.plot(gender,major,age)
      
  8. Anova
    1. aov() — creates an anova object which can be examined for the usual information with summary(). The tricky part is the anova formula.
    2. aov(x ~ f1 * f2 + Error(sub/f1))x is a vector representing a response, f1 is a within-items factor, f2 is a within-subjects factor, and sub is the subjects factor. The * includes interactions.
    3. aov(x ~ f1 * f2 + Error(itm/f2)) — here everything is the same except that itm is the items factor.
      sn <- read.table('twosn.txt',header=T) 
      summary(sn) 
      attach(sn) 
      tapply(response,list(sonority,size),mean) 
      summary(aov(response ~ sonority * size + 
         Error(subject/(sonority * size)))) 
      summary(aov(response ~ sonority * size + 
         Error(item))) 
      interaction.plot(sonority,size,response) 
      interaction.plot(sonority,size,response,ylim=c(1,7)) 
      with(sn[sn$size=='three',],summary(aov(response ~
         sonority + Error(subject/sonority))))
      
  9. Regression
    1. lm() — does a linear regression. One also needs to enter a formula, but it's simpler.
    2. lm(x ~ y + z + ...)x, y, z, etc. are vectors of numbers (not factors).
      attach(g) 
      sumary(lm(age ~ weight))
      
  10. Gotchas
    1. Input file format textfile, tabs, missing fields, column issues.
    2. Dataframe columns/rows, vector vs. factor.
    3. Anova within-items factors vs. within-subjects factors.
  11. Scripting
    1. A simple function:
      sjj <- function (x) {
         paste("Hey, ", x, ", this is tone sandhi.", sep="")
      }
      
    2. A more complex function:
      ip.to.ps <- function(filename,f1,f2,r) {
         postscript(filename)
         interaction.plot(f1,f2,r,
            trace.label = deparse(substitute(f1)),
            xlab = deparse(substitute(f2)),
            ylab = paste("mean of",deparse(substitute(r))))
         dev.off()
      }
      
  12. Links
    1. http://www.r-project.org Main R site. R can be downloaded from here, lots and lots of documentation, tutorials, etc.
    2. http://www.statmethods.net One useful R reference site.
    3. http://pidgin.ucsd.edu/mailman/listinfo/r-lang R for linguists home page.
    4. http://www.pallier.org/ressources/tpr/tpR.html General R tutorial by Christophe Pallier (in French).
    5. http://www.pallier.org/ressources/stats_with_R/stats_with_R.pdf Christophe Pallier's tutorial on doing Anova with R (in French).
    6. http://dingo.sbs.arizona.edu/~hammond/Rwkshp/ URL for workshop materials.