x=c("alpha","beta","gamma")
letters

x=month.name
as.numeric(x)

Lists

Vectors and matrices in R are two ways to work with a collection of objects. Lists provide a third method. Unlike a vector or a matrix a list can hold different kinds of objects. Thus, one entry in a list may be a number, while the next is a matrix, while a third is a character string (like "Hello R!"). Lists are useful to store different pieces of information about some common entity. The following list, for example, stores details about a student.

x = list(name="Chang", nationality="Chinese", height=5.5, grades=c(95,45,80))

We can now extract the different fields of x as

names(x)
x$name
x$hei #abbrevs are OK
x$grades
x$g[2]
x$na #oops!

In the coming tutorials we shall never need to make a list ourselves. But the statistical functions of R usually return the result in the form of lists. So we must know how to unpack a list using the $ symbol as above.
	
	To see the online help about symbols like $ type

?"$"

Notice the quotes surrounding the symbol. 	

Let us see an example of this. Suppose we want to write a function that finds the length, total and mean of a vector. Since the function is returning three different pieces of information we should use lists as follows.

f = function(x) list(len=length(x),total=sum(x),mean=mean(x))

Now we can use it like this:

dat = 1:10
result = f(dat)
names(result)
result$len
result$tot
result$mean

x = data.frame(list(name=c("Chang_1","Chang_2","Chang_3"), nationality="Chinese", height=c(5.2,5.5,5.1), grades=c(95,45,80)))

Doing statistics with R

Now that we know R to some extent it is time to put our knowledge to perform some statistics using R. There are basically three ways to do this.

    Doing elementary statistical summarization or plotting of data
    Using R as a calculator to compute some formula obtained from some statistics text.
    Using the sophisticated statistical tools built into R. 

In this first tutorial we shall content ourselves with the first of these three. But first we need to get our data set inside R.
Loading a data set into R

There are various ways to load the data set. One is to use

LMC = read.table("hsb2.txt", header=T)

Note the use of forward slash (/) even if you are working in Windows. Also the header=T tells that the first line of the data file gives the names of the columns. Here we have used the absolute path of the data file. In Unix the absolute path starts with a forward slash (/).

dim(LMC)
names(LMC)
LMC

This object LMC is like a matrix (more precisely it is called a data frame). Each column stores the values of one variable, and each row stores a case. Its main difference with a matrix is that different columns can hold different types of data (for example, the Method column stores character strings, while the other two columns hold numbers). Otherwise, a data frame is really like a matrix. We can find the mean of the Dist variable like this

mean(LMC[,2])
mean(LMC[,"Dist"])

Note that each column of the LMC matrix is a variable, so it is tempting to write

mean(Dist)

but this will not work, since Dist is inside LMC. We can ``bring it out'' by the command

attach(LMC)

Now the command

mean(Dist)

works perfectly. All the values of the Dist variable are different measurements of the same distance. So it is only natural to use the average as an estimate of the true distance. But the Err variable tells us that not all the measurements are equally reliable. So a better estimate might be a weighted mean, where the weights are inversely proportional to the errors. We can use R as a calculator to directly implement this formula:

sum(Dist/Err)/sum(1/Err)

or you may want to be a bit more explicit

wt = 1/Err
sum(Dist*wt)/sum(wt)

Actually there is a smarter way than both of these.

weighted.mean(Dist, 1/Err)
