Summary (or descriptive) statistics are the first analyses used to represent most numeric datasets. They form the foundation for much more complicated computations and analyses and data visualizations.
1. mean
2. sd
3. min
4. max
5. range
6. quantiles
7. summary
Mean
In R, a mean can be calculated on an isolated variable via the mean(VAR) command, where VAR is the name of the variable whose mean you wish to compute. Here is an example -
Standard Deviation
Within R, standard deviations are calculated in the same way as means. The standard deviation of a single variable can be computed with the sd(VAR) command, where VAR is the name of the variable whose standard deviation you wish to retrieve.
Minimum and Maximum
Keeping with the pattern, a minimum can be computed on a single variable using the min(VAR) command. The maximum, via max(VAR), operates identically.
Range
The range of a particular variable, that is, its maximum and minimum, can be retrieved using the range(VAR) command.
Percentiles
Given a dataset and a desired percentile, a corresponding value can be found using the quantile(VAR, c(PROB1, PROB2,…)) command. VAR refers to the variable name and PROB1, PROB2, etc., relate to probability values.
The probabilities must be between 0 and 1, therefore making them equivalent to decimal versions of the desired percentiles (i.e. 50% = 0.5). The following example shows how this function can be used to find the data value that corresponds to a desired percentile.
> #calculate desired percentile values using quantile(VAR, c(PROB1, PROB2,…))
> #what are the 25th and 75th percentiles for age in the sample?
Note that quantile(VAR) command can also be used. When probabilities are not specified, the function will default to computing the 0, 25, 50, 75, and 100 percentile values, as shown in the following example.
Summary
A very useful multipurpose function in R is summary(X), where X can be one of any number of objects, including datasets, variables, and linear models, just to name a few. When used, the command provides summary data related to the individual object that was fed into it. Thus, the summary function has different outputs depending on what kind of object it takes as an argument. Besides being widely applicable, this method is valuable because it often provides exactly what is needed in terms of summary statistics.
McGill Libraries • Questions? Ask us!
Privacy notice