There are a number of data frames used in the lab exercises in this book and this appendix lists basic information about those data frames.


Passenger Miles on Commercial US Airlines, 1937-1960. The revenue passenger miles flown by commercial airlines in the United States for each year from 1937 to 1960. This is a time series of length 24; annually, 1937-1960 (from the FAA Statistical Handbook of Aviation).


New York Air Quality Measurements. Daily air quality measurements in New York, May to September 1973. This is a data frame with 154 observations on 6 variables:

Table 1: Air Quality
Name Type Description
Ozone int Ozone (ppb)
Solar.R int Solar R (lang)
Wind numeric Wind (mpg)
Temp numeric Temperature (degrees F)
Month numeric Month (1-12)
Day numeric Day of month (1-31)


The Joyner-Boore Attenuation Data. This data gives peak accelerations measured at various observation stations for 23 earthquakes in California. The data have been used by various workers to estimate the attenuating affect of distance on ground acceleration. This is a data frame with 182 observations on 5 variables:

Table 2: Joyner-Boore Attenuation Data
Name Type Description
event numeric Event number
mag numeric Moment magnitude
station fac Station number (117 levels)
dist numeric Station-hypocenter distance (km)
accel numeric Peak acceleration


The Chatterjee-Price Attitude Data. From a survey of the clerical employees of a large financial organization, the data are aggregated from the questionnaires of the approximately 35 employees for each of 30 (randomly selected) departments. The numbers give the percent proportion of favourable responses to seven questions in each department. This is a data frame with 30 observations on 7 variables:

Table 3: Chatterjee-Price Attitude Data
Name Type Description
ratings numeric Overall rating
complaints numeric Handling of employee complaints
privileges numeric Does not allow special privileges
learning numeric Opportunity to learn
raises numeric Raises based on performance
critical numeric Too critical
advance numeric Advancement


Body Temperature Series of Two Beavers. Reynolds (1994) describes a small part of a study of the long-term temperature dynamics of beaver Castor canadensis in north-central Wisconsin. Body temperature was measured by telemetry every 10 minutes for four females, but data from a one period of less than a day for each of two animals is used there. There are two data frames: beaver1 has 114 rows and 4 columns on body temperature measurements at 10 minute intervals and beaver2 has 100 rows and 4 columns on body temperature measurements at 10 minute intervals.

Table 4: Body Temperature Series of Two Beavers
Name Type Description
day numeric Day of observation (days since 1990)
time numeric Time of observation (0330 for 3:30 am)
temp numeric Measured body temperature (Celsius)
activ numeric Activity outside the retreat (0, 1)


This is simulated data. Customers of the Main Street Cafe completed surveys over a one week period. This is a data frame with observations on 13 variables:

Table 5: Cafe Data
Name Type Description
sex fac Sex (3 levels: male, female, other)
age int Age
day fac Day (7 levels: Monday, Tuesday, etc.)
meal fac Meal (4 levels: breakfast, lunch, dinner, other)
length int Length of meal (minutes)
miles int Miles driven to cafe
pref fac Seating preference (2 levels: booth, table)
ptysize int Number of people in party
food int Rating for food (ord levels: 1-5 ‘stars’)
svc int Rating for service (ord levels: 1-5 ‘stars’)
recmd fac Would recommend to a friend (2 levels: yes, no)
bill numeric Bill (dollars and cents)
tip int Amount of tip (whole dollars)


Speed and Stopping Distances of Cars. The data give the speed of cars and the distances taken to stop. Note that the data were recorded in the 1920s. This is a data frame with 50 observations on 2 variables:

Table 6: Speed and Stopping Distances
Name Type Description
speed numeric Speed (mpg)
dist numeric Stopping distance (ft)


Weight Versus Age of Chicks on Different Diets. The ChickWeight data frame was generated from an experiment on the effect of diet on early growth of chicks. This is a data frame with 578 observations on 4 variables:

Table 7: Chick Weights
Name Type Description
weight numeric Weight (grams)
Time numeric Time (days since birth)
Chick ord A uniqe identifier for each chick
Diet fac A factor of 1-4 indicating the diet


Chicken Weights by Feed Type. An experiment was conducted to measure and compare the effectiveness of various feed supplements on the growth rate of chickens. This is a data frame with 71 observations on 2 variables:

Table 8: Chicken Weights
Name Type Description
weight numeric Weight (unk units)
feed fac Feed type


Carbon Dioxide Uptake in Grass Plants. This data set is from an experiment on the cold tolerance of the grass species Echinochloa crusgalli. This is a data frame with 84 observations on 5 variables:

Table 9: Carbon Dioxide Uptake in Grass Plants
Name Type Description
Plant ordered Unique identifier of each plant (12 levels)
Type fac Origin of the plant (Quebec, Mississippi)
Treatment fac Type of treatment (nonchilled, chilled)
conc numeric Ambient CO2 concentratino (mL/L)
uptake numeric CO2 uptake rate (umol/sq meter/sec)


Smoking, Alcohol and (O)esophageal Cancer . Data from a case-control study of (o)esophageal cancer in Ille-et-Vilaine, France. This is a data frame with 88 observations on 5 variables:

Table 10: Smoking, Alcohol and (O)esophageal Cancer
Name Type Description
agegp ordered Age group (6 levels)
alcgp ordered Alcohol consumption (4 levels)
tobgp ordered Tobacco consumption (4 levels)
ncases numeric Number of cases
ncontrols numeric Number of controls


Old Faithful Geyser Data. Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA. This is a data frame with 272 observations on 2 variables:

Table 11: Old Faithful Geyser Data
Name Type Description
eruptions numeric Eruption time (minutes)
waiting numeric Waiting time to next eruption (minutes)


Infertility Study. This is a matched case-control study. This is a data frame with 248 observations on 8 variables:

Table 12:
Name Type Description
education factor Education (0=0-5yrs, 1=6-11yrs, 2=12+yrs)
age numeric Age in years
parity numeric Parity
induced numeric number of prior induced abortions (0=0, 1=1, 2= 2+)
case numeric Status (0=control, 1=case)
spontaneous numeric number of prior spontaneous abortions (0=0, 1=1, 2= 2+)
stratum integer Matched Set Number 1-83
pooled.stratum numeric Stratum Number 1-63


Edgar Anderson’s Iris Data. This famous (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris: Iris setosa, versicolor, and virginica. This is a data frame with 150 observations on 5 variables:

Table 13:
Name Type Description
Sepal.Length numeric Length (cm)
Sepal.Width numeric Width (cm)
Petal.Length numeric Length (cm)
Petal.Width numeric Width (cm)
Species factor Species (3 levels)


Level of Lake Huron 1875-1972. Annual measurements of the level, in feet, of Lake Huron 1875-1972. This is a time series of length 98.


Monthly Deaths from Lung Diseases in the UK. Monthly deaths recorded from bronchitis, emphysema and asthma in the UK. This is a time series of length 72, 1974-1979.


Longley’s Economic Regression Data. This data set is a macroeconomic data set which provides a well-known example for a highly collinear regression. This is a data frame with 7 variables observed annually from 1947 to 1962 (n = 16):

Table 14: Longley’s Economic Regression Data
Name Type Description
GNP.deflator numeric GNP implicit price deflator (1954=100)
GNP numeric Gross National Product
Unemployed numeric Number employed
Armed.Forces numeric Number in armed forces
Population numeric Population > 14 years of age
Year numeric Year
Employed numeric Number employed


Annual numbers of lynx trappings for 1821–1934 in Canada. This is a time series of length 114, 1821-1934.


Michelson Speed of Light Data. A classical data of Michelson (but not this one with Morley) on measurements done in 1879 on the speed of light. The data consists of five experiments, each consisting of 20 consecutive “runs”. The response is the speed of light measurement, suitably coded (km/sec, with 299,000 subtracted). This is a data frame with 100 observations on 3 variables:

Table 15: Speed of Light
Name Type Description
Expt int The experiment number, from 1 to 5
Run int The run number within each experiment
Speed int Speed-of-light measurement


Motor Trend Car Road Tests. The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). This is a data frame with 32 observations on 11 variables:

Table 16: Motor Trend Car Road Tests
Name Type Description
mpg numeric Miles/(US) gallon
cyl numeric Number of cylinders
disp numeric Displacement (cu. in.)
hp numeric Gross horsepower
drat numeric Rear axle ratio
wt numeric Weight (1000 lbs)
qsec numeric Quarter-mile time
vs numeric V8 (0) or straight (1) engine
am numeric Automatic (0) or manual (1) transmission
gear numeric Number of forward gears
carb numeric Number of carburetors


Classical N, P, K Factorial Experiment. A classical N, P, K (nitrogen, phosphate, potassium) factorial experiment on the growth of peas conducted on 6 blocks. Each half of a fractional factorial design confounding the NPK interaction was used on 3 of the plots. This is a data frame with 24 observations on 5 variables:

Table 17: N, P, K Experiment
Name Type Description
block factor Which block (6 levels)
N factor Indicator for nitrogen (No, Yes)
P factor Indicator for phosphate (No, Yes)
K factor Indicator for potassium (No, Yes)
yield numeric Yield of peas (pounds/plot)


Results from an Experiment on Plant Growth. Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions. This is a data frame with 30 observations on 2 variables:

Table 18: Plant Growth
Name Type Description
weight numeric Weight of yield (in)
group factor Treatment group (3 levels)


Quarterly Approval Ratings of US Presidents. The (approximate) quarterly approval rating for the President of the United States from the first quarter of 1945 to the last quarter of 1974. This is a time series of length 120.


Lengths of Major North American Rivers. This data set gives the lengths (in miles) of 141 “major” rivers in North America, as compiled by the US Geological Survey. This is a vector with 141 observations.


Measurements on Petroleum Rock Samples. This data set contains measurements on 48 rock samples from a petroleum reservoir. This is a data frame with 48 observations on 4 variables:

Table 19: Petroleum Rock Samples
Name Type Description
area int Area of pores (pixels in 256 X 256)
peri numeric Perimeter in pixels
shape numeric perimeter/square root of area
perm numeric Permeability in mili-Darcies


Student’s Sleep Data. Data which show the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients. This is a data frame with 20 observations on 3 variables:

Table 20: Sleep Data
Name Type Description
extra numeric Increase in hours of sleep
group fac Drug given (2 levels)
ID fac Patient ID (10 levels)


Brownlee’s Stack Loss Plant Data. Operational data of a plant for the oxidation of ammonia to nitric acid. This is a data frame with 21 observations on 4 variables:

Table 21: Sleep Data
Name Type Description
Air Flow numeric Flow of cooling air
Water Temp numeric Cooling water inlet temperature
Acid Conc numeric Concentration of acid [per 1000, minus 500]
stack.loss numeric Stack loss


US State Facts and Figures. Data sets related to the 50 states of the United States of America. This is a factor giving the state region (Northeast, South, North Central, West).


Monthly Sunspot Numbers, 1749-1983. Monthly mean sunspot numbers from 1749 to 1983. Collected at Swiss Federal Observatory, Zurich until 1960, then Tokyo Astronomical Observatory. This is a time series of length 2820.


Swiss Fertility and Socioeconomic Indicators (1888) Data. Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888. This is a data frame with 47 observations on 6 variables, each of which is in percent (0-100):

Table 22: Swiss Socioeconomic Indicators (1888)
Name Type Description
Fertility numeric Common fertility measure
Agriculture numeric % males involved in agriculture
Examination int % receiving high mark on army exam
Education int % education beyond primary school
Catholic numeric % catholic (as opposed to protestant)
Infant.Mortality numeric Live births who live less than 1 year


Survival of passengers on the Titanic. This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner “Titanic,” summarized according to economic status (class), sex, age and survival. This is a 4-dimensional array resulting from cross-tabulating 2201 observations on 4 variables. The variables and their levels are as follows:

Table 23: Titanic
Num Name Levels
1 Class 1st, 2nd, 3rd, Crew
2 Sex Male, Female
3 Age Child, Adult
4 Survived No, Yes


Girth, Height and Volume for Black Cherry Trees. This data set provides measurements of the girth, height and volume of timber in 31 felled black cherry trees. Note that girth is the diameter of the tree (in inches) measured at 4 ft 6 in above the ground. This is a data frame with 31 observations on 3 variables:

Table 24: Black Cherry Tree Data
Name Type Description
Girth numeric Tree diameter (inches)
Height numeric Tree height (feet)
Volume numeric Volume of timber (cubic ft)

UCB Admissions

Student Admissions at UC Berkeley. Aggregate data on applicants to graduate school at Berkeley for the six largest departments in 1973 classified by admission and sex. This is a 3-dimensional array resulting from cross-tabulating 4526 observations on 3 variables. The variables and their levels are as follows:

Table 25: Admissions at UC Berkeley
Num Name Levels
1 Admit Admitted, Rejected
2 Gender Male, Female
3 Dept A, B, C, D, E, F


UK Quarterly Gas Consumption. Quarterly UK gas consumption from 1960Q1 to 1986Q4, in millions of therms. This is a time series of length 108.


Accidental Deaths in the US 1973-1978. The monthly totals of accidental deaths in the USA. This is a time series of length 72.


Violent Crime Rates by US State. This data set contains statistics about the arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas. This is a data frame with 50 observations on 4 variables:

Table 26: US Arrests
Name Type Definition
Murder numeric Murder arrests per 100,000
Assault integer Assault arrests per 100,000
UrbanPop integer Percent urban population
Rape numeric Rape arrests per 100,000


Lawyers’ Ratings of State Judges in the US Superior Court. This contains lawyers’ ratings of state judges in the US Superior Court. This is a data frame with 43 observations on 12 variables:

Table 27: Judge Ratings
Name Type Definition
CONT numeric Number of contacts of lawyer with judge
INTG numeric Judicial integrity
DMNR numeric Demeanor
DILG numeric Diligence
CFMG numeric Case flow managing
DECI numeric Prompt decisions
PREP numeric Preparation for trial
FAMI numeric Familiarity with law
ORAL numeric Sound oral rulings
WRIT numeric Sound written rulings
PHYS numeric Physical ability
RTEN numeric Worthy of retention


The Number of Breaks in Yarn During Weaving. This data set gives the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn. This is a data frame with 54 observations on 3 variables:

Table 28: Breaks in Yarn During Weaving
Name Type Description
breaks numeric Number of breaks
wool fac Type of wool (levels A, B)
tension fac Tension on wool (levels L, M, H)