Data#

behavior#

Observational data on the activities of 45 adolescent and adult males in a community of indigenous Nicaraguan horticulturalists collected over a twelve-month period in 2004–2005.

Data were obtained from the third electronic supplement to 10.1007/s00265-017-2363-8.

Models#

chimpanzees#

Prosocial behavior (or lack thereof) among seven chimpanzees described on page 325 of Statistical Rethinking; the primary source is 10.1038/nature04243. In this suite of experiments, a focal chimpanzee (actor) is presented with two levers that can be pulled. One lever will deliver an item of food to the actor only. The other lever will also deliver an item of food to the other side of the table which may or may not be occupied by another chimpanzee (recipient). This setup was repeated in six blocks (corresponding to different days) for all pairs of seven chimpanzees (both as actor and recipient). Each experiment with a recipient was interleaved with an experiment without a recipient. The dataset thus comprises [6 blocks] * [7 actors] * ([6 recipients] + [6 empty seats]) = 504 experiments. The motivation for the experiments was to assess whether the chimpanzees would change their behavior if a recipient is present, e.g., choosing the prosocial option that delivers food to a fellow chimpanzee.

The data were obtained from here, and this schema is based on this documentation. In this dataset, the recipient identifier has been re-coded by subtracting 1 as suggested by rmcelreath/rethinking#340.

Models#

detergents#

Liquid detergent purchasing decisions in the two-year period spanning the first week of July 1986 to July 16th of 1988 in Sioux Falls, South Dakota. The dataset includes the top six national brands in terms of volume, accounting for 81% of the market share for national brands. The primary source is 10.2307/1392011. The data were obtained from the bayesm package and converted to CSV for interoperability.

Models#

reference_models/misc/detergents

election88#

Outcomes of CBS News polls from the 10 days immediately preceeding the 1988 US presidential election described on pages 4–5 of Data Analysis Using Regression and Multilevel/Hierarchical Models. Each respondent was asked if they preferred George Bush (y = 1) or Michael Dukakis (y = 0). Demographic information includes four age and education categories, gender of the respondent, and whether they are Black. The respondents’ residential state and corresponding region are also available.

States have the same indices as in presidential but states 2 (AK) and 12 (HI) have no data. State 9 (DC) is not present in presidential.

The data were obtained from here and converted to CSV for interoperability.

Models#

latin_square#

Yield of plots of millet in grams arranged in a 5 by 5 latin square experiment discussed on page 292 of Data Analysis Using Regression and Multilevel/Hierarchical Models. Data were extracted from figure 13.11. The primary source is table 14.10.2 on page 270 of “Statistical Methods” by Snedecor and Cochran (1989). Treatments correspond to different spacings of plants (treatment A corresponds to 2 inch spacing, B to 4 inches, etc.). Rows and columns correspond to physical space and their order is important because “there is often a gradient in fertility running parallel to one side of the field and sometimes gradients running parallel to both sides.” (Snedecor and Cochran, p. 268).

Models#

pilots#

Psychological experiment of pilots on flight simulators, with 40 data points corresponding to 5 treatment conditions and 8 different airports described on page 289 of Data Analysis Using Regression and Multilevel/Hierarchical Models; the primary source is “New Airline Pilots May Not Receive Sufficient Training to Cope With Airplane Upsets”. The data were obtained from here and converted to CSV for interoperability.

Models#

pres_vote_historical#

Data on U.S. Presidential elections used by Andrew Gelman to forecast US presidential elections in 2020, 2024, and 2028 for the New York Times (see here for the article). He notes that the model is “‘dumb’” in that it uses nothing more than past vote totals and a forecast of the 2016 vote” (in contrast to presidential which also includes covariates such as economic performance). State groups are extracted from Andrew’s blog post. DC is included in the dataset but not used in the definition of regions. Data were obtained from here and converted to CSV for interoperability.

Models#

presidential#

Data on U.S. Presidential elections used in Section 15.2 and described in Table 15.1 of Bayesian Data Analysis. The data were obtained from here and converted to CSV for interoperability. This schema is based on the embedded data description and table 15.2. Regions in the below metadata use state indices in alphabetic order of the name of the state (rather than it’s two-letter code) starting at 1. Compared with the region assignment of the US Census Bureau, the following states in the South have been moved to the North East: MD, DE, WV (presumably to balance the sizes of regions).

The value of the regional variable r1 (‘South’) is 0 in non-Southern states. In Southern states, r1 = 1 if the Democratic candidate for President is a Southerner, and r1 = -1 if the Republican candidate for President is a Southerner. We set r1 = 1 in 1964, 1976, 1980, 1992 and r1 = -1 in 1964. For the purposes of this variable (and also the variable r2, ‘South in 1964’), Southern states are AL, AR, FL, GA, LA, MS, NC, SC, TN, TX, VA (but not KY or OK).

Models#

radon#

Measurement of radon levels in houses in each of the 85 counties in Minnesota as described on p. 3 and p. 254 of Data Analysis Using Regression and Multilevel/Hierarchical Models.

Models#

reedfrogs#

Tadpole mortality in different tanks described on page 401 of Statistical Rethinking; the primary source is 10.1890/04-0535. The data were obtained from here, and the schema is based on this documentation.

Models#

schools#

Data on the effect of coaching programs on Stochastic Aptitude Test (Verbal) in each of eight high schools described in Section 5.5 of Bayesian Data Analysis. In each school, the estimated coaching effect and its standard error were obtained by an analysis of covariance adjustment (that is, a linear regression was performed of SAT-V on treatment group, using PSAT-M and PSAT-V as control variables) appropriate for a completely randomized experiment. A separate regression was estimated for each school. Data were obtained from table 1 of 10.2307/1164617.

Models#

trolley#

Questions of morality using the classic example of an actor being able to divert a trolley with two undesirable outcomes as described on page 381 of Statistical Rethinking. The primary source is 10.1111/j.1467-9280.2006.01834.x. The data were obtained from here, and the schema is based on this documentation.