Abstract # 4310 Poster # 148:

Scheduled for Friday, June 22, 2012 07:00 PM-09:00 PM: Session 22 (Gardenia) Poster Presentation


RELIABILITY ASSESSMENT OF BEHAVIORAL OBSERVATION DATA EMPLOYING KRIPPENDORFF’S ALPHA: AN EXAMPLE FROM A CAPTIVE RHESUS MACAQUE (MACACA MULATTA) SOCIAL NETWORK STUDY

D. L. Hannibal, M. E. Jackson, A. C. Nathman, T. Boussina and B. McCowan
University of California Davis, McCowan Animal Behavior Laboratory for Welfare and Conservation, Brain Mind and Behavior Unit, California National Primate Research Center, Davis, CA 95616, USA
line
     Most behavioral studies reporting observer reliability use index of concordance (percentage agreement), Pearson’s correlation, or Cohen’s Kappa, with a minimum agreement of .70 -.85 establishing reliability. Index of concordance is considered relaxed and others restrictive. Krippendorff’s alpha provides restrictive coefficients of agreement for any metric, confidence intervals, and probabilities of achieving minimum levels of agreement. Zero values for behaviors judged as not occurring during the sampling period present a problem. Large datasets with many cases containing zero values for all observers potentially inflates agreement coefficients. Using reliability data collected by three observers sampling conflict events and affiliation scans of captive rhesus macaques (Macaca mulatta), we compare Krippendorff’s alpha for the complete data and a reduced dataset with cases where all observers recorded 0 deleted. We also compare Krippendorff’s alpha with other methods, using a minimum agreement of .80. The results show agreement coefficient differences between the complete dataset (.82) and the reduced dataset (.69) are great enough that observers could be erroneously judged reliable. Index of concordance (.70-.85) clearly allows observers declared reliable that would be judged unreliable by Cohen’s Kappa (.61-.74), Spearman rank (.41-.70), and Krippendorff’s alpha (nominal=.61-.74; ordinal=.41-.70). Krippendorff’s alpha is easily employed using common statistical packages and is equivalent to other appropriate tests, but provides additional useful information (CI’s and probabilities for minimum agreement) and we recommend it to assess observer reliability.