Contact
AAWE
Economics Department
New York University
19 W. 4th Street, 6FL
New York, NY 10012, U.S.A.
Tel: (212) 992-8083
Fax: (212) 995-4186
E-Mail: karl.storchmann@nyu.edu
The author distinguishes between the clinical and statistical meaning of varying levels of inter-taster reliability for the 11 judges who evaluated 10 Chardonnays (6 American and 4 French) in the heralded 1976 Paris wine competition. Four wines showed levels of weighted kappa values (<0.40), that are considered poor by established biostatistical criteria. These ranged between 0.10, for the French Beaune Clos des Mouches 1973 Chardonnay to 0.33 for the U.S. Veedercrest 1972 Chardonnay. However, when levels of statistical significance of the weighted kappa (K w) values were obtained, only the Clos des Mouches failed to reach statistical significance at the .05 level. The other three wines-the U.S. Chateau Montelena, 1973, with a K w of 0.20; the U.S. 1973 David Bruce regular, with a weighted kappa value of .27 and the U.S. Veedercrest, with one of .33- reached statistical significance at p values of <.05, <.001, and <.0001, respectively. These findings are not weighted kappa specific, and reveal that when sample sizes are large enough, even the most trivial of results will be statistically significant, while often devoid of practical or clinical meaningfulness. A level of K w that is clinically meaningful will most likely be statistically significant. But high levels of statistical significance are no guarantee of clinical significance. Methods for resolving this “big N phenomenon” are presented and discussed.
AAWE
Economics Department
New York University
19 W. 4th Street, 6FL
New York, NY 10012, U.S.A.
Tel: (212) 992-8083
Fax: (212) 995-4186
E-Mail: karl.storchmann@nyu.edu