Skip to content
Have an account?
Login
or
Register
  • About
    • People
    • Fellows
    • Tastings
    • In the News
    • Awards
      • Christophe Baron Prize
      • AAWE Scholarships
      • AAWE Awards of Merits
    • Downloads
    • Contacts & Copyright
  • Journal
    • Online Journal Member Access
    • Online Journal Library Access
    • Editors
    • JWE – All Issues
    • Submission Guidelines
  • Working Papers
  • Meetings
    • 2023 Stellenbosch
    • 2022 Tbilisi
    • 2019 Vienna
    • 2018 Ithaca
    • 2017 Padua
    • 2016 Bordeaux
    • 2015 Mendoza
    • 2014 Walla Walla
    • 2013 Stellenbosch
    • 2012 Princeton
    • 2011 Bolzano
    • 2010 Davis
    • 2009 Reims
    • 2008 Portland
    • 2007 Trier
  • Membership
Menu
  • About
    • People
    • Fellows
    • Tastings
    • In the News
    • Awards
      • Christophe Baron Prize
      • AAWE Scholarships
      • AAWE Awards of Merits
    • Downloads
    • Contacts & Copyright
  • Journal
    • Online Journal Member Access
    • Online Journal Library Access
    • Editors
    • JWE – All Issues
    • Submission Guidelines
  • Working Papers
  • Meetings
    • 2023 Stellenbosch
    • 2022 Tbilisi
    • 2019 Vienna
    • 2018 Ithaca
    • 2017 Padua
    • 2016 Bordeaux
    • 2015 Mendoza
    • 2014 Walla Walla
    • 2013 Stellenbosch
    • 2012 Princeton
    • 2011 Bolzano
    • 2010 Davis
    • 2009 Reims
    • 2008 Portland
    • 2007 Trier
  • Membership
DONATE
  • Data
  • Jobs & Programs
  • Data
  • Jobs & Programs
Home
»
JWE-Reviews & Others
»
Journal of Wine Economics Volume 3 | 2008 | No. 2
»
On Rating Wines with Unequal Judges
Amazon Link

On Rating Wines with Unequal Judges

By: Robert T. Hodgson
Affilation: Professor Emeritus, Humboldt State University
E-Mail: bob@fieldbrookwinery.com
Full Text PDF
Full Text

In the recent article by Goldstein, et al.(2008), covering over 6000 blind wine tastings, raters were asked to score wines on a scale of “Bad”, “O.K”, “Good”, or “Great.”. The ratings were subsequently coded on a numerical scale of 1 to 4 and averaged to rate the wines. Unknown to the raters, some wines were duplicates, which were used to evaluate the raters. Thus for a flight of 10 wines with 6 raters, if one of the raters was found to be quite inconsistent on the replicated wine, that rater’s scores on all wines would have less weight. In other words, each wine’s final score would not be an equally weighted average of all six raters. The exact weighting scheme was not discussed, but it brought to my attention a problem I used to present first year statistics students.

Suppose you are measuring a physical quantity, like chlorine concentration in water. You are presented with three measurements: two from “chlorine meters” that have a precision of 10ppm, and one with a precision of 50ppm. Do you average the two measurements taken with the more precise meter and discard the third? Do you average all three? If you toss out the third measurement, you are discarding information, even though the information is not very precise.

It is well known that the mean of n measurements with an instrument having a standard deviation of σ will have a standard error of σ / n . If one just averages the first two measurements described above, the standard error is 10 / 2 . Is it possible to weight all three measurements in such a way to improve the precision beyond this?

The answer is yes, and the proof follows from theorems regarding a linear combination of random variables. Let σp be the standard deviation of the poorer instrument and σi be that of the better instrument. Then we can relate the two standard deviations by σp = kσi , where k = 5 in the example. Applying differential calculus to search for a minimum standard error yields the weighting factors. The best (least standard) error occurs when the poorer measurement has a weight of 1/k2. The two “good” measurements have equal weighting factors of 1⁄2(1–1/k2).

In the above example where k = 5, the weighting factor of the good measurements is slightly less than 0.5 and that of the poorer measurement is 1/25. For practical purposes, you might as well throw the third measurement out.

How would this apply to the Goldstein article? Since the wines were rated 1 to 4, the maximum inconsistency would be a 3 point spread. Estimating a rater’s standard deviation by range1, a poor rater might have his score weighted by 1/9. If raters were to use the more common 20 point scale, weighting factors could be more extreme.

Robert T. Hodgson
Professor Emeritus, Humboldt State University
bob@fieldbrookwinery.com

References

Goldstein, Robin, Johan Almenberg, Anna Dreber, John W. Emerson, Alexis Herschkowitsch and Jacob Katz. (1980). Do more expensive wines taste better?. Journal of Wine Economics, 3(1), 1–9

Subscribe to our Email List

You can cancel your subscription at any time.
SUBSCRIBE HERE

Contact

AAWE
Economics Department
New York University
19 W. 4th Street, 6FL
New York, NY 10012, U.S.A.
Tel: (212) 992-8083
Fax: (212) 995-4186
E-Mail: karl.storchmann@nyu.edu

AAWE

Journal

Working Papers as a List

Membership

Videos

LINKS

Fifthsense

JWE at Cambridge University Press

Liquid Assets

Stuart Pigott

Privacy & Cookies Policy

Privacy Policy

Cookies Policy

Twitter Facebook-f Youtube

© AAWE 2021 - All rights reserved