Start Date

11-12-2016 12:00 AM

Description

Evaluation of expert workers by their decision quality has substantial practical value, yet using other expert workers for decision quality evaluation tasks is costly and often infeasible. In this work, we frame the Ranking of Expert workers according to their unobserved decision Quality (REQ) -- without resorting to evaluation by other experts -- as a new Data Science problem. This problem is challenging, as the correct decisions are commonly unobservable and substantial parts of the information available to the decision maker is not available for retrospective decision evaluation. We propose a new machine learning approach to address this problem. We evaluate our method on one dataset representing real expert decisions and two public datasets, and find that our approach is successful in generating highly accurate rankings. Moreover, we observe that our approach’s superiority over the baseline is particularly prominent as evaluation settings become increasingly challenging.

Share

COinS
 
Dec 11th, 12:00 AM

Who’s A Good Decision Maker? Data-Driven Expert Worker Ranking under Unobservable Quality

Evaluation of expert workers by their decision quality has substantial practical value, yet using other expert workers for decision quality evaluation tasks is costly and often infeasible. In this work, we frame the Ranking of Expert workers according to their unobserved decision Quality (REQ) -- without resorting to evaluation by other experts -- as a new Data Science problem. This problem is challenging, as the correct decisions are commonly unobservable and substantial parts of the information available to the decision maker is not available for retrospective decision evaluation. We propose a new machine learning approach to address this problem. We evaluate our method on one dataset representing real expert decisions and two public datasets, and find that our approach is successful in generating highly accurate rankings. Moreover, we observe that our approach’s superiority over the baseline is particularly prominent as evaluation settings become increasingly challenging.