Abstract

The field of natural language processing (NLP) is rapidly evolving with the development of cutting-edge NLP models. These advances have led to the development of models capable of achieving general natural language understanding, thus increasing the need for benchmarks that assess their capabilities in general natural language understanding. The paper presents a comparison of GLUE and SuperGLUE benchmarks, an assessment of their capabilities in assessing overall understanding of the NLP model language, and a proposal for a universal framework for comparing NLP benchmarks. The framework was applied to the GLUE and SuperGLUE benchmarks in order to assess and compare their capacities to evaluate and differentiate state-of-the-art NLP models.

Share

COinS