READ//ABLE - Technology contest UP GREAT

READ //

ABLE

Test Results

Here you can find the results of the completed cycles of the contest as well as the information on the evaluation procedure. In case of any questions, please email us at challenges@upgreat.one.

WINNERS

None of the teams has yet achieved a result corresponding to the level of teachers.

DeepPavlov

MIPT

English language

1st place

Nanosemantics

Nanosemantics Lab

English language

2nd place

Антиплагиат

Арусский язык

победитель

WINNERS AND AWARDEES IN NOMINATIONS

Raketa

MIPT, MSU

Nomination

Grammar

1st place

Antiplagiat

JSC Antiplagiat

Nomination

Grammar

2nd place

Chemist

MIPT

Nomination

Grammar

3rd place

DeepPavlov

Nomination

Grammar.ENG

1st place

Antiplagiat

JSC Antiplagiat

Nomination

Grammar.ENG

2nd place

Nanosemantics

Nanosemantics Lab

Nomination

Grammar.ENG

3rd place

Nanosemantics

Nanosemantics Lab

Nomination

Structure

1st place

Antiplagiat

JSC Antiplagiat

Nomination

Structure

2nd place

MUCTR AI

MUCTR

Nomination

Structure

3rd place

Antiplagiat

JSC Antiplagiat

Nomination

Logic

1st place

MUCTR AI

MUCTR

Nomination

Logic

2nd place

FirstTry

Nomination

Logic

3rd place

TESTS RESULTS

Cycle 1

Cycle 2

Cycle 3

This leaderboard shows the rating of the teams that took part in the tests of the contest READ//ABLE, including the current cycle nominations, as well as the level of accuracy achieved by their AI solutions.

RATING POSITION

TEAM

TOWN

AVERAGE FILE PROCESSING TIME, SEC.

SHARE OF SUCCESSFULLY PROCESSED FILES, %

RAAM, %

Shows the difference between the markups of experts and AI. RAAM = AAAM / AAEM * 100%

Antiplagiat

Moscow

3.09

100

100.138

Nanosemantika

Moscow

6.13

100

92.933

Krylia

Moscow

9.30

100

84.651

PG7

Moscow

2.17

99.4

83.593

Organoid AGI

Moscow

12.69

100

58.173

PERFORMANCE EVALUATION OF THE ARTIFICIAL INTELLIGENCE

Based on a large number of criteria, the work of artificial intelligence (AI) is compared with the work of two independent experts. As a result the accuracy of the participants' software is determined. Below is a simplified algorithm for evaluating AI performance. You can read more about the stages of evaluation, criteria and formulas in the Technical Guidelines.

Stage 1

Selection of essays for tests

Essays on various topics are collected to evaluate the work of the AI systems of the participants (AI assistants). These essays have not been published before.

1 000

essays

Stage 2

Verification of texts by experts and AI

Two experts of the USE (Unified State Exam in Russia) check the essays to ensure the objectivity of the assessment. The experts and AI systems in a limited time evaluate texts in 4 aspects:

Logic

Narrative is not broken, arguments follow the statements, etc.

Facts

Real facts and historical events are correctly described (dates, names, events description, etc.)

Grammar

No mistakes in spelling of words and sentences

Stylistics

Appropriate use of words of different connotations and stylistics, metaphors, comparisons

Experts and the AI system create a special markup of the text, detecting errors and highlighting the blocks that are significant for evaluation. When necessary, one can get an explanation of the reasons for marking the error.

Stage 3

DETERMINING THE ACCURACY OF THE AI

Expert and AI markups are compared with each other in pairs based on a number of criteria, each of them has its own assigned weight (importance) in evaluating the accuracy of the work.

Example of a text markup for an essay on history

Artificial intelligence

30 sec. per essay

The reason for reforms proposed by Speransky was the need to improve the power system. The formation of a parliamentary-type body was one of the steps to transform the autocracy into a constitutional monarchy. In 1810 the State Council was created with advisory functions.

Eveluation
Effect Role

H.Fact Epp

Expert 1

15 min. per essay

H.Cause

Epp

Expert 2

15 min. per essay

Cause

Does the AI assistant mark the text block correctly?

On average, AI of the participant marks the essay slightly worse than the exam experts

Let’s take a closer look:

SENTENCE 1

In the first sentence, the experts give opposite evaluations, and the AI coincides with one of the experts. In this sentence the system’s markup is at the expert level.

SENTENCE 2

The experts do not consider the second sentence to be significant for evaluating the essay, while the AI highlights it. The AI markup is wrong as it is not helpful for evaluation.

SENTENCE 3

In analyzing the third sentence AI conforms with one of the experts, but marked a factual error incorrectly. On average, the AI marks the text text block slightly worse than the experts.

In practice AI accuracy is calculated using special formulas

They take into account the evaluations of the system and experts of each sentence, text block and text as a whole.

The participant's AI system is considered sufficiently accurate, if the markup differs from the expert ones less than the expert ones from each other (the RAAM coefficient is higher than or equal to 100%). The higher the coefficient, the more accurate the AI will work.

THE TEAM WITH THE HIGHEST RAAM COEFFICIENT EQUAL OR EXCEEDING 100% WINS THE CONTEST

ANNOUNCEMENT OF THE RESULTS

On December 19, 2020 we summed up the results of the first test cycle of the Up Great technology contest READ//ABLE and awarded the winners of the Grammar and Grammar.Eng nominations. The event took place as a part of the online conference Data-Елка 2020, a reporting event of the Russian Open Data Science community.

WHAT NEXT?

Contest is held in the format of repeated cycles until the solution is found, but no later than the end of December 2022.

Each cycle consists of registration, qualification and test stages for each language – English or Russian. If the technological barrier is not overcome in the current cycle, the next one is launched.

In the 1st cycle there were no teams who could solve the task. So the contest continues with the 2nd stage to be launched in spring 2021. Registration is open.