Test Results
Here you can find the results of the completed cycles of the contest as well as the information on the evaluation procedure. In case of any questions, please email us at ai@upgreat.one.
WINNERS
cycle one
None of the teams has yet achieved a result corresponding to the level of teachers.
WINNERS AND AWARDEES IN NOMINATIONS
cycle one
Raketa
MIPT, MSU
Nomination
Grammar
1st place
Antiplagiat
JSC Antiplagiat
Nomination
Grammar
2nd place
Chemist
MIPT
Nomination
Grammar
3rd place
DeepPavlov
DeepPavlov
Nomination
Grammar.ENG
1st place
Antiplagiat
JSC Antiplagiat
Nomination
Grammar.ENG
2nd place
Nanosemantics
Nanosemantics Lab
Nomination
Grammar.ENG
3rd place

PERFORMANCE EVALUATION OF THE ARTIFICIAL INTELLIGENCE

Based on a large number of criteria, the work of artificial intelligence (AI) is compared with the work of two independent experts. As a result the accuracy of the participants' software is determined. Below is a simplified algorithm for evaluating AI performance. You can read more about the stages of evaluation, criteria and formulas in the Technical Guidelines.

Stage 1

Selection of essays for tests
Essays on various topics are collected to evaluate the work of the AI systems of the participants (AI assistants). These essays have not been published before.
1 000
essays

Stage 2

Verification of texts by experts and AI
Two experts of the USE (Unified State Exam in Russia) check the essays to ensure the objectivity of the assessment. The experts and AI systems in a limited time evaluate texts in 4 aspects:
Logic
Narrative is not broken, arguments follow the statements, etc.
Facts
Real facts and historical events are correctly described (dates, names, events description, etc.)
Grammar
No mistakes in spelling of words and sentences
Stylistics
Appropriate use of words of different connotations and stylistics, metaphors, comparisons
Experts and the AI system create a special markup of the text, detecting errors and highlighting the blocks that are significant for evaluation. When necessary, one can get an explanation of the reasons for marking the error.

Stage 3

DETERMINING THE ACCURACY OF THE AI
Expert and AI markups are compared with each other in pairs based on a number of criteria, each of them has its own assigned weight (importance) in evaluating the accuracy of the work.
Example of a text markup for an essay on history
Artificial intelligence
30 sec. per essay
The reason for reforms proposed by Speransky was the need to improve the power system. The formation of a parliamentary-type body was one of the steps to transform the autocracy into a constitutional monarchy. In 1810 the State Council was created with advisory functions.
Eveluation
Effect Role

H.Fact Epp
Expert 1
15 min. per essay
The reason for reforms proposed by Speransky was the need to improve the power system. The formation of a parliamentary-type body was one of the steps to transform the autocracy into a constitutional monarchy. In 1810 the State Council was created with advisory functions.
H.Cause


Epp
Expert 2
15 min. per essay
The reason for reforms proposed by Speransky was the need to improve the power system. The formation of a parliamentary-type body was one of the steps to transform the autocracy into a constitutional monarchy. In 1810 the State Council was created with advisory functions.
Cause
Does the AI assistant mark the text block correctly?
On average, AI of the participant marks the essay slightly worse than the exam experts
Let’s take a closer look:
SENTENCE 1
In the first sentence, the experts give opposite evaluations, and the AI coincides with one of the experts. In this sentence the system’s markup is at the expert level.
SENTENCE 2
The experts do not consider the second sentence to be significant for evaluating the essay, while the AI highlights it. The AI markup is wrong as it is not helpful for evaluation.
SENTENCE 3
In analyzing the third sentence AI conforms with one of the experts, but marked a factual error incorrectly. On average, the AI marks the text text block slightly worse than the experts.
In practice AI accuracy is calculated using special formulas
They take into account the evaluations of the system and experts of each sentence, text block and text as a whole.
The participant's AI system is considered sufficiently accurate, if the markup differs from the expert ones less than the expert ones from each other (the RAAM coefficient is higher than or equal to 100%). The higher the coefficient, the more accurate the AI will work.
THE TEAM WITH THE HIGHEST RAAM COEFFICIENT EQUAL OR EXCEEDING 100% WINS THE CONTEST

ANNOUNCEMENT OF THE RESULTS

On December 19, 2020 we summed up the results of the first test cycle of the Up Great technology contest READ//ABLE and awarded the winners of the Grammar and Grammar.Eng nominations. The event took place as a part of the online conference Data-Елка 2020, a reporting event of the Russian Open Data Science community.

WHAT NEXT?

Contest is held in the format of repeated cycles until the solution is found, but no later than the end of December 2022.

Each cycle consists of registration, qualification and test stages for each language – English or Russian. If the technological barrier is not overcome in the current cycle, the next one is launched.

In the 1st cycle there were no teams who could solve the task. So the contest continues with the 2nd stage to be launched in spring 2021. Registration is open.