Test Results
Here you can find the results of the completed cycles of the contest as well as the information on the evaluation procedure. In case of any questions, please email us at ai@upgreat.one.
WINNERS
DeepPavlov
MIPT
English language
1st place
Nanosemantics
Nanosemantics Lab
English language
2nd place
WINNERS AND AWARDEES IN NOMINATIONS
Nanosemantics
Nanosemantics Lab
Nomination
Structure
1st place
Antiplagiat
JSC Antiplagiat
Nomination
Structure
2nd place
MUCTR AI
MUCTR
Nomination
Structure
3rd place
Antiplagiat
JSC Antiplagiat
Nomination
Logic
1st place
MUCTR AI
MUCTR
Nomination
Logic
2nd place
FirstTry
 
Nomination
Logic
3rd place

PERFORMANCE EVALUATION OF THE ARTIFICIAL INTELLIGENCE

Based on a large number of criteria, the work of artificial intelligence (AI) is compared with the work of two independent experts. As a result the accuracy of the participants' software is determined. Below is a simplified algorithm for evaluating AI performance. You can read more about the stages of evaluation, criteria and formulas in the Technical Guidelines.

Stage 1

Selection of essays for tests
Essays on various topics are collected to evaluate the work of the AI systems of the participants (AI assistants). These essays have not been published before.
1 000
essays

Stage 2

Verification of texts by experts and AI
Two experts of the USE (Unified State Exam in Russia) check the essays to ensure the objectivity of the assessment. The experts and AI systems in a limited time evaluate texts in 4 aspects:
Logic
Narrative is not broken, arguments follow the statements, etc.
Facts
Real facts and historical events are correctly described (dates, names, events description, etc.)
Grammar
No mistakes in spelling of words and sentences
Stylistics
Appropriate use of words of different connotations and stylistics, metaphors, comparisons
Experts and the AI system create a special markup of the text, detecting errors and highlighting the blocks that are significant for evaluation. When necessary, one can get an explanation of the reasons for marking the error.

Stage 3

DETERMINING THE ACCURACY OF THE AI
Expert and AI markups are compared with each other in pairs based on a number of criteria, each of them has its own assigned weight (importance) in evaluating the accuracy of the work.
Example of a text markup for an essay on history
Artificial intelligence
30 sec. per essay
The reason for reforms proposed by Speransky was the need to improve the power system. The formation of a parliamentary-type body was one of the steps to transform the autocracy into a constitutional monarchy. In 1810 the State Council was created with advisory functions.
Eveluation
Effect Role

H.Fact Epp
Expert 1
15 min. per essay
The reason for reforms proposed by Speransky was the need to improve the power system. The formation of a parliamentary-type body was one of the steps to transform the autocracy into a constitutional monarchy. In 1810 the State Council was created with advisory functions.
H.Cause


Epp
Expert 2
15 min. per essay
The reason for reforms proposed by Speransky was the need to improve the power system. The formation of a parliamentary-type body was one of the steps to transform the autocracy into a constitutional monarchy. In 1810 the State Council was created with advisory functions.
Cause
Does the AI assistant mark the text block correctly?
On average, AI of the participant marks the essay slightly worse than the exam experts
Let’s take a closer look:
SENTENCE 1
In the first sentence, the experts give opposite evaluations, and the AI coincides with one of the experts. In this sentence the system’s markup is at the expert level.
SENTENCE 2
The experts do not consider the second sentence to be significant for evaluating the essay, while the AI highlights it. The AI markup is wrong as it is not helpful for evaluation.
SENTENCE 3
In analyzing the third sentence AI conforms with one of the experts, but marked a factual error incorrectly. On average, the AI marks the text text block slightly worse than the experts.
In practice AI accuracy is calculated using special formulas
They take into account the evaluations of the system and experts of each sentence, text block and text as a whole.
The participant's AI system is considered sufficiently accurate, if the markup differs from the expert ones less than the expert ones from each other (the RAAM coefficient is higher than or equal to 100%). The higher the coefficient, the more accurate the AI will work.
THE TEAM WITH THE HIGHEST RAAM COEFFICIENT EQUAL OR EXCEEDING 100% WINS THE CONTEST

ANNOUNCEMENT OF THE RESULTS

On December 19, 2020 we summed up the results of the first test cycle of the Up Great technology contest READ//ABLE and awarded the winners of the Grammar and Grammar.Eng nominations. The event took place as a part of the online conference Data-Елка 2020, a reporting event of the Russian Open Data Science community.

WHAT NEXT?

Contest is held in the format of repeated cycles until the solution is found, but no later than the end of December 2022.

Each cycle consists of registration, qualification and test stages for each language – English or Russian. If the technological barrier is not overcome in the current cycle, the next one is launched.

In the 1st cycle there were no teams who could solve the task. So the contest continues with the 2nd stage to be launched in spring 2021. Registration is open.