I have a database of 87,000 sentences (~ 4.5 million characters) - examples of use for 5,000 most popular words in the English language.
The sentences were translated from English to Polish with an automated tool.
The translation, due to the fact that the sentences are relatively easy and have been translated separately, is quite good (I estimate the correctness of the translation at about 70-80%).
I will call:
- filtering the list - removing incorrect sentences from the list (there are examples for a given word that do not contain a given word)
- checking the translation, introducing corrections
- tagging the grammatical form used in a given sentence (PS - present simple, PC - present continuos, PsP - past perfect, PV - passive voice etc.)
An interesting possibility of long-term cooperation.
In the second phase, on the basis of basic examples, I will order the development of the sentence database with further examples for other tenses or grammatical structures, and the transformation of sentences into questions and negations.
The same work is also to be done in German and then eventually in other languages.