Removal of duplicate content before sending it via OmegaT (or a similar CAT tool) to the machine translation API

Closed job

maksioo

Employer

12 deals

Job category:

Desktop/web applications

Expected budget:

200.00 PLN

Preferable skills:

xml

Published: on 2021-05-10

Valid until: on 2021-06-09

Job description

Dear experts,

We have e-commerce files, SKUs with similar names, different type numbers

E.g:

ELECTRONIC BOARD PF34534500-2

ELECTRONIC BOARD OPF435400

ELECTRONIC BOARD KKLFI 24/36/48

ELECTRONIC BOARD S442000

The problem is that if we translate 10k of such paragraphs by machine, only 10% will be unique, the rest are just repetitive text.

As a result, all characters translated using the machine translation API will be counted as translated. When translating repeated texts, this is an important factor to consider in order to avoid translating the same text over and over again and thus save on consumption.

My question is how to configure OmegaT to use databases before asking MT API to translate any text, so that its system first checks for existing translations for the text to be translated. If the answer is "yes", the translation that is already available is displayed. If the answer is "no" then the text to be translated is sent to the MT API and saved in the database. Any plugins, process, custom work? Thank you in advance. The file should be reduced to the original form after receiving the answer from MT.

information on OmegaT

https://omegat.org/resources

http://185.13.37.79:8003/index.php/p/omegat-core/source/tree/3.1.1/docs_devel/OmegaT-plugin-demo/Creating%20an%20OmegaT%20Plugin.odt

https://github.com/omegat-org/plugin-skeleton

Required functions:

Submitted offers

No one sends a job offer at this moment.