Job description
I will order to write a script that will extract all related data i.e. contact details, contact data, etc. from pdf file directories.
The pdf files have a complex structure (more than 100 elements per one company, multiple companies in the file) that need to be extracted from a single pdf file
The result we are interested in is fully structured data in a .csv /.xls/.xlsx/ file.
If you have dealt with data like CIDG / KRS / Court Monitor /.
If you have dealt with #NLP #pytnon #PyPDF #PyMuPDF .... then you can handle
link to 2ch sample .pdf files
https://wyszukiwarka-msig.ms.gov.pl/api/Monitor/Download?id=1943&fileId=true
https://wyszukiwarka-msig.ms.gov.pl/api/Monitor/Download?id=6969&fileId=true
Process:
1) we sign an assignment contract and NDA agreement
2) you receive a test 30 files
as you confirm that you can transform them
3) you receive a directory with sample 6000 files
4) you send the result of the script
we check the correctness of the data if they do not split in the columns
if everything will be ok
5) you receive the transfer
6) you send the script with instructions for installation / operation