
One of the most laborious tasks in Machine Learning consists of data collection and treatment.
There are a meteorological observatory in my city. You can see main meteorological indicators in real time trough its we and it share historical data too, but it share it in PDF
I have talk with them in order to share all data in CSV and allow people to use the data easily, but it seems that is not possible 🙁
Therefore I want this data and I want to convert this PDF files to a workable data collection. And I have been searching a good solution to convert this table PDF to CSV and the solution is called Canvas.
Once you have data in CSV you can use this data in many ways, opening with excel, Libre office, Google Sheets, etc, because is easy import them in spreedsheets or using pythons and its libraries.
As I want an automatized process I will work with a python script and is here where I introduce Tabula.