OpenRefine is an exciting open-source tool that provides a compelling solution for transforming and understanding data, and cleaning unstructured data. Once the data is squeaky clean OpenRefine reconciles it into a form that is ready to use.
The solution is maintained by Code for Science & Society. It’s an open-source, Java-based web power tool that has some functionality we love!
OpenRefine can take data from all commonly used spreadsheet formats such as CSV, TSV, Excel, JSON, and XML. The data can be imported locally from a computer, web addresses, or database.
OpenRefine is a web platform with a host of features to aid in exploring data. Once your data is uploaded it analyses it and removes inconsistencies so that the user can understand anomalies and zoom in on data.
Another feature we love is how it helps in cleaning datasets, converting them into different formats, and cleaning unwanted or erroneous data points. The platform can handle millions of rows of datasets – it will work as fast as your computer memory allows!
One aspect of OpenRefine that really caught our eye was the ability to reconcile data. The solution can link and extend your dataset with various web services. This is no small task, and there is some serious work going on behind the scenes here.
For example, you might have a dataset with multiple languages in the input text. OpenRefine has integrated Google Translate to detect which language is being used in the input text and then maps each defined language to a language code.
OpenRefine is open-sourced under BSD license and loved by the community with more than 7,500+ GitHub stars.