Data cleaning, preparation, and enrichment takes up an enormous amount of time and is nevertheless a crucial stage in the data science methodology. Tools available for data transformation haven’t fully caught up with the data science new scene. Spreadsheets offer entry-level interface to the data but are time-consuming and don’t scale, while languages like R or ipython offer flexibility but have a steep learning curve for the non-technical person.
Domain experts need powerful yet easy-to-use interfaces to explore new data sets, normalize them and process them via innovative services often available via an API only. OpenRefine helps you minimize this so you can spend more time on building a model by offering the best of both worlds with a self-service agile and iterative interface for data discovery and preparation and an easy-to-learn scripting language.
Formerly Freebase Gridworks and Google Refine, OpenRefine has gained traction with various domain experts including librarians and researchers, data journalism, open data enthusiasts, and semantic web professionals.
This course covers the foundation of OpenRefine and its scripting language GREL. You will learn how to
-
use the facet/filter feature to mine and discover data ;
-
leverage OpenRefine point and click transformation and fuzzy matching function for quick but powerful data cleaning ;
-
write complex transformation in GREL, OpenRefine script language ;
-
call API and parse results in Refine.
This introduction course is for less technical user, business analyst and consultant interested to learn data science. While learning OpenRefine you will get familiar with what a data model or an API is.
More technical people will discover OpenRefine capabilities and see how it can speed up data munging.
- Teacher: Martin Magdinier