Data Cleaning with OpenRefine for Health Sciences: Glossary

Key Points

Introduction
  • OpenRefine is a powerful, free and open source tool that can be used for data cleaning.

  • OpenRefine will automatically track any steps you take in working with your data.

Working with OpenRefine
  • Faceting and clustering approaches can identify errors or outliers in data.

Filtering and Sorting with OpenRefine
  • OpenRefine provides a way to sort and filter data without affecting the raw data.

Examining Numbers in OpenRefine
  • OpenRefine also provides ways to get overviews of numerical data.

Scripts from OpenRefine
  • All changes are being tracked in OpenRefine, and this information can be used for scripts for future analyses or reproducing an analysis.

  • Scripts can (and should) be published together with the dataset as part of the digital appendix of the research output.

Exporting and Saving Data from OpenRefine
  • Cleaned data or entire projects can be exported from OpenRefine.

  • Projects can be shared with collaborators, enabling them to see, reproduce and check all data cleaning steps you performed.

Other Resources in OpenRefine
  • Other examples and resources online are good for learning more about OpenRefine

Glossary