An overview over fuzzy string matching problems and solutions. After reading this article you will know in which situations fuzzy string matching can be helpful, and know variations like fuzzy deduplication and record linkage.
Which packages help us with fuzzy matching? We are going to explore stringdist, tidystringdist, fuzzyjoin, inexact, refinr, fuzzywuzzyR, and lingmatch.
Whenever you have text data that was input manually by a human, there is a chance that it contains errors: Typos, abbreviations or different ways of writing can be challenges for your analysis. Fuzzy matching is a way to find inexact matches that mean the same thing like mcdonalds, McDonalds and McDonald's Company.
Animations can help to show events over time. I found data from the RKI about daily COVID cases in Germany and want to describe the process of creating the animation. It involves fuzzy matching, as the names of the counties (Landkreise) are not identical in the RKI data and the shapefile I used.