main page — COMP 526 Efficient Algorithms
Unit 4: String Matching – What’s behind Ctrl+F?
This is an archived version of this module from Spring 2022.
Click here for the current iteration.
This unit covers algorithms for exact string matching:
- terminology for strings
- string matching with determinstic finite automata
- Knuth-Morris-Pratt (KMP) algorithm
- Boyer-Moore (BM) algorithm
- Rabin-Karp (RK) algorithm
Learning outcomes
- Know and use typical notions for strings (substring, prefix, suffix, etc.).
- Understand principles and implementation of the KMP, BM, and RK algorithms.
- Know the performance characteristics of the KMP, BM, and RK algorithms.
- Be able to solve simple stringology problems using the KMP failure function.
Material
- slides
- Video 4-1 (2022-02-28):
§4.1 Introduction
- Video 4-2 (2022-02-28):
§4.2 Brute force string matching
- Video 4-3 (2022-02-28):
§4.3 String matching with finite automata
- Video 4-4 (2022-02-28):
§4.4 Constructing string matching DFAs
- Video 4-5 (2022-03-03):
§4.5 The Knuth-Morris-Pratt algorithm
- Video 4-6 (2022-03-03):
§4.6 The Boyer-Moore algorithm
- Video 4-7 (2022-03-03):
§4.7 The Rabin-Karp algorithm
Further reading and sources
The naive string matching algorithm, string matching with automata and the Knuth-Morris-Pratt are discussed, e.g., in
The description of the Boyer-Moore algorithm follows our own algorithms textbook (in German):
The presentation of Rabin-Karp and the failure-link automaton is inspired by material prepared for the module