main page  —  COMP 526 Applied Algorithmics

Unit 4: String Matching – What’s behind Ctrl+F?

This is an archived version of this module from Spring 2020.
Click here for the current iteration.

This unit covers algorithms for exact string matching:

  • terminology for strings
  • string matching with determinstic finite automata
  • Knuth-Morris-Pratt (KMP) algorithm
  • Boyer-Moore (BM) algorithm
  • Rabin-Karp (RK) algorithm

Learning outcomes

  1. Know and use typical notions for strings (substring, prefix, suffix, etc.).
  2. Understand principles and implementation of the KMP, BM, and RK algorithms.
  3. Know the performance characteristics of the KMP, BM, and RK algorithms.
  4. Be able to solve simple stringology problems using the KMP failure function.

Material

Lecture notes and video will be added once available.

  • lecture notes
  • slides
  • tutorial problems
  • Lecture 2020-02-25: Intro & Brute Force

  • Lecture 2020-02-25: String Matching with Automata

  • Lecture 2020-03-02: String Matching with Automata (part 2)

  • Lecture 2020-03-02: Knuth-Morris-Pratt

  • Lecture 2020-03-02: Knuth-Morris-Pratt prefix function

  • Lecture 2020-03-03: Boyer-Moore

  • Lecture 2020-03-03: Rabin-Karp

Further reading and sources

The naive string matching algorithm, string matching with automata and the Knuth-Morris-Pratt are discussed, e.g., in

The description of the Boyer-Moore algorithm follows my algorithms textbook (in German):

The presentation of Rabin-Karp and the failure-link automaton is inspired by material prepared for the module