main page  —  COMP 526 Applied Algorithmics

Unit 2: Fundamental Data Structures

This is an archived version of this module from Spring 2020.
Click here for the current iteration.

This unit covers some fundamental data structures: stacks, queues, resizable arrays, binary heaps, and binary search trees.

Learning outcomes

  1. Understand the difference between abstract data type (ADT) and its implementation
  2. Know the ADTs stack, queue, priority queue and dictionary/symbol table
  3. Know array-based implementations of stack and queue
  4. Understand linked lists and the corresponding implementations of stack and queue
  5. Know binary heaps and their performance characteristics
  6. Understand binary search trees and their performance characteristics

Material

  • tutorial problems
  • notes from Q&A sessions
  • Q&A Session 2020-02-04 on ADTs, Stacks, Resizable Arrays

  • Q&A Session 2020-02-04 on Binary Heaps

  • Q&A Session 2020-02-17 on Binary Search Trees (Search, Insert, Delete)

  • Q&A Session 2020-02-18 on Binary Search Trees (Rank, Select, Analysis)

Lecture videos

This unit covers some prerequisites for the remainder of the module and is intended to bring all of you to the same level before moving on.

If you previously had an algorithms and data structures module in a computer science undergraduate program, this unit will mostly be recap for you, and you might be able to go over this unit at a faster pace.

To accommodate the varying levels of prior exposure with this material, this unit will be taught using videos. We will use the lecture videos from the online course Algorithms by Robert Sedgewick and Kevin Wayne with are available on CUvids.io.

How to watch the videos

Every video comes with a list of guidance questions resp. short tasks.

Try to answer all guiding questions while watching the videos! Take notes and bring these to class.

(I encourage you to handwrite these: The Pen Is Mightier Than the Keyboard (pdf).

Important: Speed up or slow down the videos (in the player) so that you are neither bored nor lost. CUvids.io also has a (searchable!) transcript of all videos, as well as subtitles.

1 – Stacks

Content: intro to ADTs, stacks and their implementations

Guiding questions:

  1. What is an abstract data type (according to Sedgewick)? What is a data structure?
  2. What operations does a stack support?
  3. What does FIFO and LIFO mean?
  4. What type of linked list is the presented implementation of a stack based on?
  5. How much space does the stack implementation use?
  6. What is the second implementation of stacks? Does it fully satisfy the specification of stack?
  7. What is loitering?

2 – Resizable Arrays

Content: doubling arrays, amortized costs

Guiding questions:

  1. First, a simplistic method is presented for obtaining arrays of variable sizes. How does it work? Why is it (usually) not useful?
  2. What are amortized costs? Describe the concept using push as an example.
  3. When do we shrink the array again? Why do we do that?
  4. Do resizing-array stacks fulfill all API requirements?
  5. When should we use resizing arrays resp. linked lists to implement stacks according to Sedgewick?

3 – Queues

Content: queues and their implementations

Guiding questions:

  1. What operations does a queue support?
  2. How does the linked-list implementation of queues differ from that for stacks?
  3. How does the resizing-array implementation of queues differ from that for stacks?

4 – Java Generics

Content: Java generics for parametric types

Guiding questions:

  1. What options did programmers have in the original Java (before version 1.5)?
  2. What is the advantage of generics as introduced in Java 1.5?
  3. What caveats does Sedgewick mention in connection with Java generics?

5 – Iterators

Content: (Java’s) iterator concept, Bag ADT

Guiding questions:

  1. What operations does an iterator support?
  2. How can we implement an iterator for linked lists?
  3. How can we implement an iterator for arrays?
  4. For both cases, think about what happens when we add elements to the underlying stack while iterating over it. What happens to elements that are deleted?

[Optional] Stack and Queue Applications

Content: some interesting applications of stacks and queues; not vital for following COMP 526, but worthwhile knowing.

Guiding questions:

  1. What operations does Java’s List interface support?
  2. Briefly describe the running-time complexity of these methods in ArrayList and LinkedList.
  3. What application for stacks does Sedgewick present?
  4. Which problem does Dijkstra’s two-stack algorithm solve? Which assumption does Sedgewick make for the input? (hint: parentheses!)
  5. How does Sedgewick justify the correctness of Dijkstra’s two-stack algorithm?

6 – Priority Queues

Content: The priority queue ADT, its applications and elementary implementations

Guiding questions:

  1. What operations do MaxPQ resp. MinPQ offer?
  2. Why does the topM test client use a MinPQ to find the M largest transactions – is that a mistake?
  3. Which elementary implementations are presented? What are their running-time complexities?

7 – Binary Heaps

Content: binary-heap implementation (without change key)

Guiding questions:

  1. What is a binary tree (according to Sedgewick)? Be precise and give examples.
  2. What is a complete binary tree? How many (topologically, i.e., ignoring node labels) different complete binary trees with N nodes are there?
  3. What does it mean that a binary tree is heap-ordered? What does heap order imply for the key at the root?
  4. How does Sedgewick encode a complete binary tree in an array? How is that possible without explicitly storing where the child nodes of a node are stored?
  5. What does the swim operation do? Which preconditions must be fulfilled to run swim, and which postconditions does swim guarantee upon termination?
  6. What does sink do? What are pre- and postconditions?
  7. How does insert work for a binary-heap MaxPQ?
  8. How does delMax work for a binary-heap MaxPQ?
  9. What is the running-time complexity of these two operations?
  10. What does immutability mean? What could go wrong when keys of a PQ are not immutable?

8 – Symbol Table API

Content: symbol table ADT, requirements for keys, applications of symbol tables, and test clients.

Guiding questions:

  1. What operations does every symbol table (ST) support?
  2. What are similarities and differences between ST and mathematical functions?
  3. How should a typical implementation of Java’s equals method for a data type with several fields look like? What requirements have to be met?
  4. Explain how the frequency counting test client uses STs.

9 – Elementary Symbol-Table Implementations

Content: functional, but inefficient ST implementations based on lists.

Guiding questions:

  1. What are the two implementations that are presented?
  2. How does get and put work for these two implementations?
  3. What is the running-time complexity of get and put in these implementations?

10 – Ordered Operations

Content: Ordered ST ADT

Guiding questions:

  1. What are the (additional) operations in an ordered ST?

11 – Binary Search Trees

Content: (unbalanced) binary search trees (BSTs): definitions, search, insert, and analysis

Guiding questions:

  1. (again) What is a binary tree according to Sedgewick?
  2. What is a binary search tree (BST)? Draw a BST for the keys 10, 20, 30, 40, 50.
  3. How is a BST represented in Java? Draw a box-and-arrow diagram for your example BST.
  4. How does search in a BST work? Describe it on your example.
  5. How does insertion into a BST work? Again, give an example on your tree.
  6. How does the best-case tree w.r.t. search costs look like? (more precisely, the best case w.r.t. to searching each key once)
  7. What is the worst possible search cost? How does a tree look like where these occur?
  8. What statement does Sedgewick make about typical / average-case search costs in BSTs?

12 – Ordered Operations in BSTs

Content: augmented BST implementation of Ordered ST.

Guiding questions:

  1. How can we implement min and max in BSTs?
  2. How can we implement floor and ceiling?
  3. What do we have to do to maintain correct subtree sizes in all nodes?
  4. How does the rank operation work in BSTs?

Further reading and sources

The textbook corresponding to the lecture videos has some more details and many exercise problems. Summaries and code are available on the booksite.