Builds a deterministic finite automaton (DFA) for pattern matching over a `ByteString`.
The automaton encodes transitions from (state, input byte) → next state,
allowing efficient streaming search for the pattern within input data.
It produces a flattened transition table of size `((length pattern) + 1) * 256`,
where 256 corresponds to all possible byte values (0–255).
States correspond to pattern prefixes:
- State 0: no match (empty prefix)
- State i: matched the first i bytes of the pattern
- State (length pattern): full match
Transition behavior is derived from the KMP border table (`kmpBorders`),
ensuring correct fallback transitions and eliminating redundant backtracking.
Example: "ANPANMAN"
These following equation is used to determine the "flat" index to build the automaton:
flatindex = (state ∗ alphabetsize) + charcode
Where:
state : Range from 0 to length of the input pattern
alphabetsize : All possible input characters (in this case extended ASCII, 8-bit range from 0 to 255)
charcode : Characters are interpreted via its ASCII code ('A' = 65, 'M' = 77, 'N' = 78, 'P' = 80, and so on)
| Flat index | State | Char code | Char | Meaning |
| ---------- | ----- | --------- | ---- | ------------- |
| 65 | 0 | 65 | 'A' | δ(0, 'A') = 1 |
| 321 | 1 | 65 | 'A' | δ(1, 'A') = 1 |
| 334 | 1 | 78 | 'N' | δ(1, 'N') = 2 |
| 577 | 2 | 65 | 'A' | δ(2, 'A') = 1 |
| 592 | 2 | 80 | 'P' | δ(2, 'P') = 3 |
| 833 | 3 | 65 | 'A' | δ(3, 'A') = 4 |
| 1089 | 4 | 65 | 'A' | δ(4, 'A') = 1 |
| 1102 | 4 | 78 | 'N' | δ(4, 'N') = 5 |
| 1345 | 5 | 65 | 'A' | δ(5, 'A') = 1 |
| 1357 | 5 | 77 | 'M' | δ(5, 'M') = 6 |
| 1601 | 6 | 65 | 'A' | δ(6, 'A') = 7 |
| 1857 | 7 | 65 | 'A' | δ(7, 'A') = 1 |
| 1870 | 7 | 78 | 'N' | δ(7, 'N') = 8 |
| 2113 | 8 | 65 | 'A' | δ(8, 'A') = 1 |
Totality: total
Visibility: export Table of suffix-shifts
When a mismatch occurs at pattern position patpos, assumed to be not the
last position in the pattern, the suffix u of length (patend - patpos)
has been successfully matched.
Let c be the byte in the pattern at position patpos.
If the sub-pattern u also occurs in the pattern somewhere *not* preceded
by c, let upos be the position of the last byte in u for the last of
all such occurrences. Then there can be no match if the window is shifted
less than (patend - upos) places, because either the part of the string
which matched the suffix u is not aligned with an occurrence of u in the
pattern, or it is aligned with an occurrence of u which is preceded by
the same byte c as the originally matched suffix.
If the complete sub-pattern u does not occur again in the pattern, or all
of its occurrences are preceded by the byte c, then we can align the
pattern with the string so that a suffix v of u matches a prefix of the
pattern. If v is chosen maximal, no smaller shift can give a match, so
we can shift by at least (patlen - length v).
If a complete match is encountered, we can shift by at least the same
amount as if the first byte of the pattern was a mismatch, no complete
match is possible between these positions.
For non-periodic patterns, only very short suffixes will usually occur
again in the pattern, so if a longer suffix has been matched before a
mismatch, the window can then be shifted entirely past the partial
match, so that part of the string will not be re-compared.
For periodic patterns, the suffix shifts will be shorter in general,
leading to an O(strlen * patlen) worst-case performance.
To compute the suffix-shifts, we use an array containing the lengths of
the longest common suffixes of the entire pattern and its prefix ending
with position pos.
Example: "ANPANMAN"
| idx | suff[idx] | target = patEnd - suff[idx] | value = patEnd - idx | ar after write |
| --- | --------- | --------------------------- | -------------------- | ----------------- |
| 0 | 0 | 7 - 0 = 7 | 7 - 0 = 7 | [6,6,6,6,6,6,8,7] |
| 1 | 2 | 7 - 2 = 5 | 7 - 1 = 6 | [6,6,6,6,6,6,8,7] |
| 2 | 0 | 7 | 7 - 2 = 5 | [6,6,6,6,6,6,8,5] |
| 3 | 0 | 7 | 7 - 3 = 4 | [6,6,6,6,6,6,8,4] |
| 4 | 2 | 7 - 2 = 5 | 7 - 4 = 3 | [6,6,6,6,6,3,8,4] |
| 5 | 0 | 7 | 7 - 5 = 2 | [6,6,6,6,6,3,8,2] |
| 6 | 0 | 7 | 7 - 6 = 1 | [6,6,6,6,6,3,8,1] |
Totality: total
Visibility: export