0 | ||| Log-Structured Merge RRB Vector Types
24 | %language ElabReflection
26 | --------------------------------------------------------------------------------
27 | -- Rebuild Service Messages
28 | --------------------------------------------------------------------------------
30 | ||| Requests sent to the rebuild service.
31 | |||
32 | ||| Trigger:
33 | ||| - Indicates new buffered work exists.
34 | ||| - Marks rebuild work as pending.
35 | ||| - Does not initiate rebuild.
36 | |||
43 | --------------------------------------------------------------------------------
44 | -- Snapshot-Rebuilding Service
45 | --------------------------------------------------------------------------------
58 | where
63 | --------------------------------------------------------------------------------
64 | -- Generation
65 | --------------------------------------------------------------------------------
67 | ||| Snapshot generation identifier.
68 | |||
69 | ||| Represents the logical version of the currently published
70 | ||| immutable snapshot.
71 | |||
72 | ||| Properties:
73 | ||| - Monotonically increasing.
74 | ||| - Incremented only after successful publication.
75 | ||| - Readers may compare generations to detect snapshot changes.
76 | |||
77 | ||| Notes:
78 | ||| - Does not encode time.
79 | ||| - Exists purely for ordering and visibility.
80 | |||
85 | --------------------------------------------------------------------------------
86 | -- ThreadId
87 | --------------------------------------------------------------------------------
89 | ||| A wrapper over Int for thread ids.
90 | |||
95 | --------------------------------------------------------------------------------
96 | -- Buffered Operations
97 | --------------------------------------------------------------------------------
99 | ||| Operation represents a deferred vector mutation.
100 | |||
101 | ||| Rather than mutating the underlying RRBVector immediately, all user
102 | ||| modifications are first converted into Operations and appended into
103 | ||| thread-indexed shared state.
104 | |||
105 | ||| Variants:
106 | ||| - Prepend a
107 | ||| Insert a value at the logical beginning.
108 | ||| - Append a
109 | ||| Insert a value at the logical end.
110 | ||| - Insert Nat a
111 | ||| Insert a value at a specified index.
112 | ||| - Delete Nat
113 | ||| Remove a value at a specified index.
114 | ||| - Update Nat a
115 | ||| Replace a value at a specified index.
116 | |||
117 | ||| Role in LSM design:
118 | ||| - Forms the deferred mutation layer.
119 | ||| - Enables batching.
120 | ||| - Prevents writers from mutating snapshots.
121 | |||
122 | ||| Notes:
123 | ||| Indices are interpreted relative to the visible snapshot together with preceding replayed operations.
124 | |||
135 | --------------------------------------------------------------------------------
136 | -- Write Buffer Entries
137 | --------------------------------------------------------------------------------
139 | ||| Entry represents a deferred mutation event.
140 | |||
141 | ||| Every user write becomes an Entry before entering a thread buffer.
142 | |||
143 | ||| Fields:
144 | ||| - operation <-> Deferred mutation
145 | ||| - timestamp <-> Wall-clock ordering hint
146 | ||| - threadid <-> Originating thread
147 | ||| - sequence <-> Monotonic per-thread counter
148 | |||
149 | ||| Ordering:
150 | ||| - Entries are sorted using (timestamp, threadid, sequence).
151 | ||| - This guarantees deterministic replay even when timestamps collide.
152 | |||
153 | ||| Role in LSM design:
154 | ||| - Unit of deferred work.
155 | ||| - Supports deterministic rebuild.
156 | |||
190 | LT
192 | GT
196 | LT
198 | GT
202 | --------------------------------------------------------------------------------
203 | -- Single Buffer
204 | --------------------------------------------------------------------------------
206 | ||| Single append-efficient mutation log.
207 | |||
208 | ||| Represents one physical mutation buffer.
209 | |||
218 | --------------------------------------------------------------------------------
219 | -- Double Buffered Mutation State
220 | --------------------------------------------------------------------------------
222 | ||| Thread-local mutation state.
223 | |||
224 | ||| Writers append only to active.
225 | |||
226 | ||| During rebuild:
227 | ||| - Active buffers are extracted atomically.
228 | ||| - Ownership transfers to the rebuilder.
229 | ||| - Writers immediately continue on a fresh buffer.
230 | |||
231 | ||| Properties:
232 | ||| - O(1) amortized append.
233 | ||| - No stop-the-world pauses.
234 | ||| - No shared rebuild ownership.
235 | |||
243 | --------------------------------------------------------------------------------
244 | -- Thread Context
245 | --------------------------------------------------------------------------------
247 | ||| ThreadContext stores per-thread mutation state.
248 | |||
249 | ||| Ownership:
250 | ||| - One ThreadContext exists per registered thread.
251 | |||
252 | ||| Fields:
253 | ||| - threadid <-> Unique thread identifier
254 | ||| - sequence <-> Monotonically increasing local counter used for deterministic ordering
255 | ||| - buffers <-> Thread-owned double-buffer state
256 | |||
257 | ||| Properties:
258 | ||| - Per-thread logical ownership.
259 | ||| - Shared registry storage.
260 | ||| - Low-contention write path.
261 | |||
271 | --------------------------------------------------------------------------------
272 | -- Background Rebuild State
273 | --------------------------------------------------------------------------------
275 | ||| RebuildState represents rebuild thread progress.
276 | |||
277 | ||| Lifecycle:
278 | |||
279 | ||| Sleeping
280 | ||| ↓
281 | ||| RotatingBuffers
282 | ||| ↓
283 | ||| CollectingEntries
284 | ||| ↓
285 | ||| SortingEntries
286 | ||| ↓
287 | ||| PublishingSnapshot
288 | ||| ↓
289 | ||| Sleeping
290 | |||
301 | --------------------------------------------------------------------------------
302 | -- Rebuild Metrics
303 | --------------------------------------------------------------------------------
305 | ||| Runtime rebuild statistics.
306 | |||
307 | ||| Metrics are updated after successful rebuild cycles and provide lightweight visibility into rebuild behavior.
308 | |||
309 | ||| Fields:
310 | ||| - lastbatchsize <-> Number of entries processed during most recent rebuild.
311 | ||| - totalbatchsize <-> Total number of entries processed across all rebuilds.
312 | ||| - rebuildcount <-> Number of successful rebuild cycles.
313 | |||
314 | ||| Derived values:
315 | ||| - average batch size: totalbatchsize / rebuildcount
316 | |||
317 | ||| Properties:
318 | ||| - Updated only by the rebuild worker.
319 | ||| - Does not affect correctness.
320 | ||| - Intended for observability and adaptive tuning.
321 | |||
331 | --------------------------------------------------------------------------------
332 | -- Rebuild Service State
333 | --------------------------------------------------------------------------------
335 | ||| Controls execution of rebuild-cycle progress.
336 | |||
337 | ||| Fields:
338 | ||| - rebuildphase: Current phase of the rebuild pipeline.
339 | ||| - rebuildmetrics: Runtime rebuild statistics.
340 | |||
341 | ||| Properties:
342 | ||| - Local service execution state only.
343 | ||| - Mutable only by the rebuild worker.
344 | ||| - Represents execution progress rather than global vector state.
345 | |||
346 | ||| Notes:
347 | ||| - Pending rebuild work is tracked globally through CombinedSnapshotState.rebuildpending.
348 | ||| - Failure state is intentionally omitted.
349 | ||| - Current rebuild operations are total and crash on unrecoverable runtime failures rather than persisting structured errors.
350 | |||
359 | --------------------------------------------------------------------------------
360 | -- Reader State
361 | --------------------------------------------------------------------------------
363 | ||| Reader participation in generation tracking.
364 | |||
365 | ||| Readers announce the snapshot generation currently being observed.
366 | |||
367 | ||| Properties:
368 | ||| - Updated only when entering/leaving a snapshot read section.
369 | ||| - Used by reclamation to determine oldest active generation.
370 | ||| - One entry per participating thread.
371 | |||
377 | --------------------------------------------------------------------------------
378 | -- Snapshot State
379 | --------------------------------------------------------------------------------
381 | ||| Immutable published snapshot state.
382 | |||
383 | ||| Represents the currently visible version of the vector.
384 | |||
385 | ||| Fields:
386 | ||| - generation <-> Monotonic snapshot version identifier
387 | ||| - tree <-> Immutable published RRB snapshot
388 | |||
389 | ||| Publication properties:
390 | ||| - Tree and generation are published atomically.
391 | ||| - Readers always observe a consistent snapshot pair.
392 | ||| - Eliminates visibility races between tree updates and generation updates.
393 | |||
394 | ||| Lifecycle:
395 | |||
396 | ||| rebuild
397 | ||| ↓
398 | ||| new snapshot tree
399 | ||| ↓
400 | ||| increment generation
401 | ||| ↓
402 | ||| atomic publication
403 | ||| ↓
404 | ||| visible to readers
405 | |||
406 | ||| Notes:
407 | ||| - SnapshotState is immutable once constructed.
408 | ||| - Publication occurs by replacing the whole record.
409 | |||
416 | --------------------------------------------------------------------------------
417 | -- Retired Snapshot
418 | --------------------------------------------------------------------------------
420 | ||| Snapshot awaiting reclamation.
421 | |||
422 | ||| A snapshot becomes retired after publication of a newer snapshot.
423 | |||
424 | ||| Properties:
425 | ||| - Immutable after retirement.
426 | ||| - Safe to reclaim once no reader references its generation.
427 | |||
434 | --------------------------------------------------------------------------------
435 | -- Combined Snapshot State
436 | --------------------------------------------------------------------------------
438 | ||| Combined snapshot publication, reclamation, and batching state.
439 | |||
440 | ||| Fields:
441 | ||| - currentsnapshot: Current published immutable snapshot.
442 | ||| - retiredsnapshots: Older snapshots awaiting reclamation.
443 | ||| - readerstate: Active reader generation announcements.
444 | ||| - writepressure: Number of writes accumulated since the last rebuild cycle.
445 | ||| - rebuildpending: Indicates whether buffered work exists requiring rebuild.
446 | ||| - batchwindow: Current adaptive rebuild target controlling how many writes are accumulated before rebuild behavior expands or contracts.
447 | |||
448 | ||| Properties:
449 | ||| - Updated atomically through CAS.
450 | ||| - Shared between readers, writers, and rebuilder.
451 | ||| - Adaptive batching decisions observe a globally consistent state.
452 | |||
463 | --------------------------------------------------------------------------------
464 | -- Log Structured Merge RRB Vector
465 | --------------------------------------------------------------------------------
467 | ||| A concurrent log-structured sequence built on immutable RRBVector snapshots.
468 | |||
469 | ||| Architecture:
470 | ||| - Writers append deferred mutations into thread-local buffers.
471 | ||| - Readers access immutable published snapshots.
472 | ||| - A background rebuilder periodically merges buffered mutations into a new RRBVector snapshot.
473 | |||
474 | ||| Concurrency model:
475 | ||| - Readers never block writers.
476 | ||| - Writers never mutate published snapshots.
477 | ||| - Snapshot publication occurs atomically.
478 | ||| - Retired snapshots are reclaimed using generation tracking.
479 | |||
480 | ||| Internally combines:
481 | ||| - Thread-local mutation logs.
482 | ||| - Immutable RRBVector snapshots.
483 | ||| - Background rebuild publication.
484 | ||| - Generation-based reclamation.
485 | ||| - Adaptive rebuild batching.
486 | |||
487 | ||| Visibility:
488 | ||| - Writes become visible only after rebuild and publication.
489 | ||| - Readers observe a consistent immutable snapshot.
490 | |||
491 | ||| Notes:
492 | ||| - Optimized for high write concurrency.
493 | ||| - Read performance matches the underlying RRBVector snapshot.
494 | |||
502 | --------------------------------------------------------------------------------
503 | -- Configuration
504 | --------------------------------------------------------------------------------
506 | ||| Configuration controlling rebuild and adaptive batching behavior.
507 | |||
508 | ||| Fields:
509 | ||| - initialbatchwindow: Initial adaptive batching target used before runtime adjustments occur.
510 | |||
511 | ||| Properties:
512 | ||| - Does not affect correctness.
513 | ||| - Influences rebuild latency/throughput tradeoffs.
514 | ||| - Serves as the starting point for adaptive window adjustment.
515 | |||
516 | ||| Typical values:
517 | ||| - 16–64: Lower latency, more frequent rebuilds.
518 | ||| - 64–256: Balanced throughput and latency.
519 | ||| - 256+: Higher throughput under sustained write load.
520 | |||