LCOV - code coverage report
Current view: top level - pebble/internal/base - iterator.go (source / functions) Hit Total Coverage
Test: 2024-04-01 08:16Z 1c7bcd1c - tests only.lcov Lines: 40 40 100.0 %
Date: 2024-04-01 08:17:26 Functions: 0 0 -

          Line data    Source code
       1             : // Copyright 2019 The LevelDB-Go and Pebble Authors. All rights reserved. Use
       2             : // of this source code is governed by a BSD-style license that can be found in
       3             : // the LICENSE file.
       4             : 
       5             : package base
       6             : 
       7             : import (
       8             :         "context"
       9             :         "fmt"
      10             :         "time"
      11             : )
      12             : 
      13             : // InternalIterator iterates over a DB's key/value pairs in key order. Unlike
      14             : // the Iterator interface, the returned keys are InternalKeys composed of the
      15             : // user-key, a sequence number and a key kind. In forward iteration, key/value
      16             : // pairs for identical user-keys are returned in descending sequence order. In
      17             : // reverse iteration, key/value pairs for identical user-keys are returned in
      18             : // ascending sequence order.
      19             : //
      20             : // InternalIterators provide 5 absolute positioning methods and 2 relative
      21             : // positioning methods. The absolute positioning methods are:
      22             : //
      23             : // - SeekGE
      24             : // - SeekPrefixGE
      25             : // - SeekLT
      26             : // - First
      27             : // - Last
      28             : //
      29             : // The relative positioning methods are:
      30             : //
      31             : // - Next
      32             : // - Prev
      33             : //
      34             : // The relative positioning methods can be used in conjunction with any of the
      35             : // absolute positioning methods with one exception: SeekPrefixGE does not
      36             : // support reverse iteration via Prev. It is undefined to call relative
      37             : // positioning methods without ever calling an absolute positioning method.
      38             : //
      39             : // InternalIterators can optionally implement a prefix iteration mode. This
      40             : // mode is entered by calling SeekPrefixGE and exited by any other absolute
      41             : // positioning method (SeekGE, SeekLT, First, Last). When in prefix iteration
      42             : // mode, a call to Next will advance to the next key which has the same
      43             : // "prefix" as the one supplied to SeekPrefixGE. Note that "prefix" in this
      44             : // context is not a strict byte prefix, but defined by byte equality for the
      45             : // result of the Comparer.Split method. An InternalIterator is not required to
      46             : // support prefix iteration mode, and can implement SeekPrefixGE by forwarding
      47             : // to SeekGE. When the iteration prefix is exhausted, it is not valid to call
      48             : // Next on an internal iterator that's already returned (nil,nilv) or a key
      49             : // beyond the prefix.
      50             : //
      51             : // Bounds, [lower, upper), can be set on iterators, either using the SetBounds()
      52             : // function in the interface, or in implementation specific ways during iterator
      53             : // creation. The forward positioning routines (SeekGE, First, and Next) only
      54             : // check the upper bound. The reverse positioning routines (SeekLT, Last, and
      55             : // Prev) only check the lower bound. It is up to the caller to ensure that the
      56             : // forward positioning routines respect the lower bound and the reverse
      57             : // positioning routines respect the upper bound (i.e. calling SeekGE instead of
      58             : // First if there is a lower bound, and SeekLT instead of Last if there is an
      59             : // upper bound). This imposition is done in order to elevate that enforcement to
      60             : // the caller (generally pebble.Iterator or pebble.mergingIter) rather than
      61             : // having it duplicated in every InternalIterator implementation.
      62             : //
      63             : // Additionally, the caller needs to ensure that SeekGE/SeekPrefixGE are not
      64             : // called with a key > the upper bound, and SeekLT is not called with a key <
      65             : // the lower bound. InternalIterator implementations are required to respect
      66             : // the iterator bounds, never returning records outside of the bounds with one
      67             : // exception: an iterator may generate synthetic RANGEDEL marker records. See
      68             : // levelIter.syntheticBoundary for the sole existing example of this behavior.
      69             : // Specifically, levelIter can return synthetic keys whose user key is equal to
      70             : // the lower/upper bound.
      71             : //
      72             : // The bounds provided to an internal iterator must remain valid until a
      73             : // subsequent call to SetBounds has returned. This requirement exists so that
      74             : // iterator implementations may compare old and new bounds to apply low-level
      75             : // optimizations. The pebble.Iterator satisfies this requirement by maintaining
      76             : // two bound buffers and switching between them.
      77             : //
      78             : // An iterator must be closed after use, but it is not necessary to read an
      79             : // iterator until exhaustion.
      80             : //
      81             : // An iterator is not goroutine-safe, but it is safe to use multiple iterators
      82             : // concurrently, either in separate goroutines or switching between the
      83             : // iterators in a single goroutine.
      84             : //
      85             : // It is also safe to use an iterator concurrently with modifying its
      86             : // underlying DB, if that DB permits modification. However, the resultant
      87             : // key/value pairs are not guaranteed to be a consistent snapshot of that DB
      88             : // at a particular point in time.
      89             : //
      90             : // InternalIterators accumulate errors encountered during operation, exposing
      91             : // them through the Error method. All of the absolute positioning methods
      92             : // reset any accumulated error before positioning. Relative positioning
      93             : // methods return without advancing if the iterator has accumulated an error.
      94             : //
      95             : // nilv == shorthand for LazyValue{}, which represents a nil value.
      96             : type InternalIterator interface {
      97             :         // SeekGE moves the iterator to the first key/value pair whose key is greater
      98             :         // than or equal to the given key. Returns the key and value if the iterator
      99             :         // is pointing at a valid entry, and (nil, nilv) otherwise. Note that SeekGE
     100             :         // only checks the upper bound. It is up to the caller to ensure that key
     101             :         // is greater than or equal to the lower bound.
     102             :         SeekGE(key []byte, flags SeekGEFlags) (*InternalKey, LazyValue)
     103             : 
     104             :         // SeekPrefixGE moves the iterator to the first key/value pair whose key is
     105             :         // greater than or equal to the given key. Returns the key and value if the
     106             :         // iterator is pointing at a valid entry, and (nil, nilv) otherwise. Note that
     107             :         // SeekPrefixGE only checks the upper bound. It is up to the caller to ensure
     108             :         // that key is greater than or equal to the lower bound.
     109             :         //
     110             :         // The prefix argument is used by some InternalIterator implementations (e.g.
     111             :         // sstable.Reader) to avoid expensive operations. This operation is only
     112             :         // useful when a user-defined Split function is supplied to the Comparer for
     113             :         // the DB. The supplied prefix will be the prefix of the given key returned by
     114             :         // that Split function. If the iterator is able to determine that no key with
     115             :         // the prefix exists, it can return (nil,nilv). Unlike SeekGE, this is not an
     116             :         // indication that iteration is exhausted.
     117             :         //
     118             :         // Note that the iterator may return keys not matching the prefix. It is up
     119             :         // to the caller to check if the prefix matches.
     120             :         //
     121             :         // Calling SeekPrefixGE places the receiver into prefix iteration mode. Once
     122             :         // in this mode, reverse iteration may not be supported and will return an
     123             :         // error. Note that pebble/Iterator.SeekPrefixGE has this same restriction on
     124             :         // not supporting reverse iteration in prefix iteration mode until a
     125             :         // different positioning routine (SeekGE, SeekLT, First or Last) switches the
     126             :         // iterator out of prefix iteration.
     127             :         SeekPrefixGE(prefix, key []byte, flags SeekGEFlags) (*InternalKey, LazyValue)
     128             : 
     129             :         // SeekLT moves the iterator to the last key/value pair whose key is less
     130             :         // than the given key. Returns the key and value if the iterator is pointing
     131             :         // at a valid entry, and (nil, nilv) otherwise. Note that SeekLT only checks
     132             :         // the lower bound. It is up to the caller to ensure that key is less than
     133             :         // the upper bound.
     134             :         SeekLT(key []byte, flags SeekLTFlags) (*InternalKey, LazyValue)
     135             : 
     136             :         // First moves the iterator the the first key/value pair. Returns the key and
     137             :         // value if the iterator is pointing at a valid entry, and (nil, nilv)
     138             :         // otherwise. Note that First only checks the upper bound. It is up to the
     139             :         // caller to ensure that First() is not called when there is a lower bound,
     140             :         // and instead call SeekGE(lower).
     141             :         First() (*InternalKey, LazyValue)
     142             : 
     143             :         // Last moves the iterator the the last key/value pair. Returns the key and
     144             :         // value if the iterator is pointing at a valid entry, and (nil, nilv)
     145             :         // otherwise. Note that Last only checks the lower bound. It is up to the
     146             :         // caller to ensure that Last() is not called when there is an upper bound,
     147             :         // and instead call SeekLT(upper).
     148             :         Last() (*InternalKey, LazyValue)
     149             : 
     150             :         // Next moves the iterator to the next key/value pair. Returns the key and
     151             :         // value if the iterator is pointing at a valid entry, and (nil, nilv)
     152             :         // otherwise. Note that Next only checks the upper bound. It is up to the
     153             :         // caller to ensure that key is greater than or equal to the lower bound.
     154             :         //
     155             :         // It is valid to call Next when the iterator is positioned before the first
     156             :         // key/value pair due to either a prior call to SeekLT or Prev which returned
     157             :         // (nil, nilv). It is not allowed to call Next when the previous call to SeekGE,
     158             :         // SeekPrefixGE or Next returned (nil, nilv).
     159             :         Next() (*InternalKey, LazyValue)
     160             : 
     161             :         // NextPrefix moves the iterator to the next key/value pair with a different
     162             :         // prefix than the key at the current iterator position. Returns the key and
     163             :         // value if the iterator is pointing at a valid entry, and (nil, nil)
     164             :         // otherwise. Note that NextPrefix only checks the upper bound. It is up to
     165             :         // the caller to ensure that key is greater than or equal to the lower
     166             :         // bound.
     167             :         //
     168             :         // NextPrefix is passed the immediate successor to the current prefix key. A
     169             :         // valid implementation of NextPrefix is to call SeekGE with succKey.
     170             :         //
     171             :         // It is not allowed to call NextPrefix when the previous call was a reverse
     172             :         // positioning operation or a call to a forward positioning method that
     173             :         // returned (nil, nilv). It is also not allowed to call NextPrefix when the
     174             :         // iterator is in prefix iteration mode.
     175             :         NextPrefix(succKey []byte) (*InternalKey, LazyValue)
     176             : 
     177             :         // Prev moves the iterator to the previous key/value pair. Returns the key
     178             :         // and value if the iterator is pointing at a valid entry, and (nil, nilv)
     179             :         // otherwise. Note that Prev only checks the lower bound. It is up to the
     180             :         // caller to ensure that key is less than the upper bound.
     181             :         //
     182             :         // It is valid to call Prev when the iterator is positioned after the last
     183             :         // key/value pair due to either a prior call to SeekGE or Next which returned
     184             :         // (nil, nilv). It is not allowed to call Prev when the previous call to SeekLT
     185             :         // or Prev returned (nil, nilv).
     186             :         Prev() (*InternalKey, LazyValue)
     187             : 
     188             :         // Error returns any accumulated error. It may not include errors returned
     189             :         // to the client when calling LazyValue.Value().
     190             :         Error() error
     191             : 
     192             :         // Close closes the iterator and returns any accumulated error. Exhausting
     193             :         // all the key/value pairs in a table is not considered to be an error.
     194             :         //
     195             :         // Once Close is called, the iterator should not be used again. Specific
     196             :         // implementations may support multiple calls to Close (but no other calls
     197             :         // after the first Close).
     198             :         Close() error
     199             : 
     200             :         // SetBounds sets the lower and upper bounds for the iterator. Note that the
     201             :         // result of Next and Prev will be undefined until the iterator has been
     202             :         // repositioned with SeekGE, SeekPrefixGE, SeekLT, First, or Last.
     203             :         //
     204             :         // The bounds provided must remain valid until a subsequent call to
     205             :         // SetBounds has returned. This requirement exists so that iterator
     206             :         // implementations may compare old and new bounds to apply low-level
     207             :         // optimizations.
     208             :         SetBounds(lower, upper []byte)
     209             : 
     210             :         // SetContext replaces the context provided at iterator creation, or the
     211             :         // last one provided by SetContext.
     212             :         SetContext(ctx context.Context)
     213             : 
     214             :         fmt.Stringer
     215             : }
     216             : 
     217             : // TopLevelIterator extends InternalIterator to include an additional absolute
     218             : // positioning method, SeekPrefixGEStrict.
     219             : type TopLevelIterator interface {
     220             :         InternalIterator
     221             : 
     222             :         // SeekPrefixGEStrict extends InternalIterator.SeekPrefixGE with a guarantee
     223             :         // that the iterator only returns keys matching the prefix.
     224             :         SeekPrefixGEStrict(prefix, key []byte, flags SeekGEFlags) (*InternalKey, LazyValue)
     225             : }
     226             : 
     227             : // SeekGEFlags holds flags that may configure the behavior of a forward seek.
     228             : // Not all flags are relevant to all iterators.
     229             : type SeekGEFlags uint8
     230             : 
     231             : const (
     232             :         seekGEFlagTrySeekUsingNext uint8 = iota
     233             :         seekGEFlagRelativeSeek
     234             :         seekGEFlagBatchJustRefreshed
     235             : )
     236             : 
     237             : // SeekGEFlagsNone is the default value of SeekGEFlags, with all flags disabled.
     238             : const SeekGEFlagsNone = SeekGEFlags(0)
     239             : 
     240             : // TrySeekUsingNext indicates whether a performance optimization was enabled
     241             : // by a caller, indicating the caller has not done any action to move this
     242             : // iterator beyond the first key that would be found if this iterator were to
     243             : // honestly do the intended seek. For example, say the caller did a
     244             : // SeekGE(k1...), followed by SeekGE(k2...) where k1 <= k2, without any
     245             : // intermediate positioning calls. The caller can safely specify true for this
     246             : // parameter in the second call. As another example, say the caller did do one
     247             : // call to Next between the two Seek calls, and k1 < k2. Again, the caller can
     248             : // safely specify a true value for this parameter. Note that a false value is
     249             : // always safe. The callee is free to ignore the true value if its
     250             : // implementation does not permit this optimization.
     251             : //
     252             : // We make the caller do this determination since a string comparison of k1, k2
     253             : // is not necessarily cheap, and there may be many iterators in the iterator
     254             : // stack. Doing it once at the root of the iterator stack is cheaper.
     255             : //
     256             : // This optimization could also be applied to SeekLT (where it would be
     257             : // trySeekUsingPrev). We currently only do it for SeekPrefixGE and SeekGE
     258             : // because this is where this optimization helps the performance of CockroachDB.
     259             : // The SeekLT cases in CockroachDB are typically accompanied with bounds that
     260             : // change between seek calls, and is optimized inside certain iterator
     261             : // implementations, like singleLevelIterator, without any extra parameter
     262             : // passing (though the same amortization of string comparisons could be done to
     263             : // improve that optimization, by making the root of the iterator stack do it).
     264           1 : func (s SeekGEFlags) TrySeekUsingNext() bool { return (s & (1 << seekGEFlagTrySeekUsingNext)) != 0 }
     265             : 
     266             : // RelativeSeek is set when in the course of a forward positioning operation, a
     267             : // higher-level iterator seeks a lower-level iterator to a larger key than the
     268             : // one at the current iterator position.
     269             : //
     270             : // Concretely, this occurs when the merging iterator observes a range deletion
     271             : // covering the key at a level's current position, and the merging iterator
     272             : // seeks the level to the range deletion's end key. During lazy-combined
     273             : // iteration, this flag signals to the level iterator that the seek is NOT an
     274             : // absolute-positioning operation from the perspective of the pebble.Iterator,
     275             : // and the level iterator must look for range keys in tables between the current
     276             : // iterator position and the new seeked position.
     277           1 : func (s SeekGEFlags) RelativeSeek() bool { return (s & (1 << seekGEFlagRelativeSeek)) != 0 }
     278             : 
     279             : // BatchJustRefreshed is set by Seek[Prefix]GE when an iterator's view of an
     280             : // indexed batch was just refreshed. It serves as a signal to the batch iterator
     281             : // to ignore the TrySeekUsingNext optimization, because the external knowledge
     282             : // imparted by the TrySeekUsingNext flag does not apply to the batch iterator's
     283             : // position. See (pebble.Iterator).batchJustRefreshed.
     284           1 : func (s SeekGEFlags) BatchJustRefreshed() bool { return (s & (1 << seekGEFlagBatchJustRefreshed)) != 0 }
     285             : 
     286             : // EnableTrySeekUsingNext returns the provided flags with the
     287             : // try-seek-using-next optimization enabled. See TrySeekUsingNext for an
     288             : // explanation of this optimization.
     289           1 : func (s SeekGEFlags) EnableTrySeekUsingNext() SeekGEFlags {
     290           1 :         return s | (1 << seekGEFlagTrySeekUsingNext)
     291           1 : }
     292             : 
     293             : // DisableTrySeekUsingNext returns the provided flags with the
     294             : // try-seek-using-next optimization disabled.
     295           1 : func (s SeekGEFlags) DisableTrySeekUsingNext() SeekGEFlags {
     296           1 :         return s &^ (1 << seekGEFlagTrySeekUsingNext)
     297           1 : }
     298             : 
     299             : // EnableRelativeSeek returns the provided flags with the relative-seek flag
     300             : // enabled. See RelativeSeek for an explanation of this flag's use.
     301           1 : func (s SeekGEFlags) EnableRelativeSeek() SeekGEFlags {
     302           1 :         return s | (1 << seekGEFlagRelativeSeek)
     303           1 : }
     304             : 
     305             : // DisableRelativeSeek returns the provided flags with the relative-seek flag
     306             : // disabled.
     307           1 : func (s SeekGEFlags) DisableRelativeSeek() SeekGEFlags {
     308           1 :         return s &^ (1 << seekGEFlagRelativeSeek)
     309           1 : }
     310             : 
     311             : // EnableBatchJustRefreshed returns the provided flags with the
     312             : // batch-just-refreshed bit set. See BatchJustRefreshed for an explanation of
     313             : // this flag.
     314           1 : func (s SeekGEFlags) EnableBatchJustRefreshed() SeekGEFlags {
     315           1 :         return s | (1 << seekGEFlagBatchJustRefreshed)
     316           1 : }
     317             : 
     318             : // DisableBatchJustRefreshed returns the provided flags with the
     319             : // batch-just-refreshed bit unset.
     320           1 : func (s SeekGEFlags) DisableBatchJustRefreshed() SeekGEFlags {
     321           1 :         return s &^ (1 << seekGEFlagBatchJustRefreshed)
     322           1 : }
     323             : 
     324             : // SeekLTFlags holds flags that may configure the behavior of a reverse seek.
     325             : // Not all flags are relevant to all iterators.
     326             : type SeekLTFlags uint8
     327             : 
     328             : const (
     329             :         seekLTFlagRelativeSeek uint8 = iota
     330             : )
     331             : 
     332             : // SeekLTFlagsNone is the default value of SeekLTFlags, with all flags disabled.
     333             : const SeekLTFlagsNone = SeekLTFlags(0)
     334             : 
     335             : // RelativeSeek is set when in the course of a reverse positioning operation, a
     336             : // higher-level iterator seeks a lower-level iterator to a smaller key than the
     337             : // one at the current iterator position.
     338             : //
     339             : // Concretely, this occurs when the merging iterator observes a range deletion
     340             : // covering the key at a level's current position, and the merging iterator
     341             : // seeks the level to the range deletion's start key. During lazy-combined
     342             : // iteration, this flag signals to the level iterator that the seek is NOT an
     343             : // absolute-positioning operation from the perspective of the pebble.Iterator,
     344             : // and the level iterator must look for range keys in tables between the current
     345             : // iterator position and the new seeked position.
     346           1 : func (s SeekLTFlags) RelativeSeek() bool { return s&(1<<seekLTFlagRelativeSeek) != 0 }
     347             : 
     348             : // EnableRelativeSeek returns the provided flags with the relative-seek flag
     349             : // enabled. See RelativeSeek for an explanation of this flag's use.
     350           1 : func (s SeekLTFlags) EnableRelativeSeek() SeekLTFlags {
     351           1 :         return s | (1 << seekLTFlagRelativeSeek)
     352           1 : }
     353             : 
     354             : // DisableRelativeSeek returns the provided flags with the relative-seek flag
     355             : // disabled.
     356           1 : func (s SeekLTFlags) DisableRelativeSeek() SeekLTFlags {
     357           1 :         return s &^ (1 << seekLTFlagRelativeSeek)
     358           1 : }
     359             : 
     360             : // InternalIteratorStats contains miscellaneous stats produced by
     361             : // InternalIterators that are part of the InternalIterator tree. Not every
     362             : // field is relevant for an InternalIterator implementation. The field values
     363             : // are aggregated as one goes up the InternalIterator tree.
     364             : type InternalIteratorStats struct {
     365             :         // Bytes in the loaded blocks. If the block was compressed, this is the
     366             :         // compressed bytes. Currently, only the index blocks, data blocks
     367             :         // containing points, and filter blocks are included.
     368             :         BlockBytes uint64
     369             :         // Subset of BlockBytes that were in the block cache.
     370             :         BlockBytesInCache uint64
     371             :         // BlockReadDuration accumulates the duration spent fetching blocks
     372             :         // due to block cache misses.
     373             :         // TODO(sumeer): this currently excludes the time spent in Reader creation,
     374             :         // and in reading the rangedel and rangekey blocks. Fix that.
     375             :         BlockReadDuration time.Duration
     376             :         // The following can repeatedly count the same points if they are iterated
     377             :         // over multiple times. Additionally, they may count a point twice when
     378             :         // switching directions. The latter could be improved if needed.
     379             : 
     380             :         // Bytes in keys that were iterated over. Currently, only point keys are
     381             :         // included.
     382             :         KeyBytes uint64
     383             :         // Bytes in values that were iterated over. Currently, only point values are
     384             :         // included. For separated values, this is the size of the handle.
     385             :         ValueBytes uint64
     386             :         // The count of points iterated over.
     387             :         PointCount uint64
     388             :         // Points that were iterated over that were covered by range tombstones. It
     389             :         // can be useful for discovering instances of
     390             :         // https://github.com/cockroachdb/pebble/issues/1070.
     391             :         PointsCoveredByRangeTombstones uint64
     392             : 
     393             :         // Stats related to points in value blocks encountered during iteration.
     394             :         // These are useful to understand outliers, since typical user facing
     395             :         // iteration should tend to only look at the latest point, and hence have
     396             :         // the following stats close to 0.
     397             :         SeparatedPointValue struct {
     398             :                 // Count is a count of points that were in value blocks. This is not a
     399             :                 // subset of PointCount: PointCount is produced by mergingIter and if
     400             :                 // positioned once, and successful in returning a point, will have a
     401             :                 // PointCount of 1, regardless of how many sstables (and memtables etc.)
     402             :                 // in the heap got positioned. The count here includes every sstable
     403             :                 // iterator that got positioned in the heap.
     404             :                 Count uint64
     405             :                 // ValueBytes represent the total byte length of the values (in value
     406             :                 // blocks) of the points corresponding to Count.
     407             :                 ValueBytes uint64
     408             :                 // ValueBytesFetched is the total byte length of the values (in value
     409             :                 // blocks) that were retrieved.
     410             :                 ValueBytesFetched uint64
     411             :         }
     412             : }
     413             : 
     414             : // Merge merges the stats in from into the given stats.
     415           1 : func (s *InternalIteratorStats) Merge(from InternalIteratorStats) {
     416           1 :         s.BlockBytes += from.BlockBytes
     417           1 :         s.BlockBytesInCache += from.BlockBytesInCache
     418           1 :         s.BlockReadDuration += from.BlockReadDuration
     419           1 :         s.KeyBytes += from.KeyBytes
     420           1 :         s.ValueBytes += from.ValueBytes
     421           1 :         s.PointCount += from.PointCount
     422           1 :         s.PointsCoveredByRangeTombstones += from.PointsCoveredByRangeTombstones
     423           1 :         s.SeparatedPointValue.Count += from.SeparatedPointValue.Count
     424           1 :         s.SeparatedPointValue.ValueBytes += from.SeparatedPointValue.ValueBytes
     425           1 :         s.SeparatedPointValue.ValueBytesFetched += from.SeparatedPointValue.ValueBytesFetched
     426           1 : }

Generated by: LCOV version 1.14