LCOV - code coverage report
Current view: top level - pebble/internal/base - iterator.go (source / functions) Hit Total Coverage
Test: 2023-09-19 08:16Z 57e248f9 - meta test only.lcov Lines: 16 40 40.0 %
Date: 2023-09-19 08:17:47 Functions: 0 0 -

          Line data    Source code
       1             : // Copyright 2019 The LevelDB-Go and Pebble Authors. All rights reserved. Use
       2             : // of this source code is governed by a BSD-style license that can be found in
       3             : // the LICENSE file.
       4             : 
       5             : package base
       6             : 
       7             : import (
       8             :         "fmt"
       9             :         "time"
      10             : )
      11             : 
      12             : // InternalIterator iterates over a DB's key/value pairs in key order. Unlike
      13             : // the Iterator interface, the returned keys are InternalKeys composed of the
      14             : // user-key, a sequence number and a key kind. In forward iteration, key/value
      15             : // pairs for identical user-keys are returned in descending sequence order. In
      16             : // reverse iteration, key/value pairs for identical user-keys are returned in
      17             : // ascending sequence order.
      18             : //
      19             : // InternalIterators provide 5 absolute positioning methods and 2 relative
      20             : // positioning methods. The absolute positioning methods are:
      21             : //
      22             : // - SeekGE
      23             : // - SeekPrefixGE
      24             : // - SeekLT
      25             : // - First
      26             : // - Last
      27             : //
      28             : // The relative positioning methods are:
      29             : //
      30             : // - Next
      31             : // - Prev
      32             : //
      33             : // The relative positioning methods can be used in conjunction with any of the
      34             : // absolute positioning methods with one exception: SeekPrefixGE does not
      35             : // support reverse iteration via Prev. It is undefined to call relative
      36             : // positioning methods without ever calling an absolute positioning method.
      37             : //
      38             : // InternalIterators can optionally implement a prefix iteration mode. This
      39             : // mode is entered by calling SeekPrefixGE and exited by any other absolute
      40             : // positioning method (SeekGE, SeekLT, First, Last). When in prefix iteration
      41             : // mode, a call to Next will advance to the next key which has the same
      42             : // "prefix" as the one supplied to SeekPrefixGE. Note that "prefix" in this
      43             : // context is not a strict byte prefix, but defined by byte equality for the
      44             : // result of the Comparer.Split method. An InternalIterator is not required to
      45             : // support prefix iteration mode, and can implement SeekPrefixGE by forwarding
      46             : // to SeekGE. When the iteration prefix is exhausted, it is not valid to call
      47             : // Next on an internal iterator that's already returned (nil,nilv) or a key
      48             : // beyond the prefix.
      49             : //
      50             : // Bounds, [lower, upper), can be set on iterators, either using the SetBounds()
      51             : // function in the interface, or in implementation specific ways during iterator
      52             : // creation. The forward positioning routines (SeekGE, First, and Next) only
      53             : // check the upper bound. The reverse positioning routines (SeekLT, Last, and
      54             : // Prev) only check the lower bound. It is up to the caller to ensure that the
      55             : // forward positioning routines respect the lower bound and the reverse
      56             : // positioning routines respect the upper bound (i.e. calling SeekGE instead of
      57             : // First if there is a lower bound, and SeekLT instead of Last if there is an
      58             : // upper bound). This imposition is done in order to elevate that enforcement to
      59             : // the caller (generally pebble.Iterator or pebble.mergingIter) rather than
      60             : // having it duplicated in every InternalIterator implementation.
      61             : //
      62             : // Additionally, the caller needs to ensure that SeekGE/SeekPrefixGE are not
      63             : // called with a key > the upper bound, and SeekLT is not called with a key <
      64             : // the lower bound. InternalIterator implementations are required to respect
      65             : // the iterator bounds, never returning records outside of the bounds with one
      66             : // exception: an iterator may generate synthetic RANGEDEL marker records. See
      67             : // levelIter.syntheticBoundary for the sole existing example of this behavior.
      68             : // Specifically, levelIter can return synthetic keys whose user key is equal to
      69             : // the lower/upper bound.
      70             : //
      71             : // The bounds provided to an internal iterator must remain valid until a
      72             : // subsequent call to SetBounds has returned. This requirement exists so that
      73             : // iterator implementations may compare old and new bounds to apply low-level
      74             : // optimizations. The pebble.Iterator satisfies this requirement by maintaining
      75             : // two bound buffers and switching between them.
      76             : //
      77             : // An iterator must be closed after use, but it is not necessary to read an
      78             : // iterator until exhaustion.
      79             : //
      80             : // An iterator is not goroutine-safe, but it is safe to use multiple iterators
      81             : // concurrently, either in separate goroutines or switching between the
      82             : // iterators in a single goroutine.
      83             : //
      84             : // It is also safe to use an iterator concurrently with modifying its
      85             : // underlying DB, if that DB permits modification. However, the resultant
      86             : // key/value pairs are not guaranteed to be a consistent snapshot of that DB
      87             : // at a particular point in time.
      88             : //
      89             : // InternalIterators accumulate errors encountered during operation, exposing
      90             : // them through the Error method. All of the absolute positioning methods
      91             : // reset any accumulated error before positioning. Relative positioning
      92             : // methods return without advancing if the iterator has accumulated an error.
      93             : //
      94             : // nilv == shorthand for LazyValue{}, which represents a nil value.
      95             : type InternalIterator interface {
      96             :         // SeekGE moves the iterator to the first key/value pair whose key is greater
      97             :         // than or equal to the given key. Returns the key and value if the iterator
      98             :         // is pointing at a valid entry, and (nil, nilv) otherwise. Note that SeekGE
      99             :         // only checks the upper bound. It is up to the caller to ensure that key
     100             :         // is greater than or equal to the lower bound.
     101             :         SeekGE(key []byte, flags SeekGEFlags) (*InternalKey, LazyValue)
     102             : 
     103             :         // SeekPrefixGE moves the iterator to the first key/value pair whose key is
     104             :         // greater than or equal to the given key. Returns the key and value if the
     105             :         // iterator is pointing at a valid entry, and (nil, nilv) otherwise. Note that
     106             :         // SeekPrefixGE only checks the upper bound. It is up to the caller to ensure
     107             :         // that key is greater than or equal to the lower bound.
     108             :         //
     109             :         // The prefix argument is used by some InternalIterator implementations (e.g.
     110             :         // sstable.Reader) to avoid expensive operations. A user-defined Split
     111             :         // function must be supplied to the Comparer for the DB. The supplied prefix
     112             :         // will be the prefix of the given key returned by that Split function. If
     113             :         // the iterator is able to determine that no key with the prefix exists, it
     114             :         // can return (nil,nilv). Unlike SeekGE, this is not an indication that
     115             :         // iteration is exhausted.
     116             :         //
     117             :         // Note that the iterator may return keys not matching the prefix. It is up
     118             :         // to the caller to check if the prefix matches.
     119             :         //
     120             :         // Calling SeekPrefixGE places the receiver into prefix iteration mode. Once
     121             :         // in this mode, reverse iteration may not be supported and will return an
     122             :         // error. Note that pebble/Iterator.SeekPrefixGE has this same restriction on
     123             :         // not supporting reverse iteration in prefix iteration mode until a
     124             :         // different positioning routine (SeekGE, SeekLT, First or Last) switches the
     125             :         // iterator out of prefix iteration.
     126             :         SeekPrefixGE(prefix, key []byte, flags SeekGEFlags) (*InternalKey, LazyValue)
     127             : 
     128             :         // SeekLT moves the iterator to the last key/value pair whose key is less
     129             :         // than the given key. Returns the key and value if the iterator is pointing
     130             :         // at a valid entry, and (nil, nilv) otherwise. Note that SeekLT only checks
     131             :         // the lower bound. It is up to the caller to ensure that key is less than
     132             :         // the upper bound.
     133             :         SeekLT(key []byte, flags SeekLTFlags) (*InternalKey, LazyValue)
     134             : 
     135             :         // First moves the iterator the the first key/value pair. Returns the key and
     136             :         // value if the iterator is pointing at a valid entry, and (nil, nilv)
     137             :         // otherwise. Note that First only checks the upper bound. It is up to the
     138             :         // caller to ensure that First() is not called when there is a lower bound,
     139             :         // and instead call SeekGE(lower).
     140             :         First() (*InternalKey, LazyValue)
     141             : 
     142             :         // Last moves the iterator the the last key/value pair. Returns the key and
     143             :         // value if the iterator is pointing at a valid entry, and (nil, nilv)
     144             :         // otherwise. Note that Last only checks the lower bound. It is up to the
     145             :         // caller to ensure that Last() is not called when there is an upper bound,
     146             :         // and instead call SeekLT(upper).
     147             :         Last() (*InternalKey, LazyValue)
     148             : 
     149             :         // Next moves the iterator to the next key/value pair. Returns the key and
     150             :         // value if the iterator is pointing at a valid entry, and (nil, nilv)
     151             :         // otherwise. Note that Next only checks the upper bound. It is up to the
     152             :         // caller to ensure that key is greater than or equal to the lower bound.
     153             :         //
     154             :         // It is valid to call Next when the iterator is positioned before the first
     155             :         // key/value pair due to either a prior call to SeekLT or Prev which returned
     156             :         // (nil, nilv). It is not allowed to call Next when the previous call to SeekGE,
     157             :         // SeekPrefixGE or Next returned (nil, nilv).
     158             :         Next() (*InternalKey, LazyValue)
     159             : 
     160             :         // NextPrefix moves the iterator to the next key/value pair with a different
     161             :         // prefix than the key at the current iterator position. Returns the key and
     162             :         // value if the iterator is pointing at a valid entry, and (nil, nil)
     163             :         // otherwise. Note that NextPrefix only checks the upper bound. It is up to
     164             :         // the caller to ensure that key is greater than or equal to the lower
     165             :         // bound.
     166             :         //
     167             :         // NextPrefix is passed the immediate successor to the current prefix key. A
     168             :         // valid implementation of NextPrefix is to call SeekGE with succKey.
     169             :         //
     170             :         // It is not allowed to call NextPrefix when the previous call was a reverse
     171             :         // positioning operation or a call to a forward positioning method that
     172             :         // returned (nil, nilv). It is also not allowed to call NextPrefix when the
     173             :         // iterator is in prefix iteration mode.
     174             :         NextPrefix(succKey []byte) (*InternalKey, LazyValue)
     175             : 
     176             :         // Prev moves the iterator to the previous key/value pair. Returns the key
     177             :         // and value if the iterator is pointing at a valid entry, and (nil, nilv)
     178             :         // otherwise. Note that Prev only checks the lower bound. It is up to the
     179             :         // caller to ensure that key is less than the upper bound.
     180             :         //
     181             :         // It is valid to call Prev when the iterator is positioned after the last
     182             :         // key/value pair due to either a prior call to SeekGE or Next which returned
     183             :         // (nil, nilv). It is not allowed to call Prev when the previous call to SeekLT
     184             :         // or Prev returned (nil, nilv).
     185             :         Prev() (*InternalKey, LazyValue)
     186             : 
     187             :         // Error returns any accumulated error. It may not include errors returned
     188             :         // to the client when calling LazyValue.Value().
     189             :         Error() error
     190             : 
     191             :         // Close closes the iterator and returns any accumulated error. Exhausting
     192             :         // all the key/value pairs in a table is not considered to be an error.
     193             :         // It is valid to call Close multiple times. Other methods should not be
     194             :         // called after the iterator has been closed.
     195             :         Close() error
     196             : 
     197             :         // SetBounds sets the lower and upper bounds for the iterator. Note that the
     198             :         // result of Next and Prev will be undefined until the iterator has been
     199             :         // repositioned with SeekGE, SeekPrefixGE, SeekLT, First, or Last.
     200             :         //
     201             :         // The bounds provided must remain valid until a subsequent call to
     202             :         // SetBounds has returned. This requirement exists so that iterator
     203             :         // implementations may compare old and new bounds to apply low-level
     204             :         // optimizations.
     205             :         SetBounds(lower, upper []byte)
     206             : 
     207             :         fmt.Stringer
     208             : }
     209             : 
     210             : // SeekGEFlags holds flags that may configure the behavior of a forward seek.
     211             : // Not all flags are relevant to all iterators.
     212             : type SeekGEFlags uint8
     213             : 
     214             : const (
     215             :         seekGEFlagTrySeekUsingNext uint8 = iota
     216             :         seekGEFlagRelativeSeek
     217             :         seekGEFlagBatchJustRefreshed
     218             : )
     219             : 
     220             : // SeekGEFlagsNone is the default value of SeekGEFlags, with all flags disabled.
     221             : const SeekGEFlagsNone = SeekGEFlags(0)
     222             : 
     223             : // TrySeekUsingNext indicates whether a performance optimization was enabled
     224             : // by a caller, indicating the caller has not done any action to move this
     225             : // iterator beyond the first key that would be found if this iterator were to
     226             : // honestly do the intended seek. For example, say the caller did a
     227             : // SeekGE(k1...), followed by SeekGE(k2...) where k1 <= k2, without any
     228             : // intermediate positioning calls. The caller can safely specify true for this
     229             : // parameter in the second call. As another example, say the caller did do one
     230             : // call to Next between the two Seek calls, and k1 < k2. Again, the caller can
     231             : // safely specify a true value for this parameter. Note that a false value is
     232             : // always safe. The callee is free to ignore the true value if its
     233             : // implementation does not permit this optimization.
     234             : //
     235             : // We make the caller do this determination since a string comparison of k1, k2
     236             : // is not necessarily cheap, and there may be many iterators in the iterator
     237             : // stack. Doing it once at the root of the iterator stack is cheaper.
     238             : //
     239             : // This optimization could also be applied to SeekLT (where it would be
     240             : // trySeekUsingPrev). We currently only do it for SeekPrefixGE and SeekGE
     241             : // because this is where this optimization helps the performance of CockroachDB.
     242             : // The SeekLT cases in CockroachDB are typically accompanied with bounds that
     243             : // change between seek calls, and is optimized inside certain iterator
     244             : // implementations, like singleLevelIterator, without any extra parameter
     245             : // passing (though the same amortization of string comparisons could be done to
     246             : // improve that optimization, by making the root of the iterator stack do it).
     247           1 : func (s SeekGEFlags) TrySeekUsingNext() bool { return (s & (1 << seekGEFlagTrySeekUsingNext)) != 0 }
     248             : 
     249             : // RelativeSeek is set when in the course of a forward positioning operation, a
     250             : // higher-level iterator seeks a lower-level iterator to a larger key than the
     251             : // one at the current iterator position.
     252             : //
     253             : // Concretely, this occurs when the merging iterator observes a range deletion
     254             : // covering the key at a level's current position, and the merging iterator
     255             : // seeks the level to the range deletion's end key. During lazy-combined
     256             : // iteration, this flag signals to the level iterator that the seek is NOT an
     257             : // absolute-positioning operation from the perspective of the pebble.Iterator,
     258             : // and the level iterator must look for range keys in tables between the current
     259             : // iterator position and the new seeked position.
     260           1 : func (s SeekGEFlags) RelativeSeek() bool { return (s & (1 << seekGEFlagRelativeSeek)) != 0 }
     261             : 
     262             : // BatchJustRefreshed is set by Seek[Prefix]GE when an iterator's view of an
     263             : // indexed batch was just refreshed. It serves as a signal to the batch iterator
     264             : // to ignore the TrySeekUsingNext optimization, because the external knowledge
     265             : // imparted by the TrySeekUsingNext flag does not apply to the batch iterator's
     266             : // position. See (pebble.Iterator).batchJustRefreshed.
     267           1 : func (s SeekGEFlags) BatchJustRefreshed() bool { return (s & (1 << seekGEFlagBatchJustRefreshed)) != 0 }
     268             : 
     269             : // EnableTrySeekUsingNext returns the provided flags with the
     270             : // try-seek-using-next optimization enabled. See TrySeekUsingNext for an
     271             : // explanation of this optimization.
     272           1 : func (s SeekGEFlags) EnableTrySeekUsingNext() SeekGEFlags {
     273           1 :         return s | (1 << seekGEFlagTrySeekUsingNext)
     274           1 : }
     275             : 
     276             : // DisableTrySeekUsingNext returns the provided flags with the
     277             : // try-seek-using-next optimization disabled.
     278           1 : func (s SeekGEFlags) DisableTrySeekUsingNext() SeekGEFlags {
     279           1 :         return s &^ (1 << seekGEFlagTrySeekUsingNext)
     280           1 : }
     281             : 
     282             : // EnableRelativeSeek returns the provided flags with the relative-seek flag
     283             : // enabled. See RelativeSeek for an explanation of this flag's use.
     284           1 : func (s SeekGEFlags) EnableRelativeSeek() SeekGEFlags {
     285           1 :         return s | (1 << seekGEFlagRelativeSeek)
     286           1 : }
     287             : 
     288             : // DisableRelativeSeek returns the provided flags with the relative-seek flag
     289             : // disabled.
     290           0 : func (s SeekGEFlags) DisableRelativeSeek() SeekGEFlags {
     291           0 :         return s &^ (1 << seekGEFlagRelativeSeek)
     292           0 : }
     293             : 
     294             : // EnableBatchJustRefreshed returns the provided flags with the
     295             : // batch-just-refreshed bit set. See BatchJustRefreshed for an explanation of
     296             : // this flag.
     297           0 : func (s SeekGEFlags) EnableBatchJustRefreshed() SeekGEFlags {
     298           0 :         return s | (1 << seekGEFlagBatchJustRefreshed)
     299           0 : }
     300             : 
     301             : // DisableBatchJustRefreshed returns the provided flags with the
     302             : // batch-just-refreshed bit unset.
     303           0 : func (s SeekGEFlags) DisableBatchJustRefreshed() SeekGEFlags {
     304           0 :         return s &^ (1 << seekGEFlagBatchJustRefreshed)
     305           0 : }
     306             : 
     307             : // SeekLTFlags holds flags that may configure the behavior of a reverse seek.
     308             : // Not all flags are relevant to all iterators.
     309             : type SeekLTFlags uint8
     310             : 
     311             : const (
     312             :         seekLTFlagRelativeSeek uint8 = iota
     313             : )
     314             : 
     315             : // SeekLTFlagsNone is the default value of SeekLTFlags, with all flags disabled.
     316             : const SeekLTFlagsNone = SeekLTFlags(0)
     317             : 
     318             : // RelativeSeek is set when in the course of a reverse positioning operation, a
     319             : // higher-level iterator seeks a lower-level iterator to a smaller key than the
     320             : // one at the current iterator position.
     321             : //
     322             : // Concretely, this occurs when the merging iterator observes a range deletion
     323             : // covering the key at a level's current position, and the merging iterator
     324             : // seeks the level to the range deletion's start key. During lazy-combined
     325             : // iteration, this flag signals to the level iterator that the seek is NOT an
     326             : // absolute-positioning operation from the perspective of the pebble.Iterator,
     327             : // and the level iterator must look for range keys in tables between the current
     328             : // iterator position and the new seeked position.
     329           1 : func (s SeekLTFlags) RelativeSeek() bool { return s&(1<<seekLTFlagRelativeSeek) != 0 }
     330             : 
     331             : // EnableRelativeSeek returns the provided flags with the relative-seek flag
     332             : // enabled. See RelativeSeek for an explanation of this flag's use.
     333           1 : func (s SeekLTFlags) EnableRelativeSeek() SeekLTFlags {
     334           1 :         return s | (1 << seekLTFlagRelativeSeek)
     335           1 : }
     336             : 
     337             : // DisableRelativeSeek returns the provided flags with the relative-seek flag
     338             : // disabled.
     339           0 : func (s SeekLTFlags) DisableRelativeSeek() SeekLTFlags {
     340           0 :         return s &^ (1 << seekLTFlagRelativeSeek)
     341           0 : }
     342             : 
     343             : // InternalIteratorStats contains miscellaneous stats produced by
     344             : // InternalIterators that are part of the InternalIterator tree. Not every
     345             : // field is relevant for an InternalIterator implementation. The field values
     346             : // are aggregated as one goes up the InternalIterator tree.
     347             : type InternalIteratorStats struct {
     348             :         // Bytes in the loaded blocks. If the block was compressed, this is the
     349             :         // compressed bytes. Currently, only the index blocks, data blocks
     350             :         // containing points, and filter blocks are included.
     351             :         BlockBytes uint64
     352             :         // Subset of BlockBytes that were in the block cache.
     353             :         BlockBytesInCache uint64
     354             :         // BlockReadDuration accumulates the duration spent fetching blocks
     355             :         // due to block cache misses.
     356             :         // TODO(sumeer): this currently excludes the time spent in Reader creation,
     357             :         // and in reading the rangedel and rangekey blocks. Fix that.
     358             :         BlockReadDuration time.Duration
     359             :         // The following can repeatedly count the same points if they are iterated
     360             :         // over multiple times. Additionally, they may count a point twice when
     361             :         // switching directions. The latter could be improved if needed.
     362             : 
     363             :         // Bytes in keys that were iterated over. Currently, only point keys are
     364             :         // included.
     365             :         KeyBytes uint64
     366             :         // Bytes in values that were iterated over. Currently, only point values are
     367             :         // included. For separated values, this is the size of the handle.
     368             :         ValueBytes uint64
     369             :         // The count of points iterated over.
     370             :         PointCount uint64
     371             :         // Points that were iterated over that were covered by range tombstones. It
     372             :         // can be useful for discovering instances of
     373             :         // https://github.com/cockroachdb/pebble/issues/1070.
     374             :         PointsCoveredByRangeTombstones uint64
     375             : 
     376             :         // Stats related to points in value blocks encountered during iteration.
     377             :         // These are useful to understand outliers, since typical user facing
     378             :         // iteration should tend to only look at the latest point, and hence have
     379             :         // the following stats close to 0.
     380             :         SeparatedPointValue struct {
     381             :                 // Count is a count of points that were in value blocks. This is not a
     382             :                 // subset of PointCount: PointCount is produced by mergingIter and if
     383             :                 // positioned once, and successful in returning a point, will have a
     384             :                 // PointCount of 1, regardless of how many sstables (and memtables etc.)
     385             :                 // in the heap got positioned. The count here includes every sstable
     386             :                 // iterator that got positioned in the heap.
     387             :                 Count uint64
     388             :                 // ValueBytes represent the total byte length of the values (in value
     389             :                 // blocks) of the points corresponding to Count.
     390             :                 ValueBytes uint64
     391             :                 // ValueBytesFetched is the total byte length of the values (in value
     392             :                 // blocks) that were retrieved.
     393             :                 ValueBytesFetched uint64
     394             :         }
     395             : }
     396             : 
     397             : // Merge merges the stats in from into the given stats.
     398           0 : func (s *InternalIteratorStats) Merge(from InternalIteratorStats) {
     399           0 :         s.BlockBytes += from.BlockBytes
     400           0 :         s.BlockBytesInCache += from.BlockBytesInCache
     401           0 :         s.BlockReadDuration += from.BlockReadDuration
     402           0 :         s.KeyBytes += from.KeyBytes
     403           0 :         s.ValueBytes += from.ValueBytes
     404           0 :         s.PointCount += from.PointCount
     405           0 :         s.PointsCoveredByRangeTombstones += from.PointsCoveredByRangeTombstones
     406           0 :         s.SeparatedPointValue.Count += from.SeparatedPointValue.Count
     407           0 :         s.SeparatedPointValue.ValueBytes += from.SeparatedPointValue.ValueBytes
     408           0 :         s.SeparatedPointValue.ValueBytesFetched += from.SeparatedPointValue.ValueBytesFetched
     409           0 : }

Generated by: LCOV version 1.14