LCOV - code coverage report
Current view: top level - pebble/sstable - block_property.go (source / functions) Hit Total Coverage
Test: 2024-05-29 08:16Z 7cdb5eed - tests only.lcov Lines: 363 400 90.8 %
Date: 2024-05-29 08:16:55 Functions: 0 0 -

          Line data    Source code
       1             : // Copyright 2021 The LevelDB-Go and Pebble Authors. All rights reserved. Use
       2             : // of this source code is governed by a BSD-style license that can be found in
       3             : // the LICENSE file.
       4             : 
       5             : package sstable
       6             : 
       7             : import (
       8             :         "encoding/binary"
       9             :         "fmt"
      10             :         "math"
      11             :         "sync"
      12             :         "unsafe"
      13             : 
      14             :         "github.com/cockroachdb/pebble/internal/base"
      15             :         "github.com/cockroachdb/pebble/internal/invariants"
      16             :         "github.com/cockroachdb/pebble/internal/rangekey"
      17             : )
      18             : 
      19             : // Block properties are an optional user-facing feature that can be used to
      20             : // filter data blocks (and whole sstables) from an Iterator before they are
      21             : // loaded. They do not apply to range delete blocks. These are expected to
      22             : // very concisely represent a set of some attribute value contained within the
      23             : // key or value, such that the set includes all the attribute values in the
      24             : // block. This has some similarities with OLAP pruning approaches that
      25             : // maintain min-max attribute values for some column (which concisely
      26             : // represent a set), that is then used to prune at query time. In Pebble's
      27             : // case, data blocks are small, typically 25-50KB, so these properties should
      28             : // reduce their precision in order to be concise -- a good rule of thumb is to
      29             : // not consume more than 50-100 bytes across all properties maintained for a
      30             : // block, i.e., a 500x reduction compared to loading the data block.
      31             : //
      32             : // A block property must be assigned a unique name, which is encoded and
      33             : // stored in the sstable. This name must be unique among all user-properties
      34             : // encoded in an sstable.
      35             : //
      36             : // A property is represented as a []byte. A nil value or empty byte slice are
      37             : // considered semantically identical. The caller is free to choose the
      38             : // semantics of an empty byte slice e.g. they could use it to represent the
      39             : // empty set or the universal set, whichever they think is more common and
      40             : // therefore better to encode more concisely. The serialization of the
      41             : // property for the various Finish*() calls in a BlockPropertyCollector
      42             : // implementation should be identical, since the corresponding
      43             : // BlockPropertyFilter implementation is not told the context in which it is
      44             : // deserializing the property.
      45             : //
      46             : // Block properties are more general than table properties and should be
      47             : // preferred over using table properties. A BlockPropertyCollector can achieve
      48             : // identical behavior to table properties by returning the nil slice from
      49             : // FinishDataBlock and FinishIndexBlock, and interpret them as the universal
      50             : // set in BlockPropertyFilter, and return a non-universal set in FinishTable.
      51             : //
      52             : // Block property filtering is nondeterministic because the separation of keys
      53             : // into blocks is nondeterministic. Clients use block-property filters to
      54             : // implement efficient application of a filter F that applies to key-value pairs
      55             : // (abbreviated as kv-filter). Consider correctness defined as surfacing exactly
      56             : // the same key-value pairs that would be surfaced if one applied the filter F
      57             : // above normal iteration. With this correctness definition, block property
      58             : // filtering may introduce two kinds of errors:
      59             : //
      60             : //   a) Block property filtering that uses a kv-filter may produce additional
      61             : //      key-value pairs that don't satisfy the filter because of the separation
      62             : //      of keys into blocks. Clients may remove these extra key-value pairs by
      63             : //      re-applying the kv filter while reading results back from Pebble.
      64             : //
      65             : //   b) Block property filtering may surface deleted key-value pairs if the
      66             : //      kv filter is not a strict function of the key's user key. A block
      67             : //      containing k.DEL may be filtered, while a block containing the deleted
      68             : //      key k.SET may not be filtered, if the kv filter applies to one but not
      69             : //      the other.
      70             : //
      71             : //      This error may be avoided trivially by using a kv filter that is a pure
      72             : //      function of the user key. A filter that examines values or key kinds
      73             : //      requires care to ensure F(k.SET, <value>) = F(k.DEL) = F(k.SINGLEDEL).
      74             : //
      75             : // The combination of range deletions and filtering by table-level properties
      76             : // add another opportunity for deleted point keys to be surfaced. The pebble
      77             : // Iterator stack takes care to correctly apply filtered tables' range deletions
      78             : // to lower tables, preventing this form of nondeterministic error.
      79             : //
      80             : // In addition to the non-determinism discussed in (b), which limits the use
      81             : // of properties over values, we now have support for values that are not
      82             : // stored together with the key, and may not even be retrieved during
      83             : // compactions. If Pebble is configured with such value separation, block
      84             : // properties must only apply to the key, and will be provided a nil value.
      85             : 
      86             : // BlockPropertyCollector is used when writing a sstable.
      87             : //
      88             : //   - All calls to Add are included in the next FinishDataBlock, after which
      89             : //     the next data block is expected to start.
      90             : //
      91             : //   - The index entry generated for the data block, which contains the return
      92             : //     value from FinishDataBlock, is not immediately included in the current
      93             : //     index block. It is included when AddPrevDataBlockToIndexBlock is called.
      94             : //     An alternative would be to return an opaque handle from FinishDataBlock
      95             : //     and pass it to a new AddToIndexBlock method, which requires more
      96             : //     plumbing, and passing of an interface{} results in a undesirable heap
      97             : //     allocation. AddPrevDataBlockToIndexBlock must be called before keys are
      98             : //     added to the new data block.
      99             : type BlockPropertyCollector interface {
     100             :         // Name returns the name of the block property collector.
     101             :         Name() string
     102             : 
     103             :         // Add is called with each new entry added to a data block in the sstable.
     104             :         // The callee can assume that these are in sorted order.
     105             :         Add(key InternalKey, value []byte) error
     106             : 
     107             :         // AddCollectedWithSuffixReplacement adds previously collected property data
     108             :         // and updates it to reflect a change of suffix on all keys: the old property
     109             :         // data is assumed to be constructed from keys that all have the same
     110             :         // oldSuffix and is recalculated to reflect the same keys but with newSuffix.
     111             :         //
     112             :         // A collector which supports this method must be able to derive its updated
     113             :         // value from its old value and the change being made to the suffix, without
     114             :         // needing to be passed each updated K/V.
     115             :         //
     116             :         // For example, a collector that only inspects values can simply copy its
     117             :         // previously computed property as-is, since key-suffix replacement does not
     118             :         // change values, while a collector that depends only on key suffixes, like
     119             :         // one which collected mvcc-timestamp bounds from timestamp-suffixed keys, can
     120             :         // just set its new bounds from the new suffix, as it is common to all keys,
     121             :         // without needing to recompute it from every key.
     122             :         //
     123             :         // This method is optional (if it is not implemented, it always returns an
     124             :         // error). SupportsSuffixReplacement() can be used to check if this method is
     125             :         // implemented.
     126             :         AddCollectedWithSuffixReplacement(oldProp []byte, oldSuffix, newSuffix []byte) error
     127             : 
     128             :         // SupportsSuffixReplacement returns whether the collector supports the
     129             :         // AddCollectedWithSuffixReplacement method.
     130             :         SupportsSuffixReplacement() bool
     131             : 
     132             :         // FinishDataBlock is called when all the entries have been added to a
     133             :         // data block. Subsequent Add calls will be for the next data block. It
     134             :         // returns the property value for the finished block.
     135             :         FinishDataBlock(buf []byte) ([]byte, error)
     136             : 
     137             :         // AddPrevDataBlockToIndexBlock adds the entry corresponding to the
     138             :         // previous FinishDataBlock to the current index block.
     139             :         AddPrevDataBlockToIndexBlock()
     140             : 
     141             :         // FinishIndexBlock is called when an index block, containing all the
     142             :         // key-value pairs since the last FinishIndexBlock, will no longer see new
     143             :         // entries. It returns the property value for the index block.
     144             :         FinishIndexBlock(buf []byte) ([]byte, error)
     145             : 
     146             :         // FinishTable is called when the sstable is finished, and returns the
     147             :         // property value for the sstable.
     148             :         FinishTable(buf []byte) ([]byte, error)
     149             : }
     150             : 
     151             : // BlockPropertyFilter is used in an Iterator to filter sstables and blocks
     152             : // within the sstable. It should not maintain any per-sstable state, and must
     153             : // be thread-safe.
     154             : type BlockPropertyFilter = base.BlockPropertyFilter
     155             : 
     156             : // BoundLimitedBlockPropertyFilter implements the block-property filter but
     157             : // imposes an additional constraint on its usage, requiring that only blocks
     158             : // containing exclusively keys between its lower and upper bounds may be
     159             : // filtered. The bounds may be change during iteration, so the filter doesn't
     160             : // expose the bounds, instead implementing KeyIsWithin[Lower,Upper]Bound methods
     161             : // for performing bound comparisons.
     162             : //
     163             : // To be used, a BoundLimitedBlockPropertyFilter must be supplied directly
     164             : // through NewBlockPropertiesFilterer's dedicated parameter. If supplied through
     165             : // the ordinary slice of block property filters, this filter's bounds will be
     166             : // ignored.
     167             : //
     168             : // The current [lower,upper) bounds of the filter are unknown, because they may
     169             : // be changing. During forward iteration the lower bound is externally
     170             : // guaranteed, meaning Intersects only returns false if the sstable iterator is
     171             : // already known to be positioned at a key ≥ lower. The sstable iterator is then
     172             : // only responsible for ensuring filtered blocks also meet the upper bound, and
     173             : // should only allow a block to be filtered if all its keys are < upper. The
     174             : // sstable iterator may invoke KeyIsWithinUpperBound(key) to perform this check,
     175             : // where key is an inclusive upper bound on the block's keys.
     176             : //
     177             : // During backward iteration the upper bound is externally guaranteed, and
     178             : // Intersects only returns false if the sstable iterator is already known to be
     179             : // positioned at a key < upper. The sstable iterator is responsible for ensuring
     180             : // filtered blocks also meet the lower bound, enforcing that a block is only
     181             : // filtered if all its keys are ≥ lower. This check is made through passing the
     182             : // block's inclusive lower bound to KeyIsWithinLowerBound.
     183             : //
     184             : // Implementations may become active or inactive through implementing Intersects
     185             : // to return true whenever the filter is disabled.
     186             : //
     187             : // Usage of BoundLimitedBlockPropertyFilter is subtle, and Pebble consumers
     188             : // should not implement this interface directly. This interface is an internal
     189             : // detail in the implementation of block-property range-key masking.
     190             : type BoundLimitedBlockPropertyFilter interface {
     191             :         BlockPropertyFilter
     192             : 
     193             :         // KeyIsWithinLowerBound tests whether the provided internal key falls
     194             :         // within the current lower bound of the filter. A true return value
     195             :         // indicates that the filter may be used to filter blocks that exclusively
     196             :         // contain keys ≥ `key`, so long as the blocks' keys also satisfy the upper
     197             :         // bound.
     198             :         KeyIsWithinLowerBound(key []byte) bool
     199             :         // KeyIsWithinUpperBound tests whether the provided internal key falls
     200             :         // within the current upper bound of the filter. A true return value
     201             :         // indicates that the filter may be used to filter blocks that exclusively
     202             :         // contain keys ≤ `key`, so long as the blocks' keys also satisfy the lower
     203             :         // bound.
     204             :         KeyIsWithinUpperBound(key []byte) bool
     205             : }
     206             : 
     207             : // BlockIntervalCollector is a helper implementation of BlockPropertyCollector
     208             : // for users who want to represent a set of the form [lower,upper) where both
     209             : // lower and upper are uint64, and lower <= upper.
     210             : //
     211             : // The set is encoded as:
     212             : // - Two varint integers, (lower,upper-lower), when upper-lower > 0
     213             : // - Nil, when upper-lower=0
     214             : //
     215             : // Users must not expect this to preserve differences between empty sets --
     216             : // they will all get turned into the semantically equivalent [0,0).
     217             : //
     218             : // A BlockIntervalCollector that collects over point and range keys needs to
     219             : // have both the point and range DataBlockIntervalCollector specified, since
     220             : // point and range keys are fed to the BlockIntervalCollector in an interleaved
     221             : // fashion, independently of one another. This also implies that the
     222             : // DataBlockIntervalCollectors for point and range keys should be references to
     223             : // independent instances, rather than references to the same collector, as point
     224             : // and range keys are tracked independently.
     225             : type BlockIntervalCollector struct {
     226             :         name   string
     227             :         points DataBlockIntervalCollector
     228             :         ranges DataBlockIntervalCollector
     229             : 
     230             :         blockInterval interval
     231             :         indexInterval interval
     232             :         tableInterval interval
     233             : }
     234             : 
     235             : var _ BlockPropertyCollector = &BlockIntervalCollector{}
     236             : 
     237             : // DataBlockIntervalCollector is the interface used by BlockIntervalCollector
     238             : // that contains the actual logic pertaining to the property. It only
     239             : // maintains state for the current data block, and resets that state in
     240             : // FinishDataBlock. This interface can be used to reduce parsing costs.
     241             : type DataBlockIntervalCollector interface {
     242             :         // Add is called with each new entry added to a data block in the sstable.
     243             :         // The callee can assume that these are in sorted order.
     244             :         Add(key InternalKey, value []byte) error
     245             : 
     246             :         // AddCollectedWithSuffixReplacement extends the interval with a given
     247             :         // interval after updating it to reflect a change of suffix on all keys: the
     248             :         // [oldLower, oldUpper) interval is assumed to be constructed from keys that
     249             :         // all have the same oldSuffix and is recalculated to reflect the same keys but
     250             :         // with newSuffix.
     251             :         //
     252             :         // This method is optional (if it is not implemented, it always returns an
     253             :         // error). SupportsSuffixReplacement() can be used to check if this method is
     254             :         // implemented.
     255             :         //
     256             :         // See BlockPropertyCollector.AddCollectedWithSuffixReplacement for more
     257             :         // information.
     258             :         AddCollectedWithSuffixReplacement(oldLower, oldUpper uint64, oldSuffix, newSuffix []byte) error
     259             : 
     260             :         // SupportsSuffixReplacement returns whether the collector supports the
     261             :         // AddCollectedWithSuffixReplacement method.
     262             :         SupportsSuffixReplacement() bool
     263             : 
     264             :         // FinishDataBlock is called when all the entries have been added to a
     265             :         // data block. Subsequent Add calls will be for the next data block. It
     266             :         // returns the [lower, upper) for the finished block.
     267             :         FinishDataBlock() (lower uint64, upper uint64, err error)
     268             : }
     269             : 
     270             : // NewBlockIntervalCollector constructs a BlockIntervalCollector with the given
     271             : // name. The BlockIntervalCollector makes use of the given point and range key
     272             : // DataBlockIntervalCollectors when encountering point and range keys,
     273             : // respectively.
     274             : //
     275             : // The caller may pass a nil DataBlockIntervalCollector for one of the point or
     276             : // range key collectors, in which case keys of those types will be ignored. This
     277             : // allows for flexible construction of BlockIntervalCollectors that operate on
     278             : // just point keys, just range keys, or both point and range keys.
     279             : //
     280             : // If both point and range keys are to be tracked, two independent collectors
     281             : // should be provided, rather than the same collector passed in twice (see the
     282             : // comment on BlockIntervalCollector for more detail)
     283             : func NewBlockIntervalCollector(
     284             :         name string, pointCollector, rangeCollector DataBlockIntervalCollector,
     285           1 : ) BlockPropertyCollector {
     286           1 :         if pointCollector == nil && rangeCollector == nil {
     287           1 :                 panic("sstable: at least one interval collector must be provided")
     288             :         }
     289           1 :         return &BlockIntervalCollector{
     290           1 :                 name:   name,
     291           1 :                 points: pointCollector,
     292           1 :                 ranges: rangeCollector,
     293           1 :         }
     294             : }
     295             : 
     296             : // Name is part of the BlockPropertyCollector interface.
     297           1 : func (b *BlockIntervalCollector) Name() string {
     298           1 :         return b.name
     299           1 : }
     300             : 
     301             : // Add is part of the BlockPropertyCollector interface.
     302           1 : func (b *BlockIntervalCollector) Add(key InternalKey, value []byte) error {
     303           1 :         if rangekey.IsRangeKey(key.Kind()) {
     304           1 :                 if b.ranges != nil {
     305           1 :                         return b.ranges.Add(key, value)
     306           1 :                 }
     307           1 :         } else if b.points != nil {
     308           1 :                 return b.points.Add(key, value)
     309           1 :         }
     310           1 :         return nil
     311             : }
     312             : 
     313             : // AddCollectedWithSuffixReplacement is part of the BlockPropertyCollector interface.
     314             : func (b *BlockIntervalCollector) AddCollectedWithSuffixReplacement(
     315             :         oldProp []byte, oldSuffix, newSuffix []byte,
     316           1 : ) error {
     317           1 :         var i interval
     318           1 :         if err := i.decode(oldProp); err != nil {
     319           0 :                 return err
     320           0 :         }
     321           1 :         return b.points.AddCollectedWithSuffixReplacement(i.lower, i.upper, oldSuffix, newSuffix)
     322             : }
     323             : 
     324             : // SupportsSuffixReplacement is part of the BlockPropertyCollector interface.
     325           1 : func (b *BlockIntervalCollector) SupportsSuffixReplacement() bool {
     326           1 :         return b.points.SupportsSuffixReplacement()
     327           1 : }
     328             : 
     329             : // FinishDataBlock is part of the BlockPropertyCollector interface.
     330           1 : func (b *BlockIntervalCollector) FinishDataBlock(buf []byte) ([]byte, error) {
     331           1 :         if b.points == nil {
     332           1 :                 return buf, nil
     333           1 :         }
     334           1 :         var err error
     335           1 :         b.blockInterval.lower, b.blockInterval.upper, err = b.points.FinishDataBlock()
     336           1 :         if err != nil {
     337           0 :                 return buf, err
     338           0 :         }
     339           1 :         buf = b.blockInterval.encode(buf)
     340           1 :         b.tableInterval.union(b.blockInterval)
     341           1 :         return buf, nil
     342             : }
     343             : 
     344             : // AddPrevDataBlockToIndexBlock implements the BlockPropertyCollector
     345             : // interface.
     346           1 : func (b *BlockIntervalCollector) AddPrevDataBlockToIndexBlock() {
     347           1 :         b.indexInterval.union(b.blockInterval)
     348           1 :         b.blockInterval = interval{}
     349           1 : }
     350             : 
     351             : // FinishIndexBlock implements the BlockPropertyCollector interface.
     352           1 : func (b *BlockIntervalCollector) FinishIndexBlock(buf []byte) ([]byte, error) {
     353           1 :         buf = b.indexInterval.encode(buf)
     354           1 :         b.indexInterval = interval{}
     355           1 :         return buf, nil
     356           1 : }
     357             : 
     358             : // FinishTable implements the BlockPropertyCollector interface.
     359           1 : func (b *BlockIntervalCollector) FinishTable(buf []byte) ([]byte, error) {
     360           1 :         // If the collector is tracking range keys, the range key interval is union-ed
     361           1 :         // with the point key interval for the table.
     362           1 :         if b.ranges != nil {
     363           1 :                 var rangeInterval interval
     364           1 :                 var err error
     365           1 :                 rangeInterval.lower, rangeInterval.upper, err = b.ranges.FinishDataBlock()
     366           1 :                 if err != nil {
     367           0 :                         return buf, err
     368           0 :                 }
     369           1 :                 b.tableInterval.union(rangeInterval)
     370             :         }
     371           1 :         return b.tableInterval.encode(buf), nil
     372             : }
     373             : 
     374             : type interval struct {
     375             :         lower uint64
     376             :         upper uint64
     377             : }
     378             : 
     379           1 : func (i interval) encode(buf []byte) []byte {
     380           1 :         if i.lower < i.upper {
     381           1 :                 var encoded [binary.MaxVarintLen64 * 2]byte
     382           1 :                 n := binary.PutUvarint(encoded[:], i.lower)
     383           1 :                 n += binary.PutUvarint(encoded[n:], i.upper-i.lower)
     384           1 :                 buf = append(buf, encoded[:n]...)
     385           1 :         }
     386           1 :         return buf
     387             : }
     388             : 
     389           1 : func (i *interval) decode(buf []byte) error {
     390           1 :         if len(buf) == 0 {
     391           1 :                 *i = interval{}
     392           1 :                 return nil
     393           1 :         }
     394           1 :         var n int
     395           1 :         i.lower, n = binary.Uvarint(buf)
     396           1 :         if n <= 0 || n >= len(buf) {
     397           0 :                 return base.CorruptionErrorf("cannot decode interval from buf %x", buf)
     398           0 :         }
     399           1 :         pos := n
     400           1 :         i.upper, n = binary.Uvarint(buf[pos:])
     401           1 :         pos += n
     402           1 :         if pos != len(buf) || n <= 0 {
     403           0 :                 return base.CorruptionErrorf("cannot decode interval from buf %x", buf)
     404           0 :         }
     405             :         // Delta decode.
     406           1 :         i.upper += i.lower
     407           1 :         if i.upper < i.lower {
     408           0 :                 return base.CorruptionErrorf("unexpected overflow, upper %d < lower %d", i.upper, i.lower)
     409           0 :         }
     410           1 :         return nil
     411             : }
     412             : 
     413           1 : func (i *interval) union(x interval) {
     414           1 :         if x.lower >= x.upper {
     415           1 :                 // x is the empty set.
     416           1 :                 return
     417           1 :         }
     418           1 :         if i.lower >= i.upper {
     419           1 :                 // i is the empty set.
     420           1 :                 *i = x
     421           1 :                 return
     422           1 :         }
     423             :         // Both sets are non-empty.
     424           1 :         if x.lower < i.lower {
     425           1 :                 i.lower = x.lower
     426           1 :         }
     427           1 :         if x.upper > i.upper {
     428           1 :                 i.upper = x.upper
     429           1 :         }
     430             : }
     431             : 
     432           1 : func (i interval) intersects(x interval) bool {
     433           1 :         if i.lower >= i.upper || x.lower >= x.upper {
     434           1 :                 // At least one of the sets is empty.
     435           1 :                 return false
     436           1 :         }
     437             :         // Neither set is empty.
     438           1 :         return i.upper > x.lower && i.lower < x.upper
     439             : }
     440             : 
     441             : // BlockIntervalSyntheticReplacer provides methods to conduct just in time
     442             : // adjustments of a passed in block prop interval before filtering.
     443             : type BlockIntervalSyntheticReplacer interface {
     444             :         // AdjustIntervalWithSyntheticSuffix adjusts the [lower, upper) range for a data
     445             :         // block (specifically, the range returned by the corresponding
     446             :         // DataBlockIntervalCollector.FinishDataBlock) to what it would have been if
     447             :         // all the input keys had the given synthetic suffix.
     448             :         AdjustIntervalWithSyntheticSuffix(lower uint64, upper uint64, suffix []byte) (adjustedLower uint64, adjustedUpper uint64, err error)
     449             : }
     450             : 
     451             : // BlockIntervalFilter is an implementation of BlockPropertyFilter when the
     452             : // corresponding collector is a BlockIntervalCollector. That is, the set is of
     453             : // the form [lower, upper).
     454             : type BlockIntervalFilter struct {
     455             :         name              string
     456             :         filterInterval    interval
     457             :         syntheticReplacer BlockIntervalSyntheticReplacer
     458             : }
     459             : 
     460             : var _ BlockPropertyFilter = (*BlockIntervalFilter)(nil)
     461             : 
     462             : // NewBlockIntervalFilter constructs a BlockPropertyFilter that filters blocks
     463             : // based on an interval property collected by BlockIntervalCollector and the
     464             : // given [lower, upper) bounds. The given name specifies the
     465             : // BlockIntervalCollector's properties to read.
     466             : func NewBlockIntervalFilter(
     467             :         name string, lower uint64, upper uint64, syntheticReplacer BlockIntervalSyntheticReplacer,
     468           1 : ) *BlockIntervalFilter {
     469           1 :         b := new(BlockIntervalFilter)
     470           1 :         b.Init(name, lower, upper, syntheticReplacer)
     471           1 :         return b
     472           1 : }
     473             : 
     474             : // Init initializes (or re-initializes, clearing previous state) an existing
     475             : // BLockPropertyFilter to filter blocks based on an interval property collected
     476             : // by BlockIntervalCollector and the given [lower, upper) bounds. The given name
     477             : // specifies the BlockIntervalCollector's properties to read.
     478             : func (b *BlockIntervalFilter) Init(
     479             :         name string, lower, upper uint64, syntheticReplacer BlockIntervalSyntheticReplacer,
     480           1 : ) {
     481           1 :         *b = BlockIntervalFilter{
     482           1 :                 name:              name,
     483           1 :                 filterInterval:    interval{lower: lower, upper: upper},
     484           1 :                 syntheticReplacer: syntheticReplacer,
     485           1 :         }
     486           1 : }
     487             : 
     488             : // Name implements the BlockPropertyFilter interface.
     489           1 : func (b *BlockIntervalFilter) Name() string {
     490           1 :         return b.name
     491           1 : }
     492             : 
     493             : // Intersects implements the BlockPropertyFilter interface.
     494           1 : func (b *BlockIntervalFilter) Intersects(prop []byte) (bool, error) {
     495           1 :         var i interval
     496           1 :         if err := i.decode(prop); err != nil {
     497           0 :                 return false, err
     498           0 :         }
     499           1 :         return i.intersects(b.filterInterval), nil
     500             : }
     501             : 
     502             : // SyntheticSuffixIntersects implements the BlockPropertyFilter interface.
     503           1 : func (b *BlockIntervalFilter) SyntheticSuffixIntersects(prop []byte, suffix []byte) (bool, error) {
     504           1 :         if b.syntheticReplacer == nil {
     505           0 :                 return false, base.AssertionFailedf("missing SyntheticReplacer for SyntheticSuffixIntersects()")
     506           0 :         }
     507           1 :         var i interval
     508           1 :         if err := i.decode(prop); err != nil {
     509           0 :                 return false, err
     510           0 :         }
     511             : 
     512           1 :         newLower, newUpper, err := b.syntheticReplacer.AdjustIntervalWithSyntheticSuffix(i.lower, i.upper, suffix)
     513           1 :         if err != nil {
     514           0 :                 return false, err
     515           0 :         }
     516           1 :         newInterval := interval{lower: newLower, upper: newUpper}
     517           1 :         return newInterval.intersects(b.filterInterval), nil
     518             : }
     519             : 
     520             : // SetInterval adjusts the [lower, upper) bounds used by the filter. It is not
     521             : // generally safe to alter the filter while it's in use, except as part of the
     522             : // implementation of BlockPropertyFilterMask.SetSuffix used for range-key
     523             : // masking.
     524           1 : func (b *BlockIntervalFilter) SetInterval(lower, upper uint64) {
     525           1 :         b.filterInterval = interval{lower: lower, upper: upper}
     526           1 : }
     527             : 
     528             : // When encoding block properties for each block, we cannot afford to encode the
     529             : // name. Instead, the name is mapped to a shortID, in the scope of that sstable,
     530             : // and the shortID is encoded as a single byte (which imposes a limit of of 256
     531             : // block property collectors per sstable).
     532             : // Note that the in-memory type is int16 to avoid overflows (e.g. in loops) and
     533             : // to allow special values like -1 in code.
     534             : type shortID int16
     535             : 
     536             : const invalidShortID shortID = -1
     537             : const maxShortID shortID = math.MaxUint8
     538             : const maxPropertyCollectors = int(maxShortID) + 1
     539             : 
     540           1 : func (id shortID) IsValid() bool {
     541           1 :         return id >= 0 && id <= maxShortID
     542           1 : }
     543             : 
     544           1 : func (id shortID) ToByte() byte {
     545           1 :         if invariants.Enabled && !id.IsValid() {
     546           0 :                 panic(fmt.Sprintf("inavlid id %d", id))
     547             :         }
     548           1 :         return byte(id)
     549             : }
     550             : 
     551             : type blockPropertiesEncoder struct {
     552             :         propsBuf []byte
     553             :         scratch  []byte
     554             : }
     555             : 
     556           1 : func (e *blockPropertiesEncoder) getScratchForProp() []byte {
     557           1 :         return e.scratch[:0]
     558           1 : }
     559             : 
     560           1 : func (e *blockPropertiesEncoder) resetProps() {
     561           1 :         e.propsBuf = e.propsBuf[:0]
     562           1 : }
     563             : 
     564           1 : func (e *blockPropertiesEncoder) addProp(id shortID, scratch []byte) {
     565           1 :         if len(scratch) == 0 {
     566           1 :                 // We omit empty properties. The decoder will know that any missing IDs had
     567           1 :                 // empty values.
     568           1 :                 return
     569           1 :         }
     570           1 :         const lenID = 1
     571           1 :         lenProp := uvarintLen(uint32(len(scratch)))
     572           1 :         n := lenID + lenProp + len(scratch)
     573           1 :         if cap(e.propsBuf)-len(e.propsBuf) < n {
     574           1 :                 size := len(e.propsBuf) + 2*n
     575           1 :                 if size < 2*cap(e.propsBuf) {
     576           1 :                         size = 2 * cap(e.propsBuf)
     577           1 :                 }
     578           1 :                 buf := make([]byte, len(e.propsBuf), size)
     579           1 :                 copy(buf, e.propsBuf)
     580           1 :                 e.propsBuf = buf
     581             :         }
     582           1 :         pos := len(e.propsBuf)
     583           1 :         b := e.propsBuf[pos : pos+lenID]
     584           1 :         b[0] = id.ToByte()
     585           1 :         pos += lenID
     586           1 :         b = e.propsBuf[pos : pos+lenProp]
     587           1 :         n = binary.PutUvarint(b, uint64(len(scratch)))
     588           1 :         pos += n
     589           1 :         b = e.propsBuf[pos : pos+len(scratch)]
     590           1 :         pos += len(scratch)
     591           1 :         copy(b, scratch)
     592           1 :         e.propsBuf = e.propsBuf[0:pos]
     593           1 :         e.scratch = scratch
     594             : }
     595             : 
     596           1 : func (e *blockPropertiesEncoder) unsafeProps() []byte {
     597           1 :         return e.propsBuf
     598           1 : }
     599             : 
     600           1 : func (e *blockPropertiesEncoder) props() []byte {
     601           1 :         buf := make([]byte, len(e.propsBuf))
     602           1 :         copy(buf, e.propsBuf)
     603           1 :         return buf
     604           1 : }
     605             : 
     606             : type blockPropertiesDecoder struct {
     607             :         props []byte
     608             : 
     609             :         // numCollectedProps is the number of collectors that were used when writing
     610             :         // these properties. The encoded properties contain values for shortIDs 0
     611             :         // through numCollectedProps-1, in order (with empty properties omitted).
     612             :         numCollectedProps int
     613             :         nextID            shortID
     614             : }
     615             : 
     616           1 : func makeBlockPropertiesDecoder(numCollectedProps int, propsBuf []byte) blockPropertiesDecoder {
     617           1 :         return blockPropertiesDecoder{
     618           1 :                 props:             propsBuf,
     619           1 :                 numCollectedProps: numCollectedProps,
     620           1 :         }
     621           1 : }
     622             : 
     623           1 : func (d *blockPropertiesDecoder) Done() bool {
     624           1 :         return int(d.nextID) >= d.numCollectedProps
     625           1 : }
     626             : 
     627             : // Next returns the property for each shortID between 0 and numCollectedProps-1, in order.
     628             : // Note that some properties might be empty.
     629             : // REQUIRES: !Done()
     630           1 : func (d *blockPropertiesDecoder) Next() (id shortID, prop []byte, err error) {
     631           1 :         id = d.nextID
     632           1 :         d.nextID++
     633           1 : 
     634           1 :         if len(d.props) == 0 || shortID(d.props[0]) != id {
     635           1 :                 if invariants.Enabled && len(d.props) > 0 && shortID(d.props[0]) < id {
     636           0 :                         panic("shortIDs are not in order")
     637             :                 }
     638             :                 // This property was omitted because it was empty.
     639           1 :                 return id, nil, nil
     640             :         }
     641             : 
     642           1 :         const lenID = 1
     643           1 :         propLen, m := binary.Uvarint(d.props[lenID:])
     644           1 :         n := lenID + m
     645           1 :         if m <= 0 || propLen == 0 || (n+int(propLen)) > len(d.props) {
     646           0 :                 return 0, nil, base.CorruptionErrorf("corrupt block property length")
     647           0 :         }
     648           1 :         prop = d.props[n : n+int(propLen)]
     649           1 :         d.props = d.props[n+int(propLen):]
     650           1 :         return id, prop, nil
     651             : }
     652             : 
     653             : // BlockPropertiesFilterer provides filtering support when reading an sstable
     654             : // in the context of an iterator that has a slice of BlockPropertyFilters.
     655             : // After the call to NewBlockPropertiesFilterer, the caller must call
     656             : // IntersectsUserPropsAndFinishInit to check if the sstable intersects with
     657             : // the filters. If it does intersect, this function also finishes initializing
     658             : // the BlockPropertiesFilterer using the shortIDs for the relevant filters.
     659             : // Subsequent checks for relevance of a block should use the intersects
     660             : // method.
     661             : type BlockPropertiesFilterer struct {
     662             :         filters []BlockPropertyFilter
     663             :         // Maps shortID => index in filters. This can be sparse, and shortIDs for
     664             :         // which there is no filter are represented with an index of -1. The
     665             :         // length of this can be shorter than the shortIDs allocated in the
     666             :         // sstable. e.g. if the sstable used shortIDs 0, 1, 2, 3, and the iterator
     667             :         // has two filters, corresponding to shortIDs 2, 0, this would be:
     668             :         // len(shortIDToFiltersIndex)==3, 0=>1, 1=>-1, 2=>0.
     669             :         shortIDToFiltersIndex []int
     670             : 
     671             :         // boundLimitedFilter, if non-nil, holds a single block-property filter with
     672             :         // additional constraints on its filtering. A boundLimitedFilter may only
     673             :         // filter blocks that are wholly contained within its bounds. During forward
     674             :         // iteration the lower bound (and during backward iteration the upper bound)
     675             :         // must be externally guaranteed, with Intersects only returning false if
     676             :         // that bound is met. The opposite bound is verified during iteration by the
     677             :         // sstable iterator.
     678             :         //
     679             :         // boundLimitedFilter is permitted to be defined on a property (`Name()`)
     680             :         // for which another filter exists in filters. In this case both filters
     681             :         // will be consulted, and either filter may exclude block(s). Only a single
     682             :         // bound-limited block-property filter may be set.
     683             :         //
     684             :         // The boundLimitedShortID field contains the shortID of the filter's
     685             :         // property within the sstable. It's set to -1 if the property was not
     686             :         // collected when the table was built.
     687             :         boundLimitedFilter  BoundLimitedBlockPropertyFilter
     688             :         boundLimitedShortID int
     689             : 
     690             :         syntheticSuffix SyntheticSuffix
     691             : }
     692             : 
     693             : var blockPropertiesFiltererPool = sync.Pool{
     694           1 :         New: func() interface{} {
     695           1 :                 return &BlockPropertiesFilterer{}
     696           1 :         },
     697             : }
     698             : 
     699             : // newBlockPropertiesFilterer returns a partially initialized filterer. To complete
     700             : // initialization, call IntersectsUserPropsAndFinishInit.
     701             : func newBlockPropertiesFilterer(
     702             :         filters []BlockPropertyFilter,
     703             :         limited BoundLimitedBlockPropertyFilter,
     704             :         syntheticSuffix SyntheticSuffix,
     705           1 : ) *BlockPropertiesFilterer {
     706           1 :         filterer := blockPropertiesFiltererPool.Get().(*BlockPropertiesFilterer)
     707           1 :         *filterer = BlockPropertiesFilterer{
     708           1 :                 filters:               filters,
     709           1 :                 shortIDToFiltersIndex: filterer.shortIDToFiltersIndex[:0],
     710           1 :                 boundLimitedFilter:    limited,
     711           1 :                 boundLimitedShortID:   -1,
     712           1 :                 syntheticSuffix:       syntheticSuffix,
     713           1 :         }
     714           1 :         return filterer
     715           1 : }
     716             : 
     717           1 : func releaseBlockPropertiesFilterer(filterer *BlockPropertiesFilterer) {
     718           1 :         *filterer = BlockPropertiesFilterer{
     719           1 :                 shortIDToFiltersIndex: filterer.shortIDToFiltersIndex[:0],
     720           1 :         }
     721           1 :         blockPropertiesFiltererPool.Put(filterer)
     722           1 : }
     723             : 
     724             : // IntersectsTable evaluates the provided block-property filter against the
     725             : // provided set of table-level properties. If there is no intersection between
     726             : // the filters and the table or an error is encountered, IntersectsTable returns
     727             : // a nil filterer (and possibly an error). If there is an intersection,
     728             : // IntersectsTable returns a non-nil filterer that may be used by an iterator
     729             : // reading the table.
     730             : func IntersectsTable(
     731             :         filters []BlockPropertyFilter,
     732             :         limited BoundLimitedBlockPropertyFilter,
     733             :         userProperties map[string]string,
     734             :         syntheticSuffix SyntheticSuffix,
     735           1 : ) (*BlockPropertiesFilterer, error) {
     736           1 :         f := newBlockPropertiesFilterer(filters, limited, syntheticSuffix)
     737           1 :         ok, err := f.intersectsUserPropsAndFinishInit(userProperties)
     738           1 :         if !ok || err != nil {
     739           1 :                 releaseBlockPropertiesFilterer(f)
     740           1 :                 return nil, err
     741           1 :         }
     742           1 :         return f, nil
     743             : }
     744             : 
     745             : // intersectsUserPropsAndFinishInit is called with the user properties map for
     746             : // the sstable and returns whether the sstable intersects the filters. It
     747             : // additionally initializes the shortIDToFiltersIndex for the filters that are
     748             : // relevant to this sstable.
     749             : func (f *BlockPropertiesFilterer) intersectsUserPropsAndFinishInit(
     750             :         userProperties map[string]string,
     751           1 : ) (bool, error) {
     752           1 :         for i := range f.filters {
     753           1 :                 props, ok := userProperties[f.filters[i].Name()]
     754           1 :                 if !ok {
     755           1 :                         // Collector was not used when writing this file, so it is
     756           1 :                         // considered intersecting.
     757           1 :                         continue
     758             :                 }
     759           1 :                 if len(props) < 1 {
     760           0 :                         return false, base.CorruptionErrorf(
     761           0 :                                 "block properties for %s is corrupted", f.filters[i].Name())
     762           0 :                 }
     763           1 :                 shortID := shortID(props[0])
     764           1 :                 {
     765           1 :                         // Use an unsafe conversion to avoid allocating. Intersects() is not
     766           1 :                         // supposed to modify the given slice.
     767           1 :                         // Note that unsafe.StringData only works if the string is not empty
     768           1 :                         // (which we already checked).
     769           1 :                         byteProps := unsafe.Slice(unsafe.StringData(props), len(props))
     770           1 :                         var intersects bool
     771           1 :                         var err error
     772           1 :                         if len(f.syntheticSuffix) == 0 {
     773           1 :                                 intersects, err = f.filters[i].Intersects(byteProps[1:])
     774           1 :                         } else {
     775           1 :                                 intersects, err = f.filters[i].SyntheticSuffixIntersects(byteProps[1:], f.syntheticSuffix)
     776           1 :                         }
     777           1 :                         if err != nil || !intersects {
     778           1 :                                 return false, err
     779           1 :                         }
     780             :                 }
     781             :                 // Intersects the sstable, so need to use this filter when
     782             :                 // deciding whether to read blocks.
     783           1 :                 n := len(f.shortIDToFiltersIndex)
     784           1 :                 if n <= int(shortID) {
     785           1 :                         if cap(f.shortIDToFiltersIndex) <= int(shortID) {
     786           1 :                                 index := make([]int, shortID+1, 2*(shortID+1))
     787           1 :                                 copy(index, f.shortIDToFiltersIndex)
     788           1 :                                 f.shortIDToFiltersIndex = index
     789           1 :                         } else {
     790           1 :                                 f.shortIDToFiltersIndex = f.shortIDToFiltersIndex[:shortID+1]
     791           1 :                         }
     792           1 :                         for j := n; j < int(shortID); j++ {
     793           1 :                                 f.shortIDToFiltersIndex[j] = -1
     794           1 :                         }
     795             :                 }
     796           1 :                 f.shortIDToFiltersIndex[shortID] = i
     797             :         }
     798           1 :         if f.boundLimitedFilter == nil {
     799           1 :                 return true, nil
     800           1 :         }
     801             : 
     802             :         // There's a bound-limited filter. Find its shortID. It's possible that
     803             :         // there's an existing filter in f.filters on the same property. That's
     804             :         // okay. Both filters will be consulted whenever a relevant prop is decoded.
     805           1 :         props, ok := userProperties[f.boundLimitedFilter.Name()]
     806           1 :         if !ok {
     807           1 :                 // The collector was not used when writing this file, so it's
     808           1 :                 // intersecting. We leave f.boundLimitedShortID=-1, so the filter will
     809           1 :                 // be unused within this file.
     810           1 :                 return true, nil
     811           1 :         }
     812           1 :         if len(props) < 1 {
     813           0 :                 return false, base.CorruptionErrorf(
     814           0 :                         "block properties for %s is corrupted", f.boundLimitedFilter.Name())
     815           0 :         }
     816           1 :         f.boundLimitedShortID = int(props[0])
     817           1 : 
     818           1 :         // We don't check for table-level intersection for the bound-limited filter.
     819           1 :         // The bound-limited filter is treated as vacuously intersecting.
     820           1 :         //
     821           1 :         // NB: If a block-property filter needs to be toggled inactive/active, it
     822           1 :         // should be implemented within the Intersects implementation.
     823           1 :         //
     824           1 :         // TODO(jackson): We could filter at the table-level by threading the table
     825           1 :         // smallest and largest bounds here.
     826           1 : 
     827           1 :         // The bound-limited filter isn't included in shortIDToFiltersIndex.
     828           1 :         //
     829           1 :         // When determining intersection, we decode props only up to the shortID
     830           1 :         // len(shortIDToFiltersIndex). If f.limitedShortID is greater than any of
     831           1 :         // the existing filters' shortIDs, we need to grow shortIDToFiltersIndex.
     832           1 :         // Growing the index with -1s ensures we're able to consult the index
     833           1 :         // without length checks.
     834           1 :         if n := len(f.shortIDToFiltersIndex); n <= f.boundLimitedShortID {
     835           1 :                 if cap(f.shortIDToFiltersIndex) <= f.boundLimitedShortID {
     836           1 :                         index := make([]int, f.boundLimitedShortID+1)
     837           1 :                         copy(index, f.shortIDToFiltersIndex)
     838           1 :                         f.shortIDToFiltersIndex = index
     839           1 :                 } else {
     840           1 :                         f.shortIDToFiltersIndex = f.shortIDToFiltersIndex[:f.boundLimitedShortID+1]
     841           1 :                 }
     842           1 :                 for j := n; j <= f.boundLimitedShortID; j++ {
     843           1 :                         f.shortIDToFiltersIndex[j] = -1
     844           1 :                 }
     845             :         }
     846           1 :         return true, nil
     847             : }
     848             : 
     849             : type intersectsResult int8
     850             : 
     851             : const (
     852             :         blockIntersects intersectsResult = iota
     853             :         blockExcluded
     854             :         // blockMaybeExcluded is returned by BlockPropertiesFilterer.intersects when
     855             :         // no filters unconditionally exclude the block, but the bound-limited block
     856             :         // property filter will exclude it if the block's bounds fall within the
     857             :         // filter's current bounds. See the reader's
     858             :         // {single,two}LevelIterator.resolveMaybeExcluded methods.
     859             :         blockMaybeExcluded
     860             : )
     861             : 
     862           1 : func (f *BlockPropertiesFilterer) intersects(props []byte) (intersectsResult, error) {
     863           1 :         decoder := makeBlockPropertiesDecoder(len(f.shortIDToFiltersIndex), props)
     864           1 :         ret := blockIntersects
     865           1 :         for !decoder.Done() {
     866           1 :                 id, prop, err := decoder.Next()
     867           1 :                 if err != nil {
     868           0 :                         return ret, err
     869           0 :                 }
     870           1 :                 intersects, err := f.intersectsFilter(id, prop)
     871           1 :                 if err != nil {
     872           0 :                         return ret, err
     873           0 :                 }
     874           1 :                 if intersects == blockExcluded {
     875           1 :                         return blockExcluded, nil
     876           1 :                 }
     877           1 :                 if intersects == blockMaybeExcluded {
     878           1 :                         ret = blockMaybeExcluded
     879           1 :                 }
     880             :         }
     881             :         // ret is either blockIntersects or blockMaybeExcluded.
     882           1 :         return ret, nil
     883             : }
     884             : 
     885             : func (f *BlockPropertiesFilterer) intersectsFilter(
     886             :         id shortID, prop []byte,
     887           1 : ) (intersectsResult, error) {
     888           1 :         var intersects bool
     889           1 :         var err error
     890           1 :         if filterIdx := f.shortIDToFiltersIndex[id]; filterIdx >= 0 {
     891           1 :                 if !f.syntheticSuffix.IsSet() {
     892           1 :                         intersects, err = f.filters[filterIdx].Intersects(prop)
     893           1 :                 } else {
     894           1 :                         intersects, err = f.filters[filterIdx].SyntheticSuffixIntersects(prop, f.syntheticSuffix)
     895           1 :                 }
     896           1 :                 if err != nil {
     897           0 :                         return blockIntersects, err
     898           0 :                 }
     899           1 :                 if !intersects {
     900           1 :                         return blockExcluded, nil
     901           1 :                 }
     902             :         }
     903           1 :         if int(id) == f.boundLimitedShortID {
     904           1 :                 // The bound-limited filter uses this id.
     905           1 :                 //
     906           1 :                 // The bound-limited filter only applies within a keyspan interval. We
     907           1 :                 // expect the Intersects call to be cheaper than bounds checks. If
     908           1 :                 // Intersects determines that there is no intersection, we return
     909           1 :                 // `blockMaybeExcluded` if no other bpf unconditionally excludes the
     910           1 :                 // block.
     911           1 :                 if !f.syntheticSuffix.IsSet() {
     912           1 :                         intersects, err = f.boundLimitedFilter.Intersects(prop)
     913           1 :                 } else {
     914           1 :                         intersects, err = f.boundLimitedFilter.SyntheticSuffixIntersects(prop, f.syntheticSuffix)
     915           1 :                 }
     916           1 :                 if err != nil {
     917           0 :                         return blockIntersects, err
     918           1 :                 } else if !intersects {
     919           1 :                         return blockMaybeExcluded, nil
     920           1 :                 }
     921             :         }
     922           1 :         return blockIntersects, nil
     923             : }
     924             : 
     925           1 : func uvarintLen(v uint32) int {
     926           1 :         i := 0
     927           1 :         for v >= 0x80 {
     928           1 :                 v >>= 7
     929           1 :                 i++
     930           1 :         }
     931           1 :         return i + 1
     932             : }

Generated by: LCOV version 1.14