Coverage Report

Created: 2025-07-23 06:05

/rust/registry/src/index.crates.io-6f17d22bba15001f/regex-automata-0.4.9/src/util/iter.rs
Line
Count
Source (jump to first uncovered line)
1
/*!
2
Generic helpers for iteration of matches from a regex engine in a haystack.
3
4
The principle type in this module is a [`Searcher`]. A `Searcher` provides
5
its own lower level iterator-like API in addition to methods for constructing
6
types that implement `Iterator`. The documentation for `Searcher` explains a
7
bit more about why these different APIs exist.
8
9
Currently, this module supports iteration over any regex engine that works
10
with the [`HalfMatch`], [`Match`] or [`Captures`] types.
11
*/
12
13
#[cfg(feature = "alloc")]
14
use crate::util::captures::Captures;
15
use crate::util::search::{HalfMatch, Input, Match, MatchError};
16
17
/// A searcher for creating iterators and performing lower level iteration.
18
///
19
/// This searcher encapsulates the logic required for finding all successive
20
/// non-overlapping matches in a haystack. In theory, iteration would look
21
/// something like this:
22
///
23
/// 1. Setting the start position to `0`.
24
/// 2. Execute a regex search. If no match, end iteration.
25
/// 3. Report the match and set the start position to the end of the match.
26
/// 4. Go back to (2).
27
///
28
/// And if this were indeed the case, it's likely that `Searcher` wouldn't
29
/// exist. Unfortunately, because a regex may match the empty string, the above
30
/// logic won't work for all possible regexes. Namely, if an empty match is
31
/// found, then step (3) would set the start position of the search to the
32
/// position it was at. Thus, iteration would never end.
33
///
34
/// Instead, a `Searcher` knows how to detect these cases and forcefully
35
/// advance iteration in the case of an empty match that overlaps with a
36
/// previous match.
37
///
38
/// If you know that your regex cannot match any empty string, then the simple
39
/// algorithm described above will work correctly.
40
///
41
/// When possible, prefer the iterators defined on the regex engine you're
42
/// using. This tries to abstract over the regex engine and is thus a bit more
43
/// unwieldy to use.
44
///
45
/// In particular, a `Searcher` is not itself an iterator. Instead, it provides
46
/// `advance` routines that permit moving the search along explicitly. It also
47
/// provides various routines, like [`Searcher::into_matches_iter`], that
48
/// accept a closure (representing how a regex engine executes a search) and
49
/// returns a conventional iterator.
50
///
51
/// The lifetime parameters come from the [`Input`] type passed to
52
/// [`Searcher::new`]:
53
///
54
/// * `'h` is the lifetime of the underlying haystack.
55
///
56
/// # Searcher vs Iterator
57
///
58
/// Why does a search type with "advance" APIs exist at all when we also have
59
/// iterators? Unfortunately, the reasoning behind this split is a complex
60
/// combination of the following things:
61
///
62
/// 1. While many of the regex engines expose their own iterators, it is also
63
/// nice to expose this lower level iteration helper because it permits callers
64
/// to provide their own `Input` configuration. Moreover, a `Searcher` can work
65
/// with _any_ regex engine instead of only the ones defined in this crate.
66
/// This way, everyone benefits from a shared iteration implementation.
67
/// 2. There are many different regex engines that, while they have the same
68
/// match semantics, they have slightly different APIs. Iteration is just
69
/// complex enough to want to share code, and so we need a way of abstracting
70
/// over those different regex engines. While we could define a new trait that
71
/// describes any regex engine search API, it would wind up looking very close
72
/// to a closure. While there may still be reasons for the more generic trait
73
/// to exist, for now and for the purposes of iteration, we use a closure.
74
/// Closures also provide a lot of easy flexibility at the call site, in that
75
/// they permit the caller to borrow any kind of state they want for use during
76
/// each search call.
77
/// 3. As a result of using closures, and because closures are anonymous types
78
/// that cannot be named, it is difficult to encapsulate them without both
79
/// costs to speed and added complexity to the public API. For example, in
80
/// defining an iterator type like
81
/// [`dfa::regex::FindMatches`](crate::dfa::regex::FindMatches),
82
/// if we use a closure internally, it's not possible to name this type in the
83
/// return type of the iterator constructor. Thus, the only way around it is
84
/// to erase the type by boxing it and turning it into a `Box<dyn FnMut ...>`.
85
/// This boxed closure is unlikely to be inlined _and_ it infects the public
86
/// API in subtle ways. Namely, unless you declare the closure as implementing
87
/// `Send` and `Sync`, then the resulting iterator type won't implement it
88
/// either. But there are practical issues with requiring the closure to
89
/// implement `Send` and `Sync` that result in other API complexities that
90
/// are beyond the scope of this already long exposition.
91
/// 4. Some regex engines expose more complex match information than just
92
/// "which pattern matched" and "at what offsets." For example, the PikeVM
93
/// exposes match spans for each capturing group that participated in the
94
/// match. In such cases, it can be quite beneficial to reuse the capturing
95
/// group allocation on subsequent searches. A proper iterator doesn't permit
96
/// this API due to its interface, so it's useful to have something a bit lower
97
/// level that permits callers to amortize allocations while also reusing a
98
/// shared implementation of iteration. (See the documentation for
99
/// [`Searcher::advance`] for an example of using the "advance" API with the
100
/// PikeVM.)
101
///
102
/// What this boils down to is that there are "advance" APIs which require
103
/// handing a closure to it for every call, and there are also APIs to create
104
/// iterators from a closure. The former are useful for _implementing_
105
/// iterators or when you need more flexibility, while the latter are useful
106
/// for conveniently writing custom iterators on-the-fly.
107
///
108
/// # Example: iterating with captures
109
///
110
/// Several regex engines in this crate over convenient iterator APIs over
111
/// [`Captures`] values. To do so, this requires allocating a new `Captures`
112
/// value for each iteration step. This can perhaps be more costly than you
113
/// might want. Instead of implementing your own iterator to avoid that
114
/// cost (which can be a little subtle if you want to handle empty matches
115
/// correctly), you can use this `Searcher` to do it for you:
116
///
117
/// ```
118
/// use regex_automata::{
119
///     nfa::thompson::pikevm::PikeVM,
120
///     util::iter::Searcher,
121
///     Input, Span,
122
/// };
123
///
124
/// let re = PikeVM::new("foo(?P<numbers>[0-9]+)")?;
125
/// let haystack = "foo1 foo12 foo123";
126
///
127
/// let mut caps = re.create_captures();
128
/// let mut cache = re.create_cache();
129
/// let mut matches = vec![];
130
/// let mut searcher = Searcher::new(Input::new(haystack));
131
/// while let Some(_) = searcher.advance(|input| {
132
///     re.search(&mut cache, input, &mut caps);
133
///     Ok(caps.get_match())
134
/// }) {
135
///     // The unwrap is OK since 'numbers' matches if the pattern matches.
136
///     matches.push(caps.get_group_by_name("numbers").unwrap());
137
/// }
138
/// assert_eq!(matches, vec![
139
///     Span::from(3..4),
140
///     Span::from(8..10),
141
///     Span::from(14..17),
142
/// ]);
143
///
144
/// # Ok::<(), Box<dyn std::error::Error>>(())
145
/// ```
146
#[derive(Clone, Debug)]
147
pub struct Searcher<'h> {
148
    /// The input parameters to give to each regex engine call.
149
    ///
150
    /// The start position of the search is mutated during iteration.
151
    input: Input<'h>,
152
    /// Records the end offset of the most recent match. This is necessary to
153
    /// handle a corner case for preventing empty matches from overlapping with
154
    /// the ending bounds of a prior match.
155
    last_match_end: Option<usize>,
156
}
157
158
impl<'h> Searcher<'h> {
159
    /// Create a new fallible non-overlapping matches iterator.
160
    ///
161
    /// The given `input` provides the parameters (including the haystack),
162
    /// while the `finder` represents a closure that calls the underlying regex
163
    /// engine. The closure may borrow any additional state that is needed,
164
    /// such as a prefilter scanner.
165
0
    pub fn new(input: Input<'h>) -> Searcher<'h> {
166
0
        Searcher { input, last_match_end: None }
167
0
    }
168
169
    /// Returns the current `Input` used by this searcher.
170
    ///
171
    /// The `Input` returned is generally equivalent to the one given to
172
    /// [`Searcher::new`], but its start position may be different to reflect
173
    /// the start of the next search to be executed.
174
0
    pub fn input<'s>(&'s self) -> &'s Input<'h> {
175
0
        &self.input
176
0
    }
177
178
    /// Return the next half match for an infallible search if one exists, and
179
    /// advance to the next position.
180
    ///
181
    /// This is like `try_advance_half`, except errors are converted into
182
    /// panics.
183
    ///
184
    /// # Panics
185
    ///
186
    /// If the given closure returns an error, then this panics. This is useful
187
    /// when you know your underlying regex engine has been configured to not
188
    /// return an error.
189
    ///
190
    /// # Example
191
    ///
192
    /// This example shows how to use a `Searcher` to iterate over all matches
193
    /// when using a DFA, which only provides "half" matches.
194
    ///
195
    /// ```
196
    /// use regex_automata::{
197
    ///     hybrid::dfa::DFA,
198
    ///     util::iter::Searcher,
199
    ///     HalfMatch, Input,
200
    /// };
201
    ///
202
    /// let re = DFA::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
203
    /// let mut cache = re.create_cache();
204
    ///
205
    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
206
    /// let mut it = Searcher::new(input);
207
    ///
208
    /// let expected = Some(HalfMatch::must(0, 10));
209
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
210
    /// assert_eq!(expected, got);
211
    ///
212
    /// let expected = Some(HalfMatch::must(0, 21));
213
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
214
    /// assert_eq!(expected, got);
215
    ///
216
    /// let expected = Some(HalfMatch::must(0, 32));
217
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
218
    /// assert_eq!(expected, got);
219
    ///
220
    /// let expected = None;
221
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
222
    /// assert_eq!(expected, got);
223
    ///
224
    /// # Ok::<(), Box<dyn std::error::Error>>(())
225
    /// ```
226
    ///
227
    /// This correctly moves iteration forward even when an empty match occurs:
228
    ///
229
    /// ```
230
    /// use regex_automata::{
231
    ///     hybrid::dfa::DFA,
232
    ///     util::iter::Searcher,
233
    ///     HalfMatch, Input,
234
    /// };
235
    ///
236
    /// let re = DFA::new(r"a|")?;
237
    /// let mut cache = re.create_cache();
238
    ///
239
    /// let input = Input::new("abba");
240
    /// let mut it = Searcher::new(input);
241
    ///
242
    /// let expected = Some(HalfMatch::must(0, 1));
243
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
244
    /// assert_eq!(expected, got);
245
    ///
246
    /// let expected = Some(HalfMatch::must(0, 2));
247
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
248
    /// assert_eq!(expected, got);
249
    ///
250
    /// let expected = Some(HalfMatch::must(0, 4));
251
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
252
    /// assert_eq!(expected, got);
253
    ///
254
    /// let expected = None;
255
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
256
    /// assert_eq!(expected, got);
257
    ///
258
    /// # Ok::<(), Box<dyn std::error::Error>>(())
259
    /// ```
260
    #[inline]
261
0
    pub fn advance_half<F>(&mut self, finder: F) -> Option<HalfMatch>
262
0
    where
263
0
        F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
264
0
    {
265
0
        match self.try_advance_half(finder) {
266
0
            Ok(m) => m,
267
0
            Err(err) => panic!(
268
0
                "unexpected regex half find error: {}\n\
269
0
                 to handle find errors, use 'try' or 'search' methods",
270
0
                err,
271
0
            ),
272
        }
273
0
    }
274
275
    /// Return the next match for an infallible search if one exists, and
276
    /// advance to the next position.
277
    ///
278
    /// The search is advanced even in the presence of empty matches by
279
    /// forbidding empty matches from overlapping with any other match.
280
    ///
281
    /// This is like `try_advance`, except errors are converted into panics.
282
    ///
283
    /// # Panics
284
    ///
285
    /// If the given closure returns an error, then this panics. This is useful
286
    /// when you know your underlying regex engine has been configured to not
287
    /// return an error.
288
    ///
289
    /// # Example
290
    ///
291
    /// This example shows how to use a `Searcher` to iterate over all matches
292
    /// when using a regex based on lazy DFAs:
293
    ///
294
    /// ```
295
    /// use regex_automata::{
296
    ///     hybrid::regex::Regex,
297
    ///     util::iter::Searcher,
298
    ///     Match, Input,
299
    /// };
300
    ///
301
    /// let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
302
    /// let mut cache = re.create_cache();
303
    ///
304
    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
305
    /// let mut it = Searcher::new(input);
306
    ///
307
    /// let expected = Some(Match::must(0, 0..10));
308
    /// let got = it.advance(|input| re.try_search(&mut cache, input));
309
    /// assert_eq!(expected, got);
310
    ///
311
    /// let expected = Some(Match::must(0, 11..21));
312
    /// let got = it.advance(|input| re.try_search(&mut cache, input));
313
    /// assert_eq!(expected, got);
314
    ///
315
    /// let expected = Some(Match::must(0, 22..32));
316
    /// let got = it.advance(|input| re.try_search(&mut cache, input));
317
    /// assert_eq!(expected, got);
318
    ///
319
    /// let expected = None;
320
    /// let got = it.advance(|input| re.try_search(&mut cache, input));
321
    /// assert_eq!(expected, got);
322
    ///
323
    /// # Ok::<(), Box<dyn std::error::Error>>(())
324
    /// ```
325
    ///
326
    /// This example shows the same as above, but with the PikeVM. This example
327
    /// is useful because it shows how to use this API even when the regex
328
    /// engine doesn't directly return a `Match`.
329
    ///
330
    /// ```
331
    /// use regex_automata::{
332
    ///     nfa::thompson::pikevm::PikeVM,
333
    ///     util::iter::Searcher,
334
    ///     Match, Input,
335
    /// };
336
    ///
337
    /// let re = PikeVM::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
338
    /// let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
339
    ///
340
    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
341
    /// let mut it = Searcher::new(input);
342
    ///
343
    /// let expected = Some(Match::must(0, 0..10));
344
    /// let got = it.advance(|input| {
345
    ///     re.search(&mut cache, input, &mut caps);
346
    ///     Ok(caps.get_match())
347
    /// });
348
    /// // Note that if we wanted to extract capturing group spans, we could
349
    /// // do that here with 'caps'.
350
    /// assert_eq!(expected, got);
351
    ///
352
    /// let expected = Some(Match::must(0, 11..21));
353
    /// let got = it.advance(|input| {
354
    ///     re.search(&mut cache, input, &mut caps);
355
    ///     Ok(caps.get_match())
356
    /// });
357
    /// assert_eq!(expected, got);
358
    ///
359
    /// let expected = Some(Match::must(0, 22..32));
360
    /// let got = it.advance(|input| {
361
    ///     re.search(&mut cache, input, &mut caps);
362
    ///     Ok(caps.get_match())
363
    /// });
364
    /// assert_eq!(expected, got);
365
    ///
366
    /// let expected = None;
367
    /// let got = it.advance(|input| {
368
    ///     re.search(&mut cache, input, &mut caps);
369
    ///     Ok(caps.get_match())
370
    /// });
371
    /// assert_eq!(expected, got);
372
    ///
373
    /// # Ok::<(), Box<dyn std::error::Error>>(())
374
    /// ```
375
    #[inline]
376
0
    pub fn advance<F>(&mut self, finder: F) -> Option<Match>
377
0
    where
378
0
        F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
379
0
    {
380
0
        match self.try_advance(finder) {
381
0
            Ok(m) => m,
382
0
            Err(err) => panic!(
383
0
                "unexpected regex find error: {}\n\
384
0
                 to handle find errors, use 'try' or 'search' methods",
385
0
                err,
386
0
            ),
387
        }
388
0
    }
389
390
    /// Return the next half match for a fallible search if one exists, and
391
    /// advance to the next position.
392
    ///
393
    /// This is like `advance_half`, except it permits callers to handle errors
394
    /// during iteration.
395
    #[inline]
396
0
    pub fn try_advance_half<F>(
397
0
        &mut self,
398
0
        mut finder: F,
399
0
    ) -> Result<Option<HalfMatch>, MatchError>
400
0
    where
401
0
        F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
402
0
    {
403
0
        let mut m = match finder(&self.input)? {
404
0
            None => return Ok(None),
405
0
            Some(m) => m,
406
0
        };
407
0
        if Some(m.offset()) == self.last_match_end {
408
0
            m = match self.handle_overlapping_empty_half_match(m, finder)? {
409
0
                None => return Ok(None),
410
0
                Some(m) => m,
411
            };
412
0
        }
413
0
        self.input.set_start(m.offset());
414
0
        self.last_match_end = Some(m.offset());
415
0
        Ok(Some(m))
416
0
    }
417
418
    /// Return the next match for a fallible search if one exists, and advance
419
    /// to the next position.
420
    ///
421
    /// This is like `advance`, except it permits callers to handle errors
422
    /// during iteration.
423
    #[inline]
424
0
    pub fn try_advance<F>(
425
0
        &mut self,
426
0
        mut finder: F,
427
0
    ) -> Result<Option<Match>, MatchError>
428
0
    where
429
0
        F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
430
0
    {
431
0
        let mut m = match finder(&self.input)? {
432
0
            None => return Ok(None),
433
0
            Some(m) => m,
434
0
        };
435
0
        if m.is_empty() && Some(m.end()) == self.last_match_end {
436
0
            m = match self.handle_overlapping_empty_match(m, finder)? {
437
0
                None => return Ok(None),
438
0
                Some(m) => m,
439
            };
440
0
        }
441
0
        self.input.set_start(m.end());
442
0
        self.last_match_end = Some(m.end());
443
0
        Ok(Some(m))
444
0
    }
445
446
    /// Given a closure that executes a single search, return an iterator over
447
    /// all successive non-overlapping half matches.
448
    ///
449
    /// The iterator returned yields result values. If the underlying regex
450
    /// engine is configured to never return an error, consider calling
451
    /// [`TryHalfMatchesIter::infallible`] to convert errors into panics.
452
    ///
453
    /// # Example
454
    ///
455
    /// This example shows how to use a `Searcher` to create a proper
456
    /// iterator over half matches.
457
    ///
458
    /// ```
459
    /// use regex_automata::{
460
    ///     hybrid::dfa::DFA,
461
    ///     util::iter::Searcher,
462
    ///     HalfMatch, Input,
463
    /// };
464
    ///
465
    /// let re = DFA::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
466
    /// let mut cache = re.create_cache();
467
    ///
468
    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
469
    /// let mut it = Searcher::new(input).into_half_matches_iter(|input| {
470
    ///     re.try_search_fwd(&mut cache, input)
471
    /// });
472
    ///
473
    /// let expected = Some(Ok(HalfMatch::must(0, 10)));
474
    /// assert_eq!(expected, it.next());
475
    ///
476
    /// let expected = Some(Ok(HalfMatch::must(0, 21)));
477
    /// assert_eq!(expected, it.next());
478
    ///
479
    /// let expected = Some(Ok(HalfMatch::must(0, 32)));
480
    /// assert_eq!(expected, it.next());
481
    ///
482
    /// let expected = None;
483
    /// assert_eq!(expected, it.next());
484
    ///
485
    /// # Ok::<(), Box<dyn std::error::Error>>(())
486
    /// ```
487
    #[inline]
488
0
    pub fn into_half_matches_iter<F>(
489
0
        self,
490
0
        finder: F,
491
0
    ) -> TryHalfMatchesIter<'h, F>
492
0
    where
493
0
        F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
494
0
    {
495
0
        TryHalfMatchesIter { it: self, finder }
496
0
    }
497
498
    /// Given a closure that executes a single search, return an iterator over
499
    /// all successive non-overlapping matches.
500
    ///
501
    /// The iterator returned yields result values. If the underlying regex
502
    /// engine is configured to never return an error, consider calling
503
    /// [`TryMatchesIter::infallible`] to convert errors into panics.
504
    ///
505
    /// # Example
506
    ///
507
    /// This example shows how to use a `Searcher` to create a proper
508
    /// iterator over matches.
509
    ///
510
    /// ```
511
    /// use regex_automata::{
512
    ///     hybrid::regex::Regex,
513
    ///     util::iter::Searcher,
514
    ///     Match, Input,
515
    /// };
516
    ///
517
    /// let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
518
    /// let mut cache = re.create_cache();
519
    ///
520
    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
521
    /// let mut it = Searcher::new(input).into_matches_iter(|input| {
522
    ///     re.try_search(&mut cache, input)
523
    /// });
524
    ///
525
    /// let expected = Some(Ok(Match::must(0, 0..10)));
526
    /// assert_eq!(expected, it.next());
527
    ///
528
    /// let expected = Some(Ok(Match::must(0, 11..21)));
529
    /// assert_eq!(expected, it.next());
530
    ///
531
    /// let expected = Some(Ok(Match::must(0, 22..32)));
532
    /// assert_eq!(expected, it.next());
533
    ///
534
    /// let expected = None;
535
    /// assert_eq!(expected, it.next());
536
    ///
537
    /// # Ok::<(), Box<dyn std::error::Error>>(())
538
    /// ```
539
    #[inline]
540
0
    pub fn into_matches_iter<F>(self, finder: F) -> TryMatchesIter<'h, F>
541
0
    where
542
0
        F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
543
0
    {
544
0
        TryMatchesIter { it: self, finder }
545
0
    }
546
547
    /// Given a closure that executes a single search, return an iterator over
548
    /// all successive non-overlapping `Captures` values.
549
    ///
550
    /// The iterator returned yields result values. If the underlying regex
551
    /// engine is configured to never return an error, consider calling
552
    /// [`TryCapturesIter::infallible`] to convert errors into panics.
553
    ///
554
    /// Unlike the other iterator constructors, this accepts an initial
555
    /// `Captures` value. This `Captures` value is reused for each search, and
556
    /// the iterator implementation clones it before returning it. The caller
557
    /// must provide this value because the iterator is purposely ignorant
558
    /// of the underlying regex engine and thus doesn't know how to create
559
    /// one itself. More to the point, a `Captures` value itself has a few
560
    /// different constructors, which change which kind of information is
561
    /// available to query in exchange for search performance.
562
    ///
563
    /// # Example
564
    ///
565
    /// This example shows how to use a `Searcher` to create a proper iterator
566
    /// over `Captures` values, which provides access to all capturing group
567
    /// spans for each match.
568
    ///
569
    /// ```
570
    /// use regex_automata::{
571
    ///     nfa::thompson::pikevm::PikeVM,
572
    ///     util::iter::Searcher,
573
    ///     Input,
574
    /// };
575
    ///
576
    /// let re = PikeVM::new(
577
    ///     r"(?P<y>[0-9]{4})-(?P<m>[0-9]{2})-(?P<d>[0-9]{2})",
578
    /// )?;
579
    /// let (mut cache, caps) = (re.create_cache(), re.create_captures());
580
    ///
581
    /// let haystack = "2010-03-14 2016-10-08 2020-10-22";
582
    /// let input = Input::new(haystack);
583
    /// let mut it = Searcher::new(input)
584
    ///     .into_captures_iter(caps, |input, caps| {
585
    ///         re.search(&mut cache, input, caps);
586
    ///         Ok(())
587
    ///     });
588
    ///
589
    /// let got = it.next().expect("first date")?;
590
    /// let year = got.get_group_by_name("y").expect("must match");
591
    /// assert_eq!("2010", &haystack[year]);
592
    ///
593
    /// let got = it.next().expect("second date")?;
594
    /// let month = got.get_group_by_name("m").expect("must match");
595
    /// assert_eq!("10", &haystack[month]);
596
    ///
597
    /// let got = it.next().expect("third date")?;
598
    /// let day = got.get_group_by_name("d").expect("must match");
599
    /// assert_eq!("22", &haystack[day]);
600
    ///
601
    /// assert!(it.next().is_none());
602
    ///
603
    /// # Ok::<(), Box<dyn std::error::Error>>(())
604
    /// ```
605
    #[cfg(feature = "alloc")]
606
    #[inline]
607
0
    pub fn into_captures_iter<F>(
608
0
        self,
609
0
        caps: Captures,
610
0
        finder: F,
611
0
    ) -> TryCapturesIter<'h, F>
612
0
    where
613
0
        F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>,
614
0
    {
615
0
        TryCapturesIter { it: self, caps, finder }
616
0
    }
617
618
    /// Handles the special case of a match that begins where the previous
619
    /// match ended. Without this special handling, it'd be possible to get
620
    /// stuck where an empty match never results in forward progress. This
621
    /// also makes it more consistent with how presiding general purpose regex
622
    /// engines work.
623
    #[cold]
624
    #[inline(never)]
625
0
    fn handle_overlapping_empty_half_match<F>(
626
0
        &mut self,
627
0
        _: HalfMatch,
628
0
        mut finder: F,
629
0
    ) -> Result<Option<HalfMatch>, MatchError>
630
0
    where
631
0
        F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
632
0
    {
633
0
        // Since we are only here when 'm.offset()' matches the offset of the
634
0
        // last match, it follows that this must have been an empty match.
635
0
        // Since we both need to make progress *and* prevent overlapping
636
0
        // matches, we discard this match and advance the search by 1.
637
0
        //
638
0
        // Note that this may start a search in the middle of a codepoint. The
639
0
        // regex engines themselves are expected to deal with that and not
640
0
        // report any matches within a codepoint if they are configured in
641
0
        // UTF-8 mode.
642
0
        self.input.set_start(self.input.start().checked_add(1).unwrap());
643
0
        finder(&self.input)
644
0
    }
645
646
    /// Handles the special case of an empty match by ensuring that 1) the
647
    /// iterator always advances and 2) empty matches never overlap with other
648
    /// matches.
649
    ///
650
    /// (1) is necessary because we principally make progress by setting the
651
    /// starting location of the next search to the ending location of the last
652
    /// match. But if a match is empty, then this results in a search that does
653
    /// not advance and thus does not terminate.
654
    ///
655
    /// (2) is not strictly necessary, but makes intuitive sense and matches
656
    /// the presiding behavior of most general purpose regex engines. The
657
    /// "intuitive sense" here is that we want to report NON-overlapping
658
    /// matches. So for example, given the regex 'a|(?:)' against the haystack
659
    /// 'a', without the special handling, you'd get the matches [0, 1) and [1,
660
    /// 1), where the latter overlaps with the end bounds of the former.
661
    ///
662
    /// Note that we mark this cold and forcefully prevent inlining because
663
    /// handling empty matches like this is extremely rare and does require
664
    /// quite a bit of code, comparatively. Keeping this code out of the main
665
    /// iterator function keeps it smaller and more amenable to inlining
666
    /// itself.
667
    #[cold]
668
    #[inline(never)]
669
0
    fn handle_overlapping_empty_match<F>(
670
0
        &mut self,
671
0
        m: Match,
672
0
        mut finder: F,
673
0
    ) -> Result<Option<Match>, MatchError>
674
0
    where
675
0
        F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
676
0
    {
677
0
        assert!(m.is_empty());
678
0
        self.input.set_start(self.input.start().checked_add(1).unwrap());
679
0
        finder(&self.input)
680
0
    }
681
}
682
683
/// An iterator over all non-overlapping half matches for a fallible search.
684
///
685
/// The iterator yields a `Result<HalfMatch, MatchError>` value until no more
686
/// matches could be found.
687
///
688
/// The type parameters are as follows:
689
///
690
/// * `F` represents the type of a closure that executes the search.
691
///
692
/// The lifetime parameters come from the [`Input`] type:
693
///
694
/// * `'h` is the lifetime of the underlying haystack.
695
///
696
/// When possible, prefer the iterators defined on the regex engine you're
697
/// using. This tries to abstract over the regex engine and is thus a bit more
698
/// unwieldy to use.
699
///
700
/// This iterator is created by [`Searcher::into_half_matches_iter`].
701
pub struct TryHalfMatchesIter<'h, F> {
702
    it: Searcher<'h>,
703
    finder: F,
704
}
705
706
impl<'h, F> TryHalfMatchesIter<'h, F> {
707
    /// Return an infallible version of this iterator.
708
    ///
709
    /// Any item yielded that corresponds to an error results in a panic. This
710
    /// is useful if your underlying regex engine is configured in a way that
711
    /// it is guaranteed to never return an error.
712
0
    pub fn infallible(self) -> HalfMatchesIter<'h, F> {
713
0
        HalfMatchesIter(self)
714
0
    }
715
716
    /// Returns the current `Input` used by this iterator.
717
    ///
718
    /// The `Input` returned is generally equivalent to the one used to
719
    /// construct this iterator, but its start position may be different to
720
    /// reflect the start of the next search to be executed.
721
0
    pub fn input<'i>(&'i self) -> &'i Input<'h> {
722
0
        self.it.input()
723
0
    }
724
}
725
726
impl<'h, F> Iterator for TryHalfMatchesIter<'h, F>
727
where
728
    F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
729
{
730
    type Item = Result<HalfMatch, MatchError>;
731
732
    #[inline]
733
0
    fn next(&mut self) -> Option<Result<HalfMatch, MatchError>> {
734
0
        self.it.try_advance_half(&mut self.finder).transpose()
735
0
    }
736
}
737
738
impl<'h, F> core::fmt::Debug for TryHalfMatchesIter<'h, F> {
739
0
    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
740
0
        f.debug_struct("TryHalfMatchesIter")
741
0
            .field("it", &self.it)
742
0
            .field("finder", &"<closure>")
743
0
            .finish()
744
0
    }
745
}
746
747
/// An iterator over all non-overlapping half matches for an infallible search.
748
///
749
/// The iterator yields a [`HalfMatch`] value until no more matches could be
750
/// found.
751
///
752
/// The type parameters are as follows:
753
///
754
/// * `F` represents the type of a closure that executes the search.
755
///
756
/// The lifetime parameters come from the [`Input`] type:
757
///
758
/// * `'h` is the lifetime of the underlying haystack.
759
///
760
/// When possible, prefer the iterators defined on the regex engine you're
761
/// using. This tries to abstract over the regex engine and is thus a bit more
762
/// unwieldy to use.
763
///
764
/// This iterator is created by [`Searcher::into_half_matches_iter`] and
765
/// then calling [`TryHalfMatchesIter::infallible`].
766
#[derive(Debug)]
767
pub struct HalfMatchesIter<'h, F>(TryHalfMatchesIter<'h, F>);
768
769
impl<'h, F> HalfMatchesIter<'h, F> {
770
    /// Returns the current `Input` used by this iterator.
771
    ///
772
    /// The `Input` returned is generally equivalent to the one used to
773
    /// construct this iterator, but its start position may be different to
774
    /// reflect the start of the next search to be executed.
775
0
    pub fn input<'i>(&'i self) -> &'i Input<'h> {
776
0
        self.0.it.input()
777
0
    }
778
}
779
780
impl<'h, F> Iterator for HalfMatchesIter<'h, F>
781
where
782
    F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
783
{
784
    type Item = HalfMatch;
785
786
    #[inline]
787
0
    fn next(&mut self) -> Option<HalfMatch> {
788
0
        match self.0.next()? {
789
0
            Ok(m) => Some(m),
790
0
            Err(err) => panic!(
791
0
                "unexpected regex half find error: {}\n\
792
0
                 to handle find errors, use 'try' or 'search' methods",
793
0
                err,
794
0
            ),
795
        }
796
0
    }
797
}
798
799
/// An iterator over all non-overlapping matches for a fallible search.
800
///
801
/// The iterator yields a `Result<Match, MatchError>` value until no more
802
/// matches could be found.
803
///
804
/// The type parameters are as follows:
805
///
806
/// * `F` represents the type of a closure that executes the search.
807
///
808
/// The lifetime parameters come from the [`Input`] type:
809
///
810
/// * `'h` is the lifetime of the underlying haystack.
811
///
812
/// When possible, prefer the iterators defined on the regex engine you're
813
/// using. This tries to abstract over the regex engine and is thus a bit more
814
/// unwieldy to use.
815
///
816
/// This iterator is created by [`Searcher::into_matches_iter`].
817
pub struct TryMatchesIter<'h, F> {
818
    it: Searcher<'h>,
819
    finder: F,
820
}
821
822
impl<'h, F> TryMatchesIter<'h, F> {
823
    /// Return an infallible version of this iterator.
824
    ///
825
    /// Any item yielded that corresponds to an error results in a panic. This
826
    /// is useful if your underlying regex engine is configured in a way that
827
    /// it is guaranteed to never return an error.
828
0
    pub fn infallible(self) -> MatchesIter<'h, F> {
829
0
        MatchesIter(self)
830
0
    }
831
832
    /// Returns the current `Input` used by this iterator.
833
    ///
834
    /// The `Input` returned is generally equivalent to the one used to
835
    /// construct this iterator, but its start position may be different to
836
    /// reflect the start of the next search to be executed.
837
0
    pub fn input<'i>(&'i self) -> &'i Input<'h> {
838
0
        self.it.input()
839
0
    }
840
}
841
842
impl<'h, F> Iterator for TryMatchesIter<'h, F>
843
where
844
    F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
845
{
846
    type Item = Result<Match, MatchError>;
847
848
    #[inline]
849
0
    fn next(&mut self) -> Option<Result<Match, MatchError>> {
850
0
        self.it.try_advance(&mut self.finder).transpose()
851
0
    }
852
}
853
854
impl<'h, F> core::fmt::Debug for TryMatchesIter<'h, F> {
855
0
    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
856
0
        f.debug_struct("TryMatchesIter")
857
0
            .field("it", &self.it)
858
0
            .field("finder", &"<closure>")
859
0
            .finish()
860
0
    }
861
}
862
863
/// An iterator over all non-overlapping matches for an infallible search.
864
///
865
/// The iterator yields a [`Match`] value until no more matches could be found.
866
///
867
/// The type parameters are as follows:
868
///
869
/// * `F` represents the type of a closure that executes the search.
870
///
871
/// The lifetime parameters come from the [`Input`] type:
872
///
873
/// * `'h` is the lifetime of the underlying haystack.
874
///
875
/// When possible, prefer the iterators defined on the regex engine you're
876
/// using. This tries to abstract over the regex engine and is thus a bit more
877
/// unwieldy to use.
878
///
879
/// This iterator is created by [`Searcher::into_matches_iter`] and
880
/// then calling [`TryMatchesIter::infallible`].
881
#[derive(Debug)]
882
pub struct MatchesIter<'h, F>(TryMatchesIter<'h, F>);
883
884
impl<'h, F> MatchesIter<'h, F> {
885
    /// Returns the current `Input` used by this iterator.
886
    ///
887
    /// The `Input` returned is generally equivalent to the one used to
888
    /// construct this iterator, but its start position may be different to
889
    /// reflect the start of the next search to be executed.
890
0
    pub fn input<'i>(&'i self) -> &'i Input<'h> {
891
0
        self.0.it.input()
892
0
    }
893
}
894
895
impl<'h, F> Iterator for MatchesIter<'h, F>
896
where
897
    F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
898
{
899
    type Item = Match;
900
901
    #[inline]
902
0
    fn next(&mut self) -> Option<Match> {
903
0
        match self.0.next()? {
904
0
            Ok(m) => Some(m),
905
0
            Err(err) => panic!(
906
0
                "unexpected regex find error: {}\n\
907
0
                 to handle find errors, use 'try' or 'search' methods",
908
0
                err,
909
0
            ),
910
        }
911
0
    }
912
}
913
914
/// An iterator over all non-overlapping captures for a fallible search.
915
///
916
/// The iterator yields a `Result<Captures, MatchError>` value until no more
917
/// matches could be found.
918
///
919
/// The type parameters are as follows:
920
///
921
/// * `F` represents the type of a closure that executes the search.
922
///
923
/// The lifetime parameters come from the [`Input`] type:
924
///
925
/// * `'h` is the lifetime of the underlying haystack.
926
///
927
/// When possible, prefer the iterators defined on the regex engine you're
928
/// using. This tries to abstract over the regex engine and is thus a bit more
929
/// unwieldy to use.
930
///
931
/// This iterator is created by [`Searcher::into_captures_iter`].
932
#[cfg(feature = "alloc")]
933
pub struct TryCapturesIter<'h, F> {
934
    it: Searcher<'h>,
935
    caps: Captures,
936
    finder: F,
937
}
938
939
#[cfg(feature = "alloc")]
940
impl<'h, F> TryCapturesIter<'h, F> {
941
    /// Return an infallible version of this iterator.
942
    ///
943
    /// Any item yielded that corresponds to an error results in a panic. This
944
    /// is useful if your underlying regex engine is configured in a way that
945
    /// it is guaranteed to never return an error.
946
0
    pub fn infallible(self) -> CapturesIter<'h, F> {
947
0
        CapturesIter(self)
948
0
    }
949
}
950
951
#[cfg(feature = "alloc")]
952
impl<'h, F> Iterator for TryCapturesIter<'h, F>
953
where
954
    F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>,
955
{
956
    type Item = Result<Captures, MatchError>;
957
958
    #[inline]
959
0
    fn next(&mut self) -> Option<Result<Captures, MatchError>> {
960
0
        let TryCapturesIter { ref mut it, ref mut caps, ref mut finder } =
961
0
            *self;
962
0
        let result = it
963
0
            .try_advance(|input| {
964
0
                (finder)(input, caps)?;
965
0
                Ok(caps.get_match())
966
0
            })
967
0
            .transpose()?;
968
0
        match result {
969
0
            Ok(_) => Some(Ok(caps.clone())),
970
0
            Err(err) => Some(Err(err)),
971
        }
972
0
    }
973
}
974
975
#[cfg(feature = "alloc")]
976
impl<'h, F> core::fmt::Debug for TryCapturesIter<'h, F> {
977
0
    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
978
0
        f.debug_struct("TryCapturesIter")
979
0
            .field("it", &self.it)
980
0
            .field("caps", &self.caps)
981
0
            .field("finder", &"<closure>")
982
0
            .finish()
983
0
    }
984
}
985
986
/// An iterator over all non-overlapping captures for an infallible search.
987
///
988
/// The iterator yields a [`Captures`] value until no more matches could be
989
/// found.
990
///
991
/// The type parameters are as follows:
992
///
993
/// * `F` represents the type of a closure that executes the search.
994
///
995
/// The lifetime parameters come from the [`Input`] type:
996
///
997
/// * `'h` is the lifetime of the underlying haystack.
998
///
999
/// When possible, prefer the iterators defined on the regex engine you're
1000
/// using. This tries to abstract over the regex engine and is thus a bit more
1001
/// unwieldy to use.
1002
///
1003
/// This iterator is created by [`Searcher::into_captures_iter`] and then
1004
/// calling [`TryCapturesIter::infallible`].
1005
#[cfg(feature = "alloc")]
1006
#[derive(Debug)]
1007
pub struct CapturesIter<'h, F>(TryCapturesIter<'h, F>);
1008
1009
#[cfg(feature = "alloc")]
1010
impl<'h, F> Iterator for CapturesIter<'h, F>
1011
where
1012
    F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>,
1013
{
1014
    type Item = Captures;
1015
1016
    #[inline]
1017
0
    fn next(&mut self) -> Option<Captures> {
1018
0
        match self.0.next()? {
1019
0
            Ok(m) => Some(m),
1020
0
            Err(err) => panic!(
1021
0
                "unexpected regex captures error: {}\n\
1022
0
                 to handle find errors, use 'try' or 'search' methods",
1023
0
                err,
1024
0
            ),
1025
        }
1026
0
    }
1027
}