Coverage Report

Created: 2025-11-16 06:23

next uncovered line (L), next uncovered region (R), next uncovered branch (B)
/src/regex/regex-automata/src/util/iter.rs
Line
Count
Source
1
/*!
2
Generic helpers for iteration of matches from a regex engine in a haystack.
3
4
The principle type in this module is a [`Searcher`]. A `Searcher` provides
5
its own lower level iterator-like API in addition to methods for constructing
6
types that implement `Iterator`. The documentation for `Searcher` explains a
7
bit more about why these different APIs exist.
8
9
Currently, this module supports iteration over any regex engine that works
10
with the [`HalfMatch`], [`Match`] or [`Captures`] types.
11
*/
12
13
#[cfg(feature = "alloc")]
14
use crate::util::captures::Captures;
15
use crate::util::search::{HalfMatch, Input, Match, MatchError};
16
17
/// A searcher for creating iterators and performing lower level iteration.
18
///
19
/// This searcher encapsulates the logic required for finding all successive
20
/// non-overlapping matches in a haystack. In theory, iteration would look
21
/// something like this:
22
///
23
/// 1. Setting the start position to `0`.
24
/// 2. Execute a regex search. If no match, end iteration.
25
/// 3. Report the match and set the start position to the end of the match.
26
/// 4. Go back to (2).
27
///
28
/// And if this were indeed the case, it's likely that `Searcher` wouldn't
29
/// exist. Unfortunately, because a regex may match the empty string, the above
30
/// logic won't work for all possible regexes. Namely, if an empty match is
31
/// found, then step (3) would set the start position of the search to the
32
/// position it was at. Thus, iteration would never end.
33
///
34
/// Instead, a `Searcher` knows how to detect these cases and forcefully
35
/// advance iteration in the case of an empty match that overlaps with a
36
/// previous match.
37
///
38
/// If you know that your regex cannot match any empty string, then the simple
39
/// algorithm described above will work correctly.
40
///
41
/// When possible, prefer the iterators defined on the regex engine you're
42
/// using. This tries to abstract over the regex engine and is thus a bit more
43
/// unwieldy to use.
44
///
45
/// In particular, a `Searcher` is not itself an iterator. Instead, it provides
46
/// `advance` routines that permit moving the search along explicitly. It also
47
/// provides various routines, like [`Searcher::into_matches_iter`], that
48
/// accept a closure (representing how a regex engine executes a search) and
49
/// returns a conventional iterator.
50
///
51
/// The lifetime parameters come from the [`Input`] type passed to
52
/// [`Searcher::new`]:
53
///
54
/// * `'h` is the lifetime of the underlying haystack.
55
///
56
/// # Searcher vs Iterator
57
///
58
/// Why does a search type with "advance" APIs exist at all when we also have
59
/// iterators? Unfortunately, the reasoning behind this split is a complex
60
/// combination of the following things:
61
///
62
/// 1. While many of the regex engines expose their own iterators, it is also
63
/// nice to expose this lower level iteration helper because it permits callers
64
/// to provide their own `Input` configuration. Moreover, a `Searcher` can work
65
/// with _any_ regex engine instead of only the ones defined in this crate.
66
/// This way, everyone benefits from a shared iteration implementation.
67
/// 2. There are many different regex engines that, while they have the same
68
/// match semantics, they have slightly different APIs. Iteration is just
69
/// complex enough to want to share code, and so we need a way of abstracting
70
/// over those different regex engines. While we could define a new trait that
71
/// describes any regex engine search API, it would wind up looking very close
72
/// to a closure. While there may still be reasons for the more generic trait
73
/// to exist, for now and for the purposes of iteration, we use a closure.
74
/// Closures also provide a lot of easy flexibility at the call site, in that
75
/// they permit the caller to borrow any kind of state they want for use during
76
/// each search call.
77
/// 3. As a result of using closures, and because closures are anonymous types
78
/// that cannot be named, it is difficult to encapsulate them without both
79
/// costs to speed and added complexity to the public API. For example, in
80
/// defining an iterator type like
81
/// [`dfa::regex::FindMatches`](crate::dfa::regex::FindMatches),
82
/// if we use a closure internally, it's not possible to name this type in the
83
/// return type of the iterator constructor. Thus, the only way around it is
84
/// to erase the type by boxing it and turning it into a `Box<dyn FnMut ...>`.
85
/// This boxed closure is unlikely to be inlined _and_ it infects the public
86
/// API in subtle ways. Namely, unless you declare the closure as implementing
87
/// `Send` and `Sync`, then the resulting iterator type won't implement it
88
/// either. But there are practical issues with requiring the closure to
89
/// implement `Send` and `Sync` that result in other API complexities that
90
/// are beyond the scope of this already long exposition.
91
/// 4. Some regex engines expose more complex match information than just
92
/// "which pattern matched" and "at what offsets." For example, the PikeVM
93
/// exposes match spans for each capturing group that participated in the
94
/// match. In such cases, it can be quite beneficial to reuse the capturing
95
/// group allocation on subsequent searches. A proper iterator doesn't permit
96
/// this API due to its interface, so it's useful to have something a bit lower
97
/// level that permits callers to amortize allocations while also reusing a
98
/// shared implementation of iteration. (See the documentation for
99
/// [`Searcher::advance`] for an example of using the "advance" API with the
100
/// PikeVM.)
101
///
102
/// What this boils down to is that there are "advance" APIs which require
103
/// handing a closure to it for every call, and there are also APIs to create
104
/// iterators from a closure. The former are useful for _implementing_
105
/// iterators or when you need more flexibility, while the latter are useful
106
/// for conveniently writing custom iterators on-the-fly.
107
///
108
/// # Example: iterating with captures
109
///
110
/// Several regex engines in this crate over convenient iterator APIs over
111
/// [`Captures`] values. To do so, this requires allocating a new `Captures`
112
/// value for each iteration step. This can perhaps be more costly than you
113
/// might want. Instead of implementing your own iterator to avoid that
114
/// cost (which can be a little subtle if you want to handle empty matches
115
/// correctly), you can use this `Searcher` to do it for you:
116
///
117
/// ```
118
/// use regex_automata::{
119
///     nfa::thompson::pikevm::PikeVM,
120
///     util::iter::Searcher,
121
///     Input, Span,
122
/// };
123
///
124
/// let re = PikeVM::new("foo(?P<numbers>[0-9]+)")?;
125
/// let haystack = "foo1 foo12 foo123";
126
///
127
/// let mut caps = re.create_captures();
128
/// let mut cache = re.create_cache();
129
/// let mut matches = vec![];
130
/// let mut searcher = Searcher::new(Input::new(haystack));
131
/// while let Some(_) = searcher.advance(|input| {
132
///     re.search(&mut cache, input, &mut caps);
133
///     Ok(caps.get_match())
134
/// }) {
135
///     // The unwrap is OK since 'numbers' matches if the pattern matches.
136
///     matches.push(caps.get_group_by_name("numbers").unwrap());
137
/// }
138
/// assert_eq!(matches, vec![
139
///     Span::from(3..4),
140
///     Span::from(8..10),
141
///     Span::from(14..17),
142
/// ]);
143
///
144
/// # Ok::<(), Box<dyn std::error::Error>>(())
145
/// ```
146
#[derive(Clone, Debug)]
147
pub struct Searcher<'h> {
148
    /// The input parameters to give to each regex engine call.
149
    ///
150
    /// The start position of the search is mutated during iteration.
151
    input: Input<'h>,
152
    /// Records the end offset of the most recent match. This is necessary to
153
    /// handle a corner case for preventing empty matches from overlapping with
154
    /// the ending bounds of a prior match.
155
    last_match_end: Option<usize>,
156
}
157
158
impl<'h> Searcher<'h> {
159
    /// Create a new fallible non-overlapping matches iterator.
160
    ///
161
    /// The given `input` provides the parameters (including the haystack),
162
    /// while the `finder` represents a closure that calls the underlying regex
163
    /// engine. The closure may borrow any additional state that is needed,
164
    /// such as a prefilter scanner.
165
0
    pub fn new(input: Input<'h>) -> Searcher<'h> {
166
0
        Searcher { input, last_match_end: None }
167
0
    }
168
169
    /// Returns the current `Input` used by this searcher.
170
    ///
171
    /// The `Input` returned is generally equivalent to the one given to
172
    /// [`Searcher::new`], but its start position may be different to reflect
173
    /// the start of the next search to be executed.
174
0
    pub fn input<'s>(&'s self) -> &'s Input<'h> {
175
0
        &self.input
176
0
    }
177
178
    /// Return the next half match for an infallible search if one exists, and
179
    /// advance to the next position.
180
    ///
181
    /// This is like `try_advance_half`, except errors are converted into
182
    /// panics.
183
    ///
184
    /// # Panics
185
    ///
186
    /// If the given closure returns an error, then this panics. This is useful
187
    /// when you know your underlying regex engine has been configured to not
188
    /// return an error.
189
    ///
190
    /// # Example
191
    ///
192
    /// This example shows how to use a `Searcher` to iterate over all matches
193
    /// when using a DFA, which only provides "half" matches.
194
    ///
195
    /// ```
196
    /// use regex_automata::{
197
    ///     hybrid::dfa::DFA,
198
    ///     util::iter::Searcher,
199
    ///     HalfMatch, Input,
200
    /// };
201
    ///
202
    /// let re = DFA::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
203
    /// let mut cache = re.create_cache();
204
    ///
205
    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
206
    /// let mut it = Searcher::new(input);
207
    ///
208
    /// let expected = Some(HalfMatch::must(0, 10));
209
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
210
    /// assert_eq!(expected, got);
211
    ///
212
    /// let expected = Some(HalfMatch::must(0, 21));
213
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
214
    /// assert_eq!(expected, got);
215
    ///
216
    /// let expected = Some(HalfMatch::must(0, 32));
217
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
218
    /// assert_eq!(expected, got);
219
    ///
220
    /// let expected = None;
221
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
222
    /// assert_eq!(expected, got);
223
    ///
224
    /// # Ok::<(), Box<dyn std::error::Error>>(())
225
    /// ```
226
    ///
227
    /// This correctly moves iteration forward even when an empty match occurs:
228
    ///
229
    /// ```
230
    /// use regex_automata::{
231
    ///     hybrid::dfa::DFA,
232
    ///     util::iter::Searcher,
233
    ///     HalfMatch, Input,
234
    /// };
235
    ///
236
    /// let re = DFA::new(r"a|")?;
237
    /// let mut cache = re.create_cache();
238
    ///
239
    /// let input = Input::new("abba");
240
    /// let mut it = Searcher::new(input);
241
    ///
242
    /// let expected = Some(HalfMatch::must(0, 1));
243
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
244
    /// assert_eq!(expected, got);
245
    ///
246
    /// let expected = Some(HalfMatch::must(0, 2));
247
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
248
    /// assert_eq!(expected, got);
249
    ///
250
    /// let expected = Some(HalfMatch::must(0, 4));
251
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
252
    /// assert_eq!(expected, got);
253
    ///
254
    /// let expected = None;
255
    /// let got = it.advance_half(|input| re.try_search_fwd(&mut cache, input));
256
    /// assert_eq!(expected, got);
257
    ///
258
    /// # Ok::<(), Box<dyn std::error::Error>>(())
259
    /// ```
260
    #[inline]
261
    pub fn advance_half<F>(&mut self, finder: F) -> Option<HalfMatch>
262
    where
263
        F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
264
    {
265
        match self.try_advance_half(finder) {
266
            Ok(m) => m,
267
            Err(err) => panic!(
268
                "unexpected regex half find error: {err}\n\
269
                 to handle find errors, use 'try' or 'search' methods",
270
            ),
271
        }
272
    }
273
274
    /// Return the next match for an infallible search if one exists, and
275
    /// advance to the next position.
276
    ///
277
    /// The search is advanced even in the presence of empty matches by
278
    /// forbidding empty matches from overlapping with any other match.
279
    ///
280
    /// This is like `try_advance`, except errors are converted into panics.
281
    ///
282
    /// # Panics
283
    ///
284
    /// If the given closure returns an error, then this panics. This is useful
285
    /// when you know your underlying regex engine has been configured to not
286
    /// return an error.
287
    ///
288
    /// # Example
289
    ///
290
    /// This example shows how to use a `Searcher` to iterate over all matches
291
    /// when using a regex based on lazy DFAs:
292
    ///
293
    /// ```
294
    /// use regex_automata::{
295
    ///     hybrid::regex::Regex,
296
    ///     util::iter::Searcher,
297
    ///     Match, Input,
298
    /// };
299
    ///
300
    /// let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
301
    /// let mut cache = re.create_cache();
302
    ///
303
    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
304
    /// let mut it = Searcher::new(input);
305
    ///
306
    /// let expected = Some(Match::must(0, 0..10));
307
    /// let got = it.advance(|input| re.try_search(&mut cache, input));
308
    /// assert_eq!(expected, got);
309
    ///
310
    /// let expected = Some(Match::must(0, 11..21));
311
    /// let got = it.advance(|input| re.try_search(&mut cache, input));
312
    /// assert_eq!(expected, got);
313
    ///
314
    /// let expected = Some(Match::must(0, 22..32));
315
    /// let got = it.advance(|input| re.try_search(&mut cache, input));
316
    /// assert_eq!(expected, got);
317
    ///
318
    /// let expected = None;
319
    /// let got = it.advance(|input| re.try_search(&mut cache, input));
320
    /// assert_eq!(expected, got);
321
    ///
322
    /// # Ok::<(), Box<dyn std::error::Error>>(())
323
    /// ```
324
    ///
325
    /// This example shows the same as above, but with the PikeVM. This example
326
    /// is useful because it shows how to use this API even when the regex
327
    /// engine doesn't directly return a `Match`.
328
    ///
329
    /// ```
330
    /// use regex_automata::{
331
    ///     nfa::thompson::pikevm::PikeVM,
332
    ///     util::iter::Searcher,
333
    ///     Match, Input,
334
    /// };
335
    ///
336
    /// let re = PikeVM::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
337
    /// let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
338
    ///
339
    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
340
    /// let mut it = Searcher::new(input);
341
    ///
342
    /// let expected = Some(Match::must(0, 0..10));
343
    /// let got = it.advance(|input| {
344
    ///     re.search(&mut cache, input, &mut caps);
345
    ///     Ok(caps.get_match())
346
    /// });
347
    /// // Note that if we wanted to extract capturing group spans, we could
348
    /// // do that here with 'caps'.
349
    /// assert_eq!(expected, got);
350
    ///
351
    /// let expected = Some(Match::must(0, 11..21));
352
    /// let got = it.advance(|input| {
353
    ///     re.search(&mut cache, input, &mut caps);
354
    ///     Ok(caps.get_match())
355
    /// });
356
    /// assert_eq!(expected, got);
357
    ///
358
    /// let expected = Some(Match::must(0, 22..32));
359
    /// let got = it.advance(|input| {
360
    ///     re.search(&mut cache, input, &mut caps);
361
    ///     Ok(caps.get_match())
362
    /// });
363
    /// assert_eq!(expected, got);
364
    ///
365
    /// let expected = None;
366
    /// let got = it.advance(|input| {
367
    ///     re.search(&mut cache, input, &mut caps);
368
    ///     Ok(caps.get_match())
369
    /// });
370
    /// assert_eq!(expected, got);
371
    ///
372
    /// # Ok::<(), Box<dyn std::error::Error>>(())
373
    /// ```
374
    #[inline]
375
0
    pub fn advance<F>(&mut self, finder: F) -> Option<Match>
376
0
    where
377
0
        F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
378
    {
379
0
        match self.try_advance(finder) {
380
0
            Ok(m) => m,
381
0
            Err(err) => panic!(
382
0
                "unexpected regex find error: {err}\n\
383
0
                 to handle find errors, use 'try' or 'search' methods",
384
            ),
385
        }
386
0
    }
387
388
    /// Return the next half match for a fallible search if one exists, and
389
    /// advance to the next position.
390
    ///
391
    /// This is like `advance_half`, except it permits callers to handle errors
392
    /// during iteration.
393
    #[inline]
394
    pub fn try_advance_half<F>(
395
        &mut self,
396
        mut finder: F,
397
    ) -> Result<Option<HalfMatch>, MatchError>
398
    where
399
        F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
400
    {
401
        let mut m = match finder(&self.input)? {
402
            None => return Ok(None),
403
            Some(m) => m,
404
        };
405
        if Some(m.offset()) == self.last_match_end {
406
            m = match self.handle_overlapping_empty_half_match(m, finder)? {
407
                None => return Ok(None),
408
                Some(m) => m,
409
            };
410
        }
411
        self.input.set_start(m.offset());
412
        self.last_match_end = Some(m.offset());
413
        Ok(Some(m))
414
    }
415
416
    /// Return the next match for a fallible search if one exists, and advance
417
    /// to the next position.
418
    ///
419
    /// This is like `advance`, except it permits callers to handle errors
420
    /// during iteration.
421
    #[inline]
422
0
    pub fn try_advance<F>(
423
0
        &mut self,
424
0
        mut finder: F,
425
0
    ) -> Result<Option<Match>, MatchError>
426
0
    where
427
0
        F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
428
    {
429
0
        let mut m = match finder(&self.input)? {
430
0
            None => return Ok(None),
431
0
            Some(m) => m,
432
        };
433
0
        if m.is_empty() && Some(m.end()) == self.last_match_end {
434
0
            m = match self.handle_overlapping_empty_match(m, finder)? {
435
0
                None => return Ok(None),
436
0
                Some(m) => m,
437
            };
438
0
        }
439
0
        self.input.set_start(m.end());
440
0
        self.last_match_end = Some(m.end());
441
0
        Ok(Some(m))
442
0
    }
443
444
    /// Given a closure that executes a single search, return an iterator over
445
    /// all successive non-overlapping half matches.
446
    ///
447
    /// The iterator returned yields result values. If the underlying regex
448
    /// engine is configured to never return an error, consider calling
449
    /// [`TryHalfMatchesIter::infallible`] to convert errors into panics.
450
    ///
451
    /// # Example
452
    ///
453
    /// This example shows how to use a `Searcher` to create a proper
454
    /// iterator over half matches.
455
    ///
456
    /// ```
457
    /// use regex_automata::{
458
    ///     hybrid::dfa::DFA,
459
    ///     util::iter::Searcher,
460
    ///     HalfMatch, Input,
461
    /// };
462
    ///
463
    /// let re = DFA::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
464
    /// let mut cache = re.create_cache();
465
    ///
466
    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
467
    /// let mut it = Searcher::new(input).into_half_matches_iter(|input| {
468
    ///     re.try_search_fwd(&mut cache, input)
469
    /// });
470
    ///
471
    /// let expected = Some(Ok(HalfMatch::must(0, 10)));
472
    /// assert_eq!(expected, it.next());
473
    ///
474
    /// let expected = Some(Ok(HalfMatch::must(0, 21)));
475
    /// assert_eq!(expected, it.next());
476
    ///
477
    /// let expected = Some(Ok(HalfMatch::must(0, 32)));
478
    /// assert_eq!(expected, it.next());
479
    ///
480
    /// let expected = None;
481
    /// assert_eq!(expected, it.next());
482
    ///
483
    /// # Ok::<(), Box<dyn std::error::Error>>(())
484
    /// ```
485
    #[inline]
486
    pub fn into_half_matches_iter<F>(
487
        self,
488
        finder: F,
489
    ) -> TryHalfMatchesIter<'h, F>
490
    where
491
        F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
492
    {
493
        TryHalfMatchesIter { it: self, finder }
494
    }
495
496
    /// Given a closure that executes a single search, return an iterator over
497
    /// all successive non-overlapping matches.
498
    ///
499
    /// The iterator returned yields result values. If the underlying regex
500
    /// engine is configured to never return an error, consider calling
501
    /// [`TryMatchesIter::infallible`] to convert errors into panics.
502
    ///
503
    /// # Example
504
    ///
505
    /// This example shows how to use a `Searcher` to create a proper
506
    /// iterator over matches.
507
    ///
508
    /// ```
509
    /// use regex_automata::{
510
    ///     hybrid::regex::Regex,
511
    ///     util::iter::Searcher,
512
    ///     Match, Input,
513
    /// };
514
    ///
515
    /// let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
516
    /// let mut cache = re.create_cache();
517
    ///
518
    /// let input = Input::new("2010-03-14 2016-10-08 2020-10-22");
519
    /// let mut it = Searcher::new(input).into_matches_iter(|input| {
520
    ///     re.try_search(&mut cache, input)
521
    /// });
522
    ///
523
    /// let expected = Some(Ok(Match::must(0, 0..10)));
524
    /// assert_eq!(expected, it.next());
525
    ///
526
    /// let expected = Some(Ok(Match::must(0, 11..21)));
527
    /// assert_eq!(expected, it.next());
528
    ///
529
    /// let expected = Some(Ok(Match::must(0, 22..32)));
530
    /// assert_eq!(expected, it.next());
531
    ///
532
    /// let expected = None;
533
    /// assert_eq!(expected, it.next());
534
    ///
535
    /// # Ok::<(), Box<dyn std::error::Error>>(())
536
    /// ```
537
    #[inline]
538
    pub fn into_matches_iter<F>(self, finder: F) -> TryMatchesIter<'h, F>
539
    where
540
        F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
541
    {
542
        TryMatchesIter { it: self, finder }
543
    }
544
545
    /// Given a closure that executes a single search, return an iterator over
546
    /// all successive non-overlapping `Captures` values.
547
    ///
548
    /// The iterator returned yields result values. If the underlying regex
549
    /// engine is configured to never return an error, consider calling
550
    /// [`TryCapturesIter::infallible`] to convert errors into panics.
551
    ///
552
    /// Unlike the other iterator constructors, this accepts an initial
553
    /// `Captures` value. This `Captures` value is reused for each search, and
554
    /// the iterator implementation clones it before returning it. The caller
555
    /// must provide this value because the iterator is purposely ignorant
556
    /// of the underlying regex engine and thus doesn't know how to create
557
    /// one itself. More to the point, a `Captures` value itself has a few
558
    /// different constructors, which change which kind of information is
559
    /// available to query in exchange for search performance.
560
    ///
561
    /// # Example
562
    ///
563
    /// This example shows how to use a `Searcher` to create a proper iterator
564
    /// over `Captures` values, which provides access to all capturing group
565
    /// spans for each match.
566
    ///
567
    /// ```
568
    /// use regex_automata::{
569
    ///     nfa::thompson::pikevm::PikeVM,
570
    ///     util::iter::Searcher,
571
    ///     Input,
572
    /// };
573
    ///
574
    /// let re = PikeVM::new(
575
    ///     r"(?P<y>[0-9]{4})-(?P<m>[0-9]{2})-(?P<d>[0-9]{2})",
576
    /// )?;
577
    /// let (mut cache, caps) = (re.create_cache(), re.create_captures());
578
    ///
579
    /// let haystack = "2010-03-14 2016-10-08 2020-10-22";
580
    /// let input = Input::new(haystack);
581
    /// let mut it = Searcher::new(input)
582
    ///     .into_captures_iter(caps, |input, caps| {
583
    ///         re.search(&mut cache, input, caps);
584
    ///         Ok(())
585
    ///     });
586
    ///
587
    /// let got = it.next().expect("first date")?;
588
    /// let year = got.get_group_by_name("y").expect("must match");
589
    /// assert_eq!("2010", &haystack[year]);
590
    ///
591
    /// let got = it.next().expect("second date")?;
592
    /// let month = got.get_group_by_name("m").expect("must match");
593
    /// assert_eq!("10", &haystack[month]);
594
    ///
595
    /// let got = it.next().expect("third date")?;
596
    /// let day = got.get_group_by_name("d").expect("must match");
597
    /// assert_eq!("22", &haystack[day]);
598
    ///
599
    /// assert!(it.next().is_none());
600
    ///
601
    /// # Ok::<(), Box<dyn std::error::Error>>(())
602
    /// ```
603
    #[cfg(feature = "alloc")]
604
    #[inline]
605
    pub fn into_captures_iter<F>(
606
        self,
607
        caps: Captures,
608
        finder: F,
609
    ) -> TryCapturesIter<'h, F>
610
    where
611
        F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>,
612
    {
613
        TryCapturesIter { it: self, caps, finder }
614
    }
615
616
    /// Handles the special case of a match that begins where the previous
617
    /// match ended. Without this special handling, it'd be possible to get
618
    /// stuck where an empty match never results in forward progress. This
619
    /// also makes it more consistent with how presiding general purpose regex
620
    /// engines work.
621
    #[cold]
622
    #[inline(never)]
623
    fn handle_overlapping_empty_half_match<F>(
624
        &mut self,
625
        _: HalfMatch,
626
        mut finder: F,
627
    ) -> Result<Option<HalfMatch>, MatchError>
628
    where
629
        F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
630
    {
631
        // Since we are only here when 'm.offset()' matches the offset of the
632
        // last match, it follows that this must have been an empty match.
633
        // Since we both need to make progress *and* prevent overlapping
634
        // matches, we discard this match and advance the search by 1.
635
        //
636
        // Note that this may start a search in the middle of a codepoint. The
637
        // regex engines themselves are expected to deal with that and not
638
        // report any matches within a codepoint if they are configured in
639
        // UTF-8 mode.
640
        self.input.set_start(self.input.start().checked_add(1).unwrap());
641
        finder(&self.input)
642
    }
643
644
    /// Handles the special case of an empty match by ensuring that 1) the
645
    /// iterator always advances and 2) empty matches never overlap with other
646
    /// matches.
647
    ///
648
    /// (1) is necessary because we principally make progress by setting the
649
    /// starting location of the next search to the ending location of the last
650
    /// match. But if a match is empty, then this results in a search that does
651
    /// not advance and thus does not terminate.
652
    ///
653
    /// (2) is not strictly necessary, but makes intuitive sense and matches
654
    /// the presiding behavior of most general purpose regex engines. The
655
    /// "intuitive sense" here is that we want to report NON-overlapping
656
    /// matches. So for example, given the regex 'a|(?:)' against the haystack
657
    /// 'a', without the special handling, you'd get the matches [0, 1) and [1,
658
    /// 1), where the latter overlaps with the end bounds of the former.
659
    ///
660
    /// Note that we mark this cold and forcefully prevent inlining because
661
    /// handling empty matches like this is extremely rare and does require
662
    /// quite a bit of code, comparatively. Keeping this code out of the main
663
    /// iterator function keeps it smaller and more amenable to inlining
664
    /// itself.
665
    #[cold]
666
    #[inline(never)]
667
0
    fn handle_overlapping_empty_match<F>(
668
0
        &mut self,
669
0
        m: Match,
670
0
        mut finder: F,
671
0
    ) -> Result<Option<Match>, MatchError>
672
0
    where
673
0
        F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
674
    {
675
0
        assert!(m.is_empty());
676
0
        self.input.set_start(self.input.start().checked_add(1).unwrap());
677
0
        finder(&self.input)
678
0
    }
679
}
680
681
/// An iterator over all non-overlapping half matches for a fallible search.
682
///
683
/// The iterator yields a `Result<HalfMatch, MatchError>` value until no more
684
/// matches could be found.
685
///
686
/// The type parameters are as follows:
687
///
688
/// * `F` represents the type of a closure that executes the search.
689
///
690
/// The lifetime parameters come from the [`Input`] type:
691
///
692
/// * `'h` is the lifetime of the underlying haystack.
693
///
694
/// When possible, prefer the iterators defined on the regex engine you're
695
/// using. This tries to abstract over the regex engine and is thus a bit more
696
/// unwieldy to use.
697
///
698
/// This iterator is created by [`Searcher::into_half_matches_iter`].
699
pub struct TryHalfMatchesIter<'h, F> {
700
    it: Searcher<'h>,
701
    finder: F,
702
}
703
704
impl<'h, F> TryHalfMatchesIter<'h, F> {
705
    /// Return an infallible version of this iterator.
706
    ///
707
    /// Any item yielded that corresponds to an error results in a panic. This
708
    /// is useful if your underlying regex engine is configured in a way that
709
    /// it is guaranteed to never return an error.
710
    pub fn infallible(self) -> HalfMatchesIter<'h, F> {
711
        HalfMatchesIter(self)
712
    }
713
714
    /// Returns the current `Input` used by this iterator.
715
    ///
716
    /// The `Input` returned is generally equivalent to the one used to
717
    /// construct this iterator, but its start position may be different to
718
    /// reflect the start of the next search to be executed.
719
    pub fn input<'i>(&'i self) -> &'i Input<'h> {
720
        self.it.input()
721
    }
722
}
723
724
impl<'h, F> Iterator for TryHalfMatchesIter<'h, F>
725
where
726
    F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
727
{
728
    type Item = Result<HalfMatch, MatchError>;
729
730
    #[inline]
731
    fn next(&mut self) -> Option<Result<HalfMatch, MatchError>> {
732
        self.it.try_advance_half(&mut self.finder).transpose()
733
    }
734
}
735
736
impl<'h, F> core::fmt::Debug for TryHalfMatchesIter<'h, F> {
737
    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
738
        f.debug_struct("TryHalfMatchesIter")
739
            .field("it", &self.it)
740
            .field("finder", &"<closure>")
741
            .finish()
742
    }
743
}
744
745
/// An iterator over all non-overlapping half matches for an infallible search.
746
///
747
/// The iterator yields a [`HalfMatch`] value until no more matches could be
748
/// found.
749
///
750
/// The type parameters are as follows:
751
///
752
/// * `F` represents the type of a closure that executes the search.
753
///
754
/// The lifetime parameters come from the [`Input`] type:
755
///
756
/// * `'h` is the lifetime of the underlying haystack.
757
///
758
/// When possible, prefer the iterators defined on the regex engine you're
759
/// using. This tries to abstract over the regex engine and is thus a bit more
760
/// unwieldy to use.
761
///
762
/// This iterator is created by [`Searcher::into_half_matches_iter`] and
763
/// then calling [`TryHalfMatchesIter::infallible`].
764
#[derive(Debug)]
765
pub struct HalfMatchesIter<'h, F>(TryHalfMatchesIter<'h, F>);
766
767
impl<'h, F> HalfMatchesIter<'h, F> {
768
    /// Returns the current `Input` used by this iterator.
769
    ///
770
    /// The `Input` returned is generally equivalent to the one used to
771
    /// construct this iterator, but its start position may be different to
772
    /// reflect the start of the next search to be executed.
773
    pub fn input<'i>(&'i self) -> &'i Input<'h> {
774
        self.0.it.input()
775
    }
776
}
777
778
impl<'h, F> Iterator for HalfMatchesIter<'h, F>
779
where
780
    F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>,
781
{
782
    type Item = HalfMatch;
783
784
    #[inline]
785
    fn next(&mut self) -> Option<HalfMatch> {
786
        match self.0.next()? {
787
            Ok(m) => Some(m),
788
            Err(err) => panic!(
789
                "unexpected regex half find error: {err}\n\
790
                 to handle find errors, use 'try' or 'search' methods",
791
            ),
792
        }
793
    }
794
}
795
796
/// An iterator over all non-overlapping matches for a fallible search.
797
///
798
/// The iterator yields a `Result<Match, MatchError>` value until no more
799
/// matches could be found.
800
///
801
/// The type parameters are as follows:
802
///
803
/// * `F` represents the type of a closure that executes the search.
804
///
805
/// The lifetime parameters come from the [`Input`] type:
806
///
807
/// * `'h` is the lifetime of the underlying haystack.
808
///
809
/// When possible, prefer the iterators defined on the regex engine you're
810
/// using. This tries to abstract over the regex engine and is thus a bit more
811
/// unwieldy to use.
812
///
813
/// This iterator is created by [`Searcher::into_matches_iter`].
814
pub struct TryMatchesIter<'h, F> {
815
    it: Searcher<'h>,
816
    finder: F,
817
}
818
819
impl<'h, F> TryMatchesIter<'h, F> {
820
    /// Return an infallible version of this iterator.
821
    ///
822
    /// Any item yielded that corresponds to an error results in a panic. This
823
    /// is useful if your underlying regex engine is configured in a way that
824
    /// it is guaranteed to never return an error.
825
    pub fn infallible(self) -> MatchesIter<'h, F> {
826
        MatchesIter(self)
827
    }
828
829
    /// Returns the current `Input` used by this iterator.
830
    ///
831
    /// The `Input` returned is generally equivalent to the one used to
832
    /// construct this iterator, but its start position may be different to
833
    /// reflect the start of the next search to be executed.
834
    pub fn input<'i>(&'i self) -> &'i Input<'h> {
835
        self.it.input()
836
    }
837
}
838
839
impl<'h, F> Iterator for TryMatchesIter<'h, F>
840
where
841
    F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
842
{
843
    type Item = Result<Match, MatchError>;
844
845
    #[inline]
846
    fn next(&mut self) -> Option<Result<Match, MatchError>> {
847
        self.it.try_advance(&mut self.finder).transpose()
848
    }
849
}
850
851
impl<'h, F> core::fmt::Debug for TryMatchesIter<'h, F> {
852
    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
853
        f.debug_struct("TryMatchesIter")
854
            .field("it", &self.it)
855
            .field("finder", &"<closure>")
856
            .finish()
857
    }
858
}
859
860
/// An iterator over all non-overlapping matches for an infallible search.
861
///
862
/// The iterator yields a [`Match`] value until no more matches could be found.
863
///
864
/// The type parameters are as follows:
865
///
866
/// * `F` represents the type of a closure that executes the search.
867
///
868
/// The lifetime parameters come from the [`Input`] type:
869
///
870
/// * `'h` is the lifetime of the underlying haystack.
871
///
872
/// When possible, prefer the iterators defined on the regex engine you're
873
/// using. This tries to abstract over the regex engine and is thus a bit more
874
/// unwieldy to use.
875
///
876
/// This iterator is created by [`Searcher::into_matches_iter`] and
877
/// then calling [`TryMatchesIter::infallible`].
878
#[derive(Debug)]
879
pub struct MatchesIter<'h, F>(TryMatchesIter<'h, F>);
880
881
impl<'h, F> MatchesIter<'h, F> {
882
    /// Returns the current `Input` used by this iterator.
883
    ///
884
    /// The `Input` returned is generally equivalent to the one used to
885
    /// construct this iterator, but its start position may be different to
886
    /// reflect the start of the next search to be executed.
887
    pub fn input<'i>(&'i self) -> &'i Input<'h> {
888
        self.0.it.input()
889
    }
890
}
891
892
impl<'h, F> Iterator for MatchesIter<'h, F>
893
where
894
    F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>,
895
{
896
    type Item = Match;
897
898
    #[inline]
899
    fn next(&mut self) -> Option<Match> {
900
        match self.0.next()? {
901
            Ok(m) => Some(m),
902
            Err(err) => panic!(
903
                "unexpected regex find error: {err}\n\
904
                 to handle find errors, use 'try' or 'search' methods",
905
            ),
906
        }
907
    }
908
}
909
910
/// An iterator over all non-overlapping captures for a fallible search.
911
///
912
/// The iterator yields a `Result<Captures, MatchError>` value until no more
913
/// matches could be found.
914
///
915
/// The type parameters are as follows:
916
///
917
/// * `F` represents the type of a closure that executes the search.
918
///
919
/// The lifetime parameters come from the [`Input`] type:
920
///
921
/// * `'h` is the lifetime of the underlying haystack.
922
///
923
/// When possible, prefer the iterators defined on the regex engine you're
924
/// using. This tries to abstract over the regex engine and is thus a bit more
925
/// unwieldy to use.
926
///
927
/// This iterator is created by [`Searcher::into_captures_iter`].
928
#[cfg(feature = "alloc")]
929
pub struct TryCapturesIter<'h, F> {
930
    it: Searcher<'h>,
931
    caps: Captures,
932
    finder: F,
933
}
934
935
#[cfg(feature = "alloc")]
936
impl<'h, F> TryCapturesIter<'h, F> {
937
    /// Return an infallible version of this iterator.
938
    ///
939
    /// Any item yielded that corresponds to an error results in a panic. This
940
    /// is useful if your underlying regex engine is configured in a way that
941
    /// it is guaranteed to never return an error.
942
    pub fn infallible(self) -> CapturesIter<'h, F> {
943
        CapturesIter(self)
944
    }
945
}
946
947
#[cfg(feature = "alloc")]
948
impl<'h, F> Iterator for TryCapturesIter<'h, F>
949
where
950
    F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>,
951
{
952
    type Item = Result<Captures, MatchError>;
953
954
    #[inline]
955
    fn next(&mut self) -> Option<Result<Captures, MatchError>> {
956
        let TryCapturesIter { ref mut it, ref mut caps, ref mut finder } =
957
            *self;
958
        let result = it
959
            .try_advance(|input| {
960
                (finder)(input, caps)?;
961
                Ok(caps.get_match())
962
            })
963
            .transpose()?;
964
        match result {
965
            Ok(_) => Some(Ok(caps.clone())),
966
            Err(err) => Some(Err(err)),
967
        }
968
    }
969
}
970
971
#[cfg(feature = "alloc")]
972
impl<'h, F> core::fmt::Debug for TryCapturesIter<'h, F> {
973
    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
974
        f.debug_struct("TryCapturesIter")
975
            .field("it", &self.it)
976
            .field("caps", &self.caps)
977
            .field("finder", &"<closure>")
978
            .finish()
979
    }
980
}
981
982
/// An iterator over all non-overlapping captures for an infallible search.
983
///
984
/// The iterator yields a [`Captures`] value until no more matches could be
985
/// found.
986
///
987
/// The type parameters are as follows:
988
///
989
/// * `F` represents the type of a closure that executes the search.
990
///
991
/// The lifetime parameters come from the [`Input`] type:
992
///
993
/// * `'h` is the lifetime of the underlying haystack.
994
///
995
/// When possible, prefer the iterators defined on the regex engine you're
996
/// using. This tries to abstract over the regex engine and is thus a bit more
997
/// unwieldy to use.
998
///
999
/// This iterator is created by [`Searcher::into_captures_iter`] and then
1000
/// calling [`TryCapturesIter::infallible`].
1001
#[cfg(feature = "alloc")]
1002
#[derive(Debug)]
1003
pub struct CapturesIter<'h, F>(TryCapturesIter<'h, F>);
1004
1005
#[cfg(feature = "alloc")]
1006
impl<'h, F> Iterator for CapturesIter<'h, F>
1007
where
1008
    F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>,
1009
{
1010
    type Item = Captures;
1011
1012
    #[inline]
1013
    fn next(&mut self) -> Option<Captures> {
1014
        match self.0.next()? {
1015
            Ok(m) => Some(m),
1016
            Err(err) => panic!(
1017
                "unexpected regex captures error: {err}\n\
1018
                 to handle find errors, use 'try' or 'search' methods",
1019
            ),
1020
        }
1021
    }
1022
}