xdev.patterns module

An encapsulation of regex and glob (and maybe other) patterns.

Note

This implementation is maintained in kwutil and xdev. These versions should be kept in sync.

See:

~/code/kwutil/kwutil/util_pattern.py ~/code/xdev/xdev/patterns.py

Todo

rectify with xdev / whatever package this goes in

class xdev.patterns.PatternBase[source]

Bases: object

Abstract class that defines the Pattern api

match(text)[source]
search(text)[source]
sub(repl, text)[source]
xdev.patterns._maybe_expandable_glob(pat)[source]

Determine if a string might be a expandable glob pattern by looking for special glob characters: *, ? and [].

Note

! is also special, but always inside of a [] braket, so we dont need to check it.

Returns:

if False then the input is 100% not an expandable glob pattern

(although it could still be a glob pattern, but it is equivalant to strict matching). if True, then there are special glob characters in the string, but it is not guarenteed to be a valid glob pattern.

Return type:

bool

class xdev.patterns.Pattern(pattern, backend)[source]

Bases: PatternBase, NiceRepr

Provides a common API to several common pattern matching syntaxes.

A general patterns class, which can use a backend from BACKENDS

Parameters:
  • pattern (str | object) – The pattern text or a precompiled backend pattern object

  • backend (str) – Code indicating what backend the pattern text should be interpereted with. See BACKENDS for available choices.

Notes

# BACKENDS

The glob backend uses the fnmatch module [fnmatch_docs]. The regex backend uses the Python re module. The strict backend uses the “==” string equality testing. The parse backend uses the parse module.

References

Example

>>> # Test Regex backend
>>> repat = Pattern.coerce('foo.*', 'regex')
>>> assert repat.match('foobar')
>>> assert not repat.match('barfoo')
>>> match = repat.search('baz-biz-foobar')
>>> match = repat.match('baz-biz-foobar')
>>> # Test Glob backend
>>> globpat = Pattern.coerce('foo*', 'glob')
>>> assert globpat.match('foobar')
>>> assert not globpat.match('barfoo')
>>> globpat = Pattern.coerce('[foo|bar]', 'glob')
>>> globpat.match('foo')

Example

>>> # xdoctest: +REQUIRES(module:parse)
>>> # Test parse backend
>>> pattern1 = Pattern.coerce('A {adjective} pattern', 'parse')
>>> result1 = pattern1.match('A cool pattern')
>>> print(f'result1.named = {ub.urepr(result1.named, nl=1)}')
>>> pattern2 = pattern1.to_regex()
>>> result2 = pattern2.match('A cool pattern')
to_regex()[source]

Returns an equivalent pattern with the regular expression backend

Returns:

Pattern

Example

>>> globpat = Pattern.coerce('foo*', 'glob')
>>> strictpat = Pattern.coerce('foo*', 'strict')
>>> repat1 = strictpat.to_regex()
>>> repat2 = globpat.to_regex()
>>> print(f'repat1={repat1}')
>>> print(f'repat2={repat2}')
classmethod from_regex(data, flags=0, multiline=False, dotall=False, ignorecase=False)[source]

Create a Pattern object with a regex backend.

classmethod from_glob(data)[source]

Create a Pattern object with a glob backend.

classmethod coerce_backend(data, hint='auto')[source]

Example

>>> assert Pattern.coerce_backend('foo', hint='auto') == 'strict'
>>> assert Pattern.coerce_backend('foo*', hint='auto') == 'glob'
>>> assert Pattern.coerce_backend(re.compile('foo*'), hint='auto') == 'regex'
classmethod coerce(data, hint='auto')[source]

Attempt to automatically determine the input data as the appropriate pattern. If it cannot be determined, then fallback to the hint.

Parameters:
  • data (str | Pattern | PathLike)

  • hint (str) – can be ‘glob’, ‘regex’, ‘strict’ or ‘auto’. In ‘auto’ we will use ‘glob’ if the input is a string and ‘*’ is in the pattern, otherwise we will use strict. Pattern inputs keep their existing interpretation.

Example

>>> pat = Pattern.coerce('foo*', 'glob')
>>> pat2 = Pattern.coerce(pat, 'regex')
>>> print('pat = {}'.format(ub.urepr(pat, nl=1)))
>>> print('pat2 = {}'.format(ub.urepr(pat2, nl=1)))
match(text)[source]
search(text)[source]
sub(repl, text, count=-1)[source]
Parameters:
  • repl (str) – text to insert in place of pattern

  • text (str) – text to be searched and modified

  • count (int) – if non-negative, the maximum number of replacements that will be made.

paths(cwd=None, recursive=False)[source]

Find paths in the filesystem that match this pattern

Yields:

ub.Path

class xdev.patterns.MultiPattern(patterns, predicate)[source]

Bases: PatternBase, NiceRepr

Example

>>> dpath = ub.Path.appdir('xdev/tests/multipattern_paths').ensuredir().delete().ensuredir()
>>> (dpath / 'file0.txt').touch()
>>> (dpath / 'data0.dat').touch()
>>> (dpath / 'other0.txt').touch()
>>> ((dpath / 'dir1').ensuredir() / 'file1.txt').touch()
>>> ((dpath / 'dir2').ensuredir() / 'file2.txt').touch()
>>> ((dpath / 'dir2').ensuredir() / 'file3.txt').touch()
>>> ((dpath / 'dir1').ensuredir() / 'data.dat').touch()
>>> ((dpath / 'dir2').ensuredir() / 'data.dat').touch()
>>> ((dpath / 'dir2').ensuredir() / 'data.dat').touch()
>>> pat = MultiPattern.coerce(['*.txt'], 'glob')
>>> print(list(pat.paths(cwd=dpath)))
>>> pat = MultiPattern.coerce(['*0*', '**/*.txt'], 'glob')
>>> print(list(pat.paths(cwd=dpath, recursive=1)))
>>> pat = MultiPattern.coerce(['*.txt', '**/*.txt', '**/*.dat'], 'glob')
>>> print(list(pat.paths(cwd=dpath)))
match(text)[source]
paths(cwd=None, recursive=False)[source]
_squeeze()[source]
classmethod coerce(data, hint='auto', predicate='any')[source]
Parameters:
  • data (str | List | Pattern | PathLike | MultiPattern)

  • hint (str) – can be ‘glob’, ‘regex’, ‘strict’ or ‘auto’. In ‘auto’ we will use ‘glob’ if the input is a string and ‘*’ is in the pattern, otherwise we will use strict. Pattern inputs keep their existing interpretation.

Returns:

MultiPattern

Example

>>> pat = MultiPattern.coerce('foo*', 'glob')
>>> pat2 = MultiPattern.coerce(pat, 'regex')
>>> pat3 = MultiPattern.coerce([pat, pat], 'regex')
>>> pat4 = MultiPattern.coerce([ub.Path('bar*'), pat], 'regex')
>>> print('pat = {}'.format(ub.urepr(pat, nl=1)))
>>> print('pat2 = {}'.format(ub.urepr(pat2, nl=1)))
>>> print('pat3 = {!r}'.format(pat3))
>>> print('pat4 = {!r}'.format(pat4))
>>> pat00 = MultiPattern.coerce('foo', 'glob')
>>> pat01 = MultiPattern.coerce('foo*', 'glob')
>>> pat02 = MultiPattern.coerce('foo*', 'regex')
>>> pat5 = MultiPattern.coerce(['foo', 'foo*', pat, pat00, pat01, pat02])
>>> print(f'pat5={pat5}')

Example

>>> # Test all acceptable input types
>>> import itertools as it
>>> str_pat = 'pattern*'
>>> scalar_inputs = {
>>>     'str': str_pat,
>>>     'path': ub.Path(str_pat),
>>>     'pat': Pattern.coerce(str_pat),
>>>     'mpat': MultiPattern.coerce(str_pat)
>>> }
>>> # Test scalar input types
>>> scalar_outputs = {}
>>> for k, v in scalar_inputs.items():
>>>     scalar_outputs[k] = MultiPattern.coerce(v)
>>> print('scalar_outputs = {}'.format(ub.urepr(scalar_outputs, nl=1)))
>>> #
>>> # Test iterable input types
>>> multi_outputs = []
>>> for v in it.combinations(scalar_inputs.values(), 2):
>>>     multi_outputs.append(MultiPattern.coerce(v))
>>> for v in it.combinations(scalar_inputs.values(), 3):
>>>     multi_outputs.append(MultiPattern.coerce(v))
>>> # Higher order nesting test
>>> higher_order_output = MultiPattern.coerce(multi_outputs)
>>> print('higher_order_output = {}'.format(ub.urepr(higher_order_output, nl=1)))