Releases: jonclayden/ore
Releases · jonclayden/ore
Version 1.7.5
- There is now special handling of repeated zero-length matches to avoid
infinite loops. Specifically, a zero-length match is no longer allowed at the
same offset in the string as the last zero-length match; if there is no
non-empty match available, the starting offset is first advanced by one
character.
Version 1.7.4
- Named groups would not be propagated to match matrices unless the regex was
pre-compiled usingore()
. This has been corrected. - A compiler warning about a
printf
-type format specification has been
resolved.
Version 1.7.3
- The package will now properly detect a plain locale like "UTF-8" on start-up.
- C prototype warnings have been resolved, and the problematic
sprintf()
function is now avoided.
Version 1.7.2
- A handful of small memory leaks have been plugged.
- The
README
has been updated to detail the current divergence between CRAN
and mainline versions of the package.
Version 1.7.1
- The
binary
argument toore_file()
is now stored in an attribute in its
return value. Theore_search()
function additionally uses this attribute
to determine whether or not to set thetext
element of its return value.
Treating a binary file's entire contents as a string is unwise, and may
include embedded nuls and other problem bytes. - A minor tweak has been made to the
es()
function, which appreciably
improves its performance in simple cases. - A potential buffer overrun and protection bug have both been corrected.
Version 1.7.0
- The Onigmo library has been updated to version 6.2.0.
- R connections are now supported as text sources, allowing URLs, gzipped files
and pipes (amongst others; see?connections
) to be straightforwardly
interfaced with the package. The package's C backend has been substantially
refactored to support files and connections in all the core functions. - The new
ore_switch()
function selects between a number of possible outputs
by matching each element of its input against a series of regular
expressions. This can be a useful way to handle different possible forms of
the same information. - The new
ore_repl()
function is a relative of the existingore_subst()
,
but differs in how it is vectorised. Unlikeore_subst()
, it will replicate
the source text if necessary to ensure that all specified replacements are
used. Thees()
function now usesore_repl()
, and so can produce output
vectors longer than its input. ore_subst()
gains a "start" argument, likeore_search()
, and now accepts
multiple replacements, which will be applied in sequence to different
matches.ore_match()
is a new alias ofore_search()
.- Substitution group references can now include "\0" for the whole match
string, and allow for group numbers higher than 9. Group numbers that are out
of range should now produce an error, rather than potentially leading to a
segfault. Named group references should also be more robust. - A multilingual, multiscript dataset, "glass", is now included with the
package thanks to Frank da Cruz and many contributors. - Names and
NA
values are now propagated from text arguments that are
character vectors. - There is a new option (
ore.keepNA
) to propagateNA
s inore_ismatch()
,
and the infix functions%~%
and%~~%
, rather than convert them
implicitly toFALSE
. This is off by default for backwards compatibility. - Printing has been improved, and in particular "orematches" objects now show
(just) the elements that matched. - Underscore-separated names are now used in preference in the package
documentation, following the trend in other packages, but the period-
separated versions are still available. - Detection of the platform's native encoding has been improved, and this will
be asserted in regular expressions created byore()
, rather than them being
marked as being in "unknown" encoding by default as before. Handling of
encodings has been refined elsewhere too.