Hi All.
I've been working with Mike Haertel (the original author of GNU grep)
for a number of months now. He is writing a new regexp matcher for
use in gawk (and other places, as people desire).
The matcher is avalable on Github: https://github.com/mikehaertel/minrx.
I have created a branch in the gawk repo that uses it: feature/minrx.
MinRX is currently written in C++20. Mike will eventually rewrite it
in C for portability. For the moment, you'll need to use gcc / g++
to build the branch. I haven't tried to mess with clang / clang++.
The test suite passes completely.
The new matcher is the default, so that it will be exercised. The old matchers are still available. To use them, set GAWK_GNU_MATCHERS in
the environment. I will NOT make a formal release with MinRX as long
as MinRX is still in C++.
For now, the only way to access the code is via Git:
git clone https://git.savannah.gnu.org/r/gawk.git
cd gawk
git checkout feature/minrx
./bootstrap.sh && ./configure && make -j && make check
If you use gawk, please try this branch out.
Questions, comments, and *bug reports* are welcome.
Thanks,
Arnold
On 25.07.2024 11:44, Aharon Robbins wrote:
Hi All.
I've been working with Mike Haertel (the original author of GNU grep)
for a number of months now. He is writing a new regexp matcher for
use in gawk (and other places, as people desire).
[ clipped ]
My system complains about -std=c++20 so I cannot test it. (I think
I'll wait for a native C release.)
Questions, comments, and *bug reports* are welcome.
Well, I skimmed through the txt file on Mike's git page to learn
about the algorithm; especially the algorithm and its complexity
is of interest to me. The document was not quite clear about that
(or at least made me doubt) beyond the general and typical O(N*M) >characteristics.
Algorithm simplicity is nice but as I understand there's not yet
performance comparisons done?
Unless it was a deliberate offer to use GNU Awk as a test bed.
And "nearly-feature-complete implementation" (section Features)
is not quite a fruitful marketing concept.
I also wonder why BSD and GNU extensions are supported but not
the very useful abbreviations for {some,all} Perl RE shortcuts.
Well, I skimmed through the txt file on Mike's git page to learn
about the algorithm; especially the algorithm and its complexity
is of interest to me. The document was not quite clear about that
(or at least made me doubt) beyond the general and typical O(N*M) characteristics. One thing I was astonished about was why there's
a non-deterministic automaton model used (NFSM can be transformed
into Deterministic FSM); isn't the non-deterministic tree-search
(where every branch is traversed breadth-first) sub-optimal?
My system complains about -std=c++20 so I cannot test it. (I think
I'll wait for a native C release.)
That will be a while. It's not hard to build current GCC from scratch
on a Linux system.
In article <66a350e9$0$706$14726298@news.sunsite.dk>,
Aharon Robbins <arnold@skeeve.com> wrote:
...
My system complains about -std=c++20 so I cannot test it. (I think
I'll wait for a native C release.)
That will be a while. It's not hard to build current GCC from scratch
on a Linux system.
I doubt that. I wouldn't have the first clue about how to do it, and I'm certainly no Linux newbie.
Maybe it (getting/building GCC) should be part of your "bootstrap" script?
Also, is there an easy way to find out if your current GCC is "good enough" ?
The system I am typing this on says it has GCC 9.4? Will that work?
from: https://stackoverflow.com/a/68545455/724039
C++20 features are available since GCC 8.
To enable C++20 support, add the command-line parameter
-std=c++20
For G++ 9 and earlier use
-std=c++2a
Or, to enable GNU extensions in addition to C++20 features, add
-std=gnu++20
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 991 |
Nodes: | 10 (1 / 9) |
Uptime: | 76:06:32 |
Calls: | 12,949 |
Calls today: | 3 |
Files: | 186,574 |
Messages: | 3,264,538 |