Somehow methods and tools to realize
efficient DCGs in Prolog are missing. Most
DCG attempts that one sees succumb
to some declarative nonsense, creating
exponentially many spurious choice points,
you find rarely somebody mastering the Art.
Hi,
WebPL is already outdated I guess. It doesn't
show the versions of the other Prolog systems
it is using. While I had these results for
the primes example in the WebPL playground:
/* Trealla Prolog WASM */
(23568.9ms)
When I run the example here:
https://php.energy/trealla.html
I get better results:
/* trealla-js 0.27.1 */
?- time(test).
% Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips
Bye
Hi,
Because GenX and later suffers from:
Somehow methods and tools to realize
efficient DCGs in Prolog are missing. Most
DCG attempts that one sees succumb
to some declarative nonsense, creating
exponentially many spurious choice points,
you find rarely somebody mastering the Art.
Modern programmers fancy nothing else than
throwing a set of foreign library to their
Prolog system project. This is best seen in WebPL:
LALRPOP MIT/Apache-2.0 Generate the parser https://github.com/w-henderson/WebPL/blob/main/dissertation.pdf
So there is no aim at creating a self hosting
Prolog system. There is a deep distrust in
DCGs. But why build a Prolog system that will
possibly ultimately have DCG, when you distrust
in DCGs? The second problem of GenX and later
is probably they don't know how to bootstrap
a Prolog system B via another Prolog system A.
Bye
P.S.: The result of using a Parser Tool are
often frustrating on the following levels:
- No operator table
- Directives are fixed
- Introducong DCGs need rebuid
Scryer Prolog has Operator Table, but mostlikely
used a Parser Tool some time in the project,
or programming templates borrow from Parser Tools.
Probably the worst recent example building a
Prolog system, which would have a reference for
the Parsing in Rust itself. So we have 2025
and there is not a single self hosting Prolog
yet, while all other programming languages such
as Java, golang, etc.. are self hosting.
Mild Shock schrieb:
Hi,
WebPL is already outdated I guess. It doesn't
show the versions of the other Prolog systems
it is using. While I had these results for
the primes example in the WebPL playground:
/* Trealla Prolog WASM */
(23568.9ms)
When I run the example here:
https://php.energy/trealla.html
I get better results:
/* trealla-js 0.27.1 */
?- time(test).
% Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips
Bye
Hi,
WebPL is already outdated I guess. It doesn't
show the versions of the other Prolog systems
it is using. While I had these results for
the primes example in the WebPL playground:
/* Trealla Prolog WASM */
(23568.9ms)
When I run the example here:
https://php.energy/trealla.html
I get better results:
/* trealla-js 0.27.1 */
?- time(test).
% Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips
Bye
Hi,
I spent some time thinking about my primes.pl
test. And came to the conclusion that it
mainly tests the Prolog ALU. Things like
integer successor or integer modulo. Then
I found that Java has Math.floorMod() which
I wasn't using yet. And peng results are better:
/* Dogelog Player 2.1.2 for Java, today */
?- time(test).
% Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
true.
Maybe the Java backend picks a CPU instruction
for Math.floorMod() instead of executing the
longer code sequence that is needed to correct
rem/2 into mod/2. Who knows. I also reorganized
the code a little bit, and eliminated an extra
method call in all arithmetic functions, by
inlining the arithmetic function body in the
evaluable predicate definition code. Comparison
to old measurements and some measurements of
other Prolog systems:
/* Dogelog Player 2.1.2 for Java, weeks ago */
?- time(test).
% Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
true.
/* SWI-Prolog 9.0.4 */
?- time(test).
% 7,506,639 inferences, 0.363 CPU in 0.362 seconds
(100% CPU, 20693560 Lips)
true.
/* Scryer Prolog 0.9.4-639 */
?- time(test).
% CPU time: 0.365s, 7_517_613 inferences
true.
/* Trealla Prolog 2.82.23-3 */
?- time(test).
% Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
true.
Bye
P.S.: The code uses the hated mathematical mod/2,
and not the cheaper rem/2 that CPUs usually have:
test :-
len(L, 1000),
primes(L, _).
primes([], 1).
primes([J|L], J) :-
primes(L, I),
K is I+1,
search(L, K, J).
search(L, I, J) :-
mem(X, L),
I mod X =:= 0, !,
K is I+1,
search(L, K, J).
search(_, I, I).
mem(X, [X|_]).
mem(X, [_|Y]) :-
mem(X, Y).
len([], 0) :- !.
len([_|L], N) :-
N > 0,
M is N-1,
len(L, M).
Mild Shock schrieb:
Hi,
WebPL is already outdated I guess. It doesn't
show the versions of the other Prolog systems
it is using. While I had these results for
the primes example in the WebPL playground:
/* Trealla Prolog WASM */
(23568.9ms)
When I run the example here:
https://php.energy/trealla.html
I get better results:
/* trealla-js 0.27.1 */
?- time(test).
% Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips
Bye
Hi,
The change from 378 ms to 286 ms is around 25-30%
is insane. But I did both tests on a novel AI CPU.
To be precise on a AMD Ryzen AI 7 350.
But somehow I picked up rumors that AI CPUs now
might do Neural Network Branch Prediction. The
idea seems to exist in hardware at least since (2012):
Machine learning and artificial intelligence are
the current hype (again). In their new Ryzen
processors, AMD advertises the Neural Net
Prediction. It turns out this is was already
used in their older (2012) Piledriver architecture
used for example in the AMD A10-4600M. It is also
present in recent Samsung processors such as the
one powering the Galaxy S7. What is it really? https://chasethedevil.github.io/post/the_neural_network_in_your_cpu/
It can be done with Convoluted Neural Networks (CNN):
BranchNet: A Convolutional Neural Network to
Predict Hard-To-Predict Branches
To this end, Tarsa et al. proposed using convolutional
neural networks (CNNs) that are trained at
compiletime to accurately predict branches that
TAGE cannot. Given enough profiling coverage, CNNs
learn input-independent branch correlations. https://microarch.org/micro53/papers/738300a118.pdf
Interstingly the above shows cases a PGO based
Machine Learning for Branch Predictors. No clue
how they construct the CPU, that they can feed
it with offline constructed neural neutworks for
their own execution. Maybe an optimizer uses it?
But I guess a more modern solutions would not only
use CNN, but also an Attention Mechanism.
Bye
Mild Shock schrieb:
Hi,
I spent some time thinking about my primes.pl
test. And came to the conclusion that it
mainly tests the Prolog ALU. Things like
integer successor or integer modulo. Then
I found that Java has Math.floorMod() which
I wasn't using yet. And peng results are better:
/* Dogelog Player 2.1.2 for Java, today */
?- time(test).
% Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
true.
Maybe the Java backend picks a CPU instruction
for Math.floorMod() instead of executing the
longer code sequence that is needed to correct
rem/2 into mod/2. Who knows. I also reorganized
the code a little bit, and eliminated an extra
method call in all arithmetic functions, by
inlining the arithmetic function body in the
evaluable predicate definition code. Comparison
to old measurements and some measurements of
other Prolog systems:
/* Dogelog Player 2.1.2 for Java, weeks ago */
?- time(test).
% Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
true.
/* SWI-Prolog 9.0.4 */
?- time(test).
% 7,506,639 inferences, 0.363 CPU in 0.362 seconds
(100% CPU, 20693560 Lips)
true.
/* Scryer Prolog 0.9.4-639 */
?- time(test).
% CPU time: 0.365s, 7_517_613 inferences
true.
/* Trealla Prolog 2.82.23-3 */
?- time(test).
% Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
true.
Bye
P.S.: The code uses the hated mathematical mod/2,
and not the cheaper rem/2 that CPUs usually have:
test :-
len(L, 1000),
primes(L, _).
primes([], 1).
primes([J|L], J) :-
primes(L, I),
K is I+1,
search(L, K, J).
search(L, I, J) :-
mem(X, L),
I mod X =:= 0, !,
K is I+1,
search(L, K, J).
search(_, I, I).
mem(X, [X|_]).
mem(X, [_|Y]) :-
mem(X, Y).
len([], 0) :- !.
len([_|L], N) :-
N > 0,
M is N-1,
len(L, M).
Mild Shock schrieb:
Hi,
WebPL is already outdated I guess. It doesn't
show the versions of the other Prolog systems
it is using. While I had these results for
the primes example in the WebPL playground:
/* Trealla Prolog WASM */
(23568.9ms)
When I run the example here:
https://php.energy/trealla.html
I get better results:
/* trealla-js 0.27.1 */
?- time(test).
% Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips
Bye
You Can't Buy This... Yet! The NVIDIA GB10 from Dell
The New Superchip that Terrifies the Cloud! https://www.youtube.com/watch?v=x1qViw4xyVo
Hi,
It seems I am having problems pacing with
all the new fancy toys. Wasn't able to really
benchmark my NPU from a Desktop AI machine,
picked the wrong driver. Need to try again.
What worked was benchmarking Mobile AI machines.
I just grabbed Geekbench AI and some devices:
USA Fab, M4:
sANN hANN qANN
iPad CPU 4848 7947 6353
iPad GPU 9752 11383 10051
iPad NPU 4873 36544 *51634*
China Fab, Snapdragon:
sANN hANN qANN
Redmi CPU 1044 950 1723
Redmi GPU 480 905 737
Redmi NNAPI 205 205 469
Redmi QNN 226 226 *10221*
Speed-Up via NPU is factor 10x. See the column
qANN which means quantizised artificial neural
networks, when NPU or QNN is picked.
The mobile AI NPUs are optimized using
mimimal amounts of energy, and minimal amounts
of space squeezing (distilling) everything
into INT8 and INT4.
Bye
Mild Shock schrieb:
Hi,
The change from 378 ms to 286 ms is around 25-30%
is insane. But I did both tests on a novel AI CPU.
To be precise on a AMD Ryzen AI 7 350.
But somehow I picked up rumors that AI CPUs now
might do Neural Network Branch Prediction. The
idea seems to exist in hardware at least since (2012):
Machine learning and artificial intelligence are
the current hype (again). In their new Ryzen
processors, AMD advertises the Neural Net
Prediction. It turns out this is was already
used in their older (2012) Piledriver architecture
used for example in the AMD A10-4600M. It is also
present in recent Samsung processors such as the
one powering the Galaxy S7. What is it really?
https://chasethedevil.github.io/post/the_neural_network_in_your_cpu/
It can be done with Convoluted Neural Networks (CNN):
BranchNet: A Convolutional Neural Network to
Predict Hard-To-Predict Branches
To this end, Tarsa et al. proposed using convolutional
neural networks (CNNs) that are trained at
compiletime to accurately predict branches that
TAGE cannot. Given enough profiling coverage, CNNs
learn input-independent branch correlations.
https://microarch.org/micro53/papers/738300a118.pdf
Interstingly the above shows cases a PGO based
Machine Learning for Branch Predictors. No clue
how they construct the CPU, that they can feed
it with offline constructed neural neutworks for
their own execution. Maybe an optimizer uses it?
But I guess a more modern solutions would not only
use CNN, but also an Attention Mechanism.
Bye
Mild Shock schrieb:
Hi,
I spent some time thinking about my primes.pl
test. And came to the conclusion that it
mainly tests the Prolog ALU. Things like
integer successor or integer modulo. Then
I found that Java has Math.floorMod() which
I wasn't using yet. And peng results are better:
/* Dogelog Player 2.1.2 for Java, today */
?- time(test).
% Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
true.
Maybe the Java backend picks a CPU instruction
for Math.floorMod() instead of executing the
longer code sequence that is needed to correct
rem/2 into mod/2. Who knows. I also reorganized
the code a little bit, and eliminated an extra
method call in all arithmetic functions, by
inlining the arithmetic function body in the
evaluable predicate definition code. Comparison
to old measurements and some measurements of
other Prolog systems:
/* Dogelog Player 2.1.2 for Java, weeks ago */
?- time(test).
% Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
true.
/* SWI-Prolog 9.0.4 */
?- time(test).
% 7,506,639 inferences, 0.363 CPU in 0.362 seconds
(100% CPU, 20693560 Lips)
true.
/* Scryer Prolog 0.9.4-639 */
?- time(test).
% CPU time: 0.365s, 7_517_613 inferences
true.
/* Trealla Prolog 2.82.23-3 */
?- time(test).
% Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
true.
Bye
P.S.: The code uses the hated mathematical mod/2,
and not the cheaper rem/2 that CPUs usually have:
test :-
len(L, 1000),
primes(L, _).
primes([], 1).
primes([J|L], J) :-
primes(L, I),
K is I+1,
search(L, K, J).
search(L, I, J) :-
mem(X, L),
I mod X =:= 0, !,
K is I+1,
search(L, K, J).
search(_, I, I).
mem(X, [X|_]).
mem(X, [_|Y]) :-
mem(X, Y).
len([], 0) :- !.
len([_|L], N) :-
N > 0,
M is N-1,
len(L, M).
Mild Shock schrieb:
Hi,
WebPL is already outdated I guess. It doesn't
show the versions of the other Prolog systems
it is using. While I had these results for
the primes example in the WebPL playground:
/* Trealla Prolog WASM */
(23568.9ms)
When I run the example here:
https://php.energy/trealla.html
I get better results:
/* trealla-js 0.27.1 */
?- time(test).
% Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips
Bye
Hi,
But not only Mobie AI and Desktop AI are making
a broader imprint now. We might also experience
Workstation AI, with a 3'000.- USD price tag:
You Can't Buy This... Yet! The NVIDIA GB10 from Dell
The New Superchip that Terrifies the Cloud! https://www.youtube.com/watch?v=x1qViw4xyVo
So whats going on? I was asking Phind, which is
driven by a 70B model tailored towards developers:
Q: Is there an AI inflection point right now ,
with NPUs in mobile, desktop and workstation
A: Evidence of the Inflection Point
- Mobile Leadership
NPUs originated in smartphones
Now becoming ubiquitous across all device types
Enabling sophisticated AI features at consumer price points
- Desktop Revolution
Major manufacturers implementing NPUs across product lines
Apple's Neural Engine integrated into M-series chips
Qualcomm, Intel, and AMD incorporating AI accelerators
- Workstation Transformation
Professional-grade NPUs in mobile workstations
Demonstrated superior performance for AI-specific tasks
Enabling local processing of previously cloud-dependent workloads
https://www.phind.com/search/cmgs1s6jv00023h67g5z2aaa0
Bye
Mild Shock schrieb:
Hi,
It seems I am having problems pacing with
all the new fancy toys. Wasn't able to really
benchmark my NPU from a Desktop AI machine,
picked the wrong driver. Need to try again.
What worked was benchmarking Mobile AI machines.
I just grabbed Geekbench AI and some devices:
USA Fab, M4:
sANN hANN qANN
iPad CPU 4848 7947 6353
iPad GPU 9752 11383 10051
iPad NPU 4873 36544 *51634*
China Fab, Snapdragon:
sANN hANN qANN
Redmi CPU 1044 950 1723
Redmi GPU 480 905 737
Redmi NNAPI 205 205 469
Redmi QNN 226 226 *10221*
Speed-Up via NPU is factor 10x. See the column
qANN which means quantizised artificial neural
networks, when NPU or QNN is picked.
The mobile AI NPUs are optimized using
mimimal amounts of energy, and minimal amounts
of space squeezing (distilling) everything
into INT8 and INT4.
Bye
Mild Shock schrieb:
Hi,
The change from 378 ms to 286 ms is around 25-30%
is insane. But I did both tests on a novel AI CPU.
To be precise on a AMD Ryzen AI 7 350.
But somehow I picked up rumors that AI CPUs now
might do Neural Network Branch Prediction. The
idea seems to exist in hardware at least since (2012):
Machine learning and artificial intelligence are
the current hype (again). In their new Ryzen
processors, AMD advertises the Neural Net
Prediction. It turns out this is was already
used in their older (2012) Piledriver architecture
used for example in the AMD A10-4600M. It is also
present in recent Samsung processors such as the
one powering the Galaxy S7. What is it really?
https://chasethedevil.github.io/post/the_neural_network_in_your_cpu/
It can be done with Convoluted Neural Networks (CNN):
BranchNet: A Convolutional Neural Network to
Predict Hard-To-Predict Branches
To this end, Tarsa et al. proposed using convolutional
neural networks (CNNs) that are trained at
compiletime to accurately predict branches that
TAGE cannot. Given enough profiling coverage, CNNs
learn input-independent branch correlations.
https://microarch.org/micro53/papers/738300a118.pdf
Interstingly the above shows cases a PGO based
Machine Learning for Branch Predictors. No clue
how they construct the CPU, that they can feed
it with offline constructed neural neutworks for
their own execution. Maybe an optimizer uses it?
But I guess a more modern solutions would not only
use CNN, but also an Attention Mechanism.
Bye
Mild Shock schrieb:
Hi,
I spent some time thinking about my primes.pl
test. And came to the conclusion that it
mainly tests the Prolog ALU. Things like
integer successor or integer modulo. Then
I found that Java has Math.floorMod() which
I wasn't using yet. And peng results are better:
/* Dogelog Player 2.1.2 for Java, today */
?- time(test).
% Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
true.
Maybe the Java backend picks a CPU instruction
for Math.floorMod() instead of executing the
longer code sequence that is needed to correct
rem/2 into mod/2. Who knows. I also reorganized
the code a little bit, and eliminated an extra
method call in all arithmetic functions, by
inlining the arithmetic function body in the
evaluable predicate definition code. Comparison
to old measurements and some measurements of
other Prolog systems:
/* Dogelog Player 2.1.2 for Java, weeks ago */
?- time(test).
% Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
true.
/* SWI-Prolog 9.0.4 */
?- time(test).
% 7,506,639 inferences, 0.363 CPU in 0.362 seconds
(100% CPU, 20693560 Lips)
true.
/* Scryer Prolog 0.9.4-639 */
?- time(test).
% CPU time: 0.365s, 7_517_613 inferences
true.
/* Trealla Prolog 2.82.23-3 */
?- time(test).
% Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
true.
Bye
P.S.: The code uses the hated mathematical mod/2,
and not the cheaper rem/2 that CPUs usually have:
test :-
len(L, 1000),
primes(L, _).
primes([], 1).
primes([J|L], J) :-
primes(L, I),
K is I+1,
search(L, K, J).
search(L, I, J) :-
mem(X, L),
I mod X =:= 0, !,
K is I+1,
search(L, K, J).
search(_, I, I).
mem(X, [X|_]).
mem(X, [_|Y]) :-
mem(X, Y).
len([], 0) :- !.
len([_|L], N) :-
N > 0,
M is N-1,
len(L, M).
Mild Shock schrieb:
Hi,
WebPL is already outdated I guess. It doesn't
show the versions of the other Prolog systems
it is using. While I had these results for
the primes example in the WebPL playground:
/* Trealla Prolog WASM */
(23568.9ms)
When I run the example here:
https://php.energy/trealla.html
I get better results:
/* trealla-js 0.27.1 */
?- time(test).
% Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips
Bye
Hi,
Thinks are definitively accelerating. I really would
like to use an AI that knows about all the News of today.
This bloody cut date is so annoying.
Further indicative that AI is accelerating:
In August 2025, Sam Altman dropped a bombshell:
*months, not years: Rushing GPT-6*
In August 2025, Sam Altman dropped a bombshell:
GPT-6 is already in development and coming sooner
than you think. Not in two years, but
potentially in months.
https://www.youtube.com/watch?v=44mJb5sKji0
Karpathy, coined vibe coding, released in October 2025:
*nanochat: The best ChatGPT that $100 can buy*
This repo is a full-stack implementation of an
LLM like ChatGPT in a single, clean, minimal,
hackable, dependency-lite codebase. nanochat is
designed to run on a single 8XH100 node via
scripts like speedrun.sh, that run the
entire pipeline start to end.
https://github.com/karpathy/nanochat
Bye
Mild Shock schrieb:
Hi,
But not only Mobie AI and Desktop AI are making
a broader imprint now. We might also experience
Workstation AI, with a 3'000.- USD price tag:
You Can't Buy This... Yet! The NVIDIA GB10 from Dell
The New Superchip that Terrifies the Cloud!
https://www.youtube.com/watch?v=x1qViw4xyVo
So whats going on? I was asking Phind, which is
driven by a 70B model tailored towards developers:
Q: Is there an AI inflection point right now ,
with NPUs in mobile, desktop and workstation
A: Evidence of the Inflection Point
- Mobile Leadership
NPUs originated in smartphones
Now becoming ubiquitous across all device types
Enabling sophisticated AI features at consumer price points
- Desktop Revolution
Major manufacturers implementing NPUs across product lines
Apple's Neural Engine integrated into M-series chips
Qualcomm, Intel, and AMD incorporating AI accelerators
- Workstation Transformation
Professional-grade NPUs in mobile workstations
Demonstrated superior performance for AI-specific tasks
Enabling local processing of previously cloud-dependent workloads
https://www.phind.com/search/cmgs1s6jv00023h67g5z2aaa0
Bye
Mild Shock schrieb:
Hi,
It seems I am having problems pacing with
all the new fancy toys. Wasn't able to really
benchmark my NPU from a Desktop AI machine,
picked the wrong driver. Need to try again.
What worked was benchmarking Mobile AI machines.
I just grabbed Geekbench AI and some devices:
USA Fab, M4:
sANN hANN qANN
iPad CPU 4848 7947 6353
iPad GPU 9752 11383 10051
iPad NPU 4873 36544 *51634*
China Fab, Snapdragon:
sANN hANN qANN
Redmi CPU 1044 950 1723
Redmi GPU 480 905 737
Redmi NNAPI 205 205 469
Redmi QNN 226 226 *10221*
Speed-Up via NPU is factor 10x. See the column
qANN which means quantizised artificial neural
networks, when NPU or QNN is picked.
The mobile AI NPUs are optimized using
mimimal amounts of energy, and minimal amounts
of space squeezing (distilling) everything
into INT8 and INT4.
Bye
Mild Shock schrieb:
Hi,
The change from 378 ms to 286 ms is around 25-30%
is insane. But I did both tests on a novel AI CPU.
To be precise on a AMD Ryzen AI 7 350.
But somehow I picked up rumors that AI CPUs now
might do Neural Network Branch Prediction. The
idea seems to exist in hardware at least since (2012):
Machine learning and artificial intelligence are
the current hype (again). In their new Ryzen
processors, AMD advertises the Neural Net
Prediction. It turns out this is was already
used in their older (2012) Piledriver architecture
used for example in the AMD A10-4600M. It is also
present in recent Samsung processors such as the
one powering the Galaxy S7. What is it really?
https://chasethedevil.github.io/post/the_neural_network_in_your_cpu/
It can be done with Convoluted Neural Networks (CNN):
BranchNet: A Convolutional Neural Network to
Predict Hard-To-Predict Branches
To this end, Tarsa et al. proposed using convolutional
neural networks (CNNs) that are trained at
compiletime to accurately predict branches that
TAGE cannot. Given enough profiling coverage, CNNs
learn input-independent branch correlations.
https://microarch.org/micro53/papers/738300a118.pdf
Interstingly the above shows cases a PGO based
Machine Learning for Branch Predictors. No clue
how they construct the CPU, that they can feed
it with offline constructed neural neutworks for
their own execution. Maybe an optimizer uses it?
But I guess a more modern solutions would not only
use CNN, but also an Attention Mechanism.
Bye
Mild Shock schrieb:
Hi,
I spent some time thinking about my primes.pl
test. And came to the conclusion that it
mainly tests the Prolog ALU. Things like
integer successor or integer modulo. Then
I found that Java has Math.floorMod() which
I wasn't using yet. And peng results are better:
/* Dogelog Player 2.1.2 for Java, today */
?- time(test).
% Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
true.
Maybe the Java backend picks a CPU instruction
for Math.floorMod() instead of executing the
longer code sequence that is needed to correct
rem/2 into mod/2. Who knows. I also reorganized
the code a little bit, and eliminated an extra
method call in all arithmetic functions, by
inlining the arithmetic function body in the
evaluable predicate definition code. Comparison
to old measurements and some measurements of
other Prolog systems:
/* Dogelog Player 2.1.2 for Java, weeks ago */
?- time(test).
% Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
true.
/* SWI-Prolog 9.0.4 */
?- time(test).
% 7,506,639 inferences, 0.363 CPU in 0.362 seconds
(100% CPU, 20693560 Lips)
true.
/* Scryer Prolog 0.9.4-639 */
?- time(test).
% CPU time: 0.365s, 7_517_613 inferences
true.
/* Trealla Prolog 2.82.23-3 */
?- time(test).
% Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
true.
Bye
P.S.: The code uses the hated mathematical mod/2,
and not the cheaper rem/2 that CPUs usually have:
test :-
len(L, 1000),
primes(L, _).
primes([], 1).
primes([J|L], J) :-
primes(L, I),
K is I+1,
search(L, K, J).
search(L, I, J) :-
mem(X, L),
I mod X =:= 0, !,
K is I+1,
search(L, K, J).
search(_, I, I).
mem(X, [X|_]).
mem(X, [_|Y]) :-
mem(X, Y).
len([], 0) :- !.
len([_|L], N) :-
N > 0,
M is N-1,
len(L, M).
Mild Shock schrieb:
Hi,
WebPL is already outdated I guess. It doesn't
show the versions of the other Prolog systems
it is using. While I had these results for
the primes example in the WebPL playground:
/* Trealla Prolog WASM */
(23568.9ms)
When I run the example here:
https://php.energy/trealla.html
I get better results:
/* trealla-js 0.27.1 */
?- time(test).
% Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips
Bye
100% Powered by AMD Ryzen™ AI NPU https://www.youtube.com/watch?v=0t8ijUPg4A0
Hi,
It seems I am having problems pacing with
all the new fancy toys. Wasn't able to really
benchmark my NPU from a Desktop AI machine,
picked the wrong driver. Need to try again.
What worked was benchmarking Mobile AI machines.
I just grabbed Geekbench AI and some devices:
USA Fab, M4:
sANN hANN qANN
iPad CPU 4848 7947 6353
iPad GPU 9752 11383 10051
iPad NPU 4873 36544 *51634*
China Fab, Snapdragon:
sANN hANN qANN
Redmi CPU 1044 950 1723
Redmi GPU 480 905 737
Redmi NNAPI 205 205 469
Redmi QNN 226 226 *10221*
Speed-Up via NPU is factor 10x. See the column
qANN which means quantizised artificial neural
networks, when NPU or QNN is picked.
The mobile AI NPUs are optimized using
mimimal amounts of energy, and minimal amounts
of space squeezing (distilling) everything
into INT8 and INT4.
Bye
Hi,
Give Julio Di Egidio the bloody money. He is
craving for 300 USD so that he can buy the
ISO Prolog core standard. Just imagine he would want
to build a MiniMind. Just lets put some more
prespective on the current costs:
This open-source project aims to train a super-small
language model MiniMind with only 3 RMB cost and
2 hours, starting completely from scratch. The
MiniMind series is extremely lightweight, with the
smallest version being 1/7000 the size of GPT-3,
making it possible to train quickly on even the
most ordinary personal GPUs. https://github.com/jingyaogong/minimind/blob/master/README_en.md
ChatGPT tells me that most of the numbers
are correct when you rent a GPU by the hour.
But what about a 100% ownership of a GPU for
a year. I find this might cost 12'000 USD.
One has to separate platforms for execution from
those platforms for training:
GEX44: for AI inference
Nvidia RTX™ 4000, 184 EUR / month
GEX130: for AI training
NVIDIA RTX™ 6000, 813 EUR / month https://www.hetzner.com/dedicated-rootserver/matrix-gpu/
Bye
Mild Shock schrieb:
Hi,
Thinks are definitively accelerating. I really would
like to use an AI that knows about all the News of today.
This bloody cut date is so annoying.
Further indicative that AI is accelerating:
In August 2025, Sam Altman dropped a bombshell:
*months, not years: Rushing GPT-6*
In August 2025, Sam Altman dropped a bombshell:
GPT-6 is already in development and coming sooner
than you think. Not in two years, but
potentially in months.
https://www.youtube.com/watch?v=44mJb5sKji0
Karpathy, coined vibe coding, released in October 2025:
*nanochat: The best ChatGPT that $100 can buy*
This repo is a full-stack implementation of an
LLM like ChatGPT in a single, clean, minimal,
hackable, dependency-lite codebase. nanochat is
designed to run on a single 8XH100 node via
scripts like speedrun.sh, that run the
entire pipeline start to end.
https://github.com/karpathy/nanochat
Bye
Mild Shock schrieb:
Hi,
But not only Mobie AI and Desktop AI are making
a broader imprint now. We might also experience
Workstation AI, with a 3'000.- USD price tag:
You Can't Buy This... Yet! The NVIDIA GB10 from Dell
The New Superchip that Terrifies the Cloud!
https://www.youtube.com/watch?v=x1qViw4xyVo
So whats going on? I was asking Phind, which is
driven by a 70B model tailored towards developers:
Q: Is there an AI inflection point right now ,
with NPUs in mobile, desktop and workstation
A: Evidence of the Inflection Point
- Mobile Leadership
NPUs originated in smartphones
Now becoming ubiquitous across all device types
Enabling sophisticated AI features at consumer price points
- Desktop Revolution
Major manufacturers implementing NPUs across product lines
Apple's Neural Engine integrated into M-series chips
Qualcomm, Intel, and AMD incorporating AI accelerators
- Workstation Transformation
Professional-grade NPUs in mobile workstations
Demonstrated superior performance for AI-specific tasks
Enabling local processing of previously cloud-dependent workloads
https://www.phind.com/search/cmgs1s6jv00023h67g5z2aaa0
Bye
Mild Shock schrieb:
Hi,
It seems I am having problems pacing with
all the new fancy toys. Wasn't able to really
benchmark my NPU from a Desktop AI machine,
picked the wrong driver. Need to try again.
What worked was benchmarking Mobile AI machines.
I just grabbed Geekbench AI and some devices:
USA Fab, M4:
sANN hANN qANN
iPad CPU 4848 7947 6353
iPad GPU 9752 11383 10051
iPad NPU 4873 36544 *51634*
China Fab, Snapdragon:
sANN hANN qANN
Redmi CPU 1044 950 1723
Redmi GPU 480 905 737
Redmi NNAPI 205 205 469
Redmi QNN 226 226 *10221*
Speed-Up via NPU is factor 10x. See the column
qANN which means quantizised artificial neural
networks, when NPU or QNN is picked.
The mobile AI NPUs are optimized using
mimimal amounts of energy, and minimal amounts
of space squeezing (distilling) everything
into INT8 and INT4.
Bye
Mild Shock schrieb:
Hi,
The change from 378 ms to 286 ms is around 25-30%
is insane. But I did both tests on a novel AI CPU.
To be precise on a AMD Ryzen AI 7 350.
But somehow I picked up rumors that AI CPUs now
might do Neural Network Branch Prediction. The
idea seems to exist in hardware at least since (2012):
Machine learning and artificial intelligence are
the current hype (again). In their new Ryzen
processors, AMD advertises the Neural Net
Prediction. It turns out this is was already
used in their older (2012) Piledriver architecture
used for example in the AMD A10-4600M. It is also
present in recent Samsung processors such as the
one powering the Galaxy S7. What is it really?
https://chasethedevil.github.io/post/the_neural_network_in_your_cpu/ >>>>>
It can be done with Convoluted Neural Networks (CNN):
BranchNet: A Convolutional Neural Network to
Predict Hard-To-Predict Branches
To this end, Tarsa et al. proposed using convolutional
neural networks (CNNs) that are trained at
compiletime to accurately predict branches that
TAGE cannot. Given enough profiling coverage, CNNs
learn input-independent branch correlations.
https://microarch.org/micro53/papers/738300a118.pdf
Interstingly the above shows cases a PGO based
Machine Learning for Branch Predictors. No clue
how they construct the CPU, that they can feed
it with offline constructed neural neutworks for
their own execution. Maybe an optimizer uses it?
But I guess a more modern solutions would not only
use CNN, but also an Attention Mechanism.
Bye
Mild Shock schrieb:
Hi,
I spent some time thinking about my primes.pl
test. And came to the conclusion that it
mainly tests the Prolog ALU. Things like
integer successor or integer modulo. Then
I found that Java has Math.floorMod() which
I wasn't using yet. And peng results are better:
/* Dogelog Player 2.1.2 for Java, today */
?- time(test).
% Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
true.
Maybe the Java backend picks a CPU instruction
for Math.floorMod() instead of executing the
longer code sequence that is needed to correct
rem/2 into mod/2. Who knows. I also reorganized
the code a little bit, and eliminated an extra
method call in all arithmetic functions, by
inlining the arithmetic function body in the
evaluable predicate definition code. Comparison
to old measurements and some measurements of
other Prolog systems:
/* Dogelog Player 2.1.2 for Java, weeks ago */
?- time(test).
% Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
true.
/* SWI-Prolog 9.0.4 */
?- time(test).
% 7,506,639 inferences, 0.363 CPU in 0.362 seconds
(100% CPU, 20693560 Lips)
true.
/* Scryer Prolog 0.9.4-639 */
?- time(test).
% CPU time: 0.365s, 7_517_613 inferences
true.
/* Trealla Prolog 2.82.23-3 */
?- time(test).
% Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
true.
Bye
P.S.: The code uses the hated mathematical mod/2,
and not the cheaper rem/2 that CPUs usually have:
test :-
len(L, 1000),
primes(L, _).
primes([], 1).
primes([J|L], J) :-
primes(L, I),
K is I+1,
search(L, K, J).
search(L, I, J) :-
mem(X, L),
I mod X =:= 0, !,
K is I+1,
search(L, K, J).
search(_, I, I).
mem(X, [X|_]).
mem(X, [_|Y]) :-
mem(X, Y).
len([], 0) :- !.
len([_|L], N) :-
N > 0,
M is N-1,
len(L, M).
Mild Shock schrieb:
Hi,
WebPL is already outdated I guess. It doesn't
show the versions of the other Prolog systems
it is using. While I had these results for
the primes example in the WebPL playground:
/* Trealla Prolog WASM */
(23568.9ms)
When I run the example here:
https://php.energy/trealla.html
I get better results:
/* trealla-js 0.27.1 */
?- time(test).
% Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips
Bye
Hi,--- Synchronet 3.21a-Linux NewsLink 1.2
It seems I am having problems pacing with
all the new fancy toys. Wasn't able to really
benchmark my NPU from a Desktop AI machine,
picked the wrong driver. Need to try again.
What worked was benchmarking Mobile AI machines.
I just grabbed Geekbench AI and some devices:
USA Fab, M4:
sANN hANN qANN
iPad CPU 4848 7947 6353
iPad GPU 9752 11383 10051
iPad NPU 4873 36544 *51634*
China Fab, Snapdragon:
sANN hANN qANN
Redmi CPU 1044 950 1723
Redmi GPU 480 905 737
Redmi NNAPI 205 205 469
Redmi QNN 226 226 *10221*
Speed-Up via NPU is factor 10x. See the column
qANN which means quantizised artificial neural
networks, when NPU or QNN is picked.
The mobile AI NPUs are optimized using
mimimal amounts of energy, and minimal amounts
of space squeezing (distilling) everything
into INT8 and INT4.
Bye
Hi,
Candidate Recommendation Draft - 30 September 2025 https://www.w3.org/TR/webnn
WebNN samples by Ningxin Hu, Intel, Shanghai https://github.com/webmachinelearning/webnn-samples
Bye
Mild Shock schrieb:
Hi,
It seems I am having problems pacing with
all the new fancy toys. Wasn't able to really
benchmark my NPU from a Desktop AI machine,
picked the wrong driver. Need to try again.
What worked was benchmarking Mobile AI machines.
I just grabbed Geekbench AI and some devices:
USA Fab, M4:
sANN hANN qANN
iPad CPU 4848 7947 6353
iPad GPU 9752 11383 10051
iPad NPU 4873 36544 *51634*
China Fab, Snapdragon:
sANN hANN qANN
Redmi CPU 1044 950 1723
Redmi GPU 480 905 737
Redmi NNAPI 205 205 469
Redmi QNN 226 226 *10221*
Speed-Up via NPU is factor 10x. See the column
qANN which means quantizised artificial neural
networks, when NPU or QNN is picked.
The mobile AI NPUs are optimized using
mimimal amounts of energy, and minimal amounts
of space squeezing (distilling) everything
into INT8 and INT4.
Bye
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,096 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 397:56:04 |
| Calls: | 14,036 |
| Calls today: | 2 |
| Files: | 187,082 |
| D/L today: |
2,427 files (1,569M bytes) |
| Messages: | 2,479,080 |