• WebPL and Scryer Prolog are bad examples (Was: WebPL is alreadyoutdated)

    From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Mon Oct 13 09:49:15 2025
    From Newsgroup: comp.lang.prolog

    Hi,

    Because GenX and later suffers from:

    Somehow methods and tools to realize
    efficient DCGs in Prolog are missing. Most
    DCG attempts that one sees succumb

    to some declarative nonsense, creating
    exponentially many spurious choice points,
    you find rarely somebody mastering the Art.

    Modern programmers fancy nothing else than
    throwing a set of foreign library to their
    Prolog system project. This is best seen in WebPL:

    LALRPOP MIT/Apache-2.0 Generate the parser https://github.com/w-henderson/WebPL/blob/main/dissertation.pdf

    So there is no aim at creating a self hosting
    Prolog system. There is a deep distrust in
    DCGs. But why build a Prolog system that will

    possibly ultimately have DCG, when you distrust
    in DCGs? The second problem of GenX and later
    is probably they don't know how to bootstrap

    a Prolog system B via another Prolog system A.

    Bye

    P.S.: The result of using a Parser Tool are
    often frustrating on the following levels:
    - No operator table
    - Directives are fixed
    - Introducong DCGs need rebuid

    Scryer Prolog has Operator Table, but mostlikely
    used a Parser Tool some time in the project,
    or programming templates borrow from Parser Tools.

    Probably the worst recent example building a
    Prolog system, which would have a reference for
    the Parsing in Rust itself. So we have 2025

    and there is not a single self hosting Prolog
    yet, while all other programming languages such
    as Java, golang, etc.. are self hosting.

    Mild Shock schrieb:
    Hi,

    WebPL is already outdated I guess. It doesn't
    show the versions of the other Prolog systems
    it is using. While I had these results for

    the primes example in the WebPL playground:

    /* Trealla Prolog WASM */
    (23568.9ms)

    When I run the example here:

    https://php.energy/trealla.html

    I get better results:

    /* trealla-js 0.27.1 */

    ?- time(test).
    % Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips

    Bye

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Mon Oct 13 15:09:44 2025
    From Newsgroup: comp.lang.prolog

    Hi,

    Maybe it is with Prolog like with Dinosaurs,
    when they got extinct by a meteor crash.
    All that survived were some small rodents
    as the story goes. Their advantage:
    - Small Size
    - Burrowing Behavior
    - Omnivorous Diet
    - Reproductive Speed

    Now Scryer Prolog uses a Shift-Reduce under
    the hood, the small rodent. But it might possibly
    shove their end-users heavy Tabled DCG into
    the face? So that this left recursion can be solved:

    expr --> expr + factor

    "constraint programming" got already killed
    when ILOG was bought by IBM in 2008. ILOG's
    optimization solver, CPLEX, has its roots in
    the CHIP (Constraint Handling in Prolog)

    language, 1985 at the European Computer-Industry
    Research Centre (ECRC), initially using a Prolog
    language interface. So its even not a Fench product.
    By the time ILOG became a commercial powerhouse,

    Prolog largely disappeared from their product
    codebases. There was a Transition to C++ for
    Performance and Industry Adoption. I have the
    gut feeling that Tabled DCG is similarly dead,

    especially in the light of large languages models (LLM).
    But I cannot point the figure yet perfectly
    at the issues. Currently exploring the sad problem
    domain of this mostlikely dead horse.

    A problem could be the overkill of "Logic Grammars",
    that do not tolerate incorrect texts and that cannot
    be applied so easy partially. Mostlikely one has
    to scrutinize the assumptions behind Tabled DCG, and

    review again the possibly options beyond the beaten paths.

    Bye

    Mild Shock schrieb:
    Hi,

    Because GenX and later suffers from:

    Somehow methods and tools to realize
    efficient DCGs in Prolog are missing. Most
    DCG attempts that one sees succumb

    to some declarative nonsense, creating
    exponentially many spurious choice points,
    you find rarely somebody mastering the Art.

    Modern programmers fancy nothing else than
    throwing a set of foreign library to their
    Prolog system project. This is best seen in WebPL:

    LALRPOP MIT/Apache-2.0 Generate the parser https://github.com/w-henderson/WebPL/blob/main/dissertation.pdf

    So there is no aim at creating a self hosting
    Prolog system. There is a deep distrust in
    DCGs. But why build a Prolog system that will

    possibly ultimately have DCG, when you distrust
    in DCGs? The second problem of GenX and later
    is probably they don't know how to bootstrap

    a Prolog system B via another Prolog system A.

    Bye

    P.S.: The result of using a Parser Tool are
    often frustrating on the following levels:
    - No operator table
    - Directives are fixed
    - Introducong DCGs need rebuid

    Scryer Prolog has Operator Table, but mostlikely
    used a Parser Tool some time in the project,
    or programming templates borrow from Parser Tools.

    Probably the worst recent example building a
    Prolog system, which would have a reference for
    the Parsing in Rust itself. So we have 2025

    and there is not a single self hosting Prolog
    yet, while all other programming languages such
    as Java, golang, etc.. are self hosting.

    Mild Shock schrieb:
    Hi,

    WebPL is already outdated I guess. It doesn't
    show the versions of the other Prolog systems
    it is using. While I had these results for

    the primes example in the WebPL playground:

    /* Trealla Prolog WASM */
    (23568.9ms)

    When I run the example here:

    https://php.energy/trealla.html

    I get better results:

    /* trealla-js 0.27.1 */

    ?- time(test).
    % Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips

    Bye


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Wed Oct 15 02:38:34 2025
    From Newsgroup: comp.lang.prolog

    Hi,

    I spent some time thinking about my primes.pl
    test. And came to the conclusion that it
    mainly tests the Prolog ALU. Things like

    integer successor or integer modulo. Then
    I found that Java has Math.floorMod() which
    I wasn't using yet. And peng results are better:

    /* Dogelog Player 2.1.2 for Java, today */
    ?- time(test).
    % Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
    true.

    Maybe the Java backend picks a CPU instruction
    for Math.floorMod() instead of executing the
    longer code sequence that is needed to correct

    rem/2 into mod/2. Who knows. I also reorganized
    the code a little bit, and eliminated an extra
    method call in all arithmetic functions, by

    inlining the arithmetic function body in the
    evaluable predicate definition code. Comparison
    to old measurements and some measurements of

    other Prolog systems:

    /* Dogelog Player 2.1.2 for Java, weeks ago */
    ?- time(test).
    % Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
    true.

    /* SWI-Prolog 9.0.4 */
    ?- time(test).
    % 7,506,639 inferences, 0.363 CPU in 0.362 seconds
    (100% CPU, 20693560 Lips)
    true.

    /* Scryer Prolog 0.9.4-639 */
    ?- time(test).
    % CPU time: 0.365s, 7_517_613 inferences
    true.

    /* Trealla Prolog 2.82.23-3 */
    ?- time(test).
    % Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
    true.

    Bye

    P.S.: The code uses the hated mathematical mod/2,
    and not the cheaper rem/2 that CPUs usually have:

    test :-
    len(L, 1000),
    primes(L, _).

    primes([], 1).
    primes([J|L], J) :-
    primes(L, I),
    K is I+1,
    search(L, K, J).

    search(L, I, J) :-
    mem(X, L),
    I mod X =:= 0, !,
    K is I+1,
    search(L, K, J).
    search(_, I, I).

    mem(X, [X|_]).
    mem(X, [_|Y]) :-
    mem(X, Y).

    len([], 0) :- !.
    len([_|L], N) :-
    N > 0,
    M is N-1,
    len(L, M).

    Mild Shock schrieb:
    Hi,

    WebPL is already outdated I guess. It doesn't
    show the versions of the other Prolog systems
    it is using. While I had these results for

    the primes example in the WebPL playground:

    /* Trealla Prolog WASM */
    (23568.9ms)

    When I run the example here:

    https://php.energy/trealla.html

    I get better results:

    /* trealla-js 0.27.1 */

    ?- time(test).
    % Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips

    Bye

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Wed Oct 15 04:33:12 2025
    From Newsgroup: comp.lang.prolog

    Hi,

    The change from 378 ms to 286 ms is around 25-30%
    is insane. But I did both tests on a novel AI CPU.
    To be precise on a AMD Ryzen AI 7 350.

    But somehow I picked up rumors that AI CPUs now
    might do Neural Network Branch Prediction. The
    idea seems to exist in hardware at least since (2012):

    Machine learning and artificial intelligence are
    the current hype (again). In their new Ryzen
    processors, AMD advertises the Neural Net
    Prediction. It turns out this is was already
    used in their older (2012) Piledriver architecture
    used for example in the AMD A10-4600M. It is also
    present in recent Samsung processors such as the
    one powering the Galaxy S7. What is it really? https://chasethedevil.github.io/post/the_neural_network_in_your_cpu/

    It can be done with Convoluted Neural Networks (CNN):

    BranchNet: A Convolutional Neural Network to
    Predict Hard-To-Predict Branches
    To this end, Tarsa et al. proposed using convolutional
    neural networks (CNNs) that are trained at
    compiletime to accurately predict branches that
    TAGE cannot. Given enough profiling coverage, CNNs
    learn input-independent branch correlations. https://microarch.org/micro53/papers/738300a118.pdf

    Interstingly the above shows cases a PGO based
    Machine Learning for Branch Predictors. No clue
    how they construct the CPU, that they can feed

    it with offline constructed neural neutworks for
    their own execution. Maybe an optimizer uses it?
    But I guess a more modern solutions would not only

    use CNN, but also an Attention Mechanism.

    Bye

    Mild Shock schrieb:
    Hi,

    I spent some time thinking about my primes.pl
    test. And came to the conclusion that it
    mainly tests the Prolog ALU. Things like

    integer successor or integer modulo. Then
    I found that Java has Math.floorMod() which
    I wasn't using yet. And peng results are better:

    /* Dogelog Player 2.1.2 for Java, today */
    ?- time(test).
    % Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
    true.

    Maybe the Java backend picks a CPU instruction
    for Math.floorMod() instead of executing the
    longer code sequence that is needed to correct

    rem/2 into mod/2. Who knows. I also reorganized
    the code a little bit, and eliminated an extra
    method call in all arithmetic functions, by

    inlining the arithmetic function body in the
    evaluable predicate definition code. Comparison
    to old measurements and some measurements of

    other Prolog systems:

    /* Dogelog Player 2.1.2 for Java, weeks ago */
    ?- time(test).
    % Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
    true.

    /* SWI-Prolog 9.0.4 */
    ?- time(test).
    % 7,506,639 inferences, 0.363 CPU in 0.362 seconds
    (100% CPU, 20693560 Lips)
    true.

    /* Scryer Prolog 0.9.4-639 */
    ?- time(test).
    % CPU time: 0.365s, 7_517_613 inferences
    true.

    /* Trealla Prolog 2.82.23-3 */
    ?- time(test).
    % Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
    true.

    Bye

    P.S.: The code uses the hated mathematical mod/2,
    and not the cheaper rem/2 that CPUs usually have:

    test :-
       len(L, 1000),
       primes(L, _).

    primes([], 1).
    primes([J|L], J) :-
       primes(L, I),
       K is I+1,
       search(L, K, J).

    search(L, I, J) :-
       mem(X, L),
       I mod X =:= 0, !,
       K is I+1,
       search(L, K, J).
    search(_, I, I).

    mem(X, [X|_]).
    mem(X, [_|Y]) :-
       mem(X, Y).

    len([], 0) :- !.
    len([_|L], N) :-
       N > 0,
       M is N-1,
       len(L, M).

    Mild Shock schrieb:
    Hi,

    WebPL is already outdated I guess. It doesn't
    show the versions of the other Prolog systems
    it is using. While I had these results for

    the primes example in the WebPL playground:

    /* Trealla Prolog WASM */
    (23568.9ms)

    When I run the example here:

    https://php.energy/trealla.html

    I get better results:

    /* trealla-js 0.27.1 */

    ?- time(test).
    % Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips

    Bye


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Wed Oct 15 16:04:08 2025
    From Newsgroup: comp.lang.prolog

    Hi,

    It seems I am having problems pacing with
    all the new fancy toys. Wasn't able to really
    benchmark my NPU from a Desktop AI machine,

    picked the wrong driver. Need to try again.
    What worked was benchmarking Mobile AI machines.
    I just grabbed Geekbench AI and some devices:

    USA Fab, M4:

    sANN hANN qANN
    iPad CPU 4848 7947 6353
    iPad GPU 9752 11383 10051
    iPad NPU 4873 36544 *51634*

    China Fab, Snapdragon:

    sANN hANN qANN
    Redmi CPU 1044 950 1723
    Redmi GPU 480 905 737
    Redmi NNAPI 205 205 469
    Redmi QNN 226 226 *10221*

    Speed-Up via NPU is factor 10x. See the column
    qANN which means quantizised artificial neural
    networks, when NPU or QNN is picked.

    The mobile AI NPUs are optimized using
    mimimal amounts of energy, and minimal amounts
    of space squeezing (distilling) everything

    into INT8 and INT4.

    Bye

    Mild Shock schrieb:
    Hi,

    The change from 378 ms to 286 ms is around 25-30%
    is insane. But I did both tests on a novel AI CPU.
    To be precise on a AMD Ryzen AI 7 350.

    But somehow I picked up rumors that AI CPUs now
    might do Neural Network Branch Prediction. The
    idea seems to exist in hardware at least since (2012):

    Machine learning and artificial intelligence are
    the current hype (again). In their new Ryzen
    processors, AMD advertises the Neural Net
    Prediction. It turns out this is was already
    used in their older (2012) Piledriver architecture
    used for example in the AMD A10-4600M. It is also
    present in recent Samsung processors such as the
    one powering the Galaxy S7. What is it really? https://chasethedevil.github.io/post/the_neural_network_in_your_cpu/

    It can be done with Convoluted Neural Networks (CNN):

    BranchNet: A Convolutional Neural Network to
    Predict Hard-To-Predict Branches
    To this end, Tarsa et al. proposed using convolutional
    neural networks (CNNs) that are trained at
    compiletime to accurately predict branches that
    TAGE cannot. Given enough profiling coverage, CNNs
    learn input-independent branch correlations. https://microarch.org/micro53/papers/738300a118.pdf

    Interstingly the above shows cases a PGO based
    Machine Learning for Branch Predictors. No clue
    how they construct the CPU, that they can feed

    it with offline constructed neural neutworks for
    their own execution. Maybe an optimizer uses it?
    But I guess a more modern  solutions would not only

    use CNN, but also an Attention Mechanism.

    Bye

    Mild Shock schrieb:
    Hi,

    I spent some time thinking about my primes.pl
    test. And came to the conclusion that it
    mainly tests the Prolog ALU. Things like

    integer successor or integer modulo. Then
    I found that Java has Math.floorMod() which
    I wasn't using yet. And peng results are better:

    /* Dogelog Player 2.1.2 for Java, today */
    ?- time(test).
    % Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
    true.

    Maybe the Java backend picks a CPU instruction
    for Math.floorMod() instead of executing the
    longer code sequence that is needed to correct

    rem/2 into mod/2. Who knows. I also reorganized
    the code a little bit, and eliminated an extra
    method call in all arithmetic functions, by

    inlining the arithmetic function body in the
    evaluable predicate definition code. Comparison
    to old measurements and some measurements of

    other Prolog systems:

    /* Dogelog Player 2.1.2 for Java, weeks ago */
    ?- time(test).
    % Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
    true.

    /* SWI-Prolog 9.0.4 */
    ?- time(test).
    % 7,506,639 inferences, 0.363 CPU in 0.362 seconds
    (100% CPU, 20693560 Lips)
    true.

    /* Scryer Prolog 0.9.4-639 */
    ?- time(test).
    % CPU time: 0.365s, 7_517_613 inferences
    true.

    /* Trealla Prolog 2.82.23-3 */
    ?- time(test).
    % Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
    true.

    Bye

    P.S.: The code uses the hated mathematical mod/2,
    and not the cheaper rem/2 that CPUs usually have:

    test :-
        len(L, 1000),
        primes(L, _).

    primes([], 1).
    primes([J|L], J) :-
        primes(L, I),
        K is I+1,
        search(L, K, J).

    search(L, I, J) :-
        mem(X, L),
        I mod X =:= 0, !,
        K is I+1,
        search(L, K, J).
    search(_, I, I).

    mem(X, [X|_]).
    mem(X, [_|Y]) :-
        mem(X, Y).

    len([], 0) :- !.
    len([_|L], N) :-
        N > 0,
        M is N-1,
        len(L, M).

    Mild Shock schrieb:
    Hi,

    WebPL is already outdated I guess. It doesn't
    show the versions of the other Prolog systems
    it is using. While I had these results for

    the primes example in the WebPL playground:

    /* Trealla Prolog WASM */
    (23568.9ms)

    When I run the example here:

    https://php.energy/trealla.html

    I get better results:

    /* trealla-js 0.27.1 */

    ?- time(test).
    % Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips

    Bye



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Wed Oct 15 16:10:42 2025
    From Newsgroup: comp.lang.prolog

    Hi,

    But not only Mobie AI and Desktop AI are making
    a broader imprint now. We might also experience
    Workstation AI, with a 3'000.- USD price tag:

    You Can't Buy This... Yet! The NVIDIA GB10 from Dell
    The New Superchip that Terrifies the Cloud! https://www.youtube.com/watch?v=x1qViw4xyVo

    So whats going on? I was asking Phind, which is
    driven by a 70B model tailored towards developers:

    Q: Is there an AI inflection point right now ,
    with NPUs in mobile, desktop and workstation

    A: Evidence of the Inflection Point

    - Mobile Leadership
    NPUs originated in smartphones
    Now becoming ubiquitous across all device types
    Enabling sophisticated AI features at consumer price points

    - Desktop Revolution
    Major manufacturers implementing NPUs across product lines
    Apple's Neural Engine integrated into M-series chips
    Qualcomm, Intel, and AMD incorporating AI accelerators

    - Workstation Transformation
    Professional-grade NPUs in mobile workstations
    Demonstrated superior performance for AI-specific tasks
    Enabling local processing of previously cloud-dependent workloads

    https://www.phind.com/search/cmgs1s6jv00023h67g5z2aaa0

    Bye

    Mild Shock schrieb:
    Hi,

    It seems I am having problems pacing with
    all the new fancy toys. Wasn't able to really
    benchmark my NPU from a Desktop AI machine,

    picked the wrong driver. Need to try again.
    What worked was benchmarking Mobile AI machines.
    I just grabbed Geekbench AI and some devices:

    USA Fab, M4:

        sANN    hANN    qANN
    iPad CPU    4848    7947    6353
    iPad GPU    9752    11383    10051
    iPad NPU    4873    36544    *51634*

    China Fab, Snapdragon:

        sANN    hANN    qANN
    Redmi CPU    1044    950    1723
    Redmi GPU    480    905    737
    Redmi NNAPI    205    205    469
    Redmi QNN    226    226    *10221*

    Speed-Up via NPU is factor 10x. See the column
    qANN which means quantizised artificial neural
    networks, when NPU or QNN is picked.

    The mobile AI NPUs are optimized using
    mimimal amounts of energy, and minimal amounts
    of space squeezing (distilling) everything

    into INT8 and INT4.

    Bye

    Mild Shock schrieb:
    Hi,

    The change from 378 ms to 286 ms is around 25-30%
    is insane. But I did both tests on a novel AI CPU.
    To be precise on a AMD Ryzen AI 7 350.

    But somehow I picked up rumors that AI CPUs now
    might do Neural Network Branch Prediction. The
    idea seems to exist in hardware at least since (2012):

    Machine learning and artificial intelligence are
    the current hype (again). In their new Ryzen
    processors, AMD advertises the Neural Net
    Prediction. It turns out this is was already
    used in their older (2012) Piledriver architecture
    used for example in the AMD A10-4600M. It is also
    present in recent Samsung processors such as the
    one powering the Galaxy S7. What is it really?
    https://chasethedevil.github.io/post/the_neural_network_in_your_cpu/

    It can be done with Convoluted Neural Networks (CNN):

    BranchNet: A Convolutional Neural Network to
    Predict Hard-To-Predict Branches
    To this end, Tarsa et al. proposed using convolutional
    neural networks (CNNs) that are trained at
    compiletime to accurately predict branches that
    TAGE cannot. Given enough profiling coverage, CNNs
    learn input-independent branch correlations.
    https://microarch.org/micro53/papers/738300a118.pdf

    Interstingly the above shows cases a PGO based
    Machine Learning for Branch Predictors. No clue
    how they construct the CPU, that they can feed

    it with offline constructed neural neutworks for
    their own execution. Maybe an optimizer uses it?
    But I guess a more modern  solutions would not only

    use CNN, but also an Attention Mechanism.

    Bye

    Mild Shock schrieb:
    Hi,

    I spent some time thinking about my primes.pl
    test. And came to the conclusion that it
    mainly tests the Prolog ALU. Things like

    integer successor or integer modulo. Then
    I found that Java has Math.floorMod() which
    I wasn't using yet. And peng results are better:

    /* Dogelog Player 2.1.2 for Java, today */
    ?- time(test).
    % Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
    true.

    Maybe the Java backend picks a CPU instruction
    for Math.floorMod() instead of executing the
    longer code sequence that is needed to correct

    rem/2 into mod/2. Who knows. I also reorganized
    the code a little bit, and eliminated an extra
    method call in all arithmetic functions, by

    inlining the arithmetic function body in the
    evaluable predicate definition code. Comparison
    to old measurements and some measurements of

    other Prolog systems:

    /* Dogelog Player 2.1.2 for Java, weeks ago */
    ?- time(test).
    % Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
    true.

    /* SWI-Prolog 9.0.4 */
    ?- time(test).
    % 7,506,639 inferences, 0.363 CPU in 0.362 seconds
    (100% CPU, 20693560 Lips)
    true.

    /* Scryer Prolog 0.9.4-639 */
    ?- time(test).
    % CPU time: 0.365s, 7_517_613 inferences
    true.

    /* Trealla Prolog 2.82.23-3 */
    ?- time(test).
    % Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
    true.

    Bye

    P.S.: The code uses the hated mathematical mod/2,
    and not the cheaper rem/2 that CPUs usually have:

    test :-
        len(L, 1000),
        primes(L, _).

    primes([], 1).
    primes([J|L], J) :-
        primes(L, I),
        K is I+1,
        search(L, K, J).

    search(L, I, J) :-
        mem(X, L),
        I mod X =:= 0, !,
        K is I+1,
        search(L, K, J).
    search(_, I, I).

    mem(X, [X|_]).
    mem(X, [_|Y]) :-
        mem(X, Y).

    len([], 0) :- !.
    len([_|L], N) :-
        N > 0,
        M is N-1,
        len(L, M).

    Mild Shock schrieb:
    Hi,

    WebPL is already outdated I guess. It doesn't
    show the versions of the other Prolog systems
    it is using. While I had these results for

    the primes example in the WebPL playground:

    /* Trealla Prolog WASM */
    (23568.9ms)

    When I run the example here:

    https://php.energy/trealla.html

    I get better results:

    /* trealla-js 0.27.1 */

    ?- time(test).
    % Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips

    Bye




    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Sat Oct 18 15:57:36 2025
    From Newsgroup: comp.lang.prolog

    Hi,

    Thinks are definitively accelerating. I really would
    like to use an AI that knows about all the News of today.
    This bloody cut date is so annoying.

    Further indicative that AI is accelerating:

    In August 2025, Sam Altman dropped a bombshell:

    *months, not years: Rushing GPT-6*
    In August 2025, Sam Altman dropped a bombshell:
    GPT-6 is already in development and coming sooner
    than you think. Not in two years, but
    potentially in months.
    https://www.youtube.com/watch?v=44mJb5sKji0

    Karpathy, coined vibe coding, released in October 2025:

    *nanochat: The best ChatGPT that $100 can buy*
    This repo is a full-stack implementation of an
    LLM like ChatGPT in a single, clean, minimal,
    hackable, dependency-lite codebase. nanochat is
    designed to run on a single 8XH100 node via
    scripts like speedrun.sh, that run the
    entire pipeline start to end.
    https://github.com/karpathy/nanochat

    Bye

    Mild Shock schrieb:
    Hi,

    But not only Mobie AI and Desktop AI are making
    a broader imprint now. We might also experience
    Workstation AI, with a 3'000.- USD price tag:

    You Can't Buy This... Yet! The NVIDIA GB10 from Dell
    The New Superchip that Terrifies the Cloud! https://www.youtube.com/watch?v=x1qViw4xyVo

    So whats going on? I was asking Phind, which is
    driven by a 70B model tailored towards developers:

    Q: Is there an AI inflection point right now ,
       with NPUs in mobile, desktop and workstation

    A: Evidence of the Inflection Point

    - Mobile Leadership
      NPUs originated in smartphones
      Now becoming ubiquitous across all device types
      Enabling sophisticated AI features at consumer price points

    - Desktop Revolution
      Major manufacturers implementing NPUs across product lines
      Apple's Neural Engine integrated into M-series chips
      Qualcomm, Intel, and AMD incorporating AI accelerators

    - Workstation Transformation
      Professional-grade NPUs in mobile workstations
      Demonstrated superior performance for AI-specific tasks
      Enabling local processing of previously cloud-dependent workloads

    https://www.phind.com/search/cmgs1s6jv00023h67g5z2aaa0

    Bye

    Mild Shock schrieb:
    Hi,

    It seems I am having problems pacing with
    all the new fancy toys. Wasn't able to really
    benchmark my NPU from a Desktop AI machine,

    picked the wrong driver. Need to try again.
    What worked was benchmarking Mobile AI machines.
    I just grabbed Geekbench AI and some devices:

    USA Fab, M4:

         sANN    hANN    qANN
    iPad CPU    4848    7947    6353
    iPad GPU    9752    11383    10051
    iPad NPU    4873    36544    *51634*

    China Fab, Snapdragon:

         sANN    hANN    qANN
    Redmi CPU    1044    950    1723
    Redmi GPU    480    905    737
    Redmi NNAPI    205    205    469
    Redmi QNN    226    226    *10221*

    Speed-Up via NPU is factor 10x. See the column
    qANN which means quantizised artificial neural
    networks, when NPU or QNN is picked.

    The mobile AI NPUs are optimized using
    mimimal amounts of energy, and minimal amounts
    of space squeezing (distilling) everything

    into INT8 and INT4.

    Bye

    Mild Shock schrieb:
    Hi,

    The change from 378 ms to 286 ms is around 25-30%
    is insane. But I did both tests on a novel AI CPU.
    To be precise on a AMD Ryzen AI 7 350.

    But somehow I picked up rumors that AI CPUs now
    might do Neural Network Branch Prediction. The
    idea seems to exist in hardware at least since (2012):

    Machine learning and artificial intelligence are
    the current hype (again). In their new Ryzen
    processors, AMD advertises the Neural Net
    Prediction. It turns out this is was already
    used in their older (2012) Piledriver architecture
    used for example in the AMD A10-4600M. It is also
    present in recent Samsung processors such as the
    one powering the Galaxy S7. What is it really?
    https://chasethedevil.github.io/post/the_neural_network_in_your_cpu/

    It can be done with Convoluted Neural Networks (CNN):

    BranchNet: A Convolutional Neural Network to
    Predict Hard-To-Predict Branches
    To this end, Tarsa et al. proposed using convolutional
    neural networks (CNNs) that are trained at
    compiletime to accurately predict branches that
    TAGE cannot. Given enough profiling coverage, CNNs
    learn input-independent branch correlations.
    https://microarch.org/micro53/papers/738300a118.pdf

    Interstingly the above shows cases a PGO based
    Machine Learning for Branch Predictors. No clue
    how they construct the CPU, that they can feed

    it with offline constructed neural neutworks for
    their own execution. Maybe an optimizer uses it?
    But I guess a more modern  solutions would not only

    use CNN, but also an Attention Mechanism.

    Bye

    Mild Shock schrieb:
    Hi,

    I spent some time thinking about my primes.pl
    test. And came to the conclusion that it
    mainly tests the Prolog ALU. Things like

    integer successor or integer modulo. Then
    I found that Java has Math.floorMod() which
    I wasn't using yet. And peng results are better:

    /* Dogelog Player 2.1.2 for Java, today */
    ?- time(test).
    % Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
    true.

    Maybe the Java backend picks a CPU instruction
    for Math.floorMod() instead of executing the
    longer code sequence that is needed to correct

    rem/2 into mod/2. Who knows. I also reorganized
    the code a little bit, and eliminated an extra
    method call in all arithmetic functions, by

    inlining the arithmetic function body in the
    evaluable predicate definition code. Comparison
    to old measurements and some measurements of

    other Prolog systems:

    /* Dogelog Player 2.1.2 for Java, weeks ago */
    ?- time(test).
    % Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
    true.

    /* SWI-Prolog 9.0.4 */
    ?- time(test).
    % 7,506,639 inferences, 0.363 CPU in 0.362 seconds
    (100% CPU, 20693560 Lips)
    true.

    /* Scryer Prolog 0.9.4-639 */
    ?- time(test).
    % CPU time: 0.365s, 7_517_613 inferences
    true.

    /* Trealla Prolog 2.82.23-3 */
    ?- time(test).
    % Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
    true.

    Bye

    P.S.: The code uses the hated mathematical mod/2,
    and not the cheaper rem/2 that CPUs usually have:

    test :-
        len(L, 1000),
        primes(L, _).

    primes([], 1).
    primes([J|L], J) :-
        primes(L, I),
        K is I+1,
        search(L, K, J).

    search(L, I, J) :-
        mem(X, L),
        I mod X =:= 0, !,
        K is I+1,
        search(L, K, J).
    search(_, I, I).

    mem(X, [X|_]).
    mem(X, [_|Y]) :-
        mem(X, Y).

    len([], 0) :- !.
    len([_|L], N) :-
        N > 0,
        M is N-1,
        len(L, M).

    Mild Shock schrieb:
    Hi,

    WebPL is already outdated I guess. It doesn't
    show the versions of the other Prolog systems
    it is using. While I had these results for

    the primes example in the WebPL playground:

    /* Trealla Prolog WASM */
    (23568.9ms)

    When I run the example here:

    https://php.energy/trealla.html

    I get better results:

    /* trealla-js 0.27.1 */

    ?- time(test).
    % Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips

    Bye





    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Sat Oct 18 16:19:49 2025
    From Newsgroup: comp.lang.prolog

    Hi,

    Give Julio Di Egidio the bloody money. He is
    craving for 300 USD so that he can buy the
    ISO Prolog core standard. Just imagine he would want

    to build a MiniMind. Just lets put some more
    prespective on the current costs:

    This open-source project aims to train a super-small
    language model MiniMind with only 3 RMB cost and
    2 hours, starting completely from scratch. The
    MiniMind series is extremely lightweight, with the
    smallest version being 1/7000 the size of GPT-3,
    making it possible to train quickly on even the
    most ordinary personal GPUs. https://github.com/jingyaogong/minimind/blob/master/README_en.md

    ChatGPT tells me that most of the numbers
    are correct when you rent a GPU by the hour.
    But what about a 100% ownership of a GPU for

    a year. I find this might cost 12'000 USD.
    One has to separate platforms for execution from
    those platforms for training:

    GEX44: for AI inference
    Nvidia RTX™ 4000, 184 EUR / month

    GEX130: for AI training
    NVIDIA RTX™ 6000, 813 EUR / month https://www.hetzner.com/dedicated-rootserver/matrix-gpu/

    Bye

    Mild Shock schrieb:
    Hi,

    Thinks are definitively accelerating. I really would
    like to use an AI that knows about all the News of today.
    This bloody cut date is so annoying.

    Further indicative that AI is accelerating:

    In August 2025, Sam Altman dropped a bombshell:

    *months, not years: Rushing GPT-6*
    In August 2025, Sam Altman dropped a bombshell:
    GPT-6 is already in development and coming sooner
    than you think. Not in two years, but
    potentially in months.
    https://www.youtube.com/watch?v=44mJb5sKji0

    Karpathy, coined vibe coding, released in October 2025:

    *nanochat: The best ChatGPT that $100 can buy*
    This repo is a full-stack implementation of an
    LLM like ChatGPT in a single, clean, minimal,
    hackable, dependency-lite codebase. nanochat is
    designed to run on a single 8XH100 node via
    scripts like speedrun.sh, that run the
    entire pipeline start to end.
    https://github.com/karpathy/nanochat

    Bye

    Mild Shock schrieb:
    Hi,

    But not only Mobie AI and Desktop AI are making
    a broader imprint now. We might also experience
    Workstation AI, with a 3'000.- USD price tag:

    You Can't Buy This... Yet! The NVIDIA GB10 from Dell
    The New Superchip that Terrifies the Cloud!
    https://www.youtube.com/watch?v=x1qViw4xyVo

    So whats going on? I was asking Phind, which is
    driven by a 70B model tailored towards developers:

    Q: Is there an AI inflection point right now ,
        with NPUs in mobile, desktop and workstation

    A: Evidence of the Inflection Point

    - Mobile Leadership
       NPUs originated in smartphones
       Now becoming ubiquitous across all device types
       Enabling sophisticated AI features at consumer price points

    - Desktop Revolution
       Major manufacturers implementing NPUs across product lines
       Apple's Neural Engine integrated into M-series chips
       Qualcomm, Intel, and AMD incorporating AI accelerators

    - Workstation Transformation
       Professional-grade NPUs in mobile workstations
       Demonstrated superior performance for AI-specific tasks
       Enabling local processing of previously cloud-dependent workloads

    https://www.phind.com/search/cmgs1s6jv00023h67g5z2aaa0

    Bye

    Mild Shock schrieb:
    Hi,

    It seems I am having problems pacing with
    all the new fancy toys. Wasn't able to really
    benchmark my NPU from a Desktop AI machine,

    picked the wrong driver. Need to try again.
    What worked was benchmarking Mobile AI machines.
    I just grabbed Geekbench AI and some devices:

    USA Fab, M4:

         sANN    hANN    qANN
    iPad CPU    4848    7947    6353
    iPad GPU    9752    11383    10051
    iPad NPU    4873    36544    *51634*

    China Fab, Snapdragon:

         sANN    hANN    qANN
    Redmi CPU    1044    950    1723
    Redmi GPU    480    905    737
    Redmi NNAPI    205    205    469
    Redmi QNN    226    226    *10221*

    Speed-Up via NPU is factor 10x. See the column
    qANN which means quantizised artificial neural
    networks, when NPU or QNN is picked.

    The mobile AI NPUs are optimized using
    mimimal amounts of energy, and minimal amounts
    of space squeezing (distilling) everything

    into INT8 and INT4.

    Bye

    Mild Shock schrieb:
    Hi,

    The change from 378 ms to 286 ms is around 25-30%
    is insane. But I did both tests on a novel AI CPU.
    To be precise on a AMD Ryzen AI 7 350.

    But somehow I picked up rumors that AI CPUs now
    might do Neural Network Branch Prediction. The
    idea seems to exist in hardware at least since (2012):

    Machine learning and artificial intelligence are
    the current hype (again). In their new Ryzen
    processors, AMD advertises the Neural Net
    Prediction. It turns out this is was already
    used in their older (2012) Piledriver architecture
    used for example in the AMD A10-4600M. It is also
    present in recent Samsung processors such as the
    one powering the Galaxy S7. What is it really?
    https://chasethedevil.github.io/post/the_neural_network_in_your_cpu/

    It can be done with Convoluted Neural Networks (CNN):

    BranchNet: A Convolutional Neural Network to
    Predict Hard-To-Predict Branches
    To this end, Tarsa et al. proposed using convolutional
    neural networks (CNNs) that are trained at
    compiletime to accurately predict branches that
    TAGE cannot. Given enough profiling coverage, CNNs
    learn input-independent branch correlations.
    https://microarch.org/micro53/papers/738300a118.pdf

    Interstingly the above shows cases a PGO based
    Machine Learning for Branch Predictors. No clue
    how they construct the CPU, that they can feed

    it with offline constructed neural neutworks for
    their own execution. Maybe an optimizer uses it?
    But I guess a more modern  solutions would not only

    use CNN, but also an Attention Mechanism.

    Bye

    Mild Shock schrieb:
    Hi,

    I spent some time thinking about my primes.pl
    test. And came to the conclusion that it
    mainly tests the Prolog ALU. Things like

    integer successor or integer modulo. Then
    I found that Java has Math.floorMod() which
    I wasn't using yet. And peng results are better:

    /* Dogelog Player 2.1.2 for Java, today */
    ?- time(test).
    % Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
    true.

    Maybe the Java backend picks a CPU instruction
    for Math.floorMod() instead of executing the
    longer code sequence that is needed to correct

    rem/2 into mod/2. Who knows. I also reorganized
    the code a little bit, and eliminated an extra
    method call in all arithmetic functions, by

    inlining the arithmetic function body in the
    evaluable predicate definition code. Comparison
    to old measurements and some measurements of

    other Prolog systems:

    /* Dogelog Player 2.1.2 for Java, weeks ago */
    ?- time(test).
    % Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
    true.

    /* SWI-Prolog 9.0.4 */
    ?- time(test).
    % 7,506,639 inferences, 0.363 CPU in 0.362 seconds
    (100% CPU, 20693560 Lips)
    true.

    /* Scryer Prolog 0.9.4-639 */
    ?- time(test).
    % CPU time: 0.365s, 7_517_613 inferences
    true.

    /* Trealla Prolog 2.82.23-3 */
    ?- time(test).
    % Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
    true.

    Bye

    P.S.: The code uses the hated mathematical mod/2,
    and not the cheaper rem/2 that CPUs usually have:

    test :-
        len(L, 1000),
        primes(L, _).

    primes([], 1).
    primes([J|L], J) :-
        primes(L, I),
        K is I+1,
        search(L, K, J).

    search(L, I, J) :-
        mem(X, L),
        I mod X =:= 0, !,
        K is I+1,
        search(L, K, J).
    search(_, I, I).

    mem(X, [X|_]).
    mem(X, [_|Y]) :-
        mem(X, Y).

    len([], 0) :- !.
    len([_|L], N) :-
        N > 0,
        M is N-1,
        len(L, M).

    Mild Shock schrieb:
    Hi,

    WebPL is already outdated I guess. It doesn't
    show the versions of the other Prolog systems
    it is using. While I had these results for

    the primes example in the WebPL playground:

    /* Trealla Prolog WASM */
    (23568.9ms)

    When I run the example here:

    https://php.energy/trealla.html

    I get better results:

    /* trealla-js 0.27.1 */

    ?- time(test).
    % Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips

    Bye






    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Sat Oct 18 18:59:11 2025
    From Newsgroup: comp.lang.prolog

    Hi,

    This is probably what my AI Laptop can do. The
    demo shows AMD Ryzen AI 7 340 micro. I have a
    AMD Ryzen AI 7 350 laptop.

    100% Powered by AMD Ryzen™ AI NPU https://www.youtube.com/watch?v=0t8ijUPg4A0

    Only I am too stupid / too lazy to dig up the right
    drivers and install FastFlowLM. But one sees in
    the video how the NPU gets 30% - 60% occupied,

    and it does the transcription of a YouTube video,
    into text (via Whisper-large-v3-turbo from OpenAI).
    The demo then switches to summarize mode

    (via GPT-OSS-20B from OpenAI). And boom the,
    NPU goes to 100%!

    Bye

    P.S.: I didn't know about the OpenAI and AMD partnership,
    also buying an AMD AI Laptop was not motivated by
    this development. Not sure whether its really a big thing:

    AMD and OpenAI announce partnership https://openai.com/index/openai-amd-strategic-partnership/

    It might be a good thing for end users like me, if
    Edge uses cases like the above become more common.
    But they still depend on models trained not on the Edge,

    but in a Data Center. The AMD Instinct product line directly
    competes with Nvidia's Tesla and Intel's Xeon Phi and Data
    Center GPU lines of machine learning and GPGPU cards.

    Mild Shock schrieb:
    Hi,

    It seems I am having problems pacing with
    all the new fancy toys. Wasn't able to really
    benchmark my NPU from a Desktop AI machine,

    picked the wrong driver. Need to try again.
    What worked was benchmarking Mobile AI machines.
    I just grabbed Geekbench AI and some devices:

    USA Fab, M4:

        sANN    hANN    qANN
    iPad CPU    4848    7947    6353
    iPad GPU    9752    11383    10051
    iPad NPU    4873    36544    *51634*

    China Fab, Snapdragon:

        sANN    hANN    qANN
    Redmi CPU    1044    950    1723
    Redmi GPU    480    905    737
    Redmi NNAPI    205    205    469
    Redmi QNN    226    226    *10221*

    Speed-Up via NPU is factor 10x. See the column
    qANN which means quantizised artificial neural
    networks, when NPU or QNN is picked.

    The mobile AI NPUs are optimized using
    mimimal amounts of energy, and minimal amounts
    of space squeezing (distilling) everything

    into INT8 and INT4.

    Bye


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Tue Oct 21 00:32:55 2025
    From Newsgroup: comp.lang.prolog

    Hi,

    Vertex AI Training is more expensive:

    "Vertex AI provides a managed training
    service that enables you to operationalize
    large scale model training. You can use
    Vertex AI to run training applications
    based on any machine learning (ML) framework
    on Google Cloud infrastructure. Vertex AI
    also has integrated support that simplifies
    the preparation process for model training
    and serving for the PyTorch, TensorFlow,
    scikit-learn, and XGBoost frameworks. https://cloud.google.com/products/calculator

    For 10 Jobs à 10 hours, it wants 1000 USD
    from me. The accelerator is TPU V3. Nevertheless
    might be the cheaper option to use the RTX™ 6000

    for 720 hours. ChatGPT gives me this calculation:

    Scenario I: Single TPU, RTX 6000 wins
    4.43×10^19 FLOPs versus ~2.36×10^20 FLOPs

    Scenario II: 8 chips TPU, RTX 6000 looses
    ~3.54×10^20 FLOPs versus ~2.36×10^20 FLOPs

    Bye

    Mild Shock schrieb:
    Hi,

    Give Julio Di Egidio the bloody money. He is
    craving for 300 USD so that he can buy the
    ISO Prolog core standard. Just imagine he would want

    to build a MiniMind. Just lets put some more
    prespective on the current costs:

    This open-source project aims to train a super-small
    language model MiniMind with only 3 RMB cost and
    2 hours, starting completely from scratch. The
    MiniMind series is extremely lightweight, with the
    smallest version being 1/7000 the size of GPT-3,
    making it possible to train quickly on even the
    most ordinary personal GPUs. https://github.com/jingyaogong/minimind/blob/master/README_en.md

    ChatGPT tells me that most of the numbers
    are correct when you rent a GPU by the hour.
    But what about a 100% ownership of a GPU for

    a year. I find this might cost 12'000 USD.
    One has to separate platforms for execution from
    those platforms for training:

    GEX44: for AI inference
    Nvidia RTX™ 4000, 184 EUR / month

    GEX130: for AI training
    NVIDIA RTX™ 6000, 813 EUR / month https://www.hetzner.com/dedicated-rootserver/matrix-gpu/

    Bye

    Mild Shock schrieb:
    Hi,

    Thinks are definitively accelerating. I really would
    like to use an AI that knows about all the News of today.
    This bloody cut date is so annoying.

    Further indicative that AI is accelerating:

    In August 2025, Sam Altman dropped a bombshell:

    *months, not years: Rushing GPT-6*
    In August 2025, Sam Altman dropped a bombshell:
    GPT-6 is already in development and coming sooner
    than you think. Not in two years, but
    potentially in months.
    https://www.youtube.com/watch?v=44mJb5sKji0

    Karpathy, coined vibe coding, released in October 2025:

    *nanochat: The best ChatGPT that $100 can buy*
    This repo is a full-stack implementation of an
    LLM like ChatGPT in a single, clean, minimal,
    hackable, dependency-lite codebase. nanochat is
    designed to run on a single 8XH100 node via
    scripts like speedrun.sh, that run the
    entire pipeline start to end.
    https://github.com/karpathy/nanochat

    Bye

    Mild Shock schrieb:
    Hi,

    But not only Mobie AI and Desktop AI are making
    a broader imprint now. We might also experience
    Workstation AI, with a 3'000.- USD price tag:

    You Can't Buy This... Yet! The NVIDIA GB10 from Dell
    The New Superchip that Terrifies the Cloud!
    https://www.youtube.com/watch?v=x1qViw4xyVo

    So whats going on? I was asking Phind, which is
    driven by a 70B model tailored towards developers:

    Q: Is there an AI inflection point right now ,
        with NPUs in mobile, desktop and workstation

    A: Evidence of the Inflection Point

    - Mobile Leadership
       NPUs originated in smartphones
       Now becoming ubiquitous across all device types
       Enabling sophisticated AI features at consumer price points

    - Desktop Revolution
       Major manufacturers implementing NPUs across product lines
       Apple's Neural Engine integrated into M-series chips
       Qualcomm, Intel, and AMD incorporating AI accelerators

    - Workstation Transformation
       Professional-grade NPUs in mobile workstations
       Demonstrated superior performance for AI-specific tasks
       Enabling local processing of previously cloud-dependent workloads

    https://www.phind.com/search/cmgs1s6jv00023h67g5z2aaa0

    Bye

    Mild Shock schrieb:
    Hi,

    It seems I am having problems pacing with
    all the new fancy toys. Wasn't able to really
    benchmark my NPU from a Desktop AI machine,

    picked the wrong driver. Need to try again.
    What worked was benchmarking Mobile AI machines.
    I just grabbed Geekbench AI and some devices:

    USA Fab, M4:

         sANN    hANN    qANN
    iPad CPU    4848    7947    6353
    iPad GPU    9752    11383    10051
    iPad NPU    4873    36544    *51634*

    China Fab, Snapdragon:

         sANN    hANN    qANN
    Redmi CPU    1044    950    1723
    Redmi GPU    480    905    737
    Redmi NNAPI    205    205    469
    Redmi QNN    226    226    *10221*

    Speed-Up via NPU is factor 10x. See the column
    qANN which means quantizised artificial neural
    networks, when NPU or QNN is picked.

    The mobile AI NPUs are optimized using
    mimimal amounts of energy, and minimal amounts
    of space squeezing (distilling) everything

    into INT8 and INT4.

    Bye

    Mild Shock schrieb:
    Hi,

    The change from 378 ms to 286 ms is around 25-30%
    is insane. But I did both tests on a novel AI CPU.
    To be precise on a AMD Ryzen AI 7 350.

    But somehow I picked up rumors that AI CPUs now
    might do Neural Network Branch Prediction. The
    idea seems to exist in hardware at least since (2012):

    Machine learning and artificial intelligence are
    the current hype (again). In their new Ryzen
    processors, AMD advertises the Neural Net
    Prediction. It turns out this is was already
    used in their older (2012) Piledriver architecture
    used for example in the AMD A10-4600M. It is also
    present in recent Samsung processors such as the
    one powering the Galaxy S7. What is it really?
    https://chasethedevil.github.io/post/the_neural_network_in_your_cpu/ >>>>>
    It can be done with Convoluted Neural Networks (CNN):

    BranchNet: A Convolutional Neural Network to
    Predict Hard-To-Predict Branches
    To this end, Tarsa et al. proposed using convolutional
    neural networks (CNNs) that are trained at
    compiletime to accurately predict branches that
    TAGE cannot. Given enough profiling coverage, CNNs
    learn input-independent branch correlations.
    https://microarch.org/micro53/papers/738300a118.pdf

    Interstingly the above shows cases a PGO based
    Machine Learning for Branch Predictors. No clue
    how they construct the CPU, that they can feed

    it with offline constructed neural neutworks for
    their own execution. Maybe an optimizer uses it?
    But I guess a more modern  solutions would not only

    use CNN, but also an Attention Mechanism.

    Bye

    Mild Shock schrieb:
    Hi,

    I spent some time thinking about my primes.pl
    test. And came to the conclusion that it
    mainly tests the Prolog ALU. Things like

    integer successor or integer modulo. Then
    I found that Java has Math.floorMod() which
    I wasn't using yet. And peng results are better:

    /* Dogelog Player 2.1.2 for Java, today */
    ?- time(test).
    % Zeit 286 ms, GC 1 ms, Lips 26302430, Uhr 15.10.2025 02:31
    true.

    Maybe the Java backend picks a CPU instruction
    for Math.floorMod() instead of executing the
    longer code sequence that is needed to correct

    rem/2 into mod/2. Who knows. I also reorganized
    the code a little bit, and eliminated an extra
    method call in all arithmetic functions, by

    inlining the arithmetic function body in the
    evaluable predicate definition code. Comparison
    to old measurements and some measurements of

    other Prolog systems:

    /* Dogelog Player 2.1.2 for Java, weeks ago */
    ?- time(test).
    % Zeit 378 ms, GC 1 ms, Lips 19900780, Uhr 28.08.2025 17:44
    true.

    /* SWI-Prolog 9.0.4 */
    ?- time(test).
    % 7,506,639 inferences, 0.363 CPU in 0.362 seconds
    (100% CPU, 20693560 Lips)
    true.

    /* Scryer Prolog 0.9.4-639 */
    ?- time(test).
    % CPU time: 0.365s, 7_517_613 inferences
    true.

    /* Trealla Prolog 2.82.23-3 */
    ?- time(test).
    % Time elapsed 0.868s, 11263917 Inferences, 12.983 MLips
    true.

    Bye

    P.S.: The code uses the hated mathematical mod/2,
    and not the cheaper rem/2 that CPUs usually have:

    test :-
        len(L, 1000),
        primes(L, _).

    primes([], 1).
    primes([J|L], J) :-
        primes(L, I),
        K is I+1,
        search(L, K, J).

    search(L, I, J) :-
        mem(X, L),
        I mod X =:= 0, !,
        K is I+1,
        search(L, K, J).
    search(_, I, I).

    mem(X, [X|_]).
    mem(X, [_|Y]) :-
        mem(X, Y).

    len([], 0) :- !.
    len([_|L], N) :-
        N > 0,
        M is N-1,
        len(L, M).

    Mild Shock schrieb:
    Hi,

    WebPL is already outdated I guess. It doesn't
    show the versions of the other Prolog systems
    it is using. While I had these results for

    the primes example in the WebPL playground:

    /* Trealla Prolog WASM */
    (23568.9ms)

    When I run the example here:

    https://php.energy/trealla.html

    I get better results:

    /* trealla-js 0.27.1 */

    ?- time(test).
    % Time elapsed 9.907s, 11263917 Inferences, 1.137 MLips

    Bye







    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Sun Oct 26 08:39:09 2025
    From Newsgroup: comp.lang.prolog

    Hi,

    Candidate Recommendation Draft - 30 September 2025
    https://www.w3.org/TR/webnn

    WebNN samples by Ningxin Hu, Intel, Shanghai https://github.com/webmachinelearning/webnn-samples

    Bye

    Mild Shock schrieb:
    Hi,

    It seems I am having problems pacing with
    all the new fancy toys. Wasn't able to really
    benchmark my NPU from a Desktop AI machine,

    picked the wrong driver. Need to try again.
    What worked was benchmarking Mobile AI machines.
    I just grabbed Geekbench AI and some devices:

    USA Fab, M4:

        sANN    hANN    qANN
    iPad CPU    4848    7947    6353
    iPad GPU    9752    11383    10051
    iPad NPU    4873    36544    *51634*

    China Fab, Snapdragon:

        sANN    hANN    qANN
    Redmi CPU    1044    950    1723
    Redmi GPU    480    905    737
    Redmi NNAPI    205    205    469
    Redmi QNN    226    226    *10221*

    Speed-Up via NPU is factor 10x. See the column
    qANN which means quantizised artificial neural
    networks, when NPU or QNN is picked.

    The mobile AI NPUs are optimized using
    mimimal amounts of energy, and minimal amounts
    of space squeezing (distilling) everything

    into INT8 and INT4.

    Bye
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mild Shock@janburse@fastmail.fm to comp.lang.prolog on Sun Oct 26 11:33:02 2025
    From Newsgroup: comp.lang.prolog

    Hi,

    Boris the Loris and Julio Di Egidio the Nazi Retard,
    are going for an afterwork beer. They are still
    highly confused by Fuzzy Testing:

    Star Trek - The 70's Disco Generation https://www.youtube.com/watch?v=505zvAvnreg

    The favorite hangout is Spock's Logic Dancefloor,
    which is known for its sharp unfuzzy wit. They
    have a chat with Data about Disco Math,

    the only Math which has no Fuzzy Logic in it.

    Bye

    Mild Shock schrieb:
    Hi,

    Candidate Recommendation Draft - 30 September 2025 https://www.w3.org/TR/webnn

    WebNN samples by Ningxin Hu, Intel, Shanghai https://github.com/webmachinelearning/webnn-samples

    Bye

    Mild Shock schrieb:
    Hi,

    It seems I am having problems pacing with
    all the new fancy toys. Wasn't able to really
    benchmark my NPU from a Desktop AI machine,

    picked the wrong driver. Need to try again.
    What worked was benchmarking Mobile AI machines.
    I just grabbed Geekbench AI and some devices:

    USA Fab, M4:

         sANN    hANN    qANN
    iPad CPU    4848    7947    6353
    iPad GPU    9752    11383    10051
    iPad NPU    4873    36544    *51634*

    China Fab, Snapdragon:

         sANN    hANN    qANN
    Redmi CPU    1044    950    1723
    Redmi GPU    480    905    737
    Redmi NNAPI    205    205    469
    Redmi QNN    226    226    *10221*

    Speed-Up via NPU is factor 10x. See the column
    qANN which means quantizised artificial neural
    networks, when NPU or QNN is picked.

    The mobile AI NPUs are optimized using
    mimimal amounts of energy, and minimal amounts
    of space squeezing (distilling) everything

    into INT8 and INT4.

    Bye

    --- Synchronet 3.21a-Linux NewsLink 1.2