• Re: More of my philosophy about tail latency and about technology andmore of my thoughts..

    From Angel@vvvvvvvvvvvvvvvvvvvv11111@mail.ee to comp.programming on Thu Apr 6 10:17:24 2023
    From Newsgroup: comp.programming

    weeeeeeeeeeeeee.atwebpages.com/a.php
    weeeeee.atwebpages.com/a.php
    youthmeetupplace.talk4fun.net/a.php youngamericangirl55555555.mywebcommunity.org/a.php youngafricanboy5555555555.getenjoyment.net/a.php youngafricangirl555555555.mywebcommunity.org/a.php
    On Thursday, November 24, 2022 at 5:40:33 PM UTC+2, Amine Moulay Ramdane wrote:
    Hello,



    More of my philosophy about tail latency and about technology and more of my thoughts..

    I am a white arab, and i think i am smart since i have also
    invented many scalable algorithms and algorithms..


    And now i will talk more about tail latency, so first i have to define more what is Tail latency, and here it is:

    Tail latency, also known as high-percentile latency, refers to high latencies that clients see fairly infrequently. ... There are many causes of tail latency in the world, including contention,
    garbage collection, packet loss, host failure, and weird stuff operating systems do in the background.

    So i think that wait-free queues are better than lock-free queues
    in Tail latency, and you can read the following article so that to notice it:

    Throughput vs Latency and Lock-Free vs Wait-Free

    http://concurrencyfreaks.blogspot.com/2016/08/throughput-vs-latency-and-lock-free-vs.html

    But i think that lock-free queues are still interesting and useful, so i invite you to read about my open source software project of a Lock-free bounded LIFO stack and an almost Lock-free bounded FIFO queue and an almost Lock-free bounded FIFO priority queue:

    Lock-free bounded LIFO stack and an almost Lock-free bounded FIFO queue and an almost Lock-free bounded FIFO priority queue version 1.12

    Author:

    Amine Moulay Ramdane is the inventor of the Lock-free bounded LIFO stack algorithm and the inventor of the almost Lock-free bounded FIFO queue algorithm and of the almost Lock-free bounded priority FIFO queue algorithm.

    Description:

    A Lock-free LIFO Stack algorithm and an almost(very nearly) Lock-free FIFO queue and an almost(very nearly) Lock-free priority queue, they are bounded (and the Lock-free LIFO stack is based on the almost Lock-free FIFO queue), and they don't have false sharing, and they retain the following advantages of Lock-free and Wait-free algorithms:

    - Signal Immunity: The C and C++Standards prohibit signals or
    asynchronous interrupts from calling many system routines such
    as malloc. If the interrupt calls malloc at the same time with
    an interrupted thread, that could cause deadlock. With my
    algorithms, there's no such problem anymore: Threads can
    freely interleave execution.

    - Priority Inversion Immunity: Priority inversion occurs when a
    low-priority thread holds a lock to a mutex needed by a high-
    priority thread. Such tricky conflicts must be resolved by the
    OS kernel.

    - Pre-emption tolerant and they are good at convoy-avoidance.

    - Starvation-free.

    - And for k number of threads in the system (of my almost Lock-
    free FIFO queue or my almost Lock-free FIFO priority queue or
    my almost Lock-free LIFO stack), my almost Lock-free FIFO
    queue or my almost Lock-free FIFO priority queue or my almost
    Lock-free LIFO stack have a system latency of O(q + s*sqrt(k))
    and an individual latency of O(k(q + s*sqrt(k)), but my
    algorithms are of the SCU(0,1) Class of Algorithms, so under
    scheduling conditions which approximate those found in
    commercial hardware architectures, there system latency is
    O(sqrt(k)) and there individual latency is O(k*sqrt(k)),
    read more below to understand more.

    I have invented this Lock-free LIFO stack algorithm that doesn't
    need ABA prevention and it doesn't need Hazard pointers and it is
    not complicated and it doesn't have false sharing, please look at its
    source code inside LockfreeStackBounded.pas inside the zip file.

    You can download it from my website here:

    https://sites.google.com/site/scalable68/lockfree-bounded-lifo-stack-and-fifo-queue

    An unbounded queue can hold infinite number of messages, while
    bounded - up to some predefined limit. If the limit is reached further enqueue operations fail. Note that array-based queue are always bounded.
    On first sight unbounded queues are more attractive (because they allow you to not care). But they are not. They are dangerous. What will happen if your queue will grow up to 10^6 messages? 10^7? 10^8? 10^9? What?
    It should not happen? So why you put an unbounded queue in the first place? In 95% of cases you need a bounded queue, because it will enforce what you think should happen, and will save you from bad things, it is the same for Stacks.

    And read the following paper:

    https://arxiv.org/pdf/1311.3200.pdf

    This paper suggests a simple solution to this problem. We show that, for a large class of lock- free algorithms, under scheduling conditions which approximate those found in commercial hardware architectures, lock-free algorithms behave as if they are wait-free. In other words, programmers can keep on designing simple lock-free algorithms instead of complex wait-free ones, and in practice, they will get wait-free
    progress. It says on the Analysis of the Class SCU(q, s):

    "Given an algorithm in SCU(q, s) on k correct processes under a uniform stochastic scheduler, the system latency is O(q + s*sqrt(k)), and the individual latency is O(k(q + s*sqrt(k))."

    My algorithms of an almost Lock-free bounded FIFO queue and of a Lock-free bounded priority FIFO queue and of a Lock-free bounded LIFO stack are of the SCU(q, s) Class of Algorithms, so they are powerful and they are starvation-free and for k number of threads they have a system latency of O(q + s*sqrt(k)) and an individual latency of O(k(q + s*sqrt(k)).

    The size of the queue and the stack must be passed to the constructor and it must be the power of 2.

    Typically, polling a lock-free queue works best when the queue nearly always has entries, a blocking queue works best when the queue is nearly always empty.

    The downside of blocking queues is latency, typically of the order of 2-20 uS, due to kernel signaling. This can be mitigated by designing the system so that the work done by the consumer threads on each queued item takes much longer than this interval.

    The downside of non-blocking queues is the waste of CPU and memory bandwidth while polling an empty queue. This can be mitigated by designing the system so that the queue is rarely empty.


    More of my philosophy about my inventions of SemaCondvar and SemaMonitor and about technology and more of my thoughts..

    I am a white arab, and i think i am smart since i have also
    invented many scalable algorithms and algorithms..


    I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, and my following new invention of SemaCondvar doesn't have spurious wakeups, and spurious wakeup happens when a thread wakes up from waiting on a condition variable that's been signaled, only to discover that the condition it was waiting for isn't satisfied. It's called spurious because the thread has seemingly been awakened for no reason, and my following inventions that are my SemaMonitor and SemaCondvar are fast pathed when the count of my SemaMonitor or my SemaCondvar is greater than 0, so in this case the wait() method stays on the user mode and it doesn't switch from user mode to kernel mode that costs around 1500 CPU cycles and that is expensive, the signal() method is also fast pathed when there is no item in the queue and count is less than MaximumCount, and here is my inventions:


    Author: Amine Moulay Ramdane.

    Description: SemaCondvar and SemaMonitor are new and portable synchronization objects , SemaCondvar combines some of the characteristics of a semaphore and all the characteristics of a condition variable and if you want the signal(s) to not be lost, you can configure it by passing a parameter to the constructor, SemaMonitor combines some of the characteristics of a semaphore and all the characteristics of an eventcount , and if you want the signal(s) to not be lost, you can configure it by passing a parameter to the constructor, they only use an event object and and a very fast and very efficient and portable lock.

    If you don't want the signal to be lost if the threads are not waiting, just pass True to the state argument of to the constructor, if you pass False to the state argument of the construtor, so the signals will be lost if the threads are not waiting.

    You will find the SemaMonitor and SemaCondvar classes inside the SemaMonitor.pas and SemaCondvar.pas files inside the zip file.

    When you set the first parameter of the constructor to true, the signal will not be lost if the threads are not waiting for the SemaCondvar or SemaMonitor objects, but when you set the first parameter of the construtor to false, if the threads are not waiting for the SemaCondvar or SemaMonitor the signal will be lost..

    Now you can pass the SemaCondvar's or Semamonitor's initialcount and SemaCondvar's or SemaMonitor's MaximumCount to the construtor, it's like the windows Semaphore`s InitialCount and the Semaphore's MaximumCount and it is where the signal(s) will be recorded.

    Like this:

    t:=TSemaMonitor.create(true,ctMLock,0,4);

    If you set it with ctMLock, it will use my scalable node based lock called MLock, you can set it to ctMutex to use a Mutex or to ctCriticalSection to use the TCriticalSection.


    And you can download them from my website here:

    https://sites.google.com/site/scalable68/light-weight-semacondvar-semamonitor

    And here:

    https://sites.google.com/site/scalable68/semacondvar-semamonitor

    And i invite you to read my following new thoughts about generics and higher-order functions etc. and notice that of course i am quickly thinking and writing my following thoughts:

    As you have just noticed , i have just talked more about Delphi
    and Freepascal compilers, read it below, but of course Modern Object Pascal of Delphi and Freepascal support generics, and you have to know how to be smart by implementing something that looks like Lambdas by using generics, and i invite you to look at how i have implemented
    it with Freepascal that doesn't support Lambdas, but you have to know that Freepascal will soon support Lambdas or Anonymous methods, look at my following software project and look how i have implemented something that look like Lambdas by using generics, so look at my following open source software project so that to notice it:

    Delphi and Freepascal Libraries that implement higher-order functions like Map, Reduce and Filter

    https://sites.google.com/site/scalable68/delphi-library-that-implements-higher-order-functions-like-map-reduce-and-filter

    About scalable higher-order functions like Map, Reduce and Filter..

    MapReduce is a pattern introduced in 2004 in the paper “MapReduce: Simplified Data Processing on Large Clusters,” by Jeffrey Dean and Sanjay Ghemawat (https://research.google.com/archive/mapreduce-osdi04.pdf), and I will soon implement parallel and scalable higher-order functions like Map, Reduce and Filter using Delphi and Freepascal, so it will be a very powerful library.

    And my open source software powerful project called EasyList for Delphi and Freepascal was updated to version 1.6..

    I have now documented all the methods.

    You can download and read about my powerful EasyList version 1.6 from my website here:

    https://sites.google.com/site/scalable68/easylist-for-delphi-and-freepascal

    More of my philosophy about Delphi and Freepascal and more of my thoughts..

    As you have just noticed, i have just talked about my PERT++ and
    my JNI wrapper that i have written in Modern Object Pascal
    of Delphi and Freepascal compilers, read it below, but why do you think i am also programming in Delphi and Freepascal ?

    Of course that Delphi and Freepascal compilers support modern Object Pascal, it is not only Pascal, but it is modern Object Pascal, i mean
    that modern Object Pascal of for example Delphi and Freepascal support object oriented programming and support Anonymous methods or typed Lambdas , so i think that it is a decent programming language, even if i know that the new C++ 20 supports generic Lambdas and templated Lambdas, but i think that Delphi will soon also support generic Lambdas, and in Delphi and Freepascal compilers there is no big runtime like in C# and such compilers, so you get small native executables in Delphi
    and Freepascal, and inline assembler is supported by both Delphi
    and Freepascal, and Lazarus the IDE of Freepascal and Delphi come
    both with one of the best GUI tools, and of course you can make .SO, .DLL, executables, etc. in both Delphi and Freepascal, and both Delphi
    and Freepascal compilers are Cross platform to Windows, Linux and Mac
    and Android etc. , and i think that modern Object Pascal of Delphi
    or Freepascal is more strongly typed than C++ , but less strongly typed than ADA programming language, but i think that modern Object Pascal of Delphi and Freepascal are not Strict as the programming language ADA and are not strict as the programming language Rust or the pure functional programming languages, so it can also be flexible and advantageous to not be this kind of strictness, and the compilation times of Delphi is extremely fast , and of course Freepascal supports the Delphi mode so that to be compatible with Delphi and i can go on and on, and it is why i am also programming in Delphi and Freepascal.

    And you can read about the last version 11.2 of Delphi from here:

    https://www.embarcadero.com/products/delphi

    And you can read about Freepascal and Lazarus from here:

    https://www.freepascal.org/

    https://www.lazarus-ide.org/


    More of my philosophy about Asynchronous programming and about the futures and about the ActiveObject and about technology and more of my thoughts..

    I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, i think from my new implementation of
    future below, you can notice that Asynchronous programming is not a simple task, since it can get too much complicated , since you can
    notice in my implementation below that if i make the starting of the thread of the future out of the constructor and if i make the passing of the parameter as a pointer to the future out of the constructor , it
    will get more complex to get the automaton of how to use
    and call the methods right and safe, so i think that there is
    still a problem with Asynchronous programming and it is that
    when you have many Asynchronous tasks or threads it can get
    really complex, and i think that it is the weakness of Asynchronous programming, and of course i am also speaking of the implementation
    of a sophisticated ActiveObject or a future or complex Asynchronous programming.


    More of my philosophy about my new updated implementation of a future and about the ActiveObject and about technology and more of my thoughts..


    I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, so i have just updated my implementation
    of a future, and now both the starting the thread of the future and the passing the parameter as a pointer to the future is made from the constructor so that to make safe the system of the automaton of the how to use and call the methods, and I have just added support for exceptions, so you have to know that programming with futures is asynchronous programming, but so that to be robust the future implementation has to deal correctly with "exceptions", so in my implementation of a future when an exception is raised inside the future you will receive the exception, so i have implemented two things: The HasException() method so that to detect the exception from inside the future, and the the exception and its address is returned as a string in the ExceptionStr property, and my implementation of a future does of course support passing parameters as a pointer to the future, also my implementation of a future works in Windows and Linux, and of course you can also use my following more sophisticated Threadpool engine with priorities as a sophisticated ActiveObject or such and pass the methods or functions and there parameters to it, here it is:

    Threadpool engine with priorities

    https://sites.google.com/site/scalable68/threadpool-engine-with-priorities

    And stay tuned since i will enhance more my above Threadpool engine with priorities.

    So you can download my new updated portable and efficient implementation of a future in Delphi and FreePascal version 1.32 from my website here:

    https://sites.google.com/site/scalable68/a-portable-and-efficient-implementation-of-a-future-in-delphi-and-freepascal


    And here is a new example program of how to use my implementation of a future in Delphi and Freepascal and notice that the interface has changed a little bit:


    --

    program TestFuture;

    uses system.SysUtils, system.Classes, Futures;

    type

    TTestFuture1 = class(TFuture)
    public
    function Compute(ptr:pointer): Variant; override;
    end;

    TTestFuture2 = class(TFuture)
    public
    function Compute(ptr:pointer): Variant; override;
    end;

    var obj1:TTestFuture1;
    obj2:TTestFuture2;
    a:variant;


    function TTestFuture1.Compute(ptr:pointer): Variant;
    begin

    raise Exception.Create('I raised an exception');

    end;

    function TTestFuture2.Compute(ptr:pointer): Variant;
    begin

    writeln(nativeint(ptr));
    result:='Hello world !';

    end;


    begin

    writeln;

    obj1:=TTestFuture1.create(pointer(12));

    if obj1.GetValue(a) then writeln(a)
    else if obj1.HasException then writeln(obj1.ExceptionStr);

    obj1.free;

    writeln;

    obj2:=TTestFuture2.create(pointer(12));


    if obj2.GetValue(a) then writeln(a);

    obj2.free;

    end.

    ---


    More of my philosophy about the 12 memory channels of
    the new AMD Epyc Genoa CPU and more of my thoughts..

    I am a white arab, and i think i am smart since i have also
    invented many scalable algorithms and algorithms..


    So as i am saying below, i think that so that to use 12 memory
    channels in parallel that supports it the new AMD Genoa CPU, the GMI-Wide mode must enlarge more and connects each CCD with more GMI links, so i think that it is what is doing AMD in its new 4 CCDs configuration, even with the costs optimized Epyc Genoa 9124 16 cores with 64 MB of L3 cache with 4 Core Complex Dies (CCDs), that costs around $1000 (Look at it here: https://www.tomshardware.com/reviews/amd-4th-gen-epyc-genoa-9654-9554-and-9374f-review-96-cores-zen-4-and-5nm-disrupt-the-data-center ), and as i am explaining more below that the Core Complex Dies (CCDs) connect to memory, I/O, and each other through the I/O Die (IOD) and each CCD connects to the IOD via a dedicated high-speed, or Global Memory Interconnect (GMI) link and the IOD also contains memory channels, PCIe Gen5 lanes, and Infinity Fabric links and all dies, or chiplets, interconnect with each other via AMD’s Infinity Fabric Technology, and of course this will permit my new software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well to scale on the 12 memory channels, read my following thoughts so that to understand more about it:

    More of my philosophy about the new Zen 4 AMD Ryzen™ 9 7950X and more of my thoughts..


    So i have just looked at the new Zen 4 AMD Ryzen™ 9 7950X CPU, and i invite you to look at it here:

    https://www.amd.com/en/products/cpu/amd-ryzen-9-7950x

    But notice carefully that the problem is with the number of supported memory channels, since it just support two memory channels, so it is not good, since for example my following Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well is scaling around 8X on my 16 cores Intel Xeon with 2 NUMA nodes and with 8 memory channels, but it will not scale correctly on the
    new Zen 4 AMD Ryzen™ 9 7950X CPU with just 2 memory channels since it is also memory-bound, and here is my Powerful Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well and i invite you to take carefully a look at it:

    https://sites.google.com/site/scalable68/scalable-parallel-c-conjugate-gradient-linear-system-solver-library

    So i advice you to buy an AMD Epyc CPU or an Intel Xeon CPU that supports 8 memory channels.

    ---


    And of course you can use the next Twelve DDR5 Memory Channels for Zen 4 AMD EPYC CPUs so that to scale more my above algorithm, and read about it here:

    https://www.tomshardware.com/news/amd-confirms-12-ddr5-memory-channels-on-genoa


    And here is the simulation program that uses the probabilistic mechanism that i have talked about and that prove to you that my algorithm of my Parallel C++ Conjugate Gradient Linear System Solver Library is scalable:

    If you look at my scalable parallel algorithm, it is dividing the each array of the matrix by 250 elements, and if you look carefully i am using two functions that consumes the greater part of all the CPU, it is the atsub() and asub(), and inside those functions i am using a probabilistic mechanism so that to render my algorithm scalable on NUMA architecture , and it also make it scale on the memory channels, what i am doing is scrambling the array parts using a probabilistic function and what i have noticed that this probabilistic mechanism is very efficient, to prove to you what i am saying , please look at the following simulation that i have done using a variable that contains the number of NUMA nodes, and what i have noticed that my simulation is giving almost a perfect scalability on NUMA architecture, for example let us give to the "NUMA_nodes" variable a value of 4, and to our array a value of 250, the simulation bellow will give a number of contention points of a quarter of the array, so if i am using 16 cores , in the worst case it will scale 4X throughput on NUMA architecture, because since i am using an array of 250 and there is a quarter of the array of contention points , so from the Amdahl's law this will give a scalability of almost 4X throughput on four NUMA nodes, and this will give almost a perfect scalability on more and more NUMA nodes, so my parallel algorithm is scalable on NUMA architecture and it also scale well on the memory channels,

    Here is the simulation that i have done, please run it and you will notice yourself that my parallel algorithm is scalable on NUMA architecture.

    Here it is:

    ---
    program test;

    uses math;

    var tab,tab1,tab2,tab3:array of integer; a,n1,k,i,n2,tmp,j,numa_nodes:integer;
    begin

    a:=250;
    Numa_nodes:=4;

    setlength(tab2,a);

    for i:=0 to a-1
    do
    begin

    tab2:=i mod numa_nodes;

    end;

    setlength(tab,a);

    randomize;

    for k:=0 to a-1
    do tab:=k;

    n2:=a-1;

    for k:=0 to a-1
    do
    begin
    n1:=random(n2);
    tmp:=tab;
    tab:=tab[n1];
    tab[n1]:=tmp;
    end;

    setlength(tab1,a);

    randomize;

    for k:=0 to a-1
    do tab1:=k;

    n2:=a-1;

    for k:=0 to a-1
    do
    begin
    n1:=random(n2);
    tmp:=tab1;
    tab1:=tab1[n1];
    tab1[n1]:=tmp;
    end;

    for i:=0 to a-1
    do
    if tab2[tab]=tab2[tab1] then
    begin
    inc(j);
    writeln('A contention at: ',i);

    end;

    writeln('Number of contention points: ',j);
    setlength(tab,0);
    setlength(tab1,0);
    setlength(tab2,0);
    end.
    ---


    More of my philosophy about 4 CCDs configuration of AMD Epyc Genoa CPU and more of my thoughts..


    I have just read the following new paper about AMD 4th Gen EPYC 9004 Series, so i invite you to read it carefully:

    https://hothardware.com/reviews/amd-genoa-data-center-cpu-launch


    So read carefully the 4 CCDs configuration, so i am understanding
    the following from it:


    I/O DIE is what is connected to the memory channels externally, and it says that SKUs north of 4 CCDs (e.g. 32 cores) use the GMI3-Narrow configuration with a single GMI link per CCD. With 4 CCD and lower SKUs, AMD can implement GMI-Wide mode which joins each CCD to the IOD with two GMI links. In this case, one link of each CCD populates GMI0 to GMI3 while the other link of each CCD populates GMI8 to GMI11 as diagramed above. This helps these parts better balance against I/O demands.

    So i think that that AMD implemented in his new 4 CCDs configuration the GMI-Wide mode which joins each CCD to the IOD with two GMI links, so that to be connected to the 8 memory channels externally and use them in parallel, so i think that the problem is solved, since i think that the cost optimized Epyc Genoa 9124 16 cores with 64 MB of L3 cache with 4 Core Complex Dies (CCDs), that costs around $1000 (Look at it here: https://www.tomshardware.com/reviews/amd-4th-gen-epyc-genoa-9654-9554-and-9374f-review-96-cores-zen-4-and-5nm-disrupt-the-data-center )
    can use fully the 8 memory channels in parallel, so it is a good Epyc Genoa processor to buy.

    And of course i invite you to read the following:

    More of my philosophy about the new Epyc Genoa and about Core Complex Die (CCD) and Core-complex(CCX) and more of my thoughts..

    I have just looked at the following paper from AMD and i invite
    you to look at it:

    https://developer.amd.com/wp-content/resources/56827-1-0.pdf

    And as you notice above that you have to look at how many
    Core Complex Dies (CCDs) you have, since it tells you more
    about how many connections of Infinity Fabric you have, and it is
    an important information, since look at the following article
    about the new AMD Epyc Genoa:

    https://wccftech.com/amd-epyc-genoa-cpu-lineup-specs-benchmarks-leak-up-to-2-6x-faster-than-intel-xeon/


    And you can read much more of my thoughts about technology in the following web links:


    https://groups.google.com/g/alt.culture.morocco/c/MosH5fY4g_Y

    And here:

    https://groups.google.com/g/soc.culture.usa/c/N_UxX3OECX4


    More of my philosophy about my PERT++ and about my JNI Wrapper for Delphi and FreePascal and about the new Java SE Development Kit 19.0.1 and more of my thoughts..

    I am a white arab, and i think i am smart since i have also
    invented many scalable algorithms and algorithms..

    I have just downloaded and installed the new Java SE Development Kit 19.0.1

    Here it is:

    https://www.oracle.com/java/technologies/javase/jdk19-archive-downloads.html


    And i have just tested my open source JNI Wrapper for Delphi and FreePascal with the Java SE Development Kit 19.0.1, and it is working perfectly, so you can download my JNI Wrapper for Delphi and FreePascal from my website here:

    https://sites.google.com/site/scalable68/jni-wrapper-for-delphi-and-freepascal


    And I have also tested my other open source software project called PERT++ with the new Java SE Development Kit 19.0.1, and it is working perfectly, so you can download my PERT++ from my website here:

    https://sites.google.com/site/scalable68/pert-an-enhanced-edition-of-the-program-or-project-evaluation-and-review-technique-that-includes-statistical-pert-in-delphi-and-freepascal

    And I have in my PERT++ provided you with two ways of how to estimate the critical path, first, by the way of CPM(Critical Path Method) that shows all the arcs of the estimate of the critical path, and the second way is by the way of the central limit theorem by using the inverse normal distribution function, and you have to provide my software project that is called PERT++ with three types of estimates that are the following:

    Optimistic time - generally the shortest time in which the activity
    can be completed. It is common practice to specify optimistic times
    to be three standard deviations from the mean so that there is
    approximately a 1% chance that the activity will be completed within
    the optimistic time.

    Most likely time - the completion time having the highest
    probability. Note that this time is different from the expected time.

    Pessimistic time - the longest time that an activity might require. Three standard deviations from the mean is commonly used for the pessimistic time.

    The central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough.

    How large is "large enough"?

    In practice, some statisticians say that a sample size of 30 is large enough when the population distribution is roughly bell-shaped. Others recommend a sample size of at least 40. But if the original population is distinctly not normal (e.g., is badly skewed, has multiple peaks, and/or has outliers), researchers like the sample size to be even larger. So i invite you to read my following thoughts about my software
    project that is called PERT++, and notice that the PERT networks are referred to by some researchers as "probabilistic activity networks" (PAN) because the duration of some or all of the arcs are independent random variables with known probability distribution functions, and have finite ranges. So PERT uses the central limit theorem (CLT) to find the expected project duration.

    So I have provided you in my PERT++ with the following functions:


    function NormalDistA (const Mean, StdDev, AVal, BVal: Extended): Single;

    function NormalDistP (const Mean, StdDev, AVal: Extended): Single;

    function InvNormalDist(const Mean, StdDev, PVal: Extended; const Less: Boolean): Extended;

    For NormalDistA() or NormalDistP(), you pass the best estimate of completion time to Mean, and you pass the critical path standard deviation to StdDev, and you will get the probability of the value Aval or the probability between the values of Aval and Bval.

    For InvNormalDist(), you pass the best estimate of completion time to Mean, and you pass the critical path standard deviation to StdDev, and you will get the length of the critical path of the probability PVal, and when Less is TRUE, you will obtain a cumulative distribution.

    So as you are noticing from my above thoughts that since PERT networks are referred to by some researchers as "probabilistic activity networks" (PAN) because the duration of some or all of the arcs are independent random variables with known probability distribution functions, and have finite ranges. So PERT uses the central limit theorem (CLT) to find the expected project duration. So then you have to use my above functions
    that are Normal distribution and inverse normal distribution functions, please look at my demo inside my zip file to understand better how i am doing it:

    You can download and read about my PERT++ from my website here:

    https://sites.google.com/site/scalable68/pert-an-enhanced-edition-of-the-program-or-project-evaluation-and-review-technique-that-includes-statistical-pert-in-delphi-and-freepascal



    Thank you,
    Amine Moulay Ramdane.
    --- Synchronet 3.20a-Linux NewsLink 1.114