Michael S <already5chosen@yahoo.com> writes:
On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.
Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).
For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.
I believe it is possible to prove the strict FIFO algorithm is
O(N) for an N x N pixel field, but I haven't tried to do so in
any rigorous way, nor do I know what the constant is. It does
seem to be larger than 2.
On Sun, 24 Mar 2024 10:24:45 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.
Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).
For my test cases the FIFO depth of your algorithm never exceeds
min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.
I believe it is possible to prove the strict FIFO algorithm is
O(N) for an N x N pixel field, but I haven't tried to do so in
any rigorous way, nor do I know what the constant is. It does
seem to be larger than 2.
It seems that in worst case the strict FIFO algorithm is the same as
the rest of them, i.e. O(NN) where NN is the number of re-colored
points. Below is an example of the shape for which I measured memory consumption for 3840x2160 image almost exactly 4x as much as for
1920x1080.
[code to generate fractal tree pattern]
Michael S <already5chosen@yahoo.com> writes:
On Sun, 24 Mar 2024 10:24:45 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Generally, I like your algorithm.
It was surprising for me that queue can work better than stack,
my intuition suggested otherwise, but facts are facts.
Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).
For my test cases the FIFO depth of your algorithm never exceeds
min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.
I believe it is possible to prove the strict FIFO algorithm is
O(N) for an N x N pixel field, but I haven't tried to do so in
any rigorous way, nor do I know what the constant is. It does
seem to be larger than 2.
Before I do anything else I should correct a bug in my earlier
FIFO algorithm. The initialization of the variable jx should
read
Index const jx = used*3 < open ? k : j+open/3 &m;
rather than what it used to. (The type may have changed but that
is incidental; what matters is the value of the initializing
expression.) I don't know what I was thinking when I wrote the
previous version, it's just completely wrong.
It seems that in worst case the strict FIFO algorithm is the same as
the rest of them, i.e. O(NN) where NN is the number of re-colored
points. Below is an example of the shape for which I measured
memory consumption for 3840x2160 image almost exactly 4x as much as
for 1920x1080.
I agree, the empirical evidence here and in my own tests is quite
compelling.
That said, the constant factor for the FIFO algorithm is lower
than the stack-based algorithms, even taking into account the
difference in sizes for queue and stack elements. Moreover cases
where FIFO algorithms are O( NxN ) are unusual and sparse,
whereas the stack-based algorithms tend to use a lot of memory
in lots of common and routine cases. On the average FIFO
algorithms typically use a lot less memory (or so I conjecture).
[code to generate fractal tree pattern]
Thank you for this. I incorporated it into my set of test
patterns more or less as soon as it was posted.
Now that I have taken some time to play around with different
algorithms and have been more systematic in doing speed
comparisons between different algorithms, on different patterns,
and with a good range of sizes, I have some general thoughts
to offer.
Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.
I've been playing around with a more elaborate, mostly FIFO
method, in hopes of getting something that offers the best
of both worlds. The results so far are encouraging, but a
fair amount of tuning has been necessary (and perhaps more
still is), and comparisons have been done on just the one
test server I have available. So I don't know how well it
would hold up on other hardware, including especially more
recent hardware. Under these circumstances I feel it is
premature to post actual code, especially since the code
is still in flux.
This topic has been more interesting that I was expecting, and
also more challenging.
I have a strong rule against writing
functions more than about 60 lines long. For the problem of
writing an acceptably quick flood-fill algorithm, I think it would
at the very least be a lot of work to write code to do that while
still observing a limit on function length of even 100 lines, let
alone 60.
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Before I do anything else I should correct a bug in my earlier
FIFO algorithm. The initialization of the variable jx should
read
Index const jx = used*3 < open ? k : j+open/3 &m;
I lost track, sorry. I can not find your code that contains line
similar to this.
Can you point to specific post?
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
It seems that in worst case the strict FIFO algorithm is the same
as the rest of them, i.e. O(NN) where NN is the number of
re-colored points. Below is an example of the shape for which I
measured memory consumption for 3840x2160 image almost exactly 4x
as much as for 1920x1080.
I agree, the empirical evidence here and in my own tests is quite
compelling.
BTW, I am no longer agree with myself about "the rest of them".
By now, I know at least one method that is O(W*log(H)). It is even
quite fast for majority of my test shapes. Unfortunately, [in its
current form] it is abysmally slow (100x) for minority of tests.
[In it's current form] it has other disadvantages as well like
consuming non-trivial amount of memory when handling small spot in the
big image. But that can be improved. I am less sure that worst-case
speed can be improved enough to make it generally acceptable.
I think, I said enough for you to figure out a general principle of
this algorithm. I don't want to post code here before I try few improvements.
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.
Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even on
Intel Skylake which generally serves as equalizer between different algorithms, the speed advantage of 2x2 stack is significant.
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
This topic has been more interesting that I was expecting, and
also more challenging.
That's not the first time in my practice where problems with
simple formulation begots interesting challenges.
Didn't Donald Knuth wrote 300 or 400 pages about sorting and
still ended up quite far away from exhausting the topic?
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
I have a strong rule against writing
functions more than about 60 lines long. For the problem of
writing an acceptably quick flood-fill algorithm, I think it would
at the very least be a lot of work to write code to do that while
still observing a limit on function length of even 100 lines, let
alone 60.
So why not break it down to smaller pieces ?
Michael S <already5chosen@yahoo.com> writes:
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.
Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even on
Intel Skylake which generally serves as equalizer between different algorithms, the speed advantage of 2x2 stack is significant.
This comment makes me wonder if I should post my timing results.
Maybe I will (and including an appropriate disclaimer).
I do timings over these sizes:
25 x 19
19 x 25
200 x 200
1280 x 760
760 x 1280
1920 x 1080
1080 x 1920
3840 x 2160
2160 x 3840
4096 x 4096
38400 x 21600
21600 x 38400
32767 x 32767
32768 x 32768
with these patterns:
fractal
slalom
rotated slalom
horizontal snake and vertical snake
cross in cross
donut aka ellipse with hole
entire field starting from center
If you have other patterns to suggest that would be great,
I can try to incorporate them (especially if there is
code to generate the pattern).
Patterns are allowed to include a nominal start point,
otherwise I can make an arbitrary choice and assign one.
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.
Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even
on Intel Skylake which generally serves as equalizer between
different algorithms, the speed advantage of 2x2 stack is
significant.
On Thu, 11 Apr 2024 22:09:51 -0700[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
I do timings over these sizes:
25 x 19
19 x 25
200 x 200
1280 x 760
760 x 1280
1920 x 1080
1080 x 1920
3840 x 2160
2160 x 3840
4096 x 4096
38400 x 21600
21600 x 38400
32767 x 32767
32768 x 32768
I didn't went that far up (ended at 4K)
and I only test landscape sizes. May be, I'd add portrait option
to see anisotropic behaviors.
with these patterns:
fractal
slalom
rotated slalom
horizontal snake and vertical snake
cross in cross
donut aka ellipse with hole
entire field starting from center
If you have other patterns to suggest that would be great,
I can try to incorporate them (especially if there is
code to generate the pattern).
Patterns are allowed to include a nominal start point,
otherwise I can make an arbitrary choice and assign one.
My suit is about the same with following exceptions:
1. I didn't add donut yet
2. + 3 greeds with cell size 2, 3 and 4
3. + fractal tree
4. + entire field starting from corner
It seems, neither of us tests the cases in which linear dimensions
of the shape are much smaller than those of the field.
static void make_grid(
unsigned char *image,
int width, int height,
unsigned char background_c,
unsigned char pen_c, int cell_sz)
{
for (int y = 0; y < height; ++y) {
unsigned char* p = &image[y*width];
if (y % cell_sz == 0) {
memset(p, pen_c, width);
} else {
for (int x = 0; x < width; ++x)
p[x] = x % cell_sz ? background_c : pen_c;
}
}
}
Michael S <already5chosen@yahoo.com> writes:
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.
Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even
on Intel Skylake which generally serves as equalizer between
different algorithms, the speed advantage of 2x2 stack is
significant.
I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.
On Fri, 12 Apr 2024 11:59:25 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.
This particular server turned off right now.
Hopefully, next Monday I would be able to test on it.
It would help if in the mean time you point me to specific post
with code.
Michael S <already5chosen@yahoo.com> writes:
On Fri, 12 Apr 2024 11:59:25 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.
This particular server turned off right now.
Hopefully, next Monday I would be able to test on it.
It would help if in the mean time you point me to specific post
with code.
Does this help? Message-ID: <8634ru3ofo.fsf@linuxsc.com>
Michael S <already5chosen@yahoo.com> writes:
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.
Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even
on Intel Skylake which generally serves as equalizer between
different algorithms, the speed advantage of 2x2 stack is
significant.
I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.
Finally found the time for speed measurements. [...]
Michael S <already5chosen@yahoo.com> writes:
[...]
Finally found the time for speed measurements. [...]
I got these. Thank you.
The format used didn't make it easy to do any automated
processing. I was able to get around that, although it
would have been nicer if that had been easier.
The results you got are radically different than my own,
to the point where I wonder if there is something else
going on.
Considering that, since I now have no way of doing any
useful measuring, it seems there is little point in any
further development or investigation on my part. It's
been fun, even if ultimately inconclusive.
On Wed, 17 Apr 2024 10:47:25 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Finally found the time for speed measurements. [...]
I got these. Thank you.
The format used didn't make it easy to do any automated
processing. I was able to get around that, although it
would have been nicer if that had been easier.
The results you got are radically different than my own,
to the point where I wonder if there is something else
going on.
What are your absolute result?
Are they much faster, much slower or similar to mine?
Also it would help if you find out characteristics of your
test hardware.
Considering that, since I now have no way of doing any
useful measuring, it seems there is little point in any
further development or investigation on my part. It's
been fun, even if ultimately inconclusive.
I am still interested in combination of speed that does
not suck with O(N) worst-case memory footprint.
I already have couple of variants of the former,
but so
far they are all unreasonably slow - ~5 times slower than
the best.
Michael S <already5chosen@yahoo.com> writes:
On Wed, 17 Apr 2024 10:47:25 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Finally found the time for speed measurements. [...]
I got these. Thank you.
The format used didn't make it easy to do any automated
processing. I was able to get around that, although it
would have been nicer if that had been easier.
The results you got are radically different than my own,
to the point where I wonder if there is something else
going on.
What are your absolute result?
Are they much faster, much slower or similar to mine?
Also it would help if you find out characteristics of your
test hardware.
I think trying to look at those wouldn't tell me anything
helpful. Too many unknowns. And still no way to test or
measure any changes to the various algorithms.
Considering that, since I now have no way of doing any
useful measuring, it seems there is little point in any
further development or investigation on my part. It's
been fun, even if ultimately inconclusive.
I am still interested in combination of speed that does
not suck with O(N) worst-case memory footprint.
I already have couple of variants of the former,
Did you mean you some algorithms whose worst case memory
behavior is strictly less than O( total number of pixels )?
I think it would be helpful to adopt a standard terminology
where the pixel field is of size M x N, otherwise I'm not
sure what O(N) refers to.
but so
far they are all unreasonably slow - ~5 times slower than
the best.
I'm no longer working on the problem but I'm interested to
hear what you come up with.
On Fri, 19 Apr 2024 14:59:20 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Did you mean you some algorithms whose worst case memory
behavior is strictly less than O( total number of pixels )?
I think it would be helpful to adopt a standard terminology
where the pixel field is of size M x N, otherwise I'm not
sure what O(N) refers to.
No, I mean O(max(M,N)) plus possibly some logarithmic component that
loses significance when images grow bigger.
More so, if bounding rectangle of the shape is A x B then I'd like
memory requirements to be O(max(A,B)), but so far it does not appear
to be possible, or at least not possible without significant
complications and further slowdown. So, as an intermediate goal I am
willing to accept that allocation would be O(max(M,N)). but amount of
touched memory is O(max(A,B)).
but so
far they are all unreasonably slow - ~5 times slower than
the best.
I'm no longer working on the problem but I'm interested to
hear what you come up with.
old_color, s->new_color, src_todo, exp_todo, 1);if (!ret)
old_color, s->new_color, src_todo, exp_todo, 1);if (!ret)
On Sat, 20 Apr 2024 21:10:23 +0300
Michael S <already5chosen@yahoo.com> wrote:
On Fri, 19 Apr 2024 14:59:20 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Did you mean you some algorithms whose worst case memory
behavior is strictly less than O( total number of pixels )?
I think it would be helpful to adopt a standard terminology
where the pixel field is of size M x N, otherwise I'm not
sure what O(N) refers to.
No, I mean O(max(M,N)) plus possibly some logarithmic component that
loses significance when images grow bigger.
More so, if bounding rectangle of the shape is A x B then I'd like
memory requirements to be O(max(A,B)), but so far it does not appear
to be possible, or at least not possible without significant
complications and further slowdown. So, as an intermediate goal I am willing to accept that allocation would be O(max(M,N)). but amount
of touched memory is O(max(A,B)).
but so
far they are all unreasonably slow - ~5 times slower than
the best.
I'm no longer working on the problem but I'm interested to
hear what you come up with.
Here is what I had in mind.
I tried to optimize as little as I can in order to make it as simple
as I can. Unfortunately, I am not particularly good at it, so, code
still contains few unnecessary "tricks" that make understanding a
little harder.
The code uses VLA and recursion for the same purpose of making it less tricky.
If desired, the memory footprint could be easily reduced by factor of
8 through use of packed bit arrays instead arrays of _Bool.
Even in this relatively crude form for majority of shapes this code is blazingly fast.
Unfortunately, in the worst case (both 'slalom' shapes) an execution
time is O(max(A,B)**3) which makes it unfit as general-purpose
routine. At the moment I don't see a solution for this problem.
Overall, it's probably a dead end.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 919 |
Nodes: | 10 (1 / 9) |
Uptime: | 52:49:41 |
Calls: | 12,183 |
Calls today: | 3 |
Files: | 186,524 |
Messages: | 2,236,267 |