Forum: War Ensemble BBS

signed vs unsigned and gcc -Wsign-conversion

From pozz@pozzugno@gmail.com to comp.lang.c on Mon Oct 20 17:03:58 2025

From Newsgroup: comp.lang.c

After many years programming in C language, I'm always unsure if it is
safer to use signed int or unsigned int.

Of course there are situations where signed or unsigned is clearly
better. For example, if the values could assume negative values, signed
int is the only solution. If you are manipulating single bits (&, |, ^,
<<, >>), unsigned ints are your friends.

What about other situations? For example, what do you use for the "i"
loop variable?

I recently activated gcc -Wsign-conversion option on a codebase and
received a lot of warnings. I started to fix them, usually expliciting casting. Is it the way or is it better to avoid the warning from the beginning, choosing the right signed or unsigned type?

--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Mon Oct 20 17:38:44 2025

From Newsgroup: comp.lang.c

Am 20.10.2025 um 17:03 schrieb pozz:

I recently activated gcc -Wsign-conversion option on a codebase and
received a lot of warnings. I started to fix them, usually expliciting casting. ...

As long as this doesn't fix malfunctions it's purely aesthetical.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Oct 20 19:43:37 2025

From Newsgroup: comp.lang.c

On Mon, 20 Oct 2025 17:03:58 +0200
pozz <pozzugno@gmail.com> wrote:

After many years programming in C language, I'm always unsure if it
is safer to use signed int or unsigned int.

Of course there are situations where signed or unsigned is clearly
better. For example, if the values could assume negative values,
signed int is the only solution. If you are manipulating single bits
(&, |, ^, <<, >>), unsigned ints are your friends.

What about other situations? For example, what do you use for the "i"
loop variable?

I'd just point out that small negative numbers are FAR more common than
numbers in range [2**31..2**32-1].
Now, make your own conclusion.

I recently activated gcc -Wsign-conversion option on a codebase and
received a lot of warnings. I started to fix them, usually
expliciting casting. Is it the way or is it better to avoid the
warning from the beginning, choosing the right signed or unsigned
type?

--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Mon Oct 20 19:07:07 2025

From Newsgroup: comp.lang.c

Am 20.10.2025 um 18:43 schrieb Michael S:

On Mon, 20 Oct 2025 17:03:58 +0200
pozz <pozzugno@gmail.com> wrote:

After many years programming in C language, I'm always unsure if it
is safer to use signed int or unsigned int.

Of course there are situations where signed or unsigned is clearly
better. For example, if the values could assume negative values,
signed int is the only solution. If you are manipulating single bits
(&, |, ^, <<, >>), unsigned ints are your friends.

What about other situations? For example, what do you use for the "i"
loop variable?

I'd just point out that small negative numbers are FAR more common than numbers in range [2**31..2**32-1].

So use a short instead of an int for a loop counter to make the code
run faster on a 68000-CPU ? ;-)

Now, make your own conclusion.

I recently activated gcc -Wsign-conversion option on a codebase and
received a lot of warnings. I started to fix them, usually
expliciting casting. Is it the way or is it better to avoid the
warning from the beginning, choosing the right signed or unsigned
type?

--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Mon Oct 20 18:01:34 2025

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

On Mon, 20 Oct 2025 17:03:58 +0200
pozz <pozzugno@gmail.com> wrote:

After many years programming in C language, I'm always unsure if it
is safer to use signed int or unsigned int.

Of course there are situations where signed or unsigned is clearly
better. For example, if the values could assume negative values,
signed int is the only solution. If you are manipulating single bits
(&, |, ^, <<, >>), unsigned ints are your friends.

What about other situations? For example, what do you use for the "i"
loop variable?

I'd just point out that small negative numbers are FAR more common than >numbers in range [2**31..2**32-1].
Now, make your own conclusion.

One might also point out that negative loop indicies are rare, and
thus ones conclusion may be that generally speaking loop indexes should
be unsigned.

--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Oct 20 20:03:34 2025

From Newsgroup: comp.lang.c

On 20/10/2025 17:03, pozz wrote:

After many years programming in C language, I'm always unsure if it is
safer to use signed int or unsigned int.

Of course there are situations where signed or unsigned is clearly
better. For example, if the values could assume negative values, signed
int is the only solution. If you are manipulating single bits (&, |, ^,
<<, >>), unsigned ints are your friends.

What about other situations? For example, what do you use for the "i"
loop variable?

I recently activated gcc -Wsign-conversion option on a codebase and
received a lot of warnings. I started to fix them, usually expliciting casting. Is it the way or is it better to avoid the warning from the beginning, choosing the right signed or unsigned type?

Signed and unsigned types are equally safe. If you are sure you are
within the ranges you know will work for the types you use, your code is
safe. If you are not sure, you are unsafe. It doesn't matter if an
overflow is undefined behaviour leading to a bug, or defined but
unexpected behaviour leading to a bug. (Of course, if you are using the defined wrapping behaviour of unsigned types in a way that you know is
correct for your program, then that is safe. All overflows for signed
types are unsafe, as are all unexpected overflows of unsigned types.)

Signed types can be more efficient in some circumstances, as they obey a number of useful mathematical rules that can be used for optimisation. Unsigned types - of the size of "unsigned int" or bigger - obey
different mathematical identities that can occasionally be useful but
often hinder optimisations.

Beware assumptions about wrapping of unsigned types smaller than
"unsigned int" - these promote to "int", and arithmetic is then done
with "int" with UB overflow, before possibly being converted back to the smaller unsigned integer type.

If your target has bigger registers than "int", sometimes code can be surprisingly inefficient for "unsigned int". If you have a 64-bit
target and have an expression like "array[i++];" in a loop, it can be significantly less efficient if "i" is "unsigned int" because the
compiler must assume that "i++" might wrap. If "i" is "int", or a
64-bit type (like "size_t" on such a target), there is no such issue.
(It is not uncommon that "int_fast32_t" or "uint_fast32_t" will be
faster than plain "int" or "unsigned int", because these will be 64-bit
on 64-bit targets.)

So very often, the efficient choice of type is "int" or "int_fastN_t"
for code that might be used on 64-bit platforms. Size-explicit types
are the best choice if your code has to run on smaller platforms, so
that you can be sure your types are big enough. But if the compiler can determine the range of an unsigned variable (such as in a "for" loop
where the start and end cases are known at compile-time), then unsigned
types will be just as efficient.

Comparisons between signed and unsigned types do turn up regularly, and sometimes casts are necessary if you want to enable warnings about these
and can't reasonably pick the signedness for the two sides of the
comparison. It's a question of style and preference whether you want to enable such warnings - most people do not, I think.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Kaz Kylheku@643-408-1753@kylheku.com to comp.lang.c on Mon Oct 20 20:09:14 2025

From Newsgroup: comp.lang.c

On 2025-10-20, David Brown <david.brown@hesbynett.no> wrote:

On 20/10/2025 17:03, pozz wrote:

After many years programming in C language, I'm always unsure if it is
safer to use signed int or unsigned int.

Of course there are situations where signed or unsigned is clearly
better. For example, if the values could assume negative values, signed
int is the only solution. If you are manipulating single bits (&, |, ^,
<<, >>), unsigned ints are your friends.

What about other situations? For example, what do you use for the "i"
loop variable?

I recently activated gcc -Wsign-conversion option on a codebase and
received a lot of warnings. I started to fix them, usually expliciting
casting. Is it the way or is it better to avoid the warning from the
beginning, choosing the right signed or unsigned type?

Signed and unsigned types are equally safe. If you are sure you are
within the ranges you know will work for the types you use, your code is safe. If you are not sure, you are unsafe.

Safe generally means that the language somehow protects from harm, not
that you protect yourself.

Correct code operating on correct inputs, using unsafe constructs,
is still called unsafe code.

However using unsigned types due to them being safe is often poorly
considered because if something goes wrong contrary to the programmer's
intent, there likely will be undefined behavior somewhere.

E.g. an array underflow using an unsigned index will not produce
integer overlow undefined behavior, but the access will go out of
bounds, which is undefined behavior.

There are bugs which play out without any undefined behavior:
the program calculates something contrary to its requirements,
but stays within the confines of the defined language.

The odds that by using unsigned numbers you will get only that type of
bug are low, and even if so, it is not a big comfort.

Signed numbers behave more like mathematical integers, in cases
when there is no overflow.

If a, b and c are small, non-negative quantities, you might be tempted
to make them unsigned. But if you do so, then you can no longer make
this derivation of inequalities:

a + b > c

c > a - b

Under the unsigned types, we cannot add -b to both sides of the
inequality, preserving its truth value, even if all the operands
are tiny numbers that fit into a single decimal digit!

If b happens to be greater than a, we get a huge value on the right
side that is now larger than c, not smaller.

Gratuitous use of unsigned types impairs our ability to
algebra to simplify code, due to the "cliff" at zero.

This is a nuanced topic where there isn't a one-type-fits-all answer,
but I gravitate toward signed; use of unsigned has to be justified in
some way.

When sizes are being calculated and they come from functions or
operators that produce size_t, then that tends to dictate unsigned.

If the quantities are large and can possibly overflow, there are
situations in which unsigned makes that simpler.

For instance if a and b are unsigned such that a + b can semantically
overflow (i.e. the result of the natural addition of a + b doesn't
fit into the type). It is simpler to detect: you can just do the
addition, and then test:

c = a + b;

when there is no overflow, it must be that (c >= a && c >= b)
so if either (c < a) or (c < b) is true, it overflowed.

This is significantly less verbose than a correct overflow test
for signed addition, which has to avoid doing the actual addition,
and has to be split into three cases: a and b have opposite
sign (always okay), a and b are both positive, and a and b are
both negative.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Mon Oct 20 14:48:40 2025

From Newsgroup: comp.lang.c

pozz <pozzugno@gmail.com> writes:

After many years programming in C language, I'm always unsure if it is
safer to use signed int or unsigned int.

Of course there are situations where signed or unsigned is clearly
better. For example, if the values could assume negative values,
signed int is the only solution. If you are manipulating single bits
(&, |, ^, <<, >>), unsigned ints are your friends.

What about other situations? For example, what do you use for the "i"
loop variable?

I usually use int (certainly for iterating over argc/argv), but
sometimes size_t. size_t is typically the most correct type for
representing sizes or counts of objects in memory, but int is a
bit easier to work with.

Both signed and unsigned types are (usually) used to model subranges
of the unbounded mathematical integers. If none of your operations
yields results outside the range of the type you're using, you're
safe -- but ensuring you don't stray outside that range can be easy
or difficult. If you're counting no more than a few thousand items,
int is fine. If you're counting bytes in a file or pennies in the
national debt, you have to think about just what range of values
you need to handle.

The thing about unsigned types is that they have a discontinuity at
0, which is much easier to run into than signed int's discontinuties
at INT_MIN and INT_MAX. Subtraction in particular can easily yield mathematically incorrect results for unsigned types (unless your
problem domain actuall calls for modular arithmetic).

If you start with a value of type size_t, say from sizeof or
strlen(), it's probably best to stick with size_t for any derived
values. My vague impression is that most things that should use
unsigned types should use size_t (there are of course plenty of
exceptions).

I recently activated gcc -Wsign-conversion option on a codebase and
received a lot of warnings. I started to fix them, usually expliciting casting. Is it the way or is it better to avoid the warning from the beginning, choosing the right signed or unsigned type?

Here's the description of -Wconversion :

‘-Wsign-conversion’
Warn for implicit conversions that may change the sign of an
integer value, like assigning a signed integer expression to an
unsigned integer variable. An explicit cast silences the warning.
In C, this option is enabled also by ‘-Wconversion’.

If you're converting between different types, it's often (but by no
means always) best to pick one type and use it consistently. I'm
suspicious of most casts; if I need a conversion, I find that C's
implicit conversions usually do the right thing.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.lang.c on Mon Oct 20 17:44:04 2025

From Newsgroup: comp.lang.c

On 10/20/2025 11:43 AM, Michael S wrote:

On Mon, 20 Oct 2025 17:03:58 +0200
pozz <pozzugno@gmail.com> wrote:

After many years programming in C language, I'm always unsure if it
is safer to use signed int or unsigned int.

Of course there are situations where signed or unsigned is clearly
better. For example, if the values could assume negative values,
signed int is the only solution. If you are manipulating single bits
(&, |, ^, <<, >>), unsigned ints are your friends.

What about other situations? For example, what do you use for the "i"
loop variable?

I'd just point out that small negative numbers are FAR more common than numbers in range [2**31..2**32-1].
Now, make your own conclusion.

Yeah, the distribution is lopsided, but I had usually noted that for
numeric values of n bits, by the time n reaches 9 or 10, one becomes
more likely to encounter a negative value than one in the range of n+1.

Whereas, below this point, one is more likely to encounter a positive
value larger than n, than to encounter a negative value.

So:
Positive values between 0 and 511: Very common;
Negative values:
Less common than values under +512
More common than values over 1024.

There is typically a large cluster of small positive numbers near 0,
with a very steep falloff as numbers get larger.
So, for example:
1 is most common;
2 is less common than 1;
3 is less common than 2;
...
Like, where the probability of seeing N is seemingly 1/(N+1).

Outside of this main cluster, which largely falls to "very little" by
512, there are a few big spikes up near a few locations:
n = 2^15 and 2^16 (Best covered by a 17-bit sign-extended value)
n = 2^31 and 2^32 (Best covered by a 33-bit sign-extended value)
n = 2^63

If expressing values as fixed-width binary fields, there is often sort
of a "no man's land" for values between 34 and 61 bits where one is
unlikely to find a whole lot of anything.

Contrast, between 18 and 30 bits, there are still a handful of values
spread across this range, usually in small counts (so, this space isn't
really as empty as the gap starting at 34 bits).

So, say, it is not all that useful to be able to represent a value
larger than 33 bits without going all the way to 64.

And, at this upper end, most of what one encounters tends to be things
like double-precision values and EIGHTCC style values.

And, statistically speaking, int32 is likely to hold the vast majority
of integer values one is likely to encounter.

A lot is likely to depend on what one is looking at (this is mostly for
a distribution of literal values in my compiler stats).

Ironically, because of the distribution, having things like some CPU instructions with only 5 or 6 bit fields for integer immediate values
isn't totally useless.

I recently activated gcc -Wsign-conversion option on a codebase and
received a lot of warnings. I started to fix them, usually
expliciting casting. Is it the way or is it better to avoid the
warning from the beginning, choosing the right signed or unsigned
type?

--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Mon Oct 20 23:35:35 2025

From Newsgroup: comp.lang.c

On Mon, 20 Oct 2025 17:03:58 +0200, pozz wrote:

What about other situations? For example, what do you use for the "i"
loop variable?

I use unsigned integers if negative values are not involved, where the
extra positive values might be useful. Here’s an example of the kind
of loop I might write:

unsigned int i;
bool found;
for (i = len(s);;)
{
if (i == 0)
{
found = false;
break;
} /*if*/
if (matches(s[i]))
{
found = true;
break;
} /*if*/
--i;
} /*for*/
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Mon Oct 20 23:36:40 2025

From Newsgroup: comp.lang.c

On Mon, 20 Oct 2025 19:43:37 +0300, Michael S wrote:

I'd just point out that small negative numbers are FAR more common than numbers in range [2**31..2**32-1].

Perhaps you mean *large* negative numbers?

-1 is a larger number than -1000000.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Mon Oct 20 23:38:08 2025

From Newsgroup: comp.lang.c

On Mon, 20 Oct 2025 23:35:35 -0000 (UTC), I wrote:

for (i = len(s);;)
{
...
} /*for*/

Make that

for (i = len(s);;)
{
if (i == 0)
{
found = false;
break;
} /*if*/
--i;
if (matches(s[i]))
{
found = true;
break;
} /*if*/
} /*for*/
--- Synchronet 3.21a-Linux NewsLink 1.2

From Kaz Kylheku@643-408-1753@kylheku.com to comp.lang.c on Mon Oct 20 23:52:43 2025

From Newsgroup: comp.lang.c

On 2025-10-20, Lawrence D’Oliveiro <ldo@nz.invalid> wrote:

On Mon, 20 Oct 2025 19:43:37 +0300, Michael S wrote:

I'd just point out that small negative numbers are FAR more common than
numbers in range [2**31..2**32-1].

Perhaps you mean *large* negative numbers?

Oh look, twit is out of his depth again.

There is definitely usage of "large" and "small" which implies
magnitude: the more bits are required to encode the number, the larger
it is.

This semantics is particularly emphasized/clarified when followed by the
word "negative". A "small negative" number is one closer to zero than a
"large negative".

If your bank account is -100,000.00, you have a large debt
compared to someone with -100.

There is little dispute that -100 is "greater than" -100,000.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Mon Oct 20 16:58:52 2025

From Newsgroup: comp.lang.c

Lawrence D’Oliveiro <ldo@nz.invalid> writes:

On Mon, 20 Oct 2025 19:43:37 +0300, Michael S wrote:

I'd just point out that small negative numbers are FAR more common than
numbers in range [2**31..2**32-1].

Perhaps you mean *large* negative numbers?

-1 is a larger number than -1000000.

He clearly meant larger in magnitude. -1 is *greater* than -1000000,
but smaller in magnitude -- and "-1" is clearly smaller/shorter than "-1000000".
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Mon Oct 20 17:13:03 2025

From Newsgroup: comp.lang.c

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]

The thing about unsigned types is that they have a discontinuity at
0, which is much easier to run into than signed int's discontinuties
at INT_MIN and INT_MAX. Subtraction in particular can easily yield mathematically incorrect results for unsigned types (unless your
problem domain actuall calls for modular arithmetic).

One specific footgun enabled by unsigned types involves loops that count
down to zero. This :

for (int i = N; i >= 0; i --) {
// ...
}

is well behaved, but this :

for (size_t i = N; i >= 0; i --) {
// ...
}

is an infinite loop. A compiler might warn that `i >= 0` is always
true. You can work around this by checking the condition inside
the body of the loop, before the decrement that causes a wraparound :

for (size_t i = N; /* i >= 0 */; i --) {
// ...
if (i == 0) break;
}

But if your loop counts up, this isn't an issue.

"You too may be a big hero
Once you've learned to count backwards to zero."
-- Tom Lehrer, "Wernher Von Braun"
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From rbowman@bowman@montana.com to comp.lang.c on Tue Oct 21 01:43:16 2025

From Newsgroup: comp.lang.c

On Mon, 20 Oct 2025 20:09:14 -0000 (UTC), Kaz Kylheku wrote:

This is a nuanced topic where there isn't a one-type-fits-all answer,
but I gravitate toward signed; use of unsigned has to be justified in
some way.

It's more an illustration of legacy designs that didn't stand up well but
a short was originally used in our code for object numbers. Gotta save
bytes. Who ever thought there would be more than 32k objects?

Changing it to unsigned short bought us time. Going to an int would have
had repercussions because of those bytes a diligent programmer saved back
in the '90s.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Kaz Kylheku@643-408-1753@kylheku.com to comp.lang.c on Tue Oct 21 01:45:34 2025

From Newsgroup: comp.lang.c

On 2025-10-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]

The thing about unsigned types is that they have a discontinuity at
0, which is much easier to run into than signed int's discontinuties
at INT_MIN and INT_MAX. Subtraction in particular can easily yield
mathematically incorrect results for unsigned types (unless your
problem domain actuall calls for modular arithmetic).

One specific footgun enabled by unsigned types involves loops that count
down to zero. This :

for (int i = N; i >= 0; i --) {
// ...
}

is well behaved, but this :

for (size_t i = N; i >= 0; i --) {
// ...
}

We just have to translate the signed "i >= 0" into unsigned.

One way is to just directly translate the two's complement semantics
is doing, pretending that the high bit of the value is a sign bit:

// if the two's-complement-like "sign bit" is zero ...

(SIZE_MAX & (SIZE_MAX >> 1) & i) == 0

In a downard counting loop, we can just stop when we wrap around
to the highest value, so we get to use most of the range:

for (size_t i = N; i != SIZE_MAX; --i) // or (size_t) -1

(Note: I like to write --i when it's downward, just as a style; it
comes from stacks: stack[i++] = push; pop = stack[--i].)

The troublesome case is when N needs to start at SIZE_MAX!

But that troublesome case exists when counting upward also,
signed or unsigned.

Signed:

// We must break the loop before undefined i++:

for (int i = 0; i <= INT_MAX; i++)

// Need a bottom-loop break on SIZE_MAX or else infinite loop:

for (size_t i = 0; i <= SIZE_MAX; i++)

This is where BartC will chime in with how languages benefit from
built-in idioms for ranged loops that solve these problems under the
hood. It's a valid argument.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
--- Synchronet 3.21a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Oct 21 04:27:11 2025

From Newsgroup: comp.lang.c

On 20.10.2025 20:01, Scott Lurndal wrote:

Michael S <already5chosen@yahoo.com> writes:

[...]

One might also point out that negative loop indicies are rare, and
thus ones conclusion may be that generally speaking loop indexes should
be unsigned.

Just note that loop-indices typically run over arrays. And while okay
in an (more common) ascending traversal it may become error-prone in a descending array-traversal loop. uint i; for (i=N-1; i>=0; i--) a[i];

Janis

--- Synchronet 3.21a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Tue Oct 21 03:52:40 2025

From Newsgroup: comp.lang.c

Kaz Kylheku <643-408-1753@kylheku.com> wrote:

On 2025-10-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]

The thing about unsigned types is that they have a discontinuity at
0, which is much easier to run into than signed int's discontinuties
at INT_MIN and INT_MAX. Subtraction in particular can easily yield
mathematically incorrect results for unsigned types (unless your
problem domain actuall calls for modular arithmetic).

One specific footgun enabled by unsigned types involves loops that count
down to zero. This :

for (int i = N; i >= 0; i --) {
// ...
}

is well behaved, but this :

for (size_t i = N; i >= 0; i --) {
// ...
}

We just have to translate the signed "i >= 0" into unsigned.

One way is to just directly translate the two's complement semantics
is doing, pretending that the high bit of the value is a sign bit:

// if the two's-complement-like "sign bit" is zero ...

(SIZE_MAX & (SIZE_MAX >> 1) & i) == 0

In a downard counting loop, we can just stop when we wrap around
to the highest value, so we get to use most of the range:

for (size_t i = N; i != SIZE_MAX; --i) // or (size_t) -1

(Note: I like to write --i when it's downward, just as a style; it
comes from stacks: stack[i++] = push; pop = stack[--i].)

The troublesome case is when N needs to start at SIZE_MAX!

But that troublesome case exists when counting upward also,
signed or unsigned.

Signed:

// We must break the loop before undefined i++:

for (int i = 0; i <= INT_MAX; i++)

// Need a bottom-loop break on SIZE_MAX or else infinite loop:

for (size_t i = 0; i <= SIZE_MAX; i++)

This is where BartC will chime in with how languages benefit from
built-in idioms for ranged loops that solve these problems under the
hood. It's a valid argument.

If you have variable upper and lower bounds which may cover the
whole range of the type, than AFAIK on normal machine architecture
there is significant loss of efficiency. C gives you loops
which are always efficient, but do not cover corner cases.
Other languages may give you low efficiency in cases where programmer
thinks that loop is optimal. Now, most of the time it is
better to aim at nicer sematics, possibly making code less
efficient. But C was born to allow "hand optimization", that
is writing efficient programs even if this means that programmer
must be more careful and spend more work writing a program. I think
that this is still important feature of C.
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Tue Oct 21 04:42:03 2025

From Newsgroup: comp.lang.c

pozz <pozzugno@gmail.com> wrote:

After many years programming in C language, I'm always unsure if it is
safer to use signed int or unsigned int.

Of course there are situations where signed or unsigned is clearly
better. For example, if the values could assume negative values, signed
int is the only solution. If you are manipulating single bits (&, |, ^,
<<, >>), unsigned ints are your friends.

What about other situations? For example, what do you use for the "i"
loop variable?

I recently activated gcc -Wsign-conversion option on a codebase and
received a lot of warnings. I started to fix them, usually expliciting casting. Is it the way or is it better to avoid the warning from the beginning, choosing the right signed or unsigned type?

I oscilated between various uses, but for PC programming I now
have strong preference for signed, with unsigned used when there
are special reasons. Basically, as long as you stay within range
signed agrees with mathematical integers which is normally wanted
semantics. Given availability of relatively cheap 64-bit integers
cases where you need to worry about going out of range tend to
be rather special. Similarly, cases where you want wraparound
are also rather special.

If you are going to "fix" warning by adding casts agreeing with default convertions, then IMO it makes little sense. Turning off warnings
(possibly by using pragma if warning if imposed on you by build
machinery) is equally effective. You may sometimes need casts
which are different than default convertions, those make
sense. But IIUC it is basically when you want to convert
unsigned to bigger unsigned type. IME promoting to signed usually
works fine, so casts of this sort should be rare.

BTW: I tried this warning on a piece of code which intensively
used unsigned types (for things like device registers, etc.).
It produced a bunch of warnings about changed value, but all were
false positives: the change was intended.

I would expect that in well written code need for convertions
different than default promotions it relatively rare. It makes
some sense to turn on warnings, inspect all cases and fix
ones which are wrong. But having warnings on and writing
a lot of casts means that you loose benefits of warnings:
if you make a mistake writing code with casts warning will
not help you finding the mistake.
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Oct 21 09:13:38 2025

From Newsgroup: comp.lang.c

On 20/10/2025 20:01, Scott Lurndal wrote:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 20 Oct 2025 17:03:58 +0200
pozz <pozzugno@gmail.com> wrote:

After many years programming in C language, I'm always unsure if it
is safer to use signed int or unsigned int.

Of course there are situations where signed or unsigned is clearly
better. For example, if the values could assume negative values,
signed int is the only solution. If you are manipulating single bits
(&, |, ^, <<, >>), unsigned ints are your friends.

What about other situations? For example, what do you use for the "i"
loop variable?

I'd just point out that small negative numbers are FAR more common than
numbers in range [2**31..2**32-1].
Now, make your own conclusion.

One might also point out that negative loop indicies are rare, and
thus ones conclusion may be that generally speaking loop indexes should
be unsigned.

Loop indicies greater than 2 ^ 31 are equally rare. (Loops of between 2
^ 15 and 2 ^ 16 - 1 on 8-bit and 16-bit targets are less unrealistic.)

Loops where you actually want the index counter to wrap are very rare,
except perhaps when your loop is shifting the index each count (and then
you are firmly in unsigned type territory).

So in general, you are dealing with numbers that will fit comfortably
within the ranges of both "int" and "unsigned int". If there is no
other deciding factor, pick the one with the fewest unnecessary
additional specifications, since that gives the compiler the most
flexibility - "int".

--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Tue Oct 21 09:57:45 2025

From Newsgroup: comp.lang.c

Am 21.10.2025 um 01:38 schrieb Lawrence D’Oliveiro:

On Mon, 20 Oct 2025 23:35:35 -0000 (UTC), I wrote:

for (i = len(s);;)
{
...
} /*for*/

Make that

for (i = len(s);;)
{
if (i == 0)
{
found = false;
break;
} /*if*/
--i;
if (matches(s[i]))
{
found = true;
break;
} /*if*/
} /*for*/

Your coding style doesn't match any common style and is really sick.
No wonder you chose C.

--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Oct 21 12:42:20 2025

From Newsgroup: comp.lang.c

On 20/10/2025 22:09, Kaz Kylheku wrote:

On 2025-10-20, David Brown <david.brown@hesbynett.no> wrote:

On 20/10/2025 17:03, pozz wrote:

After many years programming in C language, I'm always unsure if it is
safer to use signed int or unsigned int.

Of course there are situations where signed or unsigned is clearly
better. For example, if the values could assume negative values, signed
int is the only solution. If you are manipulating single bits (&, |, ^,
<<, >>), unsigned ints are your friends.

What about other situations? For example, what do you use for the "i"
loop variable?

I recently activated gcc -Wsign-conversion option on a codebase and
received a lot of warnings. I started to fix them, usually expliciting
casting. Is it the way or is it better to avoid the warning from the
beginning, choosing the right signed or unsigned type?

Signed and unsigned types are equally safe. If you are sure you are
within the ranges you know will work for the types you use, your code is
safe. If you are not sure, you are unsafe.

Safe generally means that the language somehow protects from harm, not
that you protect yourself.

No - "safe" means lower risk of harm, at least in /my/ book. It doesn't matter if it is something /you/ do, or something the language does, or something the tools do. (Ideally, of course, you want these all working together.)

Correct code operating on correct inputs, using unsafe constructs,
is still called unsafe code.

It will be called "unsafe code" by Rust salesmen, but not by software developers who work on safe code. "Safe code" is code used safely, it
is not an inherent property of code constructs, types, or languages.
All code constructs are unsafe if used incorrectly, while clear and well-understood code constructs are safe if used correctly. (Of course
some languages, tools, and programming practices make it easier to write
safe code, or harder to write unsafe code, or easier to tell the
difference.)

However using unsigned types due to them being safe is often poorly considered because if something goes wrong contrary to the programmer's intent, there likely will be undefined behavior somewhere.

Exactly. Unsigned types are not somehow "safer" than signed types, just because signed types have UB on overflow. Don't overflow your signed
types, then you have no UB. And if you overflow your unsigned types
without that being an intentional and understood part of your code, you
will at the very least get unexpected behaviour - a bug - and just like
UB, there are no limits to how bad that can get.

E.g. an array underflow using an unsigned index will not produce
integer overlow undefined behavior, but the access will go out of
bounds, which is undefined behavior.

Yes - bugs of all sorts often lead to UB sooner or later, even if the behaviour of the code is defined by the C language standards up to that
point.

There are bugs which play out without any undefined behavior:
the program calculates something contrary to its requirements,
but stays within the confines of the defined language.

The odds that by using unsigned numbers you will get only that type of
bug are low, and even if so, it is not a big comfort.

Signed numbers behave more like mathematical integers, in cases
when there is no overflow.

If a, b and c are small, non-negative quantities, you might be tempted
to make them unsigned. But if you do so, then you can no longer make
this derivation of inequalities:

a + b > c

c > a - b

Under the unsigned types, we cannot add -b to both sides of the
inequality, preserving its truth value, even if all the operands
are tiny numbers that fit into a single decimal digit!

If b happens to be greater than a, we get a huge value on the right
side that is now larger than c, not smaller.

Gratuitous use of unsigned types impairs our ability to
algebra to simplify code, due to the "cliff" at zero.

Yes.

This is a nuanced topic where there isn't a one-type-fits-all answer,
but I gravitate toward signed; use of unsigned has to be justified in
some way.

When sizes are being calculated and they come from functions or
operators that produce size_t, then that tends to dictate unsigned.

If the quantities are large and can possibly overflow, there are
situations in which unsigned makes that simpler.

But normally, use of a bigger integer type makes the code significantly simpler and easier to get correct - and often more efficient.

For instance if a and b are unsigned such that a + b can semantically overflow (i.e. the result of the natural addition of a + b doesn't
fit into the type). It is simpler to detect: you can just do the
addition, and then test:

c = a + b;

when there is no overflow, it must be that (c >= a && c >= b)
so if either (c < a) or (c < b) is true, it overflowed.

Or you use a bigger type and check simply and clearly for a result that
is too big for your needs. Far too often, programmers go through
reasoning like this and figure out what they see as "optimal" source
code, then leave it in the source with no explanation as to what is
going on. Aim to write code that does what it looks like it does - such
as adds the two values correctly giving the mathematically correct
result, then checks the range. Otherwise, good luck to the maintainer
that changes the expression to "c = a + b + 1;".

Or, with C23, use chk_add(). (Many compilers have extensions with the
same effect, like __builtin_add_overflow, if you are happy using them.)

This is significantly less verbose than a correct overflow test
for signed addition, which has to avoid doing the actual addition,
and has to be split into three cases: a and b have opposite
sign (always okay), a and b are both positive, and a and b are
both negative.

Sure. But it is still significantly worse than using "long long int"
(or "int_least64_t" if you prefer), or using ckd_add().

There are things in C23 that are somewhat controversial, but I think the checked integer operations are clearly a good standardisation of
existing compiler-specific practice.

--- Synchronet 3.21a-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Tue Oct 21 14:44:46 2025

From Newsgroup: comp.lang.c

On 2025-10-21 06:42, David Brown wrote:

On 20/10/2025 22:09, Kaz Kylheku wrote:

On 2025-10-20, David Brown <david.brown@hesbynett.no> wrote:

On 20/10/2025 17:03, pozz wrote:

However using unsigned types due to them being safe is often poorly

considered because if something goes wrong contrary to the programmer's
intent, there likely will be undefined behavior somewhere.

Exactly. Unsigned types are not somehow "safer" than signed types, just because signed types have UB on overflow. Don't overflow your signed
types, then you have no UB. And if you overflow your unsigned types
without that being an intentional and understood part of your code, you
will at the very least get unexpected behaviour - a bug - and just like
UB, there are no limits to how bad that can get.

No, there are limits on unexpected behavior: being unexpected, you might
not know what they are, but it is still the case that the behavior
starts out with having nothing more than an expression with an
unexpected but valid value. That's pretty bad, and your code might make
it worse, for example by promoting the unexpected value into undefined behavior. However, unless and until it actually does so, the behavior is somewhat more restricted than UB.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Tue Oct 21 19:45:22 2025

From Newsgroup: comp.lang.c

On Tue, 21 Oct 2025 09:57:45 +0200, Bonita Montero wrote:

Am 21.10.2025 um 01:38 schrieb Lawrence D’Oliveiro:

for (i = len(s);;)
{
if (i == 0)
{
found = false;
break;
} /*if*/
--i;
if (matches(s[i]))
{
found = true;
break;
} /*if*/
} /*for*/

Your coding style doesn't match any common style ...

You might want to write

for (i = len(s) - 1; i >= 0; --i)

wouldn’t you?

It’s all about the correctness of the code.
--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Oct 21 22:56:58 2025

From Newsgroup: comp.lang.c

On 21/10/2025 20:44, James Kuyper wrote:

On 2025-10-21 06:42, David Brown wrote:

On 20/10/2025 22:09, Kaz Kylheku wrote:

On 2025-10-20, David Brown <david.brown@hesbynett.no> wrote:

On 20/10/2025 17:03, pozz wrote:

However using unsigned types due to them being safe is often poorly

considered because if something goes wrong contrary to the programmer's
intent, there likely will be undefined behavior somewhere.

Exactly. Unsigned types are not somehow "safer" than signed types, just
because signed types have UB on overflow. Don't overflow your signed
types, then you have no UB. And if you overflow your unsigned types
without that being an intentional and understood part of your code, you
will at the very least get unexpected behaviour - a bug - and just like
UB, there are no limits to how bad that can get.

No, there are limits on unexpected behavior: being unexpected, you might
not know what they are, but it is still the case that the behavior
starts out with having nothing more than an expression with an
unexpected but valid value. That's pretty bad, and your code might make
it worse, for example by promoting the unexpected value into undefined behavior. However, unless and until it actually does so, the behavior is somewhat more restricted than UB.

The effect of "unexpected behaviour" - something that has well-defined behaviour according to the C standard or the implementation, but was not
what the programmer had intended or expected - is clear at the point it happens. Your unsigned arithmetic overflows in a defined and specified manner. But the knock-on effects are, in general, unpredictable - there
are no specific limits for how bad things can get. It is not unlikely
that you'll end up with "real" UB. In theory, real UB can lead to
launching of nasal daemons, while unexpected behaviour, if it does not
lead to real UB, cannot launch nasal daemons unless you have nasal
daemon launch procedures in your program. In practice, real UB can more
often lead to a quick crash and perhaps "nicer" bad behaviour (via OS
memory protections and the like), while the unexpected behaviour can
continue on, quietly causing future havoc and problems that are harder
to find and debug. Either way, I think we can agree that bad things can happen!

--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Microbot
  Mon Nov 3 08:29:45 2025
  from Moore, Ok via Telnet
- Microbot
  Sun Nov 2 10:23:48 2025
  from Moore, Ok via Telnet
- Microbot
  Sat Nov 1 11:39:40 2025
  from Moore, Ok via Telnet
- Microbot
  Fri Oct 31 13:29:16 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,075
Nodes:	10 (0 / 10)
Uptime:	114:07:15
Calls:	13,799
Calls today:	1
Files:	186,990
D/L today:	5,531 files (1,594M bytes)
Messages:	2,439,079

signed vs unsigned and gcc -Wsign-conversion

Who's Online

Recent Visitors

System Info