Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I like the behavior of the compiler here. There is no guarantee that a and b are next to each other in memory. That's why the comparison fails, the alternative makes is runtime/compiler/optimization level dependent which would be a total mess.

As usual with those C bashing articles you won't run into trouble if you don't try very hard to write contrived code. I mean, the moment you see:

     int *q = &b + 1;
on your screen alarm bells should go off. Doing pointer arithmetic on something that is not an array is asking for trouble. If the standard should be amended in any way it should be undefined behavior right away you do pointer arithmetic on non-array objects.


> I like the behavior of the compiler here. There is no guarantee that a and b are next to each other in memory. That's why the comparison fails, the alternative makes is runtime/compiler/optimization level dependent which would be a total mess

Yes, there is no guarantee that they are next to each other, but in this case they happen to be next to each other, and according to the spec as quoted in the article, two pointers are equal if:

> [...] one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space

Note that it says "happens to" immediately follow, not "is guaranteed to" immediately follow.

This are pointers to ints, not arrays of ints, but that should not matter because as quoted in the article:

> For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

I don't see any way to read these as not requiring the pointers to compare as equal if the compiler happens to put a and b adjacent in memory in the right order and with no padding between them, other than resorting to something ridiculous like claiming that "follow" or "immediately" follow are not defined in the spec and so one object occupying that very next available address after another does not necessarily "immediately follow". This seems to be what the gcc developers went with to declare this is not a bug.

Also, note that if you move the "int a, b" to outside main, so a and b are on the heap instead of on the stack, then gcc does find that the pointers are equal.


> > For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

> I don't see any way to read these as not requiring the pointers to compare as equal if the compiler happens to put a and b adjacent in memory in the right order and with no padding between them, other than resorting to something ridiculous like claiming that "follow" or "immediately" follow are not defined in the spec and so one object occupying that very next available address after another does not necessarily "immediately follow". This seems to be what the gcc developers went with to declare this is not a bug.

"These operators" in the spec refer to the (additive) pointer arithmetic operators, not the equality operator. So incrementing a pointer to int behaves as if the pointee were in an array of length one, but that does not mean any subsequent equality operator should consider the objects as arrays.

It's totally fine for the equality comparison to return false when the objects in question are actually not arrays.


Then let's make them real arrays:

  #include <stdio.h>

  #define N 8

  int main(void) {
    int a[N], b[N];
    int *p = &a[0];
    int *q = &b[N];
    printf("%p %p %d\n", (void *)p, (void *)q, p == q);
    return 0;
  }
Results:

  $ gcc -std=c11 -O1 ar.c; ./a.out
  0x7ffd32d048c0 0x7ffd32d048c0 0
  $ gcc -std=c11 ar.c; ./a.out
  0x7ffe3f8ccd60 0x7ffe3f8ccd60 1


BTW, things work correctly when the objects live inside a real array:

  #include <stdio.h>

  int main(void) {
    int arr[16];
    int *p = &arr[0] + 1;
    int *q = &arr[1];
    printf("pointers: %p %p %d\n", p, q, p == q);
    return 0;
  }
The result is perfectly correct and predictable, as the Standard guarantees pointer comparisons in this case:

  pointers: 0x71b35ac790d4 0x71b35ac790d4 1
The original code in the article relies on UB, I think, so all bets are off.


> Note that it says "happens to" immediately follow, not "is guaranteed to" immediately follow.

I read that more as "happens to be placed" by the programmer (as a conscious decision), not including mere happenstance. In other words, this refers to arrays and struct fields.


> Also, note that if you move the "int a, b" to outside main, so a and b are on the heap instead of on the stack, then gcc does find that the pointers are equal.

This is also an utter accident, since it depends a great deal on the toolchain. I've worked on compilers and linkers that will put the variables in totally separate parts of the binary (e.g., based on name hash).


Agree on the alarm bells with the pointer arithmetic.

Disagree that gcc is doing the right thing here. The clang behavior (different comment in this thread) is much more sane: if the pointers happen to be the same, they compare as equal. If the pointers happen to not be the same, they compare as not equal.


Except clang's behaviour changes depending on the optimisation level. If you use -O1, then you get a different result.


Yes, because the pointers are different at -O.

As there are no guarantees as to the relative placement of auto/stack variables relative to each other, that is perfectly fine.


If the behaviour differs at optimisation levels, would that not suggest that it is not a sane behaviour?


Not in this case, no, not at all.

Finding a different layout for auto variable in the stack frame at different optimization levels is perfectly fine.


That would require making a distinction between array-pointers and non-array pointers. There, another can of worms. Probably not worth the additional complexity.


C is useful in part because it allows for the freedom unimaginable in other languages.


That freedom does exist in Ada, Modula-2 among others, the big difference being that one needs to be explicit about what is going on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: