Advertisement

Array arithmetic

Started by August 08, 2018 06:07 PM
5 comments, last by frob 6 years, 1 month ago

Reading through Stroustrup C++ Programming Language I arrived at some rather strange array arithmetic (without use of pointers). He says the following:

Quote

For every built-in array a and integer j within the range of a, we have:

a[j] == ∗(&a[0]+j) == ∗(a+j) == ∗(j+a) == j[a]

It usually surprises people to find that a[j]==j[a]. For example, 3["Texas"]=="Texas"[3]=='a'

This surprised me too. How 1. a[j]==j[a] and 2. 3["Texas"]=="Texas"[3]=='a' are possible?
Isn't a usually (read always) implicitly converted to T*, so how [a] could point to something meaning able? Also if array has 10 elements, than j==10, how address of 10 could point to the same thing, or it does?
If somebody could explain this through assembly or memory I would really appreciate it.

The mistake that you make is that you follow your intuition, and/or the tutorial you once read. Both say that "a[j]" "means index j into array a". While this what it effectively means, this is technically actually not true in the details of the semantics of the c++ language.

In a sense, p[q] doesn't exist at all. It is just a nice looking abbreviation for *(p+q) . This rewrite doesn't care about types of p and q. As a result v[w] and w[v] mean the same thing, because *(v+w) and *(w+v) give the same answer, as you expect from + (+ is a commutative operator).

In other words, the tutorial you read technically lied to you, and tricked you into believing that a[j] exists with the semantics like you believe it has. In fact, the semantics of [] are not handled by the [] operator at all, but by the +, after the above rewrite of p[q] to *(p+q). (According to the language definition, an actual compiler may do otherwise here, and eg recognize [] on its own, for example to give better error messages or so.)

The T* comes into play when computing the "+" result. When you try to add a pointer and an integer together, the size of the element-type of the pointer is multiplied with the integer.

a[j] -> *(a + j) -> *( ((char*)&a[0]) + sizeof(T)*j )

j[a] -> *(j + a) -> *( sizeof(T)*j + ((char*)&a[0]) )

Note that "((char*)&a[0])" is an awkward notation for "the first address in the array, and stripping away the sizeof multiplication from the addition"

"sizeof(T)*j" is the offset of the j-th value in the array.

Both are just numbers, and swapping them has no effect, both calculations give the same answer.

Advertisement
16 hours ago, ryt said:

I arrived at some rather strange array arithmetic (without use of pointers).

Nope.  You stumbled onto pointer arithmetic and went into denial.

In C and C++, arrays are really just pointers (with a few extra properties, not relevant to the discussion). Pointers are really just integers (with some extra properties, not relevant to the discussion).  In the algebra of integers, addition is a commutative operation.  Indexing an array using its operator[] is just adding an integer index to an integer pointer.  It makes perfect sense that you can also add an integer pointer to an integer index and get the same result.

Stephen M. Webb
Professional Free Software Developer

People can also do this to mess with other developers.  Constructs like 0[ptr] or 0[myArray] can catch your attention, but when you remember that under the hood array access is effectively *(a+b) it doesn't matter which order the two values are placed. 

The language allows many quirky things like this.  But if I ran across 0[this] in a code review, they'd be getting some feedback.

13 hours ago, frob said:

But if I ran across 0[this] in a code review, they'd be getting some feedback.

Could be a way to gauge how careful your coworkers do their reviews.

This is possibly a tangent but we're not in For Beginners, so going for it.

Yes, it can be a way to judge it.

Over the years I've had a few people who did gauge code review quality over time across the team.  I've known co-workers who intentionally introduce subtle changes in to the review that aren't in the actual code just to see who finds the errors. On a few occasions they've been introduced on purpose by leadership as a science project for data about the code reviews. Sometimes they're revealed in good ways, other times revealed in annoying ways, and sometimes they're never revealed at all. 

It is a good thing for statistics in quality control. If you introduce m known defects and n of them are caught, you can estimate that you're getting a ratio of n/m of that type of defect generally. If 80% of those little bugs are caught in code reviews, you can estimate that code reviews catch about 80% of the accidental issues. In that case your code reviews are effective.  But if only 20% are caught, you can assume a similar rate for accidental issues and know you the team needs to increase their scrutiny.

It is annoying when you know people introduce them on purpose, both because you know some developers have a bit of a "gotcha" mentality with them and because people will shy away from someone who is intentionally tripping them up.  But it can work out as a good thing.

 

Regarding someone who did it on purpose, he was clear he is trying to get better code reviews. He would always introduce at least one item into code reviews expressly to get caught.  Everyone on the team knew he was looking for as many issues as you could spot so the code would be better, not because he wanted to be spiteful.  Still was annoying because the reviews took more time, but we all knew the code was better for it.

This topic is closed to new replies.

Advertisement