Intuitive Indexing

While most programming languages use 0-based indexing some people are not convinced that this is a good thing and not just the legacy of the language C. They argue that starting to index with one is more intuitive than starting to index with zero, and that there is no good reason to use 0-based indices besides pointer arithmetic which should not be of concern to higher level programming.

And to be fair, proponents of zero based indexing have mostly failed to provide a convincing argument for their preference (Dijkstra’s range argument aside). To remedy this, I am going to present 5 arguments for 0-based indexing and address the intuition argument of 1-based indexing.

0. Intuition

Sure if you simply want to enumerate a written list, 1-based indexing makes more sense. 1-based indexing then provides you with the correspondence of the n-th element with the index n. But we do not have to look far to find instances where this does not apply. One simple example are the parameters of polynomials or exponential series to be more general. Mathematicians generally write

$$a_0 + a_1 x +a_2 x^2 + a_3 x^3…$$

starting with the index zero. And exponential series are everywhere. Sine, Cosine, Fourierseries, Taylorseries, etc. The taylor series brings us to the next class of things you want to index with zeros: Operations you might apply repeatedly. Having the function itself as the 0-th derivative of itself allows one to intuitively place it in the same enumeration as the rest of the derivatives. The same happens when you want to repeatedly apply some operator (e.g. the transition matrix for markov chains). Or any discretized physics simulation starting at (surprise, surprise) time zero.

And there are more places where you just have some initial thing left over which does not really belong into your enumeration. In the case of linear regression, you might have n parameters with coefficients plus a bias. Using zero based indexing, the bias naturally becomes the 0-th element while the coefficient at index i corresponds to the i-th element.

$$y = a_0 + a_1 x_1 + … + a_n x_n$$

So while there are certainly cases where 1-based indexing might be more intuitive for enumeration, there are just as many use cases where 0-based indexing is more intuitive. And since this is not really an argument for 0-based indexing but rather a counterargument to the inherent intuitiveness of 1-based indexing, it is nice that I had some index left over to start the actual argument for 0-based indexing now:

1. Pointers

The most well known argument for 0-based indexing is pointer arithmetic. Computer memory is enumerated with integer addresses, and a “pointer” is just a variable storing the address for some bit of memory. If you want to store an array of size n in memory, it is enough to reserve this memory and keep the address to the first bit of memory. If you want to retrieve the second element, you know that its position is right after the second element at address p. Therefore the second element is stored at address p+1. The i-th element is stored at address p+(i-1) and the first element is stored at p+0 because p already points to the first element. So pointers naturally give rise to zero based indexing.

Now I agree that this is not a good enough reason to use zero based indexing. No one should work with bare pointers in this day and age and programming languages could just abstract such implementation detail away. But I made this point not only for completeness sake, it is also a good introduction to

2. NDArrays

Memory is 1-dimensional. Addresses are just a single number incrementing. A naive approach to store a matrix in memory might be, to store an array of arrays. I.e. an array with pointers representing the columns with every pointer pointing to a row. While this works, it is not very efficient. Not only do you need to store an entire array of pointers, you also have to first retrieve them to find the right row. This means access to an element requires two memory retrievals and memory operations are notoriously expensive compared to arithmetic. It is thus preferable to simply flatten n dimensional arrays. This means that rows are simply concatenated and stored one after the other.

So lets see how that would look for a 3x10 matrix in one based indexing

	col 1	col 2	col 3	col 4	col 5	col 6	col 7	col 8	col 9	col 10
row 1	01	02	03	04	05	06	07	08	09	10
row 2	11	12	13	14	15	16	17	18	19	20
row 3	21	22	23	24	25	26	27	28	29	30

and compare it to a 0-based layout:

	col 0	col 1	col 2	col 3	col 4	col 5	col 6	col 7	col 8	col 9
row 0	00	01	02	03	04	05	06	07	08	09
row 1	10	11	12	13	14	15	16	17	18	19
row 2	20	21	22	23	24	25	26	27	28	29

You might have noticed that in the second case, the first digit is always equals the row while the first digit always equals the column. So if we wanted to access the element a[i][j] we could simply access a[ij]. Now of course concatenation of numbers is not really a thing unless you switch to strings so the actual operation would look like this a[i*10 + j].

Now you might argue that this example only works because we are using a base 10 numbering system. This is half right. The concatenation? Sure. But that is due to digits being written as

$$abcd = a 10^3 + b10^2 + c 10^1 + d 10^0$$

If you have an arbitrary n x m matrix, you will still be able to access the element a[i][j] with a[i*m + j].

I challenge you to try and come up with the bijection for 1-based indexing without converting everything to 0-based indices and back. It is quite difficult to wrap your head around. Notice that the other way around is just as easy:

a[n] is converted to a[floor(n/m)][a%m]. Now for n dimensional arrays you might still argue that this only works if all the dimensions have the same size. But this is not the case. While our numbering system generally work with powers of some base. (e.g. 10^0, … 10^n), this does not have to be and is not always the case. Just think of seconds, minutes, hours, days, months, years. To convert years into seconds, you have to multiply them by the size of every previous dimension. The same works for n dimensional arrays.

Why do 0-based indices work better here?

Whether we are talking about number systems or flattened multidimensional arrays the underlying principle is that we have positions/digits which wrap around (modulo algebra) and carry over into the next position/digit. And modulo algebra is generally done from 0 to n-1 not from 1 to n. We generally don’t write 10:60 and instead wrap around from 10:59 to 11:00. Carrying over from minutes to hours. Similarly it is easier to wrap arrays after (n-1) and not n.

A note on Column vs Row Major Storage

I have used row major storage here. This means that we first go through the rows and then through the columns when enumerating our matrix.

Now keen eyes might notice that we do not really specify what is a column or a row so you could just pretend it is different. But since we generally write column indices before row indices, we mean by “row major” that we first go through the right-most index. And that is fixed by the language.

Column major storage would also work, but that would mean we have to flip around a[ij] to a[j][i]. In a sense our numbering system is row major, the things close together are on the right, not the left. 55 is closer to 56 than to 65. And access behavior in programming languages obeys this principle. If you retrieve object.property.name from a hierarchical struct, then name is stored next to object.property.description. Not next to object.property2.name. Similarly an array of arrays (which is sometimes necessary if the sub arrays do not have the same size), will result in things closer together being on the right.

Lastly it is very easy to write down an array of arrays:

a = [
   [1,2,3,4],
   [3,4,5,6]
]

which is easily and sensibly translated to a row major storage. You can also easily provide rows from a by writing a[i] and access a column from that row with a[i][j] which is the same element as a[ij]. While returning the column a[j] if you access a column major store would mean that the i-th row from that column a[j][i] corresponds to the element a[ij].

So column major storage violently collides with every other access convention.

Column major storage also means that

$$x A^T$$

for a vector x and matrix A will be faster than

$$ Ax $$

The fact that this convention is used in most scientific computing languages from Fortran, R and Matlab to now even Julia is disturbing.

3. Divide and Conquer

While you might still argue that n dimensional arrays can be abstracted away and do not matter, the fundamental principle which makes them work better with 0-based indices also makes “Divide and Conquer” algorithms easier with 0-based indices. Splitting n tasks into equally sized m portions is equivalent to equally distributing those n tasks over m rows in a matrix. In other words converting a one dimensional array of length n into an

$$ m\times\text{ceil}(m/n)$$

array. And divide and conquer algorithms are everywhere, from sorting algorithms to workload distribution over computing cores.

4. Discretization

Another task which is extremely common is the discretization of a continuous interval. One might for example want to discretize the interval from 0 to 1 with points distanced 1/n from each other: [0, 1/n,...,1]. The value at point i/n will correspond to the index i inside the value array. If you want to find the best approximation for a point between 0 and 1, you simply multiply it by n and round to the nearest integer – that will be your index.

I got asked once whether that was an artifact of the interval from 0 to 1. But that is again not the case. If we took the interval from 1 to 2, you would have to subtract 1 from your location when looking for the index in 0-based indexing sure, but you would also have to do that in 1-based indexing plus you would still have to fix the index in the end.

To get an intuition for why this is the case, consider n=10. Then our points are 0.0, 0.1, 0.2,..., 1.0 or 1.0, 1.1,...2.0, notice how the index is 0.i or 1.i respectively. Shifting the interval does not help because it shifts the wrong digit. And this also explains why counting from 0001 is weird. You start with every digit at 0 except for the 10^0 digit. So as soon as you shift the scale everything breaks, because you do not start counting from 1111 but from 0001. Counting from 0 is natural, we do it for every digit but the ones.

Now you could shift the interval by your discretization size and start with 0.1, ..., 1.0. But then your interval/your starting point depends on the discretization. And generally any situation where you are discretizing, you very much care about 0 (e.g. time in physics simulations/differential equations).

It is also often the case that you want to compare two different discretizations to estimate whether or not you already have convergence. If you consider another discretization with points distanced 1/2n apart, then you can easily find the values of the rough discretizations by looking for the index a[i*2] in the array of finer discretizations.

5. Ranges

Lastly there is also the often cited argument by Dijkstra about ranges. As you might have noticed in the discretization argument, the number of points in [0.0,0.1,... 1.0] is not 10, it is 11. Similarly there are (n-m+1) elements in [m,...,n]. This +1 is quite confusing and irritating when slicing your data. So the idea is, that you want your ranges to include just as many elements as they are sized. To achieve that you need to drop either the first or the last element. In other words:

a[m:n] should either be [a[m+1],...,a[n]] or [a[m],...,a[n-1]].

The first variant is considered weird by most people (including proponents of 1-based indexing). The second variant does not work well with 1-based indices, as a[k:length(a)] would not include the last index. But this works extremely well for 0-based indexing precisely because the last element of an array of size n is the element at index n-1.

But Python does even more nifty things with ranges using zero based indexing. Instead of having to use something like a[length(a)-(k-1)] to obtain the k-th last element, you can simply wrap your array around and obtain a[-k]. In particular the last element is just a[-1].

Now you could of course do the same with 1-based indices and make a[0] the last element, but I would personally find that extremely irritating – modulo algebra is inherently 0-based.

Final Thoughts

When I started to think about this issue I thought that you basically had to decide between ranges working intuitively, (lengths corresponding to differences) and the intuition that the i-th element should be at index i. Because one can not seem to have both.

And at that point I would have already chosen ranges. Why? Because

Slicing data into pieces is mentally much more challenging than retrieving a single index. So if one or the other should be made easier it should be the more difficult and thus more bug prone thing.
I can not think of a single reason why you would want to access the i-th element in a programming language. You generally use arrays when all the elements have similar meaning and there is no reason to pick a particular element from the list. If they had special meaning using maps (dictionaries) or structs seems much more appropriate since you can give your entries names then. Slicing data on the other hand? You want to see the top 10 tourist attractions? Sure sort your dataframe and use df[:10]. Any divide and conquer algorithm is another example.

And since ranges are more challenging and used more often, I would have picked 0-based indexing based on this merit alone. But it is not the only reason one should make this decision. It is one of many.

The trap people fall into, is that they think of hand written lists when they think about indices. Sure, there it might be a bit more intuitive to use 1-based indices. But we are not handwriting lists here. We have to do much more complex things with indices, and 0-based indices just work much better when you actually have to do arithmetic with them.

And sure you can abstract a lot of things away, but someone still has to write that code. And if you abstract it away then neither of them have any advantages. So we might as well use 0-based indexing at this point, since you should not see indices anyway. Lastly “use a library” should not be the answer to a design failure.