Vectors are not a list of numbers
I recently joined a Discord channel about Mechanistic Interpretability and within that joined a small reading group for the book Mathematics for Machine Learning, to brush up on my mathematical background.
For our first session, we read the first fifty pages, which covers the basics of vectors and vector spaces, solving linear equations, reduced echelon form and Gaussian elimination. I was surprised by this, because I was expecting there to be more emphasis on conceptual understanding rather than on computations: knowing how to reduce a matrix to echelon form is not central to linear algebra or machine learning.
To help my fellow readers, I wrote some thoughts. This blogpost is those thoughts fleshed out.
The crux of the matter is that fundamentally vectors are not a list of numbers, which so far is the main way the book has treated vectors. Instead vectors are fundamentally geometric objects, often drawn as arrows, where adding two arrows means sticking one arrow onto the end of the other and scaling arrows means stretching/squeezing the length of the arrow.
To get to the list of numbers you need a basis, which you can think of as a coordinate system. With a basis, you can then associate the vector with a list of numbers, which you can think of as the ‘coordinates of the vector’. Importantly, a single vector can have different coordinates depending on the choice of basis: the same vector can be represented by [1,0,0] or [0,1,0] or [root(2), TREE(3), -e^pi].
Similarly, a matrix is not fundamentally a grid of numbers with obscure rules for multiplication, but simply the coordinates (once you have chosen a basis) of a linear map. (And once you know this, you can derive these rules for multiplication. It is no longer some arbitrary steps to be memorized but a straightforward consequence of how linear maps compose.)
As an analogy, the length of an object, say a rod, is not a number. If I said, “the length of the rod is 15”, you should rightfully ask, “15 what?”. Only once I have specified some units, say centimetres, does it then make sense to assign the number 15: the rod’s length is 15cm.
The question remains: what is a vector, if not a list of numbers? I do not have a great answer for this. One attempt is that a vector is a thing which when given a basis/coordinate system, behaves like a list of numbers. Or, more geometrically (and hence better) a vector is something that 'behaves like an arrow in the canonical vector space of arrows'. To make this precise, you then have to introduce the formal definition of a vector space, but without the underlying intuition, the definition is not enlightening.
To help appreciate the difficulty in defining what a vector is, how would you define ‘the length of a rod’? Once you choose some units, it is easy to say “the rod’s length is 15cm”. And you can also give the length in other units, e.g. you could have said “the length is 6 inches”. But the length is certainly not the number ‘15’ or ‘6’. The length is the underlying physical thing, not the numbers themselves. Similarly, vectors are the underlying ‘abtract object’, not the list of numbers that represent it.
As another analogy, there is a difference between the word “chair” and an actual chair. In particular, if you start speaking a different language, you will use a different word for the chair, but that does not change the chair itself.
And another analogy: this idea also applies to numbers themselves! The number 15 is not the same as the literal symbols ‘15’. If you change base, you will get a different representation, e.g. 1111 in binary, 23 in base six, or XV in roman numerals. All these strings, ‘15’, ‘1111’, ‘23’ and ‘XV’ are representing the same underlying thing and that thing is what the number 15 really is.
Why am I stressing the idea that vectors are not the same as a list of numbers. Partially it is because I like explaining things, but this distinction is crucial for mechanistic interpretability and machine learning in general.
For example, PCA is all about finding the ‘best’ (in a precise sense) basis to represent your dataset. You do not change the dataset, just the coordinates used to represent it. For mechanistic interpretability, I quote Neel Nanda:
A key mental move in mechanistic interpretability is thinking about the internal activations of the model as living in some vector space, and switching between thinking about the vector as a geometric object in R^n, vs as a tuple of n coordinates in some specific basis, vs as a different tuple of n coordinates in some other basis.