Why are two random vectors near orthogonal in high dimensions?

math
random
vector
Author

Madiyar Aitbayev

Published

May 12, 2025

Why are two random vectors near orthogonal in high dimensions?

Introduction

High-dimensional embedding vectors are fundamental building blocks in Machine Learning, particularly in transformers or word2vec. Typically, two vectors that are semantically similar point in roughly the same direction; if they are entirely dissimilar, they point in opposite directions; and if they’re nearly orthogonal, they are unrelated.

We usually think in two or three dimensions, but there are some unintuitive properties that only apply in higher dimensions. For example, two random vectors are expected to be near orthogonal in high dimensions. Intuitively, it makes sense for word2vec, as we expect that two words are unrelated in most instances.

In this post, I will explain why two random vectors are expected to be nearly orthogonal in high dimensions.

Subscribe to get a notification about future posts.

Two Dimensions

This problem is equivalent to selecting two random vectors u and v on a circle of radius 1 and computing their dot product.

As a reminder the dot product of u and v is:

\[ \mathbf{u} \cdot \mathbf{v} = u_x v_x + u_y v_y = \cos (\alpha) \]

The dot product is zero when u and v are orthogonal, and near zero when they are nearly orthogonal.

The dot product is invariant under rotations, which means that we can rotate both vectors in the same way such that the vector u is [0, 1]:

This simplifies the dot product to \(\mathbf{u} \cdot \mathbf{v} = \cos (\alpha) = v_y\). The probability of being near orthogonal is pretty low in two dimensions, but this helps us in framing our problem for higher dimensions.

It turns out that the average value of \(\cos^2(\alpha)\) or \(v_y^2\) is exactly \(\frac{1}{2}\). There are numerous analytical proofs, but the simplest intuition is derived from \(v_x^2+v_y^2=1\). We have a total budget of 1, which we distribute between \(v_x^2\) and \(v_y^2\); thus, on average, \(v_y^2\) receives half of the budget.

We only have two orthogonal vectors to u, and they are:

Three Dimensions

In three dimensions, we take two random vectors u and v on a unit sphere and rotate them together so that the vector u becomes the north pole ([0, 0, 1]):

Tip: The spheres in this post are interactive.

The dot product of \(\mathbf{u}=[0, 0, 1]\) and \(\mathbf{v}=[v_x, v_y, v_z]\) is:

\[ \mathbf{u} \cdot \mathbf{v} = \cos (\alpha) = v_z \]

In 2D, we only had two vectors that are orthogonal to u. In 3D, we have an entire subspace of vectors that are orthogonal to u:

In other words, any vector in the red circle above is orthogonal to u. Moreover, this circle is the largest one that can be found on the sphere. For example, compare with a smaller circle which spans non-orthogonal vectors:

The larger the circle from which we select the vector v is, the closer it is to being orthogonal:

For example, we can consider all vectors within the thin red stripe to be nearly orthogonal.

The average value of \(\cos^2 (\alpha)\) or \(v_z^2\) is \(\dfrac{1}{3}\). Similar to the 2D case, this comes from the fact that \(v_x^2+v_y^2+v_z^2=1\). With a total budget of 1 split among \(v_x^2\), \(v_y^2\) and \(v_z^2\), each gets, on average, one-third of it.

N-dimensions

Visualizing an N-dimensional sphere is challenging. However, we have all the tools needed to develop an intuition.

Similar to the 2D and 3D cases, we can fix the first vector as \(\mathbf{u}=[0, 0, \dots, 1]\). The second vector \(\mathbf{v}=[v_1, v_2, \cdots, v_n]\) is randomly chosen from an N-dimensional unit sphere. Their dot product is then given by:

\[ \mathbf{u} \cdot \mathbf{v} = \cos (\alpha) = v_n \]

The average value of \(\cos^2 (\alpha)\) or \(v_n^2\) is \(\dfrac{1}{n}\). Since \(v_1^2+v_2^2 + \cdot + v_n^2=1\) and the total value of 1 is divided equally among n components, giving \(v_n^2\) a share of \(\frac{1}{n}\). As n becomes large enough, the average value of \(cos^2(\alpha)\) approaches zero.

In my opinion the above explanation is enough to understand the intuition. However, let’s also present a formal proof for the expected value of \(v_n^2\).

We’re looking for \(\mathbb{E}\left[v_n^2\right]\). And we also have:

\[ \begin{align} \mathbb{E}[\sum_{i=1}^n v_i^2] &=\sum_{i=1}^n \mathbb{E}[v_i^2] \\ &= n\mathbb{E}[v_n^2] \\ &= 1 \end{align} \]

Hence, \(\mathbb{E}[v_n^2]=\frac{1}{n}\). Here we used the property that all components of v have identical expected values and the linearity of expectation.