Why are two random vectors near orthogonal in high dimensions?

Introduction

High-dimensional embedding vectors are fundamental building blocks in Machine Learning, particularly in transformers or word2vec. Typically, two vectors that are semantically similar point in roughly the same direction; if they are entirely dissimilar, they point in opposite directions; and if they’re nearly orthogonal, they are unrelated.

We usually think in two or three dimensions, but there are some unintuitive properties that only apply in higher dimensions. For example, two random vectors are expected to be near orthogonal in high dimensions. Intuitively, it makes sense for word2vec, as we expect that two words are unrelated in most instances.

In this post, I will explain why two random vectors are expected to be nearly orthogonal in high dimensions.

Subscribe to get a notification about future posts.

Two Dimensions

function makeCircle2D(vecU, vecV, height=200) {
  const svg = d3.create('svg')
    .attr('viewBox', [0, 0, width, height]);
  
  function appendArrow(svg, v, color, id) {
    svg
      .append('defs')
      .append('marker')
      .attr('id', id)
      .attr('viewBox', [0, 0, 20, 20])
      .attr('refX', 20)
      .attr('refY', 10)
      .attr('markerWidth', 10)
      .attr('markerHeight', 10)
      .attr('orient', 'auto-start-reverse')
      .append('path')
      .attr('d', d3.line()([[0, 0], [0, 20], [20, 10]]))
      .attr('fill', color);
    svg
      .append('path')
      .attr('d', d3.line()(
        [[width / 2, height / 2], [width / 2 + v[0], height / 2 + v[1]]]
      ))
      .attr('stroke', color)
      .attr('stroke-width', '2px')
      .attr('marker-end', 'url(#' + id + ')')
      .attr('fill', 'none');
  }
  const radius = Math.min(width, height) / 2 - 4;
  vecU = [vecU[0] * radius, vecU[1] * radius];
  vecV = [vecV[0] * radius, vecV[1] * radius];
  svg
    .append('circle')
    .attr('cx', '50%')
    .attr('cy', '50%')
    .attr('r', radius)
    .style('stroke', 'black')
    .style('stroke-width', '2px')
    .style('fill', 'none');

  appendArrow(svg, vecU, "black", "arrow-0");
  appendArrow(svg, vecV, "red", "arrow-1");
  return svg.node();
}

This problem is equivalent to selecting two random vectors u and v on a circle of radius 1 and computing their dot product.

makeCircle2D([0, 1], [1 / Math.sqrt(2), 1 / Math.sqrt(2)])

As a reminder the dot product of u and v is:

\[ \mathbf{u} \cdot \mathbf{v} = u_x v_x + u_y v_y = \cos (\alpha) \]

The dot product is zero when u and v are orthogonal, and near zero when they are nearly orthogonal.

The dot product is invariant under rotations, which means that we can rotate both vectors in the same way such that the vector u is [0, 1]:

makeCircle2D([0, -1], [-1 / Math.sqrt(2), -1 / Math.sqrt(2)])

This simplifies the dot product to \(\mathbf{u} \cdot \mathbf{v} = \cos (\alpha) = v_y\). The probability of being near orthogonal is pretty low in two dimensions, but this helps us in framing our problem for higher dimensions.

It turns out that the average value of \(\cos^2(\alpha)\) or \(v_y^2\) is exactly \(\frac{1}{2}\). There are numerous analytical proofs, but the simplest intuition is derived from \(v_x^2+v_y^2=1\). We have a total budget of 1, which we distribute between \(v_x^2\) and \(v_y^2\); thus, on average, \(v_y^2\) receives half of the budget.

We only have two orthogonal vectors to u, and they are:

html`
<div style="display: flex; justify-content: center;">
  <div style="width: 50%;">
    ${makeCircle2D([0, -1], [-1, 0], 400)}
  </div>
  <div style="width: 50%;">
    ${makeCircle2D([0, -1], [1, 0], 400)}
  </div>
</div>
`

Three Dimensions

THREE = {
  const THREE = window.THREE = await require("three@0.145.0/build/three.min.js");
  await require("three@0.145.0/examples/js/controls/OrbitControls.js").catch(() => {});
  return THREE;
}

function makeSphere(wireframe = false, opacity = 1) {
  const material = new THREE.MeshBasicMaterial({
    wireframe: wireframe, 
    opacity: opacity, 
    color: 0x555555, 
    transparent: true  
  });
  const geometry = new THREE.SphereGeometry(1, 32, 32 ); 
  return new THREE.Mesh(geometry, material);
}

function makeSphereSlice(wireframe = true, opacity = 1) {
  const material = new THREE.MeshBasicMaterial({
    wireframe: wireframe, 
    opacity: opacity, 
    color: 0xff0000, 
    transparent: true,
    side: THREE.DoubleSide, 
    depthTest: false, 
  });
  const geometry = new THREE.SphereGeometry(
    1.01, 32, 32,
    0, Math.PI * 2.0,
    Math.PI / 2 - Math.PI / 32, Math.PI / 16
  ); 
  return new THREE.Mesh(geometry, material);
}

function makeCircle3D(color, radius=1) {
  const geometry = new THREE.CircleGeometry(radius, 32); 
  // radius^2+z^2=1
  // z=sqrt(1-radius^2)
  const material = new THREE.MeshStandardMaterial({ 
    color: color,
    side: THREE.DoubleSide,
    opacity: 0.3,
    transparent: true,
    depthTest: false, 
  }); 
  const circle = new THREE.Mesh( geometry, material );
  circle.position.y = Math.sqrt(1 - radius * radius);
  circle.lookAt(0, 1, 0);
  return circle;
}

function makeVector(vec, color) {
  function customArrow( fx, fy, fz, ix, iy, iz, thickness, color) {
    const ARROW_BODY = new THREE.CylinderGeometry( 1, 1, 1, 12 )
                            .rotateX( Math.PI/2)
                            .translate( 0, 0, 0.5 );

    const ARROW_HEAD = new THREE.ConeGeometry( 1, 1, 12 )
                            .rotateX( Math.PI/2)
                            .translate( 0, 0, -0.5 );
    var material = new THREE.MeshLambertMaterial( {color: color} );
    
    var length = Math.sqrt( (ix-fx)**2 + (iy-fy)**2 + (iz-fz)**2 );
    
    var body = new THREE.Mesh( ARROW_BODY, material );
    body.scale.set( thickness, thickness, length-10*thickness );
      
    var head = new THREE.Mesh( ARROW_HEAD, material );
    head.position.set( 0, 0, length );
    head.scale.set( 3*thickness, 3*thickness, 10*thickness );
    
    var arrow = new THREE.Group( );
    arrow.position.set( ix, iy, iz );
    arrow.lookAt( fx, fy, fz ); 
    arrow.add( body, head );
    return arrow;
  }
  return customArrow(vec[0], vec[1], vec[2], 0, 0, 0, 0.02, color);
}

function makeScene(vec) {
  const scene = new THREE.Scene();
  scene.background = new THREE.Color(0xffffff);
  scene.add(new THREE.AxesHelper(1.5));
  scene.add(makeSphere(/*wireframe: */false, /*opacity: */0.1));
  scene.add(makeSphere(/*wireframe: */true, /*opacity: */1));
  const light = new THREE.AmbientLight(0xFFFFFF,  /*intensity=*/1);
  scene.add(light);
  scene.add(makeVector([0, 1, 0], 0x000000));
  scene.add(makeVector(vec, 0xff0000));
  return scene;
}

function makeRendererElement(height, scene) {
  function makeCamera() {
    const fov = 45;
    const aspect = width / height;
    const near = 1;
    const far = 1000;
    var camera = new THREE.PerspectiveCamera(fov, aspect, near, far);
    camera.position.z = 3;
    return camera;
  }
  const camera = makeCamera();
  const renderer = new THREE.WebGLRenderer({antialias: true});
  const controls = new THREE.OrbitControls(camera, renderer.domElement);
  controls.autoRotate = false;
  controls.enableZoom = false;
  function animate() {
    requestAnimationFrame( animate );
    controls.update();
    renderer.render( scene, camera );
  }
  animate();
  invalidation.then(() => (controls.dispose(), renderer.dispose()));
  renderer.setSize(width, height);
  renderer.setPixelRatio(devicePixelRatio);
  renderer.render(scene, camera)
  // controls.addEventListener("change", () => renderer.render(scene, camera));
  return renderer.domElement
}

In three dimensions, we take two random vectors u and v on a unit sphere and rotate them together so that the vector u becomes the north pole ([0, 0, 1]):

{ 
  const scene = makeScene([1 / Math.sqrt(3), 1 / Math.sqrt(3), 1 / Math.sqrt(3)]);
  return makeRendererElement(300, scene);
}

Tip: The spheres in this post are interactive.

The dot product of \(\mathbf{u}=[0, 0, 1]\) and \(\mathbf{v}=[v_x, v_y, v_z]\) is:

\[ \mathbf{u} \cdot \mathbf{v} = \cos (\alpha) = v_z \]

In 2D, we only had two vectors that are orthogonal to u. In 3D, we have an entire subspace of vectors that are orthogonal to u:

{ 
  const scene = makeScene([1 / Math.sqrt(2), 0, 1 / Math.sqrt(2)]);
  scene.add(makeCircle3D(0xff0000));
  scene.rotateX(Math.PI / 10);
  return makeRendererElement(300, scene);
}

In other words, any vector in the red circle above is orthogonal to u. Moreover, this circle is the largest one that can be found on the sphere. For example, compare with a smaller circle which spans non-orthogonal vectors:

{ 
  const vec = [0.7, Math.sqrt(1 - 0.7 * 0.7), 0];
  const scene = makeScene(vec);
  scene.add(makeCircle3D(0xff0000, 0.7));
  scene.rotateX(Math.PI / 8);
  return makeRendererElement(300, scene);
}

The larger the circle from which we select the vector v is, the closer it is to being orthogonal:

{ 
  const vec = [1 / Math.sqrt(2), 0, 1 / Math.sqrt(2)];
  const scene = makeScene(vec);
  scene.add(makeSphereSlice(false, 0.3));
  // scene.add(makeCircle3D(0xff0000, 1));
  scene.rotateX(Math.PI / 8);
  return makeRendererElement(300, scene);
}

For example, we can consider all vectors within the thin red stripe to be nearly orthogonal.

The average value of \(\cos^2 (\alpha)\) or \(v_z^2\) is \(\dfrac{1}{3}\). Similar to the 2D case, this comes from the fact that \(v_x^2+v_y^2+v_z^2=1\). With a total budget of 1 split among \(v_x^2\), \(v_y^2\) and \(v_z^2\), each gets, on average, one-third of it.

N-dimensions

Visualizing an N-dimensional sphere is challenging. However, we have all the tools needed to develop an intuition.

Similar to the 2D and 3D cases, we can fix the first vector as \(\mathbf{u}=[0, 0, \dots, 1]\). The second vector \(\mathbf{v}=[v_1, v_2, \cdots, v_n]\) is randomly chosen from an N-dimensional unit sphere. Their dot product is then given by:

\[ \mathbf{u} \cdot \mathbf{v} = \cos (\alpha) = v_n \]

The average value of \(\cos^2 (\alpha)\) or \(v_n^2\) is \(\dfrac{1}{n}\). Since \(v_1^2+v_2^2 + \cdot + v_n^2=1\) and the total value of 1 is divided equally among n components, giving \(v_n^2\) a share of \(\frac{1}{n}\). As n becomes large enough, the average value of \(cos^2(\alpha)\) approaches zero.

More Formal Proof

In my opinion the above explanation is enough to understand the intuition. However, let’s also present a formal proof for the expected value of \(v_n^2\).

We’re looking for \(\mathbb{E}\left[v_n^2\right]\). And we also have:

\[ \begin{align} \mathbb{E}[\sum_{i=1}^n v_i^2] &=\sum_{i=1}^n \mathbb{E}[v_i^2] \\ &= n\mathbb{E}[v_n^2] \\ &= 1 \end{align} \]

Hence, \(\mathbb{E}[v_n^2]=\frac{1}{n}\). Here we used the property that all components of v have identical expected values and the linearity of expectation.

The End

I hope you enjoyed this post.

Subscribe to get a notification about future posts.

html`
    <canvas id="canvas" style="width:100%; height: 360px;">
    </canvas>
`

pkg = {
  const file = await FileAttachment("./pkg/demo.js").url();
  const module = await import(file);
  module.default = await module.default();
  // module.set_panic_hook();
  return module;
}