Vision, before learning.
We give machines the geometry of sight, so they don't have to relearn the world from scratch.
TheThestructurestructureofofvisionvisionisisnotnotaahyperparameter.hyperparameter.ItItisisaafactfact——andandweweencodeencodeititdirectly.directly.
A robot that walks into a new room and already knows how light, edges, and motion are shaped.
Under noise, the gap shows up.
Accuracy under noise. Standard CNN drops to 69%. Same data, same optimizer.
Faster to 70% action-recognition accuracy on video. 79 frozen parameters.
Frozen parameters in our representation, vs 784,080 for the discretized equivalent.
Cost in input dimension. Previous approaches scale combinatorially. Linear, not polynomial.
Most networks waste capacity rediscovering geometry that mathematics has already proven.
Small natural image patches concentrate on a Klein bottle — a result proven by Gunnar Carlssonat Stanford. We don't train a network to rediscover that surface. We replace its first layer with a topological lift that already lives on it.
The same construction extends to video through the tangent bundle — the Klein bottle moving through frames. Cost scales linearly in input size, not polynomially. The downstream network has less to learn, and is stable under noise it has never seen.
The place this matters most is robotic perception: data is scarce, environments are noisy, and every new room is a distribution shift.
Built by mathematicians, grounded in topology.
If the structure is already in the data,
the model shouldn't have to relearn it.