mHC: Generalized Residual Connections in DeepSeek-V4
Published:
Residual connections force every layer to read from and write to one shared vector. Manifold-Constrained Hyper-Connections replace this vector with a matrix and constrain the mixing to be doubly stochastic, preserving the identity mapping property that makes residual connections stable.
