1. Motivation: Faster Neural Network Development
Training deep neural networks is a time‑consuming process, especially during iterative experimentation where multiple models are tested and refined. Traditionally, each new architecture is trained from scratch, resulting in significant computational waste. The Net2Net approach addresses this inefficiency by enabling rapid knowledge transfer from an already trained network to a new, larger one, dramatically shortening training time while preserving performance.
2. Core Concept: Function-Preserving Transformations
Net2Net is based on the idea of function-preserving transformations, meaning that a new neural network is initialized to represent exactly the same function as the original model, despite having a different architecture. As a result, the new network starts training with the same accuracy as the teacher model and can immediately continue improving, rather than relearning from zero.
3. Net2WiderNet: Making Networks Wider
Net2WiderNet allows a neural network layer to be expanded by increasing the number of units or channels. This is achieved by replicating existing neurons and carefully adjusting their outgoing weights so that the overall computation remains unchanged. After widening, small amounts of noise can be added to break symmetry and allow the expanded model to fully exploit its increased capacity during further training.
Key benefit: the wider model converges faster than one trained from random initialization and reaches the same final accuracy.
4. Net2DeeperNet: Making Networks Deeper
Net2DeeperNet enables adding new layers to a network without altering its behavior. This is done by inserting identity mappings between layers, ensuring the output remains unchanged. This transformation is especially effective with ReLU-based architectures and convolutional networks, where identity filters can be used. The deeper student network can then be trained further, benefiting from its increased representational power without a slow warm‑up phase.
5. Practical Impact and Experimental Results
Experiments conducted on Inception networks trained on ImageNet show that both Net2WiderNet and Net2DeeperNet significantly accelerate convergence compared to standard initialization techniques. Importantly, the final accuracy of models trained using Net2Net matches or exceeds that of traditionally trained networks, confirming that no performance is sacrificed for speed.
6. Broader Implications
Net2Net is particularly valuable in real‑world and lifelong learning scenarios, where model complexity needs to grow over time as more data becomes available. The method allows practitioners to scale models efficiently and safely, making continuous model evolution more practical and cost‑effective.

© Image. https://medium.com/smash-those-papers/net2net-explained-7d0321f03594
Resources:
Net2Net: Accelerating Learning via Knowledge Transfer
@Yolanda Muriel 