Author : Niveditha Kumaran 1
Date of Publication :9th August 2017
Abstract: — Batch normalization is a boon to the training of a deep neural network. It acts as a panacea to the problem of internal covariate shift and facilitates the usage of higher learning rates. It also accounts for the inclusion of saturating non-linear functions, while excluding the need of drop outs for regularisation. However, mini-batch normalization is not self-sufficient and comes with a few limitations such as inability to deal with non-i.i.d inputs and decreased efficiency with a batch size of one. In this paper, we explore normalization, the need for its optimization, and evaluate the optimization technique provided by researchers.
Reference :
-
- TheSutskever, Ilya, Martens, James, Dahl, George E., and Hinton, Geoffrey E. On the importance of initialization and momentum in deep learning. In ICML(3), volume 28 of JMLR Proceedings, pp. 1139–1147.JMLR.org, 2013.
- Shimodaira, Hidetoshi. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference,90(2):227–244, October 2000.
- Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex,Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout:A simple way to prevent neural networks from overfitting.J. Mach. Learn. Res., 15(1):1929–1958, January 2014.
- LeCun, Y., Bottou, L., Orr, G., and Muller, K. Efficient backprop. In Orr, G. and K., Muller (eds.), Neural Networks:Tricks of the trade. Springer, 1998b.
- S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conferenceon Machine Learning (ICML-15), pages 448–456, 2015.
- J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood components analysis. In Advances in Neural Information Processing Systems 17, 2004.
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna.Rethinking the inception architecture for computer vision.InProceedings of the IEEE Conference on ComputerVision and Pattern Recognition, pages 2818–2826, 2016.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge, 2014.
- S. Ioffe. Batch renormalization: Towards reducing minibatch dependence in batch-normalized models. arXiv preprint arXiv:1702.03275, 2017
- J. Chen, R. Monga, S. Bengio, and R. Jozefowicz. Revisiting distributed synchronous sgd. arXiv preprint arXiv:1604.00981, 2016.
- http://https: // gab41.lab41.org /batch-normalizationwhat the-hey- d480039a9e3b
- http://cs231n.github.io/optimization-1/ https :// www . semantic scholar .org/ paper / Image Netpre- trained- models- with – batch – normalizatiSimon Rodner / 1d5fe8230 3712a7 0c1d231 ead2ee0 3f042d8ad70