On this chapter you will learn about the googleNet (Winning architecture on ImageNet 2014) and it's inception layers.
googleNet has 22 layer, and almost 12x less parameters (So faster and less then Alexnet and much more accurate.
Their idea was to make a model that also could be used on a smart-phone (Keep calculation budget around 1.5 billion multiply-adds on prediction).
The idea of the inception layer is to cover a bigger area, but also keep a fine resolution for small information on the images. So the idea is to convolve in parallel different sizes from the most accurate detailing (1x1) to a bigger one (5x5).
The idea is that a series of gabor filters with different sizes, will handle better multiple objects scales. With the advantage that all filters on the inception layer are learnable.
The most straightforward way to improve performance on deep learning is to use more layers and more data, googleNet use 9 inception modules. The problem is that more parameters also means that your model is more prone to overfit. So to avoid a parameter explosion on the inception layers, all bottleneck techniques are exploited.
Using the bottleneck approaches we can rebuild the inception module with more non-linearities and less parameters. Also a max pooling layer is added to summarize the content of the previous layer. All the results are concatenated one after the other, and given to the next layer.
Bellow we present 2 inception layers on cascade from the original googleNet.