Beam Search: Effect of Parameters

28 July, 2019

In my post on image captioning and visualising decoding algorithms, we used beam search as one of the decoding algorithm to generate captions. In both posts, we only saw the end result i.e the caption generated by the algorithm but we did not discuss the impact of various algorithm specific parameters on the end result.

This page demonstrates the impact of following parameters on captions generated using beam search decoder:

  • Beam Width: How many words to keep track of at every step
  • Max Hypotheses: What is the maximum number of hypotheses after which the algorithm stops
  • Max Steps: What is the maximum number of steps after which the algorithm stops
G26A3224.jpg
G26A3512.jpg
G26A4450.jpg
IMG_0445.jpg
IMG_0528.jpg
IMG_0911.jpeg
IMG_1511.jpg
IMG_1776.jpg
IMG_3713.png
IMG_6249.jpg
IMG_6473.jpg
IMG_8130.jpg
L1000103.jpg
L1000619.jpg
L1000634.jpg
car.jpg

Click on an image to select it

Beam Width
Max Hypotheses
Max Steps

Overall, I found that increasing the beam width and max hypotheses tend to generate better captions. This makes sense because as we increase the beam width, we keep track of more words at every step and increase the chances of finding a better caption. Although for some images high beam width seem to generate unfinished captions.

Similarly, as we increase the max hypotheses we relax the stopping criteria and let the model see more potential candidates (hypotheses). The max steps does not have any impact on the caption as long as we keep it sufficiently large enough (e.g 32 and above).