Katnoria.com

In my post on image captioning and visualising decoding algorithms, we used beam search as one of the decoding algorithm to generate captions. In both posts, we only saw the end result i.e the caption generated by the algorithm but we did not discuss the impact of various algorithm specific parameters on the end result. This page is my attempt to address that by showing the impact of following parameters on captions generated using beam search decoder:

Beam Width: How many words to keep track of at every step
Max Hypotheses: What is the maximum number of hypotheses after which the algorithm stops
Max Steps: What is the maximum number of steps after which the algorithm stops

Overall, I found that increasing the beam width and max hypotheses tend to generate better captions. This makes sense because as we increase the beam width, we keep track of more words at every step and increase the chances of finding a better caption. Although for some images high beam width seem to generate unfinished captions 🤔. Similarly, as we increase the max hypotheses we relax the stopping criteria and let the model see more of potential candidates (hypotheses). The max steps does not have any impact on the caption as long as we keep it sufficiently large enough (e.g 32 and above).

Beam Search: Effect of Parameters

28 July, 2019