Word Tree: Transformers Edition

Desktop Version | 13 July, 2021

Trajectories are intriguing

Background

The first version of Word Tree visualized the model trained as part of Neural Text Generation post. It showed the words sampled from a set of small RNN based models trained on various corpora such as Asimov, Jokes, etc. The models were trained along the lines of Karpathy's post on Unreasonable Effectiveness of Recurrent Neural Networks must-read

The demo model used in the neural text generation post was very limiting in terms of their generative capabilities, i.e., consistently producing meaningful suggestions. Nonetheless, it was fascinating to see it uncover the structure of the language. The motivation behind the demo app was to try and create a specialized version of the model that could provide task or domain-specific suggestions. For example, a model to assist in writing jokes or, to auto complete science fiction stories in the style of great Isaac Asimov.

The model architecture will be considered infant compared to the transformers era in terms of the number of parameters, model capacity, and training costs.
2017: Neural Text Generation Demo Web App (Click Here)

About The Video

The tree in the video visualizes a few sampled trajectories generated using a language model. We use nucleus sampling to sample the model predictions and generate the sentence. The previous version of word tree used beam search. In general, nucleus sampling tends to generate diverse as well as better quality text.

By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.

Source: The Curious Case of Neural Text Degeneration (i.e. The Nucleus Sampling Paper)

See Also

How does it work?

At the very high level, this is how it works:

  1. Split sentences into individual sentence
  2. Split each sentence into words
  3. For every word in a sentence, concatenate with previous words and pass through the language model
  4. Decode the model predictions to generate a complete sentence
Word Tree Flow

TODO: Add animated version with an example

Points worth mentioning:

  • ⇢ Why aren't we passing the first two words to the model? Well, technically, we could but having more context helps the model provide better predictions.
  • ⇢ We can swap out the language model with any domain-centric or next state-of-the-art model
  • ⇢ We can swap the decoding algorithm as well
  • ⇢ What do you mean by trajectory? The trajectory here is the path taken by the sentence. Each trajectory is created by repeatedly sampling the next word until the stopping criteria is met.

Plots

Why does the sun go on shining?
Speed:
Why does the sundo thisWhy does the sunbother youWhy does the suntick to our tWhy does the sundimWhy does the sungo up and down onthe Earth andmakes the Earthlook differentWhy does the sungo down to Earthand not cancel outor eclipse events"Why does the sungo downWhy does the sungo down every fewhundred years inthis hemisphere inwinter or is itWhy does the sungo on oneverythingWhy does the sungo on to fall in12,000 yearsWhy does the sungo on rotation innature, throughvegetation orthrough the earthWhy does the sungo on in the skywhen a baby wakesupWhydoesthesungoonshining?
Why does the sea rush to shore?
Speed:
Why does thesea-level risemean that wecannot createpower now, nowWhy does the seareally have to eatthe zombiesWhy does the seasnake possiblyapproach ourplanet by seaWhy does the seagod have the rightto honor your godsand sing yourpraisesWhy does the searush to the Gulfof MexicoWhy does the searush so fastWhy does the searush into thispicture of drysandy land andturn the earthWhy does the searush throughOklahoma City anyfaster than itrushes through aWhy does the searush to this typeof ruin every weekWhy does the searush to judgmentfor mistakes liketheseWhy does the searush to replaceits sea-cloudedshoresWhy does the searush to claim thesecurity of theUnited StatesWhydoesthesearushtoshore?
Don't they know it's the end of the world?
Speed:
Don't they knowit's belowDon't they knowit's possible thatit was anythingbut more mundaneDon't they knowit's really a lotharder to find aguy who can starthis own businessDon't they knowit's a fact that,after nearly 40years since theirpeak, hunters areDon't they knowit's the end ofthe worldDon't they knowit's the end ofthe world, justlike everybodyelseDon't they knowit's the end ofthe year whenyou're still goingto see TiffanyDon't they knowit's the end ofthe world andafter that thosebillions are goneDon't they knowit's the end ofthe worldDon't they knowit's the end ofthe world!" KuoSon mumbled outDon't they knowit's the endDon't they knowit's the end ofOctoberDon't they knowit's the end ofyour dayDon't they knowit's the end ofthe summerDon't they knowit's the end ofthe worldDon't they knowit's the end ofthe lineDon'ttheyknowit'stheendofworld?
'Cause you don't love me any more
Speed:
Cause you don'tlove itCause you don'tlove it," MrCause you don'tlove that dog, anddo you take pityon him, or do youteach him theCause you don'tlove to drive inthe first placeCause you don'tlove me, then youshould hate meCause you don'tlove meCause you don'tlove me do youlove me do youlove me love me doyou love me do youCause you don'tlove me in thestart, though Iknew you loved mein the end, rightCause you don'tlove me any morethan you love meCause you don'tlove me any more,baby"Cause you don'tlove me any moreCause you don'tlove me any moreCauseyoudon'tlovemeanymore

Interestingly, the trajectories in last plot mostly converge to the same word. P(more | you don't love me any) is most likely going be higher than other words in the vocabulary. We also see one of the most common issues with decoding algorithms where they are stuck repeating themselves. And, some of the sentences will not make sense.

The following video is another take on the same song but this time we use image generation model (CLIP + GAN) to generate the background images. Each image was generated based on the sentence you see on screen.

Updated Video: Word Tree + Generative Image Model

Both the videos were created by manually aligning the song to the tree animation. What will be cool? Automatically aligning the audio and animation using word boundary detection.

Auto alignment

Or do something even cooler:

Automated Word Tree

But I am going to stop here. Hoping to see it in action some day.

Speed:
Hoping to see itin action sometime laterHoping to see itin action some daythoughHoping to see itin action sometime soon, thefilm opens withsomeone, The Duke,Hoping to see itin action some dayso we can progressforward withoutthe suffering ofHopingtoseeitinactionsomeday

Trajectories are intriguing

Of many choices available, only one that ever gets chosen. The rest become could've, should've, or would've. Every step, every moment, every decision presents many possible paths - infinitely branching. Some more likely than the others. But one must choose.

Trajectories are fascinating

No language models were used to write this one

If you wish to explore more about sampling trajectories:


Up Next

Back to the demo app that has been in works for a few months now

Image Explorer