GPT2: Tokenization and Generation
PPLM.jl comes along with support of GPT2 tokenizer and Generation. Here is an example of how you can use this feature of PPLM.jl package for tokenization and normal generation
PPLM.jl allows users to load pre-trained GPT2 tokenizers based on BytePairEncoding.jl and Transformers.jl, which can then be used to tokenize/encode English text with a single line of code. The tokenizer is implemented with the following structure:
abstract type GPT2 <: PretrainedTokenizer end
struct GPT2Tokenizer <: GPT2
vocab::Dict{String, Any}
Example of Tokenization
Let's see how you can tokenize text with PPLM.
# Load Tokenizer
using PPLM
tokenizer = PPLM.load_pretrained_tokenizer(GPT2)
sentence = "This is an example of Tokenization"
Once, you have loaded your tokenizer, one can use either of the methods:
tokens = tokenizer(sentence)
# or
tokens = encode(tokenizer, sentence)
It will return the following output:
7-element Vector{Int64}:
Now you have your list of tokens. Suppose you want to get back your sentence. This can be done in two ways:
# Firsth Method:
sentence = detokenize(tokenizer, tokens)
# Second Method:
decoded_tokens_list = decode(tokenizer, tokens)
# returns vector: ["This", "Ġis", "Ġan", "Ġexample", "Ġof", "ĠToken", "ization"]
sentence = detokenizer(tokenizer, decoded_tokens_list)
You will get back your original sentence This is an example of Tokenization
Generation : Normal Text
PPLM.jl can be used to generate normal (unperturbed) text with the GPT2 model, with any of the two sampling methods top_k
and nucleus
To generate text, you can use the following code:
sample_normal(;primer="Fruits are", tokenizer=tokenizer, model=model, method="top_k")
Here is a Sample text generated with GPT2 using the above code:
With Top_k sampling, k=50, prompt = "Fruits are"
"Fruits are the key ingredient in our diet; their vitamins, and proteins are essential to build our immune system.
What makes a good fruit one of them is simply as simple as your diet is. Fruit is one simple nutrient that is used
effectively as a defense against sickness and stress (which can be very life changing indeed). When the body has
just consumed enough fat for at least 40-50 days, the body also releases hormones known as the hormone estrogen in
order to prevent infection. A good diet makes life easier"
With Nucleus sampling, p=0.6, prompt = "Fruits are"
"Fruits are packed with the goodness of ancient Greek life, plants that protect and revive us from death. At every
stone in the garden, your fruit may reflect on the people who once carried you from town to town, those who would
still give you food to live, and the perfect pair of hand-gloved fingers you may wear in your golden bedroll."