Discriminator Model
In the BoW Model, we first train a Linear Discriminator over the Large Language Model called as ClassifierHead, to classify wanted vs unwanted class. This ClassifierHead is then used to calculate gradients against the crossentropy loss for p(a/x) where a is the desired attribute.
In PPLM.jl, the ClassifierHead is defined as a struct:
struct ClassifierHead
linear_layer::Dense
embed_size::Int
class_size::Int
end
You can load a ClassifierHead with any of the following Methods:
#Method 1: load a pretrained model
classifier, config_metadata = PPLM.ClassifierHead(;load_from_pretrained=true, discrim="toxicity")
#Method 2: load a custom trained mode
classifier, config_metadata = PPLM.ClassifierHead(;load_from_pretrained=true, path="./pretrained/custom_model.bson")
#Method 3: Intiate a random Classifier Layer
classifier, _ = PPLM.ClassifierHead(;load_from_pretrained=true, discrim="toxicity")
Let's delve into an example of PPLM based generation with Discriminator Model.
First, let's load the package and model:
using PPLM
tokenizer, model = PPLM.get_gpt2();
Prompt: Do I look like I give a
Perturb Hidden State
Perturbation of hidden states can be done similar to the given example
args = PPLM.pplm(method="Discrim", perturb="hidden", discrim="toxicity", target_class_id=1, stepsize=0.008, fusion_kl_scale=0.05);
PPLM.sample_pplm(args; tokenizer=tokenizer, model=model, prompt="Do I look like I give a")
Another more crude way of generation could be:
input_ = [tokenizer.eos_token_id; tokenizer("Do I look like I give a")]
args = PPLM.pplm(method="Discrim", perturb="hidden", discrim="toxicity", target_class_id=1, stepsize=0.008, fusion_kl_scale=0.05);
for i in 1:100
input_ids = reshape(input_[:], :, 1) |> PPLM.gpu
outputs = model(input_ids; output_attentions=false,
output_hidden_states=true,
use_cache=false);
original_logits = outputs.logits[:, end, 1]
original_probs = PPLM.temp_softmax(original_logits; t=args.temperature)
hidden = outputs.hidden_states[end]
modified_hidden = PPLM.perturb_hidden_discrim(hidden, model, tokenizer, args)
pert_logits = model.lm_head(modified_hidden)[:, end, 1]
pert_probs = PPLM.temp_softmax(pert_logits; t=args.temperature)
gm_scale = args.fusion_gm_scale
pert_probs = Float32.((original_probs.^(1-gm_scale)).*(pert_probs.^(gm_scale))) |> cpu
new_token = PPLM.top_k_sample(pert_probs; k=args.top_k)[1]
push!(input_, new_token)
end
text = detokenize(tokenizer, input_)
Sample generation:
"Do I look like I give a damn? I want to be a nice person who treats my colleagues and even friends
like people.\n\nFor one thing, it takes time for me and others to really consider and think about
your value. In the past, I often felt uncomfortable working with people who thought my interests,
opinions and interests were different, and didn't have the emotional and spiritual value to interact
with them. I didn't feel like they wanted me to speak to their views. So I started getting involved
on many other topics"
Perturb Past Key Values
Perturbation of hidden states can be done similar to the given example
args = PPLM.pplm(method="Discrim", perturb="past", discrim="toxicity", target_class_id=1, stepsize=0.004, fusion_kl_scale=0.05);
PPLM.sample_pplm(args; tokenizer=tokenizer, model=model, prompt="Do I look like I give a")
Another more crude way of generation could be:
input_ = [tokenizer.eos_token_id; tokenizer("Do I look like I give a")]
args = PPLM.pplm(method="Discrim", perturb="past", discrim="toxicity", target_class_id=1, stepsize=0.008, fusion_kl_scale=0.05);
for i in 1:100
input_ids = reshape(input_[:], :, 1) |> PPLM.gpu
inp = input_ids[1:end-1, :]
prev = input_ids[end:end, :]
outputs = model(inp; output_attentions=false,
output_hidden_states=true,
use_cache=true);
past = outputs.past_key_values;
original_logits = outputs.logits[:, end, 1]
original_probs = PPLM.temp_softmax(original_logits; t=args.temperature)
new_past = PPLM.perturb_past_discrim(model, prev, past, original_probs, args)
output_new = model(prev; past_key_values=new_past,
output_attentions=false,
output_hidden_states=true,
use_cache=true);
pert_logits = output_new.logits[:, end, 1]
pert_probs = PPLM.temp_softmax(pert_logits; t=args.temperature)
#print(sum(pert_probs.-original_probs))
gm_scale = args.fusion_gm_scale
pert_probs = Float32.((original_probs.^(1-gm_scale)).*(pert_probs.^(gm_scale))) |> cpu
new_token = PPLM.top_k_sample(pert_probs; k=args.top_k)[1]
push!(input_, new_token)
end
text = detokenize(tokenizer, input_)
Sample generation:
"Do I look like I give a proper treatment to these people? We're seeing real examples in all the
things that they have done as well. There is going to be a discussion on there with the state of
what steps we should be taking to address all cases of people in the community, and then what we
are going to do going forward that has not a national interest interest. Is your experience with
similar issues from different different sides affected your work/responsibility of not doing that
things you find seem quite simple, at first glance?"
Load Custom Model
You can use your own custom train Model (suppose saved at path=path
) using the following:
args = PPLM.pplm(method="Discrim", discrim="custom", path=path, target_class_id=1, stepsize=0.008, fusion_kl_scale=0.05);
PPLM.sample_pplm(args; tokenizer=tokenizer, model=model, prompt="Do I look like I give a")
Note: For different Discriminator, you may need to tune some hyperparameters like stepsize
, fusion_gm_scale
etc. to get some really interesting results. Will add more details on it later. Also note that in first iteration, it usually takes more time to evaluate the gradients but becomes fast in consecutive passes.