The GDELT Project

Generative AI Experiments: Comparing Google's Bison, Unicorn & Gemini Pro Models For Summarization

The majority of our past LLM experimentation with GCP models have been with the PaLM Bison model, since this was the original model offered in Vertex AI's generative text collection. With the availability now of Bison, Unicorn and Gemini Pro, how do the three compare on a basic summarization task? We'll use an NBC News article from earlier this year about a vibriosis outbreak and ask each of the three models to summarize it under various temperature conditions.

The end result is that all three models plagiarize the opening words of the original source text, even Gemini Pro, across all temperature settings except Unicorn at its maximal setting. Remarkably, the previous high divergence across outputs seen at maximal model temperature has all but vanished, with Bison, Unicorn and Gemini Pro all exhibiting strikingly little difference between each run across deterministic, low and maximal temperature settings. Whereas in the past, setting the temperature to 0.99 for Bison would result in massive differences between outputs, temperature now is almost a noop across the three models, with creativity apparently now relegated to the prompt rather than parameter settings.

Surprisingly, Gemini Pro did not yield anecdotally better results than Unicorn or even Bison, with all three models producing relatively similar results. Bison actually incorporated more useable detail into its outputs than Gemini Pro, suggesting it may be a better fit for news summarization where detail preservation is important. However, no prompt experimentation was performed here to test whether Gemini Pro could be nudged to produce similarly detailed results, as it may be that the models have simply been tuned to move "creativity" from being an API parameter setting to being more of a prompt setting.

The actual API submissions can be seen at the end of this post.

BISON

We'll start with the standard Bison model, which is GCP's general-purpose LLM model. Let's start with a temperature of 0.0 to test its deterministic response. We can see that the model response is truncated:

Let's try the formerly recommended temperature of 0.2. All three responses are extremely similar, which is the typical behavior for this temperature in past model iterations:

Now let's test its creativity with a temperature of 0.99. Strangely, despite the highest possible temperature setting, all three responses are extremely similar, whereas in the past they would show considerable divergence. This suggests that the Bison model has been tuned in recent iterations towards greater determinism even in its most "creative" responses:

UNICORN

Now let's test Unicorn's deterministic (temperature 0.0) response. Unlike Bison, this version does not truncate its output:

Now we'll try a temperature of 0.2:

And finally we'll test its maximal creativity with temperature 0.99. As with Bison, the results are remarkably consistent and do not reflect the previous high divergence observed with maximal temperature settings:

GEMINI PRO

Finally, lets test GCP's newest model, Gemini Pro. Let's start with its deterministic response (0.0 temperature):

Now we'll test a 0.2 temperature. This yields unchanged text each time:

How about a temperature of 0.99? Once again, the results are unchanged across all three runs:

TECHNICAL DETAILS

Both Bison and Unicorn use the same API parameters and invocation workflow, differing only in the model name.

Bison

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/[YOURPROJECTID]/locations/us-central1/publishers/google/models/text-bison:predict -d \
$'{
  "instances": [
    { "prompt": "Summarize this news article. NEWS ARTICLE: [ARTICLE FULLTEXT] "}
  ],
  "parameters": {
    "temperature": 0.99,
    "maxOutputTokens": 256,
    "topK": 40,
    "topP": 0.95
  }
}'

Unicorn

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/[YOURPROJECTID]/locations/us-central1/publishers/google/models/text-unicorn:predict -d \
$'{
  "instances": [
    { "prompt": "Summarize this news article. NEWS ARTICLE: [ARTICLE FULLTEXT] "}
  ],
  "parameters": {
    "temperature": 0.99,
    "maxOutputTokens": 256,
    "topK": 40,
    "topP": 0.95
  }
}'

In contrast, Gemini Pro utilizes an entirely new API parameter set and returns streaming results that require an additional post processing stage to reassemble for display. Here we write the results to a file on disk and then pipe through jq to parse out the streaming results, then pass through tr to remove the carriage returns to reassemble into a single block of text:

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/[YOURPROJECTID]/locations/us-central1/publishers/google/models/gemini-pro:streamGenerateContent -d \
$'{
  "contents": {
    "role": "user",
    "parts": { "text": "Summarize this news article. NEWS ARTICLE: [ARTICLE FULLTEXT]" },
  },
  "safety_settings": {
    "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
    "threshold": "BLOCK_NONE"
  },
  "generation_config": {
    "temperature": 0.0,
    "maxOutputTokens": 256,
    "topK": 40,
    "topP": 0.95
  }
}' > O; cat O | jq -r .[].candidates[].content.parts[].text | tr '\n' ' '