The GDELT Project

Experiments With Google's PaLM 2 LLM's For Embedding, Summarization, Event & Relation Extraction, NER & Q&A: Bison & Gecko

What does it look like to summarize a television evening news broadcast using Google's PaLM 2 large language model (LLM)? Let's explore both embedding and summarization tasks using Vertex AI's PaLM API models text-bison and textembedding-gecko.

We'll start with a CBC Evening News broadcast from July 5th that includes this synopsis of the story:

jericka: good evening to our viewers in the west, and thank you for joining us this wednesday night. i'm jericka duncan in for norah. we begin with the secret service leading an investigation into who brought cocaine into the white house. the surprising discovery of a bag of suspicious white powder initially caused a panic leading to the temporary closure of the building on sunday. the illegal drug was found near a visitors entrance for those taking a tour of the west wing. today lab tests came back positive for cocaine. president biden and the first family were at camp david at the time. cbs's weijia jiang starts us off from the white house. good evening. >> reporter: good evening, jericka. the white house says it's very important to president biden to get to the bottom of who brough% that illegal drug inside. but officials cautioned this could be a challenging investigation. tonight the leading theory is that someone on a white house tour may have left a small bag of cocaine in the west wing. according to senior law enforcement officials. the discovery sunday nights put the white house on temporary lockdown until the substance was deemed 'not a threat.' the bag was discovered on the ground floor in a common area near the entry to the west wing. not far from the situation room. it was close to a set of storage cubbies where visitors leave their cell p phones duringng a . the secret service is leading the investigation looking at visitor logs and camera surveillance. they are camera systems they will go back as a part of the investigation and research. not every camera is trained on every body at every minute. so there will always be a possibility that this cocould go unanswered. reporteter: white house visitorsrs undergo background checks and go through magnetometers on site. but the secret service does not have k-9 units that target illegal drugs. instead, the dog search for explosives, biohazardous materials, and weapons. >> this episode shines a light on the fact that you can bring in illegal substances to the white house, so what is preventing a visitor from bringing an anthrax or something that is not magnetic into the white house? i totally understand the question, but it is under investigation. we will let the secret service do their job. i'm not going to get ahead of if or whens or changes. reporter: senator tom cotton, a republican from arkansas sent a letter to the director of the secret service to say that congress needs to know if the white house complex is not secure. and he asked for a plan to correct any security flaws. jericka: weijia jiang at the white house, thank you.

First, let's explore how to construct an embedding of this text using text-embedding-gecko:

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/[YOURPROJECTID]/locations/us-central1/publishers/google/models/textembedding-gecko:predict -d \
$'{
  "instances": [
    { "content": "jericka: good evening to our viewers in the west, and thank you for joining us this wednesday night. i\'m jericka duncan in for norah. we begin with the secret service leading an investigation into who brought cocaine into the white house.  the surprising discovery of a bag of suspicious white powder initially caused a panic leading to the temporary closure of the building on sunday. the illegal drug was found near a visitors entrance for those taking a tour of the west wing. today lab tests came back positive for cocaine. cbs\'s weijia jiang starts us off from the white house. good evening. >> reporter: good evening, jericka. the white house says it\'s very important to president biden to get to the bottom of who brough% that illegal drug inside. but officials cautioned this could be a challenging investigation. tonight the leading theory is that someone on a white house tour may have left a small bag of cocaine in the west wing. according to senior law enforcement officials. the discovery sunday nights put the white house on temporary lockdown until the substance was deemed \'not a threat.\'president biden and the first family were at camp david at the time. the bag was discovered on the ground floor in a common area near the entry to the west wing. not far from the situation room. it was close to a set of storage cubbies where visitors leave their cell p phones duringng a . the secret service is leading the investigation looking at visitor logs and camera surveillance. they are camera systems they will go back as a part of the investigation and research. not every camera is trained on every body at every minute. so there will always be a possibility that this cocould go unanswered. reporteter: white house visitorsrs undergo background checks and go through magnetometers on site. but the secret service does not have k-9 units that target illegal drugs. instead, the dog search for explosives, biohazardous materials, and weapons. >> this episode shines a light on the fact that you can bring in illegal substances to the white house, so what is preventing a visitor from bringing an anthrax or something that is not magnetic into the white house? i totally understand the question, but it is under investigation. we will let the secret service do their job. i\'m not going to get ahead of if or whens or changes. reporter: senator tom cotton, a republican from arkansas sent a letter to the director of the secret service to say that congress needs to know if the white house complex is not secure. and he asked for a plan to correct any security flaws. jericka: weijia jiang at the white house, thank you."}
  ]
}'

This yields a response with the number of processed tokens, whether the input was truncated to fit Gecko's input limit of 3,072 tokens and the final 768-dimension embedding vector:

{
  "predictions": [
    {
      "embeddings": {
        "statistics": {
          "token_count": 620,
          "truncated": false
        },
        "values": [ -0.026316206902265549, -0.00928876455873251, -0.013688338920474052, 0.021982554346323013, 0.0097444616258144379, -0.015772707760334015, 0.004138801246881485, -0.0075818272307515144, 0.016171032562851906, 0.028827827423810959, 0.05103302001953125, 0.029937701299786568, 0.038068559020757675, 0.0061067845672369, -0.018965898081660271, -0.0256775114685297, -0.091803833842277527, -0.034031633287668228, -0.0060932785272598267, 0.014593350701034069, -0.066603764891624451, -0.0095631163567304611, 0.024024041369557381, -0.0079607218503952026, -0.005145183764398098, -0.092862524092197418, -0.0012368297902867198, 0.011701472103595734, -0.00325111486017704, 0.044185217469930649, -0.029273299500346184, 0.0520230270922184, -0.014012008905410767, -0.020314397290349007, 0.046527337282896042, 0.033383198082447052, 0.0048452704213559628, 0.022460045292973518, -0.0012309200828894973, 0.0677514299750328, 0.018747150897979736, -0.0035755180288106203, 0.037990573793649673, -0.014407480135560036, -0.0011932635679841042, 0.033241834491491318, -0.053499162197113037, 0.080047480762004852, -0.054175291210412979, -0.037399996072053909, -0.0080034369602799416, -0.030808413401246071, 0.0092976866289973259, 0.0448807030916214, 0.0043153264559805393, 0.017457028850913048, -0.050193365663290024, -0.022531535476446152, -0.013204140588641167, 0.01885344460606575, 0.00574616901576519, -0.045513268560171127, 0.0066943424753844738, -0.0712982714176178, -0.027345221489667892, 0.042248494923114777, 0.0040224445983767509, 0.013439870439469814, -0.015983225777745247, -0.03760816901922226, 0.05191253125667572, 0.0350683219730854, 0.0050177471712231636, -0.016289044171571732, 0.048790067434310913, 0.0066572064533829689, -0.0019017370650544763, 0.025467976927757263, 0.057025041431188583, -0.050553541630506516, -0.0070882458239793777, -0.053437814116477966, -0.046275746077299118, -0.13421690464019775, -0.020731303840875626, 0.086676493287086487, 0.015039863064885139, 0.014777365140616894, -0.03055473230779171, 0.050682112574577332, -0.035149849951267242, 0.025798056274652481, 0.012301063165068626, 0.0062756882980465889, -0.064990691840648651, 0.013834938406944275, -0.051995061337947845, -0.00079759990330785513, 0.0066644442267715931, 0.012001706287264824, -0.0079390192404389381, -0.063710562884807587, 0.044366545975208282, -0.016902312636375427, 0.0428246334195137, 0.018548643216490746, 0.01926993764936924, 0.048384230583906174, -0.0011376551119610667, -0.07110295444726944, -0.086533837020397186, 0.00812441110610962, -0.0458795428276062, 0.023786090314388275, -0.0039624534547328949, -0.035696256905794144, 0.0023529164027422667, -0.001919868984259665, -0.010612469166517258, 0.0032412649597972631, -0.051447264850139618, 0.0058120153844356537, 0.00781091395765543, -0.0017323935171589255, -0.049268078058958054, 0.024353422224521637, 0.028342736884951591, -0.034602772444486618, -0.034199949353933334, 0.015881473198533058, 0.035613395273685455, 0.004543862771242857, 0.07088296115398407, -0.039463784545660019, 0.046701215207576752, 0.01532874908298254, -0.022417116910219193, -0.0003880842705257237, -0.0066897408105432987, 0.019256442785263062, -0.068573862314224243, -0.00067667977418750525, -0.021355381235480309, -0.0098667414858937263, 0.00028396613197401166, -0.010659218765795231, 0.030645176768302917, -0.050831831991672516, -0.022481508553028107, -0.039617203176021576, 0.015442410483956337, -0.013325230218470097, 0.016390321776270866, 0.033645197749137878, 0.08762369304895401, 0.036494847387075424, 0.058092851191759109, 0.075144916772842407, -0.020085778087377548, -0.058286841958761215, 0.021364128217101097, -0.020230172201991081, -0.038337793201208115, -0.03770042210817337, 0.048687223345041275, 0.018671782687306404, 0.033665869385004044, 0.00977856945246458, -0.013594646938145161, 0.0052350247278809547, 0.05435759574174881, -0.15027342736721039, 0.037088572978973389, -0.034370757639408112, 0.094945646822452545, 0.023252522572875023, -0.0043162712827324867, 0.0068887979723513126, 0.018356040120124817, -0.013730751350522041, -0.035038098692893982, -0.10446758568286896, -0.005304933525621891, 0.0019772211089730263, -0.066111944615840912, -0.022060751914978027, 0.017511617392301559, -0.031518656760454178, -0.047460515052080154, 0.054819650948047638, -0.018583152443170547, 0.029078252613544464, 0.014650802128016949, -0.056383751332759857, -0.0043354369699954987, 0.011964770033955574, 0.04107024148106575, -0.13784731924533844, -0.027608023956418037, 0.033411502838134766, -0.044444672763347626, -0.044441241770982742, -0.019240120425820351, 0.044814370572566986, -0.045749519020318985, -0.010699816979467869, -0.039257779717445374, 0.025641884654760361, 0.010015186853706837, 0.0185274425894022, -0.0055958293378353119, -0.021514885127544403, 0.0082529652863740921, 0.039269499480724335, -0.0078581469133496284, -0.038671988993883133, -0.015580939128994942, -0.074905931949615479, -0.044650718569755554, 0.039278872311115265, -0.0047844992950558662, -0.0072679305449128151, 0.061034247279167175, 0.0791454166173935, 0.002983615268021822, 0.026874322444200516, 0.039715621620416641, -0.0105104586109519, 0.0093197207897901535, 0.024982800707221031, -0.012086464092135429, 0.00052393384976312518, -0.044072277843952179, -0.0088321603834629059, 0.00094110809732228518, 0.044567491859197617, -0.00079117628047242761, 0.074029959738254547, -0.071012064814567566, -0.0039413967169821262, 0.038064368069171906, 0.043769285082817078, -0.044775538146495819, 0.028004949912428856, -0.010611819103360176, -0.004892522469162941, 0.020269447937607765, 0.024363018572330475, 0.052664425224065781, -0.0069154165685176849, 0.066945686936378479, -0.071817919611930847, -0.017478758469223976, 0.018190000206232071, 0.03751499205827713, -0.0086314696818590164, 0.025379018858075142, 0.056897521018981934, -0.054873146116733551, 0.0077923806384205818, 0.0058024311438202858, -0.020971076563000679, 0.0391705147922039, 0.12995195388793945, -0.004607341717928648, -0.047103539109230042, -0.078285306692123413, -0.014779404737055302, 0.0449865348637104, -0.0495770089328289, -0.027704590931534767, 0.0059334421530365944, 0.060318492352962494, 0.05264587327837944, -0.018132075667381287, 0.0069052176550030708, -0.063973180949687958, 0.019628074020147324, 0.018511399626731873, 0.042460836470127106, 0.0277895238250494, -0.032415885478258133, -3.78041913791094e-05, 0.051172856241464615, -0.018370026722550392, 0.036760680377483368, -0.0230995062738657, 0.034480251371860504, -0.042489845305681229, -0.014464761130511761, 0.033649202436208725, -0.039281737059354782, -0.033192023634910583, 0.073097392916679382, -0.00023193511879071593, -0.003101282985880971, 0.035935819149017334, -0.019467530772089958, -0.031026003882288933, 0.045379795134067535, 0.0041903224773705006, -0.044047903269529343, -0.020402844995260239, -0.0504877008497715, -0.061819050461053848, -0.027720579877495766, -0.00500531168654561, -0.0069280141033232212, 0.05069950595498085, -0.021937036886811256, -1.7885344277601689e-05, -0.011793952435255051, 0.010473549365997314, 0.023273181170225143, -0.0694643035531044, 0.023775665089488029, -0.0084978053346276283, -0.0034868491347879171, -0.038604769855737686, 0.03774532675743103, -0.028512574732303619, -0.004587060771882534, 0.0929008424282074, 0.023256026208400726, -0.0063658906146883965, 0.0095173399895429611, -0.0080719469115138054, -0.0278488602489233, 0.057520207017660141, -0.029592080041766167, 0.0044944840483367443, -0.0040046311914920807, -0.051493551582098007, 0.017654145136475563, 0.0032976111397147179, 0.0026079099625349045, 0.013650619424879551, -0.0024660662747919559, -0.0023697640281170607, -0.06456109881401062, 0.024658707901835442, -0.021408997476100922, -0.0068025114014744759, -0.0066638542339205742, -0.036482363939285278, 0.019174516201019287, -0.0438554473221302, -0.021870041266083717, -0.04547392949461937, -0.038521196693181992, 0.00596650829538703, 0.030038686469197273, -0.035086750984191895, 0.04976818710565567, 0.0400061272084713, 0.0064890272915363312, -0.0063623036257922649, 0.01012021116912365, 0.035347927361726761, 0.030191557481884956, -0.011656205169856548, -0.013294038362801075, -0.015342887490987778, 0.034351147711277008, -0.0021816431544721127, 0.020229756832122803, -0.044474758207798004, 0.044439211487770081, -0.010409115813672543, -0.025947626680135727, -0.085024826228618622, -0.0020656429696828127, 0.00052231590962037444, 0.016292104497551918, -0.057914499193429947, -0.050391267985105515, -0.027483150362968445, -0.0097544686868786812, 0.0033668591640889645, 0.017049608752131462, -0.054627485573291779, 0.042636740952730179, 0.0075308582745492458, 0.020278513431549072, 0.0011057972442358732, -0.013009698130190372, 0.016123700886964798, 0.065272562205791473, -0.028781287372112274, -0.0041641448624432087, 0.044746361672878265, 0.020369136705994606, 0.037195291370153427, -0.0261179618537426, -0.015578169375658035, 0.043055444955825806, -0.025448441505432129, -0.021031482145190239, -0.040092967450618744, -0.025911903008818626, 0.048199713230133057, 0.014582139439880848, 0.0058958618901669979, -0.028741909191012383, 0.030405901372432709, 0.039635665714740753, -0.025071172043681145, 0.030906492844223976, 0.047843046486377716, -0.019424373283982277, 0.010585284791886806, -0.05459122359752655, -0.0087877418845891953, -0.026016067713499069, -0.0014721733750775456, -0.085492178797721863, 0.031729720532894135, -0.078703917562961578, -0.0027248507831245661, -0.0350271575152874, 0.051737476140260696, -0.050961028784513474, -0.10167919099330902, -0.0070392638444900513, 0.025627272203564644, 0.0074106105603277683, -0.065895773470401764, 0.011963761411607265, -0.042191382497549057, 0.02000121958553791, 0.013293313793838024, -0.022133151069283485, 0.026620637625455856, -0.0232720784842968, 0.010924641974270344, -0.040254242718219757, 0.071481935679912567, 0.066086731851100922, -0.0037087397649884224, -0.027018001303076744, -0.0081843966618180275, 0.015539648942649364, -0.0734042078256607, 0.020948104560375214, 0.0088705914095044136, -0.04675234854221344, 0.0064436499960720539, -0.0016939505003392696, -0.042131610214710236, 0.0021320984233170748, -0.048430327326059341, -0.019703345373272896, -0.023374607786536217, 0.0537690594792366, 0.033143993467092514, 0.048992723226547241, 0.021371668204665184, -0.014940503984689713, 0.056736309081315994, -0.016199393197894096, -0.029217114672064781, 0.014478300698101521, -0.02062884159386158, 0.018699198961257935, 0.012110746465623379, -0.01791837252676487, 0.018031449988484383, 0.046827990561723709, -0.033992249518632889, 0.0211626086384058, -0.01361218374222517, 0.082904554903507233, 0.0026834416203200817, -0.0081295585259795189, 0.06623174250125885, -0.0098228035494685173, 0.011983917094767094, 0.039773993194103241, 0.02003207616508007, -0.025936976075172424, -0.0665331557393074, 0.011741439811885357, -0.004901728592813015, -0.042070217430591583, 0.020219540223479271, 0.038307908922433853, -0.0044579682871699333, -0.020297309383749962, -0.0037286004517227411, 0.025151962414383888, 0.006233209278434515, -0.017356084659695625, 0.015886928886175156, -0.027023743838071823, 0.0353916771709919, -0.032550949603319168, 0.0054831248708069324, 0.008485875092446804, 0.02822386845946312, 0.00055772654013708234, 0.035904359072446823, 0.038642261177301407, 0.033935274928808212, -0.04867519810795784, -0.040141310542821884, -0.031909134238958359, -0.010828463360667229, -0.051429837942123413, -0.03953658789396286, -0.005867494735866785, -0.019369930028915405, 0.061740703880786896, -0.013841093517839909, -0.088703647255897522, -0.034924622625112534, -0.091603584587574, 0.016831735149025917, 0.034741189330816269, 0.014819703064858913, -0.014225361868739128, -0.0068876338191330433, -0.0198333989828825, 0.015365913510322571, 0.0083503536880016327, 0.056442447006702423, 0.056455954909324646, 0.038439355790615082, -0.029722638428211212, -0.063413433730602264, 0.011833596043288708, 0.016077077016234398, 0.03391527384519577, -0.031383827328681946, 0.040686309337615967, -0.0045986874029040337, -0.042314108461141586, 0.033530868589878082, 0.00094054639339447021, 0.032176192849874496, 0.093753546476364136, -0.014439630322158337, -0.042523600161075592, 0.0047705862671136856, 0.0030266710091382265, 0.017641626298427582, 0.013294588774442673, 0.030782707035541534, 0.044322464615106583, 0.031199635937809944, 0.052474472671747208, 0.036384563893079758, -0.047080215066671371, -0.068959988653659821, 0.060448259115219116, 0.0362711101770401, -0.0079839704558253288, 7.6323623943608254e-05, -0.043060086667537689, -0.0284821055829525, -0.012486309744417667, 0.0022430496755987406, 0.017791595309972763, -0.017464542761445045, -0.0040545891970396042, 0.055264808237552643, -0.033159296959638596, 0.048600237816572189, 0.0071537047624588013, 0.016378089785575867, 0.042574159801006317, -0.027758155018091202, -0.056894510984420776, -0.023540997877717018, 0.0074647869914770126, 0.018669456243515015, 0.001149291405454278, -0.019083635881543159, 0.076512843370437622, 0.015710553154349327, 0.0582815445959568, -0.030300052836537361, 0.034225389361381531, 0.027270868420600891, 0.023879049345850945, -0.021174229681491852, 0.026707133278250694, -0.011528563685715199, -0.050993524491786957, 0.01228194497525692, -0.049799654632806778, 0.014272252097725868, -0.063386999070644379, 0.006799464114010334, -0.0068940017372369766, -0.036047190427780151, 0.010232350789010525, -0.030707294121384621, 0.015823215246200562, 0.030295798555016518, 0.039484459906816483, 0.000905710447113961, 0.0021220427006483078, 0.078907705843448639, -0.043667729943990707, 0.0090091153979301453, 0.0010319879511371255, 0.031346507370471954, 0.012438925914466381, 0.035531342029571533, 0.0032337333541363478, 0.010663202963769436, 0.064628124237060547, 0.012772770598530769, -0.015615382231771946, -0.016807988286018372, 0.022895120084285736, -0.019653532654047012, 0.019522856920957565, 0.03982137143611908, -0.0033588006626814604, 0.031334321945905685, 0.0089674843475222588, 0.0017239715671166778, -0.06297273188829422, 0.034553438425064087, -0.010600830428302288, -0.01747843436896801, 0.0098965084180235863, -0.013531683944165707, -0.025220366194844246, -0.00737897539511323, -0.024626210331916809, -0.016799338161945343, -0.0053686108440160751, 0.017709797248244286, -0.020623026415705681, -0.038971975445747375, 0.012872497551143169, 0.02904120460152626, 0.014000239782035351, 0.0075879525393247604, 0.014748400077223778, -0.0084365503862500191, 0.0023773263674229383, 0.016774099320173264, -0.014384962618350983, 0.022928168997168541, -0.016738042235374451, -0.032256394624710083, 0.025818657130002975, -0.017969658598303795, -0.013997227884829044, 0.0029355555307120085, -0.0315256230533123, -0.026518212631344795, -0.0075670378282666206, -0.026912249624729156, -0.0027541194576770067, 0.0093572121113538742, 0.053237061947584152, 0.027376702055335045, 0.007124289870262146, 0.023848235607147217, 0.0362401157617569, -0.045349262654781342, 0.046671662479639053, -0.023667324334383011, 0.034829627722501755, 0.014288730919361115, 0.0010316941188648343, -0.05718626081943512, 0.00054579123388975859, -0.016503829509019852, -0.067113369703292847, 0.037305478006601334, -0.02717096358537674, 0.0060737114399671555, -0.023821115493774414, 0.027460051700472832, 0.028433756902813911, -0.018951106816530228, 0.025625767186284065, 0.017243430018424988, -0.0457310676574707, 0.018011104315519333, -0.071702934801578522, -0.016308693215250969, 0.0083586070686578751, -0.00789407454431057, -0.00409452011808753, 0.017870822921395302, 0.031845990568399429, 0.0016450724797323346, 0.011956182308495045, -0.029796648770570755, -0.079430453479290009, -0.01187825296074152, 0.037597853690385818, -0.013134190812706947, 0.021589752286672592, 0.042448859661817551, -0.0064895674586296082, 0.043483644723892212, 0.031352333724498749, 0.035559024661779404, -0.020559746772050858, -0.01215350441634655, 0.034788474440574646, 0.0032888427376747131, -0.011147325858473778, 0.0024599034804850817, -0.017628306522965431, 0.032043773680925369, 0.024470558390021324, 0.0004806019423995167, 0.021746734157204628, -0.046977408230304718, -0.016708966344594955, -0.0033737628255039454, 0.0019520255737006664, -0.014722972176969051, 0.031111471354961395, 0.021764641627669334, -0.0058486745692789555, -0.026641644537448883, 0.018864938989281654, 0.0082651916891336441, -0.0022936826571822166, -0.005159065593034029, 0.040802452713251114, -0.022272210568189621, -0.01438487134873867, 0.018310142681002617, 0.057113774120807648, 0.016066212207078934, -0.068376481533050537, 0.04297725111246109, -0.0067473971284925938, 0.067931763827800751, -0.049211379140615463, -0.00513458764180541, 0.051039986312389374, 0.011585981585085392, -0.043625619262456894, 0.024500312283635139, -0.012329598888754845, 0.0023764201905578375, -0.00697014294564724, -0.026742862537503242, -0.026267023757100105, -0.093272946774959564, 0.037495456635951996, -0.048684924840927124, 0.04810091108083725, 0.013280177488923073, 0.081437990069389343, 0.032810073345899582, 0.042868722230196, 0.003743504174053669, 0.10386362671852112, 0.017743712291121483, 0.038164127618074417, 0.013563415966928005, -0.0060403961688280106, -0.0039521381258964539, 0.041518833488225937, -0.022289315238595009, -0.018127968534827232 ]      }
    }
  ]
}

Now let's try Bison itself, using it to summarize the passage (parameter documentation):

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/[YOURPROJECTID]/locations/us-central1/publishers/google/models/text-bison:predict -d \
$'{
  "instances": [
    { "prompt": "Summarize this television news transcript: \\"jericka: good evening to our viewers in the west, and thank you for joining us this wednesday night. i\'m jericka duncan in for norah. we begin with the secret service leading an investigation into who brought cocaine into the white house.  the surprising discovery of a bag of suspicious white powder initially caused a panic leading to the temporary closure of the building on sunday. the illegal drug was found near a visitors entrance for those taking a tour of the west wing. today lab tests came back positive for cocaine. cbs\'s weijia jiang starts us off from the white house. good evening. >> reporter: good evening, jericka. the white house says it\'s very important to president biden to get to the bottom of who brough% that illegal drug inside. but officials cautioned this could be a challenging investigation. tonight the leading theory is that someone on a white house tour may have left a small bag of cocaine in the west wing. according to senior law enforcement officials. the discovery sunday nights put the white house on temporary lockdown until the substance was deemed \'not a threat.\'president biden and the first family were at camp david at the time. the bag was discovered on the ground floor in a common area near the entry to the west wing. not far from the situation room. it was close to a set of storage cubbies where visitors leave their cell p phones duringng a . the secret service is leading the investigation looking at visitor logs and camera surveillance. they are camera systems they will go back as a part of the investigation and research. not every camera is trained on every body at every minute. so there will always be a possibility that this cocould go unanswered. reporteter: white house visitorsrs undergo background checks and go through magnetometers on site. but the secret service does not have k-9 units that target illegal drugs. instead, the dog search for explosives, biohazardous materials, and weapons. >> this episode shines a light on the fact that you can bring in illegal substances to the white house, so what is preventing a visitor from bringing an anthrax or something that is not magnetic into the white house? i totally understand the question, but it is under investigation. we will let the secret service do their job. i\'m not going to get ahead of if or whens or changes. reporter: senator tom cotton, a republican from arkansas sent a letter to the director of the secret service to say that congress needs to know if the white house complex is not secure. and he asked for a plan to correct any security flaws. jericka: weijia jiang at the white house, thank you.\\"  "}
  ],
  "parameters": {
    "temperature": 0.2,
    "maxOutputTokens": 256,
    "topK": 40,
    "topP": 0.95
  }
}'

The complete output looks like the following:

{
  "predictions": [
    {
      "citationMetadata": {
        "citations": []
      },
      "safetyAttributes": {
        "blocked": false,
        "scores": [
          0.2,
          0.1,
          1,
          0.2,
          1,
          0.1,
          0.5,
          0.9,
          0.1,
          0.9,
          0.2,
          0.3,
          0.6
        ],
        "categories": [
          "Death, Harm & Tragedy",
          "Derogatory",
          "Firearms & Weapons",
          "Health",
          "Illicit Drugs",
          "Insult",
          "Legal",
          "Politics",
          "Profanity",
          "Public Safety",
          "Toxic",
          "Violent",
          "War & Conflict"
        ]
      },
      "content": "The Secret Service is investigating how cocaine was brought into the White House. The bag of suspicious white powder was found near a visitors entrance for those taking a tour of the west wing. The discovery on Sunday night put the White House on temporary lockdown until the substance was deemed \"not a threat.\" President Biden and the first family were at Camp David at the time. The bag was discovered on the ground floor in a common area near the entry to the west wing. It was close to a set of storage cubbies where visitors leave their cell phones during a tour. The Secret Service is leading the investigation looking at visitor logs and camera surveillance. However, not every camera is trained on every body at every minute, so there will always be a possibility that this could go unanswered. Senator Tom Cotton, a Republican from Arkansas, sent a letter to the director of the Secret Service to say that Congress needs to know if the White House complex is not secure. He asked for a plan to correct any security flaws."
    }
  ]
}

Here we can see that the response includes both the actual output of the LLM and a set of "sensitive topics" categories that help the application understand the overall sensitivity of the passage.

Running twice, here are the example outputs:

Temperature is a particularly interesting parameter, as it allows us to adjust the "creativity" of the output. Let's try a few different values:

Here we see there wasn't much of a difference. What if we double our max output tokens to give more room for the model's expressiveness, set temperature to 0.0, topK to 1 (greedy) and topP to 0.0? This should offer as close as possible to a clinical deterministic output. Let's run it three times:

Indeed, all three runs yield identical results. At the same time, this causes the model to nearly plagiarize the original text, excerpting it as-is in large chunks. This is actually the expected behavior, since we are in effect asking the model to return the most probable entailed token.

What if we attempt to maximize its creativity? We'll set temperature to 0.99, topK to 40 (its max) and topP to 0.99 and run three times:

This time the model generates different results each time, though there are still substantial similarities between them. Interestingly, every single summary generated to date, regardless of all parameter settings, has begun with "The Secret Service is investigating how cocaine was brought into the White House."

What if we use a slightly different prompt and ask it to "write an abstract for this television news transcript" using the same parameters:

And headlines ("Write a title for this television news transcript" using the same parameters):

And hashtags ("What are potential hashtags for this television news transcript" using the same parameters) (note that in the second run it did not output actual hashtags, while the hashtags in the third response are not as descriptive as the first response and lack the same explaination):

Let's revert to our original parameters (temperature: 0.2, topK: 40, topP: 0.95) and test sentiment ("What is the sentiment of this television news transcript"):

And constrain the tone response to a categorical statement ("Classify as either negative or positive the tone of this television news transcript"):

Following this JSON sentiment example, let's try a machine-readable structured response ("For this television news transcript return a JSON object that has fields sentiment and explaination. Acceptable values for sentiment are Very Positive, Positive, Neutral, Negative or Very Negative. The explanation field contains text that explains the sentiment."):

What about a machine-readable categorical example ("Return a JSON object with Category and Explanation. For Category classify the text as one of the following categories: Political, Crime, War, Fashion, Tourism. The Explanation field contains text that explains the category selection."):

Note how the more creative temperature leads to a greater variety of responses, some of which are less intuitive, such as War or Political. This shows the dangers of higher temperature settings for precision NLU tasks like categorization.

What about another categorical example, this time with three custom categories where we provide a specific definition that partially overlaps and rely on the LLM to decide the topical boundaries ("Return a JSON object with Category and Explanation. For Category classify the text as one of the following categories: Topic1 or Topic2 or Topic3. The Topic1 category is defined as relating to US domestic politics. The Topic2 category is defined as relating to the White House or the current presidential administration. The Topic3 category relates to any criminal activity in the US. The Explanation field contains text that explains the category selection."):

Here we can see that once again the default temperature setting leads to consistent and expected results. With a higher temperature setting it alternates between two equally-applicable labels.

How about a NER task ("What are the locations mentioned in the text?"):

Here we can see variability even at the default temperature setting, with White House and Situation room alternately appearing or not. This is a fairly common limitation of current LLMs: despite their immense prowess at interpreting natural language and generatively summarizing it, when constrained to more simplistic tasks like NER, they can struggle. Interestingly, the linguistic tasks required by NER (the underlying semantic modeling) is an intrinsic part of the LLM model's weights, but current LLMs struggle to translate this knowledge into reliable and consistent entity extraction. Moreover, even when they successfully extract entities, they are unable to resolve them due to a lack of realtime updating knowledgestore (which can't be replicated via two tower ANN).

What about structured entity extraction ("Extract all of the entities, their types and descriptions in the following text into a JSON object.")?

Given the free-form nature of the prompt and the lack of a precise definition of an "entity" the results here vary considerably, even at the default temperature setting. Strangely, at the default temperature setting, the LLM consistently failed to output in JSON format and in one case output strange triple tick marks around the output.

Note that it extracted Jericka Duncan's name only once and even then respelled it as "Jerrika", which suggests biasing in its training data of specific name spellings.

Of the eight runs, it extracted the names of Jericka Duncan and Weijia Jiang just once.

What about a Q&A ("Where was the cocaine discovered?"):

Here the temperature setting did not impact the results.

Or another question ("Who brought the cocaine into the White House according to the following text?")?

Interestingly, here the default temperature actually yielded a longer response in one case than a temperature of 0.99.

What about event/relation extraction ("Return a JSON object containing all of the events described in the following text.")?

Strangely, every single time it was run the output was blocked due to a violation of Google's standards.

Changing the prompt from "events" to "relationships" fixes the issue ("Return a JSON object containing all of the relationships described in the following text."):

What if we give it a bit more instruction ("Return a JSON object containing all of the relationships described in the following text. Each relationship should have three fields: Actor 1, Relationship, Actor 2. Actor 1 is the first entity, Actor 2 is the second entity and Relationship is the relationship between them."):

What if we change the formulation a bit to get around the incorrect policy block ("Return a JSON object containing all of the events described in the following text. Each event should have three fields: Actor 1, Relationship, Actor 2. Actor 1 is the first entity, Actor 2 is the second entity and Relationship is the relationship between them.")?

Here we can see that just by the addition of a small amount of clarifying text we can guide the LLM a bit, though we can see entity extraction breaking down to simple clause extraction – a symptom we've seen in current tier 1 commercial LLMs.

Overall, these examples give a general sense of how PaLM 2 can be applied via embeddings and generative text tasks. Each of the requests above took between 3 and 7 seconds per execution, meaning they are not sufficiently responsive for latency-sensitive applications, but are sufficiently fast and scalable for background bulk processing.