Embedding Models: Mitigating Knowledge Cutoffs Through Replacement Terms

Embedding models represent a snapshot in time of world knowledge. Like knowledge graphs, LLMs and all other forms of machine models, the worldview they encode begins aging from the moment they are trained as their understanding of the world remains frozen in time even as the world in which they operate continues to evolve. This "knowledge cutoff" date manifests itself in existential ways when models are confronted with new terms that originated or were contextualized past their training data's collection. For example, while Vertex's circa-2021 Gecko model accurately recognizes Covid mentions while the older USE models do not, it in turn fails to recognize the newer term "mpox" that originated after it was trained. In practice, the context of a word often provides sufficient situating information for the model to look past the unknown word and still reasonably embed the passage through the surrounding text. Even if a model does not recognize the word "covid", it will likely be able to group it with coronavirus coverage because of the surrounding terms.

In applications where the surrounding context is insufficient for the model to correctly cluster or surface for LLM external memory applications, a method that we have found to work well is to replace with a contextually similar synonym. For example, when testing earlier generations of large neural models during the Covid-19 pandemic, all of the models we tested in the early months of the pandemic lacked the word "covid" in their internal knowledge representation. In some use cases the model was still able to produce reasonable results based on the surrounding context, but for many Q&A and extraction tasks these models produced incorrect results since they did not recognize Covid as a respiratory illness and coronavirus.

After extensive experimentation, we discovered that in all of the models we tested, the correct results could be obtained, other than regarding treatment regimes, simply by replacing "Covid" with "pneumonia" or "coronavirus" depending on the model. In other words, simply by preprocessing each input text to perform a global search-and-replace of all mentions of "Covid" with "pneumonia" we were able to instantly obtain the correct results from all of the models. Of course, queries regarding treatment regimes were still incorrect, but for all other tasks, the models now functioned as if Covid had been part of their training data.

Ultimately, we discovered that we could automatically generate replacement terms for the majority of our knowledge cutoff situations through a very simple workflow:

  1. Perform a basic keyword search to identify all sentences containing the term of interest.
  2. Compute the embeddings of all of these sentences using any major embedding model of a similar knowledge cutoff to the LLM, large neural model or knowledge graph of interest and then cluster the results to generate a smaller subset of example sentences that cover the full range of contexts in which the term appeared. In other words, "Covid cases continue to rise" might be an extremely common sentence that would dominate our training data, but by clustering the sentences we would select only one occurrence of this sentence form for our training dataset. For extremely large datasets where this was intractable we would simply randomly sample the sentences and skip the embedding step or perform the embedding only on that subset.
  3. The subset of sentences from step #2 were then run through a thematic dictionary-based replacement model that iterated through every word in the dictionary that fell under the same topic and generated a set of replacement sentences where the unknown word was replaced with the dictionary word. For example, for the sentence "Covid cases continue to soar" we used a dictionary containing the names of all major diseases and generated replacement sentences "Asthma cases continue to soar", "Bronchitis cases continue to soar", "Cancer cases continue to soar", "Coronavirus cases continue to soar", "Ebola cases continue to soar", "Pneumonia cases continue to soar", etc. This might generate a few thousand to a few tens of thousands of candidate sentences.
  4. Each of the candidate sentences was then converted to an embedding and the complete list scored by similarity to the original sentence, with the top X terms that were deemed most similar compiled.
  5. The final list of terms from step #3 were then compiled into a histogram of the terms that were the most similar across the most sentences. This list was then manually reviewed and the most semantically similar term selected. For example, at least one of the embedding models found that "Ebola" actually shared the most similar context to "Covid" sentences in the early days of the pandemic, reflecting that the model's training data likely contained a significant amount of material from the 2014 Ebola outbreak that shared significant similarity in their description as extremely dangerous highly infectious diseases with globe-spanning societal-scale impact.

In other words, we take all of the sentences containing the unknown word and condense them down to a set of representative sentences that capture the overall range of contexts in which the term is used. Then we use a dictionary of terms of a similar topic (disease, economics, war, etc) and just make a copy of the sentence with the unknown word replaced with the dictionary word and repeat this for all of the terms in the dictionary. Then we compute the embedding of the original sentence and of all of the replacement sentences, score the replacement sentences by similarity to the original and take the top few matches. We repeat that for all of the representative sentences and then make a list of the top handful of words that were scored as having the most similar context across all of the representative sentences. This final list of words was then manually reviewed and a replacement term selected from it.

Let's see how this works in miniature by selecting a couple of disease-related terms like "mpox" "coronavirus" "asthma" etc. In real applications this list would contain thousands or tens of thousands of sentences, but here we'll just use a couple of examples:

sentences = [
    "covid cases continue to soar",
    "COVID cases continue to soar",
    "COVID-19 cases continue to soar",
    "mpox cases continue to soar",
    "coronavirus cases continue to soar ",
    "pneumonia cases continue to soar",
    "respiratory cases continue to soar",
    "respiratory ailments continue to soar",
    "asthma cases continue to soar",
    "bronchitis cases continue to soar",
    "asthma cases continue to soar",
]

As before, we'll use our embedding visualization template to cluster the generated passages using six models: the English-only USEv4, the larger English-only USEv5-Large, the 16-language USEv3-Multilingual and the larger 16-language USEv3-Multilingual-Large models (supporting 16 languages: Arabic, Chinese-simplified, Chinese-traditional, English, French, German, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Spanish, Thai, Turkish, Russian), the 100-language LaBSEv2 model optimized for translation-pair scoring and the Vertex AI Embeddings for Text API.

Vertex AI

Given that Vertex was trained after Covid's existence, we can see that it strongly associates "coronavirus" as a near-synonym for "Covid". Interestingly, the unknown term "mpox" (which is not present in Vertex's knowledgestore) is clustered more strongly with it than other respiratory terms. Notably, bronchitis is more closely associated than asthma and the generic phrase "respiratory ailments", suggesting it would be the best replacement amongst this set of terms.

Sentence  0 :
   (1.000) (ID 0) (Len: 28): covid cases continue to soar
   (0.983) (ID 1) (Len: 28): COVID cases continue to soar
   (0.968) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.955) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.857) (ID 3) (Len: 27): mpox cases continue to soar
   (0.841) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.827) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.813) (ID 8) (Len: 29): asthma cases continue to soar
   (0.813) (ID 10) (Len: 29): asthma cases continue to soar
   (0.809) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.760) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  1 :
   (1.000) (ID 1) (Len: 28): COVID cases continue to soar
   (0.983) (ID 0) (Len: 28): covid cases continue to soar
   (0.980) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.956) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.863) (ID 3) (Len: 27): mpox cases continue to soar
   (0.844) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.823) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.811) (ID 8) (Len: 29): asthma cases continue to soar
   (0.811) (ID 10) (Len: 29): asthma cases continue to soar
   (0.807) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.758) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  2 :
   (1.000) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.980) (ID 1) (Len: 28): COVID cases continue to soar
   (0.968) (ID 0) (Len: 28): covid cases continue to soar
   (0.955) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.860) (ID 3) (Len: 27): mpox cases continue to soar
   (0.842) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.825) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.809) (ID 8) (Len: 29): asthma cases continue to soar
   (0.809) (ID 10) (Len: 29): asthma cases continue to soar
   (0.809) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.758) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  2 :
   (1.000) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.980) (ID 1) (Len: 28): COVID cases continue to soar
   (0.968) (ID 0) (Len: 28): covid cases continue to soar
   (0.955) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.860) (ID 3) (Len: 27): mpox cases continue to soar
   (0.842) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.825) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.809) (ID 8) (Len: 29): asthma cases continue to soar
   (0.809) (ID 10) (Len: 29): asthma cases continue to soar
   (0.809) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.758) (ID 7) (Len: 37): respiratory ailments continue to soar

Universal Sentence Encoder

Despite USE's knowledgestore having a pre-Covid cutoff, "coronavirus" is scored as having one of the most similar contexts, so would make the best replacement term. The most likely reason for this is that our sample sentence uses the technical term "cases" that historically was more often paired with more technical medical terms like "coronavirus" than with more general terms like "pneumonia" that might be more likely to be paired with "infections" rather than "cases". This is where looking over a large set of reference sentences can help to normalize for the idiosyncrasies of a given term.

Sentence  0 :
   (1.000) (ID 0) (Len: 28): covid cases continue to soar
   (1.000) (ID 1) (Len: 28): COVID cases continue to soar
   (0.926) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.916) (ID 3) (Len: 27): mpox cases continue to soar
   (0.894) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.752) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.686) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.682) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.643) (ID 8) (Len: 29): asthma cases continue to soar
   (0.643) (ID 10) (Len: 29): asthma cases continue to soar
   (0.592) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  1 :
   (1.000) (ID 0) (Len: 28): covid cases continue to soar
   (1.000) (ID 1) (Len: 28): COVID cases continue to soar
   (0.926) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.916) (ID 3) (Len: 27): mpox cases continue to soar
   (0.894) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.752) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.686) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.682) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.643) (ID 8) (Len: 29): asthma cases continue to soar
   (0.643) (ID 10) (Len: 29): asthma cases continue to soar
   (0.592) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  2 :
   (1.000) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.926) (ID 0) (Len: 28): covid cases continue to soar
   (0.926) (ID 1) (Len: 28): COVID cases continue to soar
   (0.918) (ID 3) (Len: 27): mpox cases continue to soar
   (0.875) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.732) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.692) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.655) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.614) (ID 8) (Len: 29): asthma cases continue to soar
   (0.614) (ID 10) (Len: 29): asthma cases continue to soar
   (0.571) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  3 :
   (1.000) (ID 3) (Len: 27): mpox cases continue to soar
   (0.918) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.916) (ID 0) (Len: 28): covid cases continue to soar
   (0.916) (ID 1) (Len: 28): COVID cases continue to soar
   (0.862) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.703) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.675) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.662) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.618) (ID 8) (Len: 29): asthma cases continue to soar
   (0.618) (ID 10) (Len: 29): asthma cases continue to soar
   (0.551) (ID 7) (Len: 37): respiratory ailments continue to soar

Universal Sentence Encoder Large

Here the results are fairly similar:

Sentence  0 :
   (1.000) (ID 0) (Len: 28): covid cases continue to soar
   (1.000) (ID 1) (Len: 28): COVID cases continue to soar
   (0.848) (ID 3) (Len: 27): mpox cases continue to soar
   (0.826) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.807) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.712) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.695) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.628) (ID 8) (Len: 29): asthma cases continue to soar
   (0.628) (ID 10) (Len: 29): asthma cases continue to soar
   (0.605) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.535) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  1 :
   (1.000) (ID 0) (Len: 28): covid cases continue to soar
   (1.000) (ID 1) (Len: 28): COVID cases continue to soar
   (0.848) (ID 3) (Len: 27): mpox cases continue to soar
   (0.826) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.807) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.712) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.695) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.628) (ID 8) (Len: 29): asthma cases continue to soar
   (0.628) (ID 10) (Len: 29): asthma cases continue to soar
   (0.605) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.535) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  2 :
   (1.000) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.819) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.815) (ID 3) (Len: 27): mpox cases continue to soar
   (0.807) (ID 0) (Len: 28): covid cases continue to soar
   (0.807) (ID 1) (Len: 28): COVID cases continue to soar
   (0.728) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.717) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.650) (ID 10) (Len: 29): asthma cases continue to soar
   (0.650) (ID 8) (Len: 29): asthma cases continue to soar
   (0.649) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.551) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  3 :
   (1.000) (ID 3) (Len: 27): mpox cases continue to soar
   (0.848) (ID 0) (Len: 28): covid cases continue to soar
   (0.848) (ID 1) (Len: 28): COVID cases continue to soar
   (0.837) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.815) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.732) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.724) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.648) (ID 10) (Len: 29): asthma cases continue to soar
   (0.648) (ID 8) (Len: 29): asthma cases continue to soar
   (0.616) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.582) (ID 7) (Len: 37): respiratory ailments continue to soar

Universal Sentence Encoder Multilingual

USE Multilingual is also fairly similar:

Sentence  0 :
   (1.000) (ID 0) (Len: 28): covid cases continue to soar
   (0.735) (ID 1) (Len: 28): COVID cases continue to soar
   (0.632) (ID 3) (Len: 27): mpox cases continue to soar
   (0.565) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.534) (ID 8) (Len: 29): asthma cases continue to soar
   (0.534) (ID 10) (Len: 29): asthma cases continue to soar
   (0.509) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.501) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.495) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.479) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.349) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  1 :
   (1.000) (ID 1) (Len: 28): COVID cases continue to soar
   (0.735) (ID 0) (Len: 28): covid cases continue to soar
   (0.690) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.690) (ID 3) (Len: 27): mpox cases continue to soar
   (0.598) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.543) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.541) (ID 10) (Len: 29): asthma cases continue to soar
   (0.541) (ID 8) (Len: 29): asthma cases continue to soar
   (0.500) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.492) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.434) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  2 :
   (1.000) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.690) (ID 1) (Len: 28): COVID cases continue to soar
   (0.514) (ID 3) (Len: 27): mpox cases continue to soar
   (0.495) (ID 0) (Len: 28): covid cases continue to soar
   (0.451) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.378) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.361) (ID 8) (Len: 29): asthma cases continue to soar
   (0.361) (ID 10) (Len: 29): asthma cases continue to soar
   (0.358) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.342) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.298) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  3 :
   (1.000) (ID 3) (Len: 27): mpox cases continue to soar
   (0.690) (ID 1) (Len: 28): COVID cases continue to soar
   (0.687) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.669) (ID 8) (Len: 29): asthma cases continue to soar
   (0.669) (ID 10) (Len: 29): asthma cases continue to soar
   (0.632) (ID 0) (Len: 28): covid cases continue to soar
   (0.628) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.624) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.596) (ID 7) (Len: 37): respiratory ailments continue to soar
   (0.596) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.514) (ID 2) (Len: 31): COVID-19 cases continue to soar

Universal Sentence Encoder Multilingual Large

As is its Large version, though here bronchitis is ranked higher as a secondary option:

Sentence  0 :
   (1.000) (ID 0) (Len: 28): covid cases continue to soar
   (0.877) (ID 3) (Len: 27): mpox cases continue to soar
   (0.780) (ID 1) (Len: 28): COVID cases continue to soar
   (0.752) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.750) (ID 10) (Len: 29): asthma cases continue to soar
   (0.750) (ID 8) (Len: 29): asthma cases continue to soar
   (0.742) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.737) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.734) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.661) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.550) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  1 :
   (1.000) (ID 1) (Len: 28): COVID cases continue to soar
   (0.830) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.780) (ID 0) (Len: 28): covid cases continue to soar
   (0.726) (ID 3) (Len: 27): mpox cases continue to soar
   (0.676) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.586) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.580) (ID 8) (Len: 29): asthma cases continue to soar
   (0.580) (ID 10) (Len: 29): asthma cases continue to soar
   (0.575) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.563) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.400) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  2 :
   (1.000) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.830) (ID 1) (Len: 28): COVID cases continue to soar
   (0.661) (ID 0) (Len: 28): covid cases continue to soar
   (0.621) (ID 3) (Len: 27): mpox cases continue to soar
   (0.566) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.502) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.492) (ID 8) (Len: 29): asthma cases continue to soar
   (0.492) (ID 10) (Len: 29): asthma cases continue to soar
   (0.486) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.479) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.330) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  3 :
   (1.000) (ID 3) (Len: 27): mpox cases continue to soar
   (0.877) (ID 0) (Len: 28): covid cases continue to soar
   (0.749) (ID 10) (Len: 29): asthma cases continue to soar
   (0.749) (ID 8) (Len: 29): asthma cases continue to soar
   (0.735) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.729) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.726) (ID 1) (Len: 28): COVID cases continue to soar
   (0.722) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.717) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.621) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.561) (ID 7) (Len: 37): respiratory ailments continue to soar

LaBSE

For LaBSE, pneumonia is scored more highly than the others behind coronavirus:

Sentence  0 :
   (1.000) (ID 0) (Len: 28): covid cases continue to soar
   (0.906) (ID 1) (Len: 28): COVID cases continue to soar
   (0.834) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.821) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.772) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.738) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.730) (ID 3) (Len: 27): mpox cases continue to soar
   (0.726) (ID 8) (Len: 29): asthma cases continue to soar
   (0.726) (ID 10) (Len: 29): asthma cases continue to soar
   (0.724) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.629) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  1 :
   (1.000) (ID 1) (Len: 28): COVID cases continue to soar
   (0.909) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.906) (ID 0) (Len: 28): covid cases continue to soar
   (0.842) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.726) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.698) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.671) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.659) (ID 3) (Len: 27): mpox cases continue to soar
   (0.657) (ID 8) (Len: 29): asthma cases continue to soar
   (0.657) (ID 10) (Len: 29): asthma cases continue to soar
   (0.590) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  2 :
   (1.000) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.909) (ID 1) (Len: 28): COVID cases continue to soar
   (0.821) (ID 0) (Len: 28): covid cases continue to soar
   (0.792) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.719) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.652) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.634) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.607) (ID 3) (Len: 27): mpox cases continue to soar
   (0.606) (ID 8) (Len: 29): asthma cases continue to soar
   (0.606) (ID 10) (Len: 29): asthma cases continue to soar
   (0.566) (ID 7) (Len: 37): respiratory ailments continue to soar


Sentence  3 :
   (1.000) (ID 3) (Len: 27): mpox cases continue to soar
   (0.749) (ID 5) (Len: 32): pneumonia cases continue to soar
   (0.735) (ID 8) (Len: 29): asthma cases continue to soar
   (0.735) (ID 10) (Len: 29): asthma cases continue to soar
   (0.730) (ID 0) (Len: 28): covid cases continue to soar
   (0.713) (ID 6) (Len: 34): respiratory cases continue to soar
   (0.701) (ID 9) (Len: 33): bronchitis cases continue to soar
   (0.659) (ID 1) (Len: 28): COVID cases continue to soar
   (0.632) (ID 4) (Len: 35): coronavirus cases continue to soar 
   (0.607) (ID 2) (Len: 31): COVID-19 cases continue to soar
   (0.593) (ID 7) (Len: 37): respiratory ailments continue to soar