In collaboration with the Internet Archive's TV News Archive and the Media-Data Research Consortium, we are tremendously excited today to unveil a powerful new Visual Explorer Lens metaphor: deep linking television by automatically identifying mentions of legislation in CSPAN transcripts and translating them into inline hyperlinks that link directly to the full record of the underlying legislation on the US Congress' website. As of this morning, when viewing any broadcast from CSPAN, CSPAN2 and CSPAN3, the Visual Explorer automatically scans the transcript and converts any mention of House or Senate legislation into an inline hyperlink that offers one-click access to the complete record of that legislation, converting a mention of "s. res. 198" to a link to "A resolution designating the week of April 23 through April 29, 2023, as 'National Water Week'", making it instantly clear what is being debated, negotiated and discussed.
Historically, there has been a disconnect between finished legislation available on Congress.gov and the democracy-in-action legislative process that led to the final bill as captured by CSPAN. Not merely the spoken words of the back-and-forth debate and their tenor, but the unspoken body language and symbolic gestures of speakers through the myriad behind-the-scenes discussions across the floor that CSPAN's cameras capture. In short, to truly understand democracy, we must understand the democratic process that creates the laws that define and govern our nation. Today we connect those two worlds: legislation and the legislative process, the text of Congress.gov with the video of CSPAN.
For example, take this exchange from the Senate floor invoking three resolutions: "… bloc consideration of the following resolutions, s. 181, s. 198, s. 199, s. res. 181, s. res. 198 and 199. the presiding officer…" What precisely are these three resolutions being considered? The surrounding discussion provides no clues, with the movement to adopt the resolutions occurring in isolation, sandwiched between other business. An interested viewer today would have no choice but to open a new browser window and copy-paste each reference into a Google search and hope that the first link is the correct one. For the current Congress, that often works, but when viewing the record of past congresses, that will return the wrong result as search engines prioritize results from the current congress over previous ones (since numbers are recycled with each Congress). Instead, the new CSPAN legislative deep linking lens displays this transcript as: "…bloc consideration of the following resolutions, s. 181, s. 198, s. 199, s. res. 181, s. res. 198 and 199. the presiding officer…", with each reference now a link that will open a new browser tab directly to the full record of that legislation on the US Congressional website. In this case, the lens also helps novice congressional watchers understand the difference between "s.181" and "s.res.181" simply by clicking the two different links, helping clarify why Sen. Schumer corrected himself.
Currently, only select legislation, such as resolutions and bills, are automatically linked and it may miss some mentions (such as "s.res.199" above) as we continue to refine its capabilities.
You can try it out live today:
This inaugural CSPAN Legislative Linker runs entirely in your browser, scanning CSPAN transcripts as you play a given clip and translating select legislation mentions into hyperlinks to the official legislation record on Congress.gov. Performing the deep linking process entirely in the browser dynamically allows us to rapidly iterate the underlying algorithm to accommodate a growing selection of legislative actions and to improve its robustness to transcription error, disfluencies and grammatical structures. Eventually our goal is to provide a JSON metadata file for each CSPAN broadcast that contains a list of all legislative mentions, their normalized forms (see below), their precise timestamps of mention and the URL of the original legislation records. Such a database would make it possible to perform normalized search and, most powerfully, perform bidirectional deep linking in which the final legislative record could be annotated to provide links back to the complete debate history of every discussion of that bill recorded by CSPAN's cameras. Eventually one could even envision deep annotation of the entire legislative record, with every line of every piece of legislation connected back to all of the moments it was discussed and how it came to be in that final form.
We explored many different approaches in the creation of this tool, from Large Language Models (LLM's) to the regular expression engine we ultimately adopted. While LLM's have the benefit of being able to theoretically look across transcription error and complex speech structures, in their current iterations their hallucination rates are simply too high when asked to perform even the simple task of recognizing legislation mentions. In contrast, the structured nature of both House and Senate lawmaking, with well-defined and relatively rigid formalized mechanisms for referencing legislation, makes it an ideal candidate for the use of the fixed grammars of regular expressions.
One immediate challenge is that the US Congress does not use monotonically increasing unique identifying numbers to refer to legislation: numbers simply reset with each congress. Thus, a debate in today's Congress (the 118th) over Senate Resolution 198 refers to National Water Week, while a CSPAN broadcast from two years ago (the 117th Congress) capturing a debate over Senate Resolution 198 would refer to "A resolution recognizing the roles and contributions of the teachers of the United States in building and enhancing the civic, cultural, and economic well-being of the United States" and a 2019 CSPAN broadcast recording the Senate Resolution 198 debate would refer to "A resolution condemning Brunei's dramatic human rights backsliding." To address this, the Visual Explorer Legislative Linker computes which congress was in session based on the date of the CSPAN broadcast. In rare instances this may yield an error if a historical legislative clip is played, such as if a clip of the 2019 debate is played on CSPAN today, since the tool won't know that the clip is from the 116th Congress, but this is typically extremely rare. A more common case is when a lawmaker refers in the present to a bill they sponsored or endorsed in a past congress without additional context, such as this 2023 reference to a 2021 bill. Overall, this means that viewing any CSPAN broadcast from the Internet Archive's decade-long archive will, in the majority of cases, be connected back to the correct bill from the correct congress, but there can be some occasional errors.
A second challenge is the limitations of the official Congress.gov search engine. For example, only numeric bill numbers are accepted: a search for "h.r.1" will return the "Lower Energy Costs Act", while a search for "h.r. one" will yield no matches. This means that when translating each mention into a link to the original legislative text, the engine must translate spelled-out bill numbers into their numeric equivalents. A greater challenge, however, is that the search engine only matches the acronym form of legislation mentions, rather than the spelled-out versions that are common to Congressional discourse. For example, a search for "s.res. 198" will correctly return National Water Week, but a search for "Senate Resolution 198" will return zero results. This necessitates a set of rewriter rules that translate spelled-out mentions into the forms needed for the Congress.gov search engine.
A third challenge is the range of legislative types and ways in which they are mentioned in practice. Terms from "hr" to "h.r." to "house bill" to "house resolution" to "house res." to "h.j.res" to "house joint resolution" to "house joint res." to "h.con.res." to "house concurrent resolution" to "h.amdt." to "house amendment" and their variants and typographical errors make for a diverse array of mentions that have to be recognized, normalized and translated by the engine.
Here are just a few example clips to get you started, showing different samples of legislation mentions and how this new tool connects them to their underlying legislative text:
- May 19, 2023.
- May 16, 2023. (Example 2, Example 3, Example 4, Example 5, Example 6)
- May 4, 2023.
- March 30, 2023.
This is just a first step in our journey towards creating new multimodal metaphors linking the online and offline worlds, television, text, imagery and audio through Visual Explorer Lenses. Future iterations will include additional enrichments and annotations, translating a broader array of entities into links to their underlying records and chronologies and providing metadata lookups to enable bidirectional connectivity: allowing for the first time deep-linking from the textual digital world into the video broadcast world.
In the end, the future of democracy rests on an informed electorate. Through these new forms of annotation and interface metaphors, drawing together legislation and the legislative process, we are taking powerful new steps towards a world in which citizens can play a far more active role in the governing process, continuing the long tradition of leveraging the digital world for government transparency from the earliest White House document repository under the Clinton administration. As we unsilo and interconnect these vast repositories of society, bridging modalities, languages, geographies, domains and the planet itself, we hope to unlock fundamentally new insights and understandings of the world itself.