Expanding our new Advertising Inventory Files (AIF) Captioning Time dataset, we're particularly interested in understanding the landscape of closed captioning of advertisements on television news. Are most captioned advertisements for consumer products, while advertising related to public health, policy issues and other critical societal topics uncaptioned in ways that would disenfranchise members of society that rely upon captioning to understand the news? Is the language of advertising fundamentally different in some way from the language of news, either in word choice or emotional undercurrents? Do advertisements offer counternarratives to the news programming in which they are embedded, perhaps negating editorial decisions of news channels? Have advertisements changed in the Covid-19 era and either way, are advertisements presenting scenes and promoting behaviors at odds with public health messaging?
To help explore these questions and broaden understanding of working with Captioning Mode data, we will shortly be releasing modified versions of the raw TTXT files generated by ccextractor for each broadcast. This modified file is the raw TTXT output format, but the actual text of the captioning is removed for all news-related lines and retained only for POP and PAI caption modes, which are advertisements. In this way, these TTXT files are exactly as output by ccextractor but with only advertising captioning retained, while news captioning is blanked out.
For example, for the July 29, 2010 ABC World News Now 1AM-3AM PST broadcast, the lines surrounding the first commercial break look like:
00:04:15,638|00:04:16,105|RU3| 00:04:16,172|00:04:17,640|RU3| 00:04:17,707|00:04:18,808|RU3| 00:04:24,463|00:04:25,798|POP| Ohhh. Cheesecake. 00:04:25,816|00:04:28,201|POP| Ok. What if I just had 00:04:25,816|00:04:28,201|POP| a small slice? 00:04:28,235|00:04:29,636|POP| I was good today, 00:04:28,235|00:04:29,636|POP| I deserve it! 00:04:29,670|00:04:32,021|POP| Or, I could have a medium slice 00:04:29,670|00:04:32,021|POP| and some celery sticks 00:04:32,072|00:04:33,472|POP|and they would ccel each other 00:04:32,072|00:04:33,472|POP| out, right? 00:04:33,490|00:04:34,941|POP| Or...Ok. 00:04:33,490|00:04:34,941|POP| I could ha one large slice 00:04:34,975|00:04:36,359|POP| and jog in place as I eat it 00:04:36,410|00:04:39,078|POP| Or...Ok. How about one large 00:04:36,410|00:04:39,078|POP| slice while jogging in place 00:04:39,113|00:04:40,113|POP| followed by eight celery... 00:04:40,147|00:04:41,531|POP| MMM 00:04:40,147|00:04:41,531|POP| Raspberrcheesecake... 00:04:41,582|00:04:43,616|POP| I have been thinking about this 00:04:41,582|00:04:43,616|POP| all day. 00:04:43,651|00:04:44,817|POP| Wow, and you've 00:04:43,651|00:04:44,817|POP| lost weight! 00:04:44,835|00:04:45,652|POP|Oh yeah, 00:04:45,686|00:04:48,121|POP|You're welcome. thank you! 00:04:48,188|00:04:49,122|POP| [ Female Announcer ] Yop. 00:04:49,156|00:04:50,790|POP| With 30 delicious flavors 00:04:52,126|00:04:59,065|POP| Yoplait. 00:04:52,126|00:04:59,065|POP| It is so good. 00:06:01,695|00:06:02,945|POP| OUCH! OW! OOPS! 00:06:03,063|00:06:05,298|POP| IT'S NEO TO GO!Â® 00:06:05,366|00:06:07,083|POP|READY. AIM. PROTECT. 00:06:07,184|00:06:08,935|POP| NEOSPORINÂ® GIVES YOU 00:06:07,184|00:06:08,935|POP| INFECTION-PROTECTION, 00:06:09,019|00:06:10,086|POP| AND PAIN RELIEF. 00:06:10,187|00:06:12,104|POP| NEO TO GO!Â® 00:06:10,187|00:06:12,104|POP| PLUS PAIN RELIEF. 00:06:12,189|00:06:14,123|POP| EVERY CUT. 00:06:12,189|00:06:14,123|POP| EVERY TIME. 00:06:12,189|00:06:14,123|POP| EVERYWHERE. 00:06:54,647|00:06:56,015|RU3| 00:06:56,182|00:06:57,150|RU3| 00:06:57,250|00:06:59,385|RU3| 00:06:59,552|00:07:00,987|RU3| 00:07:01,120|00:07:02,555|RU3|
You can see how the "RU3" news lines are blank since we have removed the text of the news-related captioning, while the advertising lines are intact. You can also see how the first advertisement ends at 4m59s, while the following commercial is uncaptioned, followed by another captioned commercial starting at 6m1s. Remember that captioning files do not encode uncaptioned time – blocks of time that do not have associated captions appear as gaps in the timecode sequence.
You'll notice that there is no delineation separating the two commercials other than in this case that a third uncaptioned commercial appears between them. Captioning files do not provide information separating distinct stories or commercials, so to distinguish the boundary of two commercials you would need to use semantic similarity estimation or construct a database of known commercials and the scripts.