VGKG & GEN4: HTTP Versus HTTPS Benchmarking

While the online news landscape has largely transitioned to HTTPS in much of the world, this evolution brings with it considerable computational overhead. The cryptographic processing of HTTPS means fetching an HTTPS resource is vastly more expensive than traditional HTTP ingest. The complexity of TLS also means that its functions are typically outsourced to third party libraries. Combined, this can lead to unusual performance and stability side effects.

For example, one widely-used networking library enables connection reuse by default, which can significantly improve performance when requesting multiple resources from the same server. When used with HTTP requests, this feature works as expected. However, with HTTPS requests, 30 minutes after the first connection to a host, all further connections to that host begin behaving in unusual ways and returning unexpected errors, requiring either connection reuse to be turned off or that the process restart every 30 minutes.

In another example, the library version of a major networking tool outperforms its CLI sibling by a factor of 2-3x on HTTP requests. However, when requesting the same URLs using HTTPS, the CLI is now 2x faster. In fact, HTTP and HTTPS requests can uncover significant differences in both performance and stability in common tools between their CLI and library interfaces.