Generative AI Experiments: Debugging A Networking Issue With GenAI Copilots Vs Stack Overflow & GitHub

Recently, we were forced to diagnose and address an extremely specialized edge case networking issue with a third party utility we rely on for certain queuing tasks (one of the many reasons we have been slowly replacing all our external library code relating to key types of tasks with bespoke code that better addresses the unique challenges of global-scale distributed high performance applications). This was an excellent opportunity to explore the utility of generative AI coding copilots to help in such tasks.

The specific issue was that under certain conditions the queue server would only return part of the requested record, truncating it mid-record. Extensive manual code analyses failed to turn up any obvious issues, so we turned to several major coding copilots and attempted two different approaches. In the first, we handed them the entire code path from record lookup to wire transmission and asked them to analyze it for issues, using different formulations of telling it there was an issue or asking it to identify different kinds of potential problems. In the second, we handed it a pseudocode description of the desired workflow and asked it to generate code from scratch, to test whether its code generation was better than its code analysis.

Neither approach uncovered the actual culprit – instead leading us on myriad wild goose chases and false paths, with the copilots confidently asserting that many correct lines of code were incorrect.

Ultimately, several Stack Overflow posts put us on the correct path and we finally located a code sample on GitHub that gave us the correct answer: the server incorrectly assumed that short socket writes of less than 1024 bytes would never result in a partial write and thus the server didn't check the return value of its socket write to see if less than the expected number of bytes were written.

In this case, had we simply performed a few Google searches, we would have had our answer in less than ten minutes, rather than hours of interacting with copilots leading us confidently and assertively in myriad false directions.