Google I/O 2024 Musings: As Models Move From The Lab To Life, Interface Takes Center Stage

As LLMs, LSMs, LMMs and other "Large X Models" continue their transition from laboratory novelties and enterprise tools into the hands of ordinary consumers, the interfaces through which they are used will increasingly take center stage. From turn-based textual interfaces, uninterruptable outputs and lengthy delays of seconds to minutes to interruptible voice-based interfaces, models will become increasingly flexible in their interaction modalities and contexts. From a model standpoint, major changes include the addition of voice and visual modalities, which have already been underway for some time and will become more commonplace across models, rather than relegated to the largest and most powerful models or to more limited purpose-built models. Speed will become central to model design, since consumer-based applications can tolerate only minimal response latency – though again the speed versus capability tradeoff has been increasingly explored as companies confront the enormous operational costs of running the largest models. Interruptibility doesn't require model modifications per-se as much as it requires out-of-band monitoring and injection, though it requires low model latency to respond to those interruptions.

In short, the future of consumer-based applications doesn't require much in the way of model changes beyond what is already under way. Instead, consumer application of AI centers on wrapping these models within interfaces such as voice and application tie-ins (mobile, enterprise collaboration tools, content generation and productivity packages, etc) and eventually building directly and more transparently into devices via on-device AI. A thermostat or toaster might embed a mini-LMM tuned for their narrow domain but with sufficient flexibility to handle unusual requests like "Make my toast brown but not too hot that the butter will run as much" or "my friends are visiting at 5 and get cold easily so keep the house comfortable for me but not too cold for them" and be able to work out what settings they should use. Similarly, an enterprise collaboration tool might offer AI agents as built-in team members, much as Google demonstrated at IO. Native integration into productivity tools will make them more directly accessible and more seamless to invoke. Each of these centers on interface.

LLMs today can trivially cluster emails into topical threads, summarize each major point, extract and compile action items and even draft memos, write reports, craft images and code websites. Yet, by integrating all of these tasks into the native interfaces of the tools we already use and making them participate seamlessly in those environments they become more natural, much as the integration of spelling and grammar check natively into word processing tools fundamentally changed our relationship with automated writing assistance features.

In the end, the future of AI to come will center less on fundamental advances in model capabilities and more on novel wrappings of models into new workflows and contexts centering on interface.