Validity, Breadth, and Density in a Model of Language Generation

Speaker:

Jon Kleinberg, Cornell University

Date and Time:

Wednesday, September 24, 2025 - 11:40am to 12:30pm

Location:

Fields Institute, Room 230 and online

Abstract:

The recent successes of large language models (LLMs) have led to a surge of theoretical research into language generation. A recent line of work proposes an abstract view, called language generation in the limit, where generation is viewed as a game between an adversary and an algorithm: the adversary generates strings from an unknown language K, chosen from a countable collection of candidate languages, and after seeing a finite set of these strings, the algorithm must generate new strings from K that it has not seen before. This formalism highlights a key tension: the trade-off between validity (the algorithm should only produce strings from the language) and breadth (it should be able to produce many strings from the language). This trade-off is central in applied uses of language generation as well, where it appears as a balance between hallucination (generating invalid utterances) and mode collapse (generating only a restricted set of outputs). Despite its importance, this trade-off has been challenging to study quantitatively. We survey recent work in this model, including a set of results that quantifies the trade-off between validity and breadth using measures of density.

This talk is based on joint work with Sendhil Mullainathan and Fan Wei.

The Fields Institute for
Research in Mathematical Sciences

Validity, Breadth, and Density in a Model of Language Generation

Scheduled as part of

People and Contacts

Calendar and Events