July 20th talk – Occam’s razor in machine learning (Tom Sterkenburg)

Tom Sterkenburg (MCMP, LMU Munich) will present on Occam’s razor in machine learning on July 20th, 4PM. Please see the abstract below.

The talk will take place on site at HLRS, Room Baden and also be streamed via webex.

To join in person you will need to register, please send a mail to nico.formanek@hlrs.de.

Join online via webex: https://unistuttgart.webex.com/unistuttgart/j.php?MTID=m9fbede5de288df8835c19726ac87578c

This event is connected to a master class with Tom on Occam’s razor in machine learning on July 21st. Should you be interested in joining the master class, please check out this link: Master Class on Occam’s Razor with Tom Sterkenburg.

Occam’s razor in machine learning

I give an overview of my work on Occam’s razor, the methodological principle to prefer simplicity in inductive inference. This principle presents us with two philosophical problems: what is simplicity (the problem of definition), and why is it good to prefer it (the problem of justification)? I observe that the mathematical theory of machine learning holds the promise to answer these problems, for (versions of) Occam’s razor in machine learning, by (1) giving a formal notion of simplicity and (2) connecting this formal notion of simplicity to formal guarantees of successful learning. I investigate whether this promise holds good.

I first look at Kolmogorov complexity and the Solomonoff-Levin theory of universal prediction based on it. I argue that this theory neither gives a robust notion of simplicity nor a theoretical connection between simplicity and learning guarantees.

I then look at statistical learning theory, the (still) standard mathematical framework for machine learning. I argue that this theory does give us a robust notion of simplicity, in the form of the notion of capacity of a model class, and a theoretical connection between simplicity and learning guarantees, in the form of uniform convergence bounds.

I finally look at the contemporary discussion around “the generalization puzzle” that the learning success of modern machine learning algorithms cannot adequately be accounted for by the classical framework of statistical learning theory. Proposals for a new and more adequate theory routinely evoke a version of Occam’s razor. I argue that these proposals bring us in an important sense back to square one: the relevant principle neither relies on a robust notion of simplicity nor is underwritten by theoretical guarantees.