For best experience please turn on javascript and use a modern browser!
You are using a browser that is no longer supported by Microsoft. Please upgrade your browser. The site may not present itself correctly if you continue browsing.
Wouter Kool will defend his thesis entitled: 'Learning and Optimization in Combinatorial Spaces'. Supervisor: Prof. Dr. M. Welling. Co-Supervisors: Dr. H.C. van Hoof and prof. dr. J.A.S. Gromicho.
Event details of PhD Defence Wouter Kool
Date
23 November 2022
Time
11:00 -12:30
Aula - Lutherse kerk

Singel 411
1012 WN Amsterdam

Abstract

In this thesis, we develop methods for using machine learning to solve combinatorial optimization problems, with a focus on vehicle routing problems. The thesis consists of two parts. In Part i (Chapters 3 and 4), we develop practical methods using machine learning models to solve different variants of vehicle routing problems. As these models represent probability distributions over combinatorial spaces, in Part ii (Chapters 5 and 6), we focus on sampling from such models and optimizing their parameters.
Specifically, in Chapter 3, we use reinforcement learning to train the attention model, which represents a construction heuristic, to solve different variants of routing problems. In Chapter 4, we present deep policy dynamic programming, which uses another learned model to guide a restricted dynamic programming algorithm, for improved performance on routing problems and the ability to handle complex constraints such as time windows.
Given the deterministic nature of combinatorial problems, duplicate samples from the models in Part i are uninformative, so Part ii focuses on sampling without replacement from such models. In Chapter 5, we present ancestral Gumbel-top-k sampling as an efficient method for drawing samples without replacement from structured models over combinatorial domains, and we illustrate the general applicability beyond routing problems. In Chapter 6, we derive statistical gradient estimators based on such samples without replacement, which can be used to improve the gradient-based training procedure for the model in Chapter 3.