For best experience please turn on javascript and use a modern browser!
You are using a browser that is no longer supported by Microsoft. Please upgrade your browser. The site may not present itself correctly if you continue browsing.
During the Amsterdam AI Impact Festival, Remco Hogerwerf, alumus of the IvI Bachelor’s Computer Science, was awarded the Amsterdam AI Thesis Prize. His thesis, ‘Space Filling Curves For Sequencing Image Patches in Vision Transformers’, explores how alternative sequencing methods can improve how Vision Transformers learn spatial relationships in images. In four minutes Remco pitched this research to the community at the festival.

Understanding Vision Transformers and Pixel Ordering

Vision Transformers (ViTs) are strong models for understanding images. They read images in a fixed order, like a zigzag pattern, which can break the natural closeness of parts in the image. In his thesis project, Hogerwerf studied different ways of ordering image pixels called space-filling curves (SFCs). Examples are Raster-scan, Hilbert, Peano, Moore, Onion, and Z-curves. These curves rearrange the image so that pixels that are close in the image stay close in the sequence.

A New Method for Better Spatial Understanding

In his thesis, Hogerwerf introduces a new method for ViTs. First, images are turned into a long line of pixels using SFCs. Then they are made into patches. This adds helpful spatial information directly into the model. A second idea uses several SFCs at different scales to give even more spatial clues. He also tests what happens when positional encodings - extra information about patch order - are removed.

Strong Results From Extensive Testing

Hogerwerf ran many tests on common image datasets. They compared six SFC-based ViT models with a normal ViT, keeping all training settings the same. Using SFCs made the models 3–7% more accurate and up to 20% faster to train. The Z-curve gave the best improvement. Using several SFCs together added another 2–3% accuracy. The tests also showed that adding positional encodings actually made things worse when SFCs were used, meaning the SFCs already give enough spatial information.

Benefits for Future Vision Models

Overall, Hogerwerf’s results show that a simple scan order that keeps nearby image areas close together can replace positional embeddings completely and make the model both more accurate and more efficient to train. In practice, this new method offers great potential for fields where spatial details in images are crucial. In addition, the research shows that fundamental optimisations are essential for the development of more sustainable AI. By organising data processing more efficiently, the models trained faster. Such algorithmic improvements are indispensable for reducing the growing energy demand and ecological footprint of large-scale AI systems.

Remco Hogerwerf with his supervisor Rein van den Boomgaard

Amsterdam Thesis Award

The Amsterdam AI Thesis Award is awarded to Bachelor’s and Master’s students who present exciting and innovative work in the field of AI and Data Science research. The Amsterdam AI Thesis Award is organised  to promote new AI students, AI research, encouraging diversity, and foster collaboration within the AI and Data Science communities.

Stay tuned for the winners of the AI Amsterdam Master’s Thesis Prize, Dominykas Šeputis and Liang Telkamp, alumni in the Master’s Artificial Intelligence.