John Kloosterman

About Me

I am a Lecturer in Computer Science and Engineering at the University of Michigan. Recently I have been teaching EECS 183, an introductory computer science course for both those students who intend to major in computer science and those looking to integrate computer science skills into their other interests.

Office Hours, Winter 2019

Monday 4:00-5:00PM, 3611 Bob and Betty Beyster Building
Tuesday 2:30-3:30PM, 1837 East Hall
Wednesday 3:00-4:30PM, 3rd floor, Duderstadt Center
Thursday 10:00-11:00AM, 1837 East Hall

I am also happy to set up an appointment over e-mail.

Teaching

EECS 183 (Elementary Programming Concepts), Winter 2019, Fall 2018
EECS 280 (Programming and Data Structures), Winter 2017

Press

EECS 280 Becomes Third Largest Course at U-M
University of Michigan CSE, April 2017

Research

Autonomous vehicle security: Autonomous vehicle software, such as Apollo, is implemented as a distributed system, with multiple modules communicating using message passing. I am creating tools that can automatically identify vulnerabilities in the ways these modules interact, which developers can use to secure these critical systems.

GPU multi-kernel execution: When multiple kernels are run at the same time on the same GPU, they can often acheive higher throughput than when they are run consecutively, because there are times they have complementary resource requirements. During other times, they can interfere. My work finds resource paritions that limit the impact of interference.

GPU register file design: GPUs need to have hundreds of kilobytes of register file, because so many threads are executing simultaneously. However, not many of these registers are accessed in any given period of time. RegLess (published MICRO 2017) is a technique to save energy using a much smaller register structure that stores only active registers.

GPU memory coalescing: Nearby threads on a GPU tend to access nearby locations in memory, allowing requests to the same cache lines to be merged to increase memory throughput. WarpPool (published MICRO 2015) used a new type of memory locality between loads made by different thread groups to merge more requests.

Publications

Scratch That (But Cache This): A Hybrid Register Cache / Scratchpad for GPUs
Jonathan Bailey, John Kloosterman, Scott Mahlke
CASES 2018

RegLess: Just-in-Time Operand Staging for GPUs
John Kloosterman, Jonathan Beaumont, D. Anoushe Jamshidi, Jonathan Bailey, Trevor Mudge, Scott Mahlke
MICRO 2017

WarpPool: Sharing Requests with Inter-Warp Coalescing for Throughput Processors
John Kloosterman, Jonathan Beaumont, Mick Wollman, Ankit Sethia, Ron Dreslinski, Trevor Mudge, and Scott Mahlke
MICRO 2015

local_malloc: malloc() for OpenCL local memory (poster)
John Kloosterman, Joel Adams
ACM SRC Poster, SC13