In-person + Virtual
16 -20 May
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2022 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Central European Summer Time (UTC +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Back To Schedule
Wednesday, May 18 • 17:25 - 18:00
Tower of Babel: Making Apache Spark, Kubeflow, and Kubernetes Play Nice - Holden Karau, Netflix

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

Working with big data matrices is challenging, Kubernetes allows users to elastically scale, but can only have a pod as large as a node, which may not be large enough to fit the matrix in memory. While Kubernetes allows for other paradigms on top of it which allows pods to coordinate on individual jobs, setting them up and making them play nice with ML platforms is not straightforward. Using Apache Spark and Apache Mahout we can work with matrices of any dimension and distribute them across an unbounded number of pods/nodes, and we can use Kubeflow to make our work quickly and easily reproducible. In this talk, we’ll discuss how we used Apache Spark and Mahout to denoise DICOM images of lungs of COVID patients and published our Pipeline with Kubeflow to make the process easily repeatable which could help doctors in more resource limited hospitals, as well as other researchers seeking to automate the detection of COVID.

Click here to view captioning/translation in the MeetingPlay platform!

avatar for Holden Karau

Holden Karau

Open Source Engineer, Netflix
Holden Karau is a queer transgender Canadian, Apache Spark committer, Apache Software Foundation member, and an active open source contributor. She also extends her passion for building community with industry projects including Scaling for Python for ML and teaching distributed computing... Read More →

Wednesday May 18, 2022 17:25 - 18:00 CEST
Pavilion 4, Room B | Level 2 | Central Forum