Apache Spark is a unified analytics engine for large-scale data processing. People are moving Spark and batch workload to Kubernetes due to its uprising popularity. There are many challenges to running Spark efficiently on Kubernetes, for example, supporting autoscaling-based workloads. In this talk, we discuss building a large scale Spark Service on top of Kubernetes. We will also walk through autoscaling on a multi-tenant platform with advanced features such as physical isolation, min/max capacity setting, bin-packing, scale-in and scale out controls, and more. These improvements show significant CPU and memory utilization savings for Spark on Kubernetes.
Click here to view captioning/translation in the MeetingPlay platform!