The ATLAS experiment at CERN is one of the largest scientific machines built to date and will have ever growing computing needs as it explores higher energy and luminosity proton collisions. Recent R&D on the integration of cloud infrastructures with ATLAS' Worldwide LHC Computing Grid resources identified Kubernetes as a commonly available, ideal substrate. While Kubernetes is widely known for its service management capabilities, it also offers powerful batch controllers for containerised workloads. We exploited these capabilities to build ephemeral batch clusters with over 100k vCPU to process tasks that require quick turnaround, make available GPU resources that are not widely available in our own infrastructure, or create interactive facilities, where users can easily spin up private clusters for their distributed analysis from a notebook.
Click here to view captioning/translation in the MeetingPlay platform!