Reproducibility has been a cornerstone of scientific research for the last 400 years;, often that means charts or a sample of data. With the advent of Github, sometimes now code and sample data too, unfortunately this still leaves a lot of “leg work” to the person reproducing (which is why this is often a task for graduate students). But Kubeflow, an Open Source, Cloud Native, Data Science Platform, changes that by making all steps from data cleansing to visualization quickly and easily reproducible which in turn makes iterative advances much easier and faster. In this talk, we’ll discuss a peer review article that was published not only with corresponding code, but with a Kubeflow Pipeline, so that anyone may download, check, and iteratively improve the results. While the paper itself is interesting- the talk will focus on why publishing not only code and data but full pipelines benefits not only grad students tasked with verifying results, but the entire academic community.
Click here to view captioning/translation in the MeetingPlay platform!