English

The future of big data is Kubernetes

Kubernetes, and cloud native technologies are bringing a significant paradigm shift to infrastructure, big data deployments and applications and these changes are fast and not incremental. We are claiming that the existing big data frameworks are all built on outdated infrastructural components and the proposed changes are too little and too late. We’re essentially making the pitch that Kubernetes is solving these problems and implies a better separation of concerns between compute, SQL, streaming and the underlying infrastructure, let it be cloud, on-prem or hybrid. Kubernetes has become the de-facto standard as a “runtime fabric” purposely designed and built for the cloud and scale and does things the right way – it is an overall game changer and we hope that the big data landscape will benefit from it. Big data frameworks relies on several projects/services like YARN for scheduling and Zookeeper for consistency. While these were great tools for on-prem environments they have failed to progress with the pace of the requirements and technologies used in dynamic cloud environments. This talk is making the audience familiar with the benefits of running big data workloads on Kubernetes, introducing into the changes done by the Kubernetes community and deep dive into the most popular data frameworks as Spark, Zeppelin and Kafka.

János Mátyás
CTO, Banzai Cloud

Janos is an open source committer at the Apache Software Foundation and contributor to several Cloud Native Computing Foundation projects. He’s primary interests is in scheduling, distributed systems and cloud native technologies. He has been active with contributions to make Spark and Zeppelin running in Kubernetes and an active committer to the Kubicorn project. He was the CTO and co-founder of SequenceIQ, a startup acquired by Hortonworks where he started the Cloudbreak project to provision Hadoop clusters in the cloud. Currently he is the CTO at Banzai Cloud where he is developing a microservice platform based on Kubernetes to deploy Spark, Zeppelin, Kafka and other big date tools running the cloud native way by removing the YARN and Zookeeper dependencies and replacing it with CNCF native technologies like the Kubernetes scheduler, etcd, Prometheus and Istio. He is an active presenter at several large conferences as Hadoop Summit, Strata, DataWorks Summit and Apache Big Data conference and the organizer of the Kubernetes and Cloud Native Computing meetup in Budapest.