Apache Spark seems shiny and easy on the surface, but after living more than a year together, we can see the not-that shiny parts of it. Performance issues, memory problems, development hiccups are also part of our war, just as the good parts.At enbritely we measure traffic quality in online advertisement by analyzing vast amount of data. Our data infrastructure has to process billions of events per day on a TB scale. In the past 6 months we worked on our data platform based on Apache Spark. We encountered a lot of barriers, surprises we hadn’t anticipated.In the presentation I share our stories, experiences with Apache Spark, the good, the bad and the ugly parts of it.
Gulyás Máté
CTO, Co-Founder, enbrite.ly
Mate is Co-founder and CTO of enbrite.ly a Budapest based startup with the vision to create the next generation decision supporting system in online advertising that covers the market needs of the future. Mate has many years of experience with Big Data architectures, data analytics pipelines, operation of infrastructures and growing organizations by focusing on culture. Beside enbrite.ly he is Chief Architect at Dmlab, a leading data analytics company providing innovative data products and services. Mate teaches Big Data analytics at Budapest University of Technology and Economics and runs courses for companies. Speaker of local and international conferences and meetups.