Top 7 mistakes to avoid when using Apache Spark

What are the caveats when you take Spark into production? Many companies start using Apache Spark as it provides a unified platform across different workloads, be it Machine Learning, Real-Time Data processing, or plain SQL and ETL. Many choose Spark since it’s easy to start off with and scales rather simply as your data grows.

I collected the top caveats from Datapao’s Spark projects and will present you the ones you can’t afford to overlook when taking Spark into production.

Tóth Zoltán
CTO, Datapao

I design and implement Data Analytics Architectures at Datapao,
Besides working on Data Infrastructures, I work as a principal
instructor and consultant at Databricks, the company created by the founders of Apache Spark. Earlier I worked on the Spark integration project in Rapidminer and led the Data Engineering and the Business Modeling teams of Prezi.