BigQuery without big costs

The Google Cloud Platform (GCP) provides a broad spectrum of tools that can help or even replace local infrastructure for handling big-data and enables users to seamlessly scale from proof-of-concept to global operations.

Emarsys moved to the Google Cloud Platform in 2016 and today most of our services use it for storing vast amounts of data, providing inter-service communications, and for data loading and transformation jobs at scale.

By early 2017, due to new product development efforts, the Google BigQuery-based data warehouse in its past form faced serious bottlenecks in terms of query costs as well as latency, therefore both a restructuring of the data and a shift to real-time stream processing was needed. Both the bulk transformation of the data and the introduction of live streams to a traditionally batch world posed interesting technological challenges that we solved using a variety of tools in a cost effective manner.

During the presentation we will demonstrate a Python-based tool that transforms billions of rows of data employing very simple but free of charge operations on GCP in the majority of the process and an Apache Beam process written in Java running on the Google Cloud Dataflow that performs transformation and joins of multiple incoming event streams in real time.

Balogh Ádám
Data scientist, Emarsys

After obtaining a PhD in the field of biomedical signal processing, Adam worked on R&D projects with a focus on translating innovations into production. Currently, as a data scientist he is involved in developing and implementing models enabling personalization of the services provided by Emarsys.

Tóth Krisztián
Data scientist, Emarsys

Krisztian as a data scientist helps Emarsys clients gain better insights about their customers and enables them to personalize their interactions at scale. Prior to working at Emarsys, Krisztian co-founded 3 startups and worked as a consultant for major multinational companies and as a consulting data scientist in international research projects.