Dodging the cost of scalability in data analysis with CPU efficiency

Multi-node operations for data analysis have enjoyed huge attention in recent years. However, single nodes have also grown tremendously in CPU and memory capacity at the same time. We revisit single-node operations for data analysis using for example efficient vectorised data processing primitives. We also describe our DuckDB project, which provides these primitives under a SQL interface.

Dr. Hannes Mühleisen
Senior Researcher, CWI Database Architectures Group