This year, the weekend of the 3rd and 4th of February was the one chosen. This was the third time I came to Brussels, in the middle of the winter, to attend what is one of my favourite conferences. It was the coldest of the three, with temperatures droping to -3ºC and the first time it actually snowed!
Nonetheless it was fuller than ever, gathering more than 10 thousand developers and entusiasts from all over the world. The Rust and Go devrooms where specially full with huge queues at the door. The usual advice of selecting one or two devrooms per day and holding to those continues to hold true, now more than ever. Otherwise you might find yourself spending more time in queues than listening to talks.
This year I spent most of the morning of Saturday in the Monitoring and Cloud devroom and the afternoon in the Go devroom. Sunday was split between MySQL and HPC, Data Science and Big Data devrooms.
Saturday’s highlights
From the “10 years of Python 3” talk:
- Dropbox working on mypy to annotate types in their large codebase.
- Python 3.6 is faster than 2.7.
- Instagram codebase is Python, migrating from 2 to 3 and seeing 12% less cpu and 30% less memory usage. (Lisa Guo at PyCon)
- 2.7 -> 3.7: 27 new modules, e.g., asyncio, enum, pathlib, statistics, venv.
- Coroutines, async/await.
- python3statement.org
From the monitoring and cloud devroom:
- Collection of libraries used by Google to export metrics now opensourced, opencensus.io;
- Supports, among others, exporting to Prometheus;
- Also supports tracing;
- Efficient instrumentation, can turn collection on and off.
- Use contextual structured data;
- consistent naming;
- let the use-case dictate ingest pattern;
- ask questions, validate hypotheses with data.
From the Go devroom:
- Google bundles docker containers, cobra, viper and some custom scripts to distribute dev environment with all the tools already set to developers;
- can run with local or remote docker engine;
- authenticates with hashicorp vault;
- elPrep, orignally implemented in common lisp, but the only kind of gc available was stop the world and sequential introducing significant slowdown;
- Java and Go have concurrent GCs,
- Looked into C++, Java and Go and built three similar implementations;
- Java faster than C++ but consuming almost double the memory, Go faster than both with almost no memory overhead.
- Go APIs for Tensorflow, both for training and consuming models.
- Automate finding an offending revision with Go tests and git bisect.
- There will be a GopherCon Iceland in 2018.
Sunday’s highlights
From the MySQL devroom:
- MySQL 8.0 focus is on efficiency, do more with same HW;
- Solved Double Write, some REDO log bottlenecks, improved UPDATE performance;
- will use adaptive spinning for spin locks.
- Roles are coming to MySQL 8;
- implemented as locked users with expired passwords;
- workflow currently is: create role, grant privileges to role, create user, grant role to user, set (default) role.
- set role is not permanent by default;
- histogram support coming to MySQL 8;
- motivation, e.g., order of join has huge impact on performance;
- Will provide information about value distribution to query planner;
- Recommended to use for columns that are not the first column of any index and are used in where clauses, order by or queries with in-subqueries;
- best fit for columns with low cardinality, skewed/uneven and relatively stable distributions;
- Not updated automatically.
From the high performance computing, data science and big data devroom:
- operational/security side of big data, containers and openstack help,
- using cilium and coreos clair,
- CrateDB, a scalable SQL-99 compatible database built on top of Lucene, Netty, Antlr, ElasticSearch.
- Postgres wire protocol,
- Clustering, replication, partiotioned tables,
- Good for monitoring, stream analysis, text analysis, timeseries and geospatial queries;
- uTensor, tensorflow with microcontrollers,
- train model on normal hardware, export to do inference on microcontrollers;
- Dask, pure python implementation providing a real enough dataframe interface for distributed data,
- Apache Beam, a unified programming model providing efficient and portable data processing pipelines,
- has runners for Apex, Spark, Flink, Gearpump, Google Cloud Dataflow, with more comming;
All talks are available on FOSDEM’s YouTube channel or by room on the FOSDEM video recordings website.