Personal takeaways from Open Source Summit

Two weeks ago I was asked to go to Open Source Summit North America 2023 in Vancouver on short notice. Last week I took a beautiful floatplane ride in the morning to get there from my home on Vancouver Island. Fast, scenic, downtown to downtown, and in fact a good price when you try alternatives. I arrived perfectly in time for day one of the conferences and started out with attending the keynotes. And after three awesome days I made my way back home in a Twin Otter plane again.

Over the course of the next three days, keynotes started the day. My personal favourites were the talk from Cory Doctorow, the presentation from the NASA JetPropulsionLab, and the BC Government talk about their open source DevHub. The rest of the day was full of interesting sessions, networking after sessions, in the hallways, in the exhibition hall, and even outside the Vancouver Convention Center, including one evening at a function in the Vancouver aquarium.

The main reasons I went to the event were to represent the Trino community and Starburst to the audience and attendees, and specifically also to have an interview chat with John Furrier and Rob Strechay for SiliconANGLE theCUBe. I blogged a bit more about my attendance and the interview on the Trino website. Head on over there, or read the interview directly on SiliconANGLE, or watch the recording.

Beyond the discussion in the interview with John and Rob, following are a few other thoughts from the event and overall.

Community and collaboration succeeds

Given that I attended an open source conference and a major aspect of the event is community building and leadership, you would think that the conclusion that this work and collaboration succeeds is a self-fulfilling prophecy. That might well be true, but I tend to think that there is much more to it. The success of large initiative is truly staggering and statements such as “open source is the not part of the IT industry, open source is the IT industry” are still baffling to hear when I think back to where we all started. Open source is no longer the hidden gem or the unknown underpinning of some proprietary software. Today, you can not build any reasonably complex software without using open source tools, libraries and frameworks. And that is a good thing – the collaboration across the industry is improving more and more and the quality could not be reached in any other manner. Secrets are much less valuable than collaborative efforts.

Proprietary databases are loosing

The strength of the communities is also evident when you look at the database market. All proprietary vendors are in decline, and under sever pressure from open source competitors. In the RDBMS world PostgreSQL seems to be the only growing system. Companies that work against the community like MongoDB and Elastic are overwhelmed with negative feedback, and alternatives that work for the community often act as a rallying point. I learned that forks of MongoDB prior to the license change are available and thrive. The community around Opensearch tops that with impressive growth and a bright inclusive future for all interested participants.

Big Data is dead, long live big data in lakehouses

What used to be called Big Data and required a dedicated Hadoop cluster and was queried with MapReduce or Hive, is now considered painful, small, and completely legacy. The new open lakehouse format such as Iceberg and Delta Lake are what everyone is looking at as new remedy to solve the problem of bigger and bigger dataset, and higher and higher analytics demands. They are all powered by open source efforts, and they have a fair shot at taking us all to the next level. Which brings me to my next point.

Data mesh continues to be a dream

The idea that data is a product, owned and managed by separate teams, and made available in a market place of data sources and data products, and is reusable for insights across business units is often referred to as data mesh. From all the talk I have seen about this however, it seems a bit like the new emperor has no clothes yet. Often the implemented data mesh usages are just complex setups with a lakehouse that present a bunch of data in views or materialized views that are disguised as data products. However the lift for implementing such a system still seems to large, and only big companies seem to pull it off. And even these companies don’t in my opinion really create a distributed architecture of many data sources and data products, which rich metadata and interactions. And they also typically don’t unlock many different data sources, but instead focus on a lakehouse. Trino of course can do it all.

Trino is on a great path

As part of the event I also attended a number of session about Presto, the legacy Facebook project that Trino evolved from. While the Presto project is still doing some interesting work, my arguably biased view is that Trino is on the right track. Sticking with the Java runtime and innovating on that platform has so far served the project very well. I just can’t see how the port of only the one Hive connector, to a C-based library provides a sufficient benefit compared to loosing all other connectors. And even the performance benefits from the C-approach might be smoke and mirrors when compared to a modern Trino setup with Java 17, caching and all the other performance improvements Trino received in the last three or more years. Trino adoption among Presto users and beyond is bound to continue to rise thanks to our variety of connectors, including crucial support for lakehouses with Iceberg, Delta Lake and Hudi. In parallel Trino provides connectors for all the RDBMS, no-sql, streaming and other data sources. The reality is that all companies use many different data sources. The strong collaboration with more and more community members is showing positive impact across the code base and results in further adoption of Trino. The success and performance of platforms like Starburst Enterprise, Starburst Galaxy, and other systems embedding Trino in their products and data platforms yields benefits for all users, and also all individual products. IBM is trying with Presto and WatsonX and the preview I saw looks nice – I am just not so sure about the narrow connector availability. I hear that DB2 and Netezza might become available – if that is the case I would applaud a contribution to Trino 😉

Next up is Trino Fest

Its an exciting time for Trino and the Trino community. I hope you can all join us next month at the two-half day, free event Trino Fest. We had so many good speaker submissions, that we had to extend our time schedule.

Leave a Reply

Required fields are marked *.