

So they lose a lot of time and money for their organization, their work is slowed down and, therefore, Science is slowed down. They either end up using way too many systems for their problem, or they build them in-house because no system fits their needs. And these folks, being scientists and despite the fact that they are more than capable of handling data engineering tasks, are lost in this data system noise. Problems important for Humanity, those kinds of problems. We are currently working with some organizations who are actually trying to solve some very important problems.
Universal database definition software#
Sometimes I think they are just doing it to troll us, as they can even invest in companies developing the exact same software branded with a different name, or in systems with the exact same features but with marginal performance differences.Īnd you may say: “So what? The market is well capitalized and more talent gets hired”. VCs spend inordinate amounts of money on new startups around those systems, with the recipe being: (i) GitHub stars or Hacker News top story, coupled with (ii) some top university pedigree or top tech company previous affiliation for the founders. Thousands of “data systems” - databases, warehouses, lake houses, metadata stores, governance systems, catalogs, ML model/feature stores - flood the market. Therefore, a data management mess ensues. Because those ML models are trained over data, they are serviced on data, and they themselves constitute… well, data. As everyone jumps on the bandwagon and builds numerous pieces of software around ML, an important mistake is made: everyone thinks that ML is a compute problem, whereas it is in fact a data management problem. To add insult to injury, Machine Learning is undergoing massive hype. This gave rise to “lake houses”, which pretty much boil down to the following: (i) dump your data into cloud buckets as files, (ii) adopt a “hammer” for scalable compute, (iii) treat data management as an afterthought by applying “hacks”. Organizations with large quantities of data prefer storing it in cheap cloud stores in the form of files, effectively separating storage from compute. In the meantime, the cloud has changed data management radically. This data is typically massive and seemingly diverse, and cannot be managed effectively with “traditional” databases.

While we have built enormous sophistication over the past 5 decades in relational databases which manage tabular data perfectly, most of the data being generated out there (e.g., by the Sciences) is not tabular.
Universal database definition full#
The full webinar recording The problem with purpose-built data systems I look forward to hearing your thoughts and initiating a necessary dialogue in our industry. I am also providing the gist below in text, in case you folks are too busy or bored to listen to some random dude talking about the exact opposite of what cloud vendors are telling you and how the market works. And we built one, so we thought to share how we did it! We call such a system the universal database. We explain that there is huge Engineering overlap across all those data systems and, therefore, it is possible to build a single system that can manage all data types in a unified way, for all applications, in a foundational and future-proof manner. Those create a lot of noise for analysts and scientists dealing with vital data problems across numerous important application domains, making their lives very difficult and slowing down Science for all of us. We argue that the market is saturated with thousands of purpose-built data(base) systems. We hosted a webinar where I shared what we have been working on for the past several years to make this vision a reality. Last week we had a coming out party for our audacious vision at TileDB.
