Something Weird is Going on at Databricks
A third stealth acquisition with implications for Decentralized A.I.
Hey Guys,
Welcome to Data Science Learning Center Premium.
During the pandemic few companies have accelerated their market cap as Databricks has. Databricks develops a web-based platform for working with Spark that provides automated cluster management and IPython-style notebooks. I’d suggest their market cap in mid 2022 is around $50 Billion. That’s quite a jump since in February, 2021 Databricks raised $1 billion at a $28 billion valuation against ARR of $425 million.
In October, 2021 Databricks, the data analysis and AI software start-up, made its second acquisition, of German no-code company 8080 Labs, as it extended focus on data applications for non-computer scientists.
Databricks, like Snowflake, is just really finding traction. Some of Databricks’ biggest investors, including Google, Amazon and Salesforce, have been pushing into the low-code/no-code space in recent years, technology which enables non-coders to develop enterprise apps across business functions, also referred to as the rise of the “citizen developer”. However Databricks wasn’t done there.
The Race for the One-Stop Shop for Data
With origins in academia and the open source community, Databricks was founded in 2013 by the original creators of Apache Spark™, Delta Lake and MLflow.
As the world’s first and only lakehouse platform in the cloud, Databricks combines the best of data warehouses and data lakes to offer an open and unified platform for data and AI.
In many ways Databricks and Snowflake intersect a bit. You might call it the race to build a one-stop-shop for your data. With enterprises large and small racing to build out their data infrastructure, one foundational piece these enterprise companies all need is an easy place to store their data. However both companies are poised to use and leverage that for better ROI.
Data lakehouses combine the storage and analytics features of data warehouses and data lakes, with vendors like Databricks offering solutions for specific industries.
Surprise, Cortex Labs is Joining Databricks
The most recent news caught me a bit off guard:
Today, more than 7,000 organizations worldwide — including ABN AMRO, Condé Nast, H&M Group, Regeneron and Shell — rely on Databricks to enable massive-scale data engineering, collaborative data science, full-lifecycle machine learning and business analytics.
After launching industry-specific data lakehouses for the retail, financial services and healthcare sectors over the past three months, Databricks is releasing a solution targeting the media and the entertainment (M&E) sector.
Cortex Labs is quite an interesting enigma here. Cloud data lakes and warehouses have become a critical element in answering enterprise data management needs.
As enterprises grow their investments in data platforms, they increasingly want to go beyond using data for internal analytics and start integrating predictions from machine learning (ML) models to create a competitive advantage for their products and services. This is not a sponsored post, but this acquisition really surprised me.
For example, financial institutions deploy ML models to detect fraudulent transactions in real-time, and retailers use ML models to personalize product recommendations for each customer.
In June 2020, Databricks acquired Redash, an Israeli open-source tech company focused on data visualization.
In October 2021, Databricks acquired 8080 Labs that makes bamboolib, a data exploration tool that does not require coding to use, bringing what has become one of the most popular trends in enterprise development — the rise of low-code/no-code solutions — to Databricks’ Lakehouse Platform.
Now in April 2021, Databricks has acquired Cortex Labs. To accelerate model serving and MLOps on Databricks, they are excited to announce that Cortex Labs, a Bay Area-based MLOps startup, has joined Databricks.
Cortex Labs is the maker of Cortex, a popular open-source platform for deploying, managing and scaling ML models in production. Cortex Labs was backed by leading infrastructure software investors Pitango Venture Capital, Engineering Capital, Uncorrelated Ventures, at.inc/, and Abstraction Capital, as well as angels Jeremey Schneider and Lior Gavish.
Cortex Labs has a significant blockchain component. Just look at their Twitter.
Cortex Labs is Really a Decentralized A.I. Company
Think about it, Cortex enables engineers and data scientists to deploy ML models in production without worrying about DevOps or cloud infrastructure. Companies from cybersecurity, biotechnology, retail, and other industries use Cortex to scale production ML workloads reliably, securely and cost effectively.
This also has implications in A.I. Dapps and decentralization A.I. research.
So this is actually pretty fascinating.
Cortex uses Solidity as the smart contract language to reduce friction and offers a rich array of tools for developers to write and integrate AI into smart contracts.
I think this is a very smart move by Databricks which has raised $3.5 billion so far. When they go IPO, they will be a very important company.
Databricks has become a one-stop solution for the entire analytics team instead of giant vendors. Databricks’ unique offerings include Machine Learning Runtime, managed ML Flow, Collaborative Notebooks, Dataframes and Spark SQL libraries. Its unified analytics platform allows the team of Data Engineers, Data Analysts, Data Scientists and Machine Learning Engineers to work on a project together.
As for Cortex Labs, the company’s founders were students at Berkeley when they observed that one of the problems around creating machine learning models was finding a way to deploy them. While there was a lot of open-source tooling available, data scientists are not experts in infrastructure.
What the four founders did was take a set of open-source tools and combine them with AWS services to provide a way to deploy models more easily.
SageMaker, of course, only works on the Amazon cloud, while Cortex will eventually work on any cloud.
Databricks originally caught on with enterprise clients — it is on pace to generate $1 billion or more in 2022 revenue, a growth rate of over 75% — with a version of big data tool Apache Spark, an alternative to the Hadoop technology for storing lots of different kinds of data in massive quantities. But I think in 2022 it’s reaching Snowflake levels of hype and utility. That’s actually pretty hard to do.
Both Databricks and Snowflake offer strong scalability, but scaling up and down is easier with Snowflake. In Snowflake, processing and storage layers scale independently.
Databricks auto-scales depending on the workload where it can scale down during circumstances of the platform being 100% idle for long enough. Then, it removes idle workers on under-utilized clusters.
Databricks said:
We’re thrilled to welcome Co-founders Omer Spillinger and David Eliahu to the Databricks team. Together, we’ll be working to realize our shared vision of an end-to-end, multi-cloud platform that empowers enterprises to deliver machine learning applications to their customers.
Datbricks is up to more stealthy ninja stuff than I expected from them. We’re watching you!
Check out Cortex Labs on Github: github.com/cortexlabs/cortex
It took them just 4 years and 4 months to get acquired. Databricks did not disclose the amount or mention the true nature of what Cortex Labs does in their blog. Which is pretty Ninja, lmao.
Databricks is an enormous bet of Andreessen Horowitz. Microsoft is likely also a significant backer.
What do you think?
If you want to support me so I can keep writing, please don’t hesitate to give me tips, a paid subscription or some donation. With a conversion rate of less than two percent, this Newsletter exists mostly by the grace of my goodwill (passion for A.I. and Data Science) & my own experience of material poverty as I try to pivot into the Creator Economy.
Thanks for reading!