Apache Doris

Software Development

San Francisco, California 2,577 followers

Apache Doris is an open-source real-time data warehouse based on MPP architecture.

View 1 employee

About us

Apache Doris is an open-source real-time data warehouse based on MPP architecture, known for its fast speed and ease of use. It supports real-time data ingestion and real-time query response in both high-concurrency point query and high-throughput analysis scenarios. With it, users can process and analyze large datasets in the blink of an eye. In June 2022, Apache Doris became a full-fledged, top-level project incubated by ASF. It accumulated nearly 600 contributors and more than 20,000 developers are using Apache Doris today. Doris is also used in production within over 2000 companies around the world, trusted by business giants such as AWS, Fuse, JD.com, Lenovo, OPPO, Shoppe, TikTok, Tencent, Vivo, Xiaomi and etc. We welcome more open source technology enthusiasts to join the Apache Doris community and together discover infinite possibilities! Learn more about Apache Doris on Github: https://github.com/apache/doris Join the Apache Doris community on Slack: https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2gmq5o30h-455W226d79zP3L96ZhXIoQ

Website: https://doris.apache.org/
External link for Apache Doris
Industry: Software Development
Company size: 201-500 employees
Headquarters: San Francisco, California
Type: Nonprofit
Founded: 2018

Locations

Primary

San Francisco, California 94102, US

Get directions
Beijing, Beijing 100086, CN

Get directions

Employees at Apache Doris

Shirley Hu

Apache Doris Developer Advocate

See all employees

Updates

Apache Doris

2,577 followers
6mo
Report this post
📢We are thrilled to announce the release of Apache Doris 2.1.0! For our long-term supportive users, allow me to re-introduce Apache Doris with its amazing new features and substantially improved data writing and query performance! For those who are new to Apache Doris, this is great timing for a proof of concept to see how it performs in your use case! Fasten up and be ready for: 🚶♂️ 100% faster out-of-the-box performance proven by TPC-DS benchmark tests 🚶♀️ Improved data lake analytics capabilities: 4~6 times faster than Trino and Spark 🏃♂️ Solid support for semi-structured data analysis 🏃♀️ Materialized view across multiple tables to accelerate multi-table joins 💃 Enhanced real-time writing efficiency powered by AUTO_INCREMENT column, AUTO PARTITION, forward placement of MemTable, and Group Commit. 🕺 Better workload management for higher performance stability https://lnkd.in/gjVXD6gQ #database #dataengineering #analytics #bigdata #opensource

Another big leap: Apache Doris 2.1.0 is released - Apache Doris

doris.apache.org

4 Comments

Like Comment Share
Apache Doris reposted this

VeloDB

682 followers
15h
Report this post
Compute Clusters in VeloDB 🌟 💡 Question 3️⃣ 💡 How to achieve flexible caching? In a compute-storage decoupled architecture, where object storage and HDFS are typically used as remote shared storage systems, the initial I/O requests often experience slow response times. How can we ensure high performance in these situations, and furthermore, in multi-cluster scenarios? VeloDB addresses these challenges by providing a well-designed caching management mechanism: For a single compute cluster, VeloDB defaults to an LRU caching strategy. When the cache size is sufficient to store all hot data, it delivers the same performance as the compute-storage coupled architecture, with much less storage costs. Additionally, VeloDB offers manual caching control strategies, allowing users to prioritize certain tables for caching. During cluster scaling, VeloDB automatically pre-heats or migrates caches based on statistical information to ensure smooth query service despite changes. For multiple computing clusters, VeloDB provides cross-cluster cache synchronization capabilities, thus accelerating query performance. It also supports partition-level cache synchronization control. The cache of each compute cluster operates independently, allowing users to control cache size as needed. #database #cloudcompute #dataengineer
Like Comment Share
Apache Doris

2,577 followers
2d
Report this post
Real-time data processing is more challenging than offline batch processing because it involves complicated operations like multi-stream JOINs and dimension table changes. It requires a higher level of development and maintenance input, and due to the need for system stability guarantee, it often leads to resource redundancy and waste. We are excited to invite the data platform team of TikTok to talk about how they use Apache Doris in their real-time data architecture and how they benefit from it, which could serve as a model for effective real-time data warehousing. https://lnkd.in/gRRpqvkg TikTok also has some job openings for engineers familiar with Apache Doris: https://lnkd.in/gdjxFbmw #TikTok #dataplatform #realtime #dataprocessing #datawarehouse #livestream #ecommerce
Like Comment Share
Apache Doris reposted this

VeloDB

682 followers
6d
Report this post
Compute Clusters in VeloDB 🌟 💡 Question 2️⃣ 💡 How to allow multiple nodes to process data writes simultaneously? After years of exploration, most relational databases adopt a read-heavy architecture, where only one cluster is allowed to write data into the shared storage. However, VeloDB enables multiple clusters to write simultaneously. VeloDB leverages a Multi-Version Concurrency Control (MVCC) mechanism and a shared metadata center for transaction coordination. Data is first submitted to multiple clusters for transformation processing, followed by distributed coordination during the metadata update phase. The cluster that obtains the lock first successfully writes, while other clusters will retry. Since the overhead of data writing primarily occurs during the transformation process, this distributed coordination mechanism and optimistic locking design allow for multi-read and multi-write capabilities, while also utilizing multiple clusters to further enhance concurrent write throughput.
Like Comment Share
Apache Doris

2,577 followers
1w
Report this post
📢 Apache Doris 2.1.6 is released! This version comes with optimizations and new features in data lakehousing, semi-structured data management, query execution, and more. Highlights include but are not limited to: ☑ Data writeback to #Iceberg tables ☑ Wider support for transparent rewriting in async materialized view ☑ More flexible ingestion, conversion, and processing of semi-structured data https://lnkd.in/gvaNhgrG #database #datalakehouse #opensource

Apache Doris 2.1.6 just released - Apache Doris

doris.apache.org

2 Comments

Like Comment Share
Apache Doris reposted this

VeloDB

682 followers
1w
Report this post
Compute Clusters in VeloDB 🌟 The multi-compute cluster architecture of VeloDB is to facilitate read-write isolation and query workload isolation. It may not appear to be challenging to implement such architecture in a cloud-native solution with decoupled computation and storage. However, from a product perspective, there are still many key aspects that require carefully crafted design. We are going to answer a series of questions around it with a few posts. 💡 Question 1️⃣ 💡 How to ensure data consistency across the compute clusters? With computation and storage decoupled, data is in shared storage accessible by multiple compute clusters. VeloDB has undergone in-depth refactoring to achieve shared metadata. After data is written into shared storage, the shared metadata is updated first, and then the data write result is returned. Other clusters will access the shared metadata center and retrieve the latest data.
Like Comment Share
Apache Doris

2,577 followers
1w
Report this post
📢 Apache Doris Flink Connector 24.0.0 is released! It supports Flink 1.2.0, and allows high-speed data reading from Apache Doris using Arrow Flight SQL. Download from GitHub: https://lnkd.in/gPa87rbm Arrow Flight SQL for 10X faster data transfer: https://lnkd.in/gvR7F_hk

Release Apache Doris Flink Connector 24.0.0 Release · apache/doris-flink-connector

github.com

Like Comment Share
Apache Doris

2,577 followers
1w
Report this post
Thanks to Riccardo Bernardi, Eng. for introducing Routine Load, one of our most frequently used data ingestion methods, in a succinct way. It's always valuable to receive user feedback on our technologies and hear how they describe them in their own words. https://lnkd.in/gfB7RW4k #Kafka #database #dataengineer #OLAP #opensource #dataplatform

What are routine loads in Apache Doris? And why are they advantageous…

medium.com

1 Comment

Like Comment Share
Apache Doris

2,577 followers
2w
Report this post
We've been thrilled to witness the incredible formation of #Lakehouse V2 at Ortege AI in the past months. It's truly an honor to see how Apache Doris is fueling the growth and innovation of a #blockchain and #AI #startup. A huge thank you to Justin Trollip for sharing this exciting journey with us. We look forward to the great products you make. https://lnkd.in/g3HwRjjT #database #analytics #Databricks #dataengineering
1 Comment

Like Comment Share
Apache Doris

2,577 followers
2w
Report this post
Quick start guide on building a data lakehouse solution with Apache Doris and Apache #Iceberg. It allows you to: 1️⃣ Build a unified platform for federated data analysis, leveraging the high-performance query engine of Doris to associate data from Iceberg and other data sources. 2️⃣ Manage and build Iceberg tables directly through Doris, performing data cleaning and processing in Doris and writeback to Iceberg. 3️⃣ Build an open data storage platform that shares Doris data with upstream and downstream systems for further processing through the Iceberg table engine. https://lnkd.in/djrBNGXA #DataLakehouse #dataengineering #opensource #dataanalytics
Like Comment Share

Apache Doris

Software Development

San Francisco, California 2,577 followers

Apache Doris is an open-source real-time data warehouse based on MPP architecture.

About us

Locations

Employees at Apache Doris

Shirley Hu

Apache Doris Developer Advocate

Updates

Another big leap: Apache Doris 2.1.0 is released - Apache Doris

doris.apache.org

Apache Doris 2.1.6 just released - Apache Doris

doris.apache.org

Release Apache Doris Flink Connector 24.0.0 Release · apache/doris-flink-connector

github.com

What are routine loads in Apache Doris? And why are they advantageous…

medium.com

Join now to see what you are missing

Similar pages

Apache Hudi

DuckDB

Polars

Apache Iceberg

Apache XTable (Incubating)

MotherDuck

VeloDB

StarRocks

Delta Lake

MinIO