Apache Doris

Apache Doris

Software Development

San Francisco, California 2,577 followers

Apache Doris is an open-source real-time data warehouse based on MPP architecture.

About us

Apache Doris is an open-source real-time data warehouse based on MPP architecture, known for its fast speed and ease of use. It supports real-time data ingestion and real-time query response in both high-concurrency point query and high-throughput analysis scenarios. With it, users can process and analyze large datasets in the blink of an eye. In June 2022, Apache Doris became a full-fledged, top-level project incubated by ASF. It accumulated nearly 600 contributors and more than 20,000 developers are using Apache Doris today. Doris is also used in production within over 2000 companies around the world, trusted by business giants such as AWS, Fuse, JD.com, Lenovo, OPPO, Shoppe, TikTok, Tencent, Vivo, Xiaomi and etc. We welcome more open source technology enthusiasts to join the Apache Doris community and together discover infinite possibilities! Learn more about Apache Doris on Github: https://github.com/apache/doris Join the Apache Doris community on Slack: https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2gmq5o30h-455W226d79zP3L96ZhXIoQ

Website
https://doris.apache.org/
Industry
Software Development
Company size
201-500 employees
Headquarters
San Francisco, California
Type
Nonprofit
Founded
2018

Locations

Employees at Apache Doris

Updates

  • View organization page for Apache Doris, graphic

    2,577 followers

    📢We are thrilled to announce the release of Apache Doris 2.1.0! For our long-term supportive users, allow me to re-introduce Apache Doris with its amazing new features and substantially improved data writing and query performance! For those who are new to Apache Doris, this is great timing for a proof of concept to see how it performs in your use case! Fasten up and be ready for: 🚶♂️ 100% faster out-of-the-box performance proven by TPC-DS benchmark tests 🚶♀️ Improved data lake analytics capabilities: 4~6 times faster than Trino and Spark 🏃♂️ Solid support for semi-structured data analysis 🏃♀️ Materialized view across multiple tables to accelerate multi-table joins 💃 Enhanced real-time writing efficiency powered by AUTO_INCREMENT column, AUTO PARTITION, forward placement of MemTable, and Group Commit. 🕺 Better workload management for higher performance stability https://lnkd.in/gjVXD6gQ #database #dataengineering #analytics #bigdata #opensource

    Another big leap: Apache Doris 2.1.0 is released - Apache Doris

    Another big leap: Apache Doris 2.1.0 is released - Apache Doris

    doris.apache.org

  • Apache Doris reposted this

    View organization page for VeloDB, graphic

    682 followers

    Compute Clusters in VeloDB 🌟 💡 Question 3️⃣ 💡 How to achieve flexible caching? In a compute-storage decoupled architecture, where object storage and HDFS are typically used as remote shared storage systems, the initial I/O requests often experience slow response times. How can we ensure high performance in these situations, and furthermore, in multi-cluster scenarios? VeloDB addresses these challenges by providing a well-designed caching management mechanism: For a single compute cluster, VeloDB defaults to an LRU caching strategy. When the cache size is sufficient to store all hot data, it delivers the same performance as the compute-storage coupled architecture, with much less storage costs. Additionally, VeloDB offers manual caching control strategies, allowing users to prioritize certain tables for caching. During cluster scaling, VeloDB automatically pre-heats or migrates caches based on statistical information to ensure smooth query service despite changes. For multiple computing clusters, VeloDB provides cross-cluster cache synchronization capabilities, thus accelerating query performance. It also supports partition-level cache synchronization control. The cache of each compute cluster operates independently, allowing users to control cache size as needed. #database #cloudcompute #dataengineer

    • No alternative text description for this image
  • View organization page for Apache Doris, graphic

    2,577 followers

    Real-time data processing is more challenging than offline batch processing because it involves complicated operations like multi-stream JOINs and dimension table changes. It requires a higher level of development and maintenance input, and due to the need for system stability guarantee, it often leads to resource redundancy and waste. We are excited to invite the data platform team of TikTok to talk about how they use Apache Doris in their real-time data architecture and how they benefit from it, which could serve as a model for effective real-time data warehousing. https://lnkd.in/gRRpqvkg TikTok also has some job openings for engineers familiar with Apache Doris: https://lnkd.in/gdjxFbmw #TikTok #dataplatform #realtime #dataprocessing #datawarehouse #livestream #ecommerce

    • No alternative text description for this image
  • Apache Doris reposted this

    View organization page for VeloDB, graphic

    682 followers

    Compute Clusters in VeloDB 🌟 💡 Question 2️⃣ 💡 How to allow multiple nodes to process data writes simultaneously? After years of exploration, most relational databases adopt a read-heavy architecture, where only one cluster is allowed to write data into the shared storage. However, VeloDB enables multiple clusters to write simultaneously. VeloDB leverages a Multi-Version Concurrency Control (MVCC) mechanism and a shared metadata center for transaction coordination. Data is first submitted to multiple clusters for transformation processing, followed by distributed coordination during the metadata update phase. The cluster that obtains the lock first successfully writes, while other clusters will retry. Since the overhead of data writing primarily occurs during the transformation process, this distributed coordination mechanism and optimistic locking design allow for multi-read and multi-write capabilities, while also utilizing multiple clusters to further enhance concurrent write throughput.

    • No alternative text description for this image
  • View organization page for Apache Doris, graphic

    2,577 followers

    📢 Apache Doris 2.1.6 is released! This version comes with optimizations and new features in data lakehousing, semi-structured data management, query execution, and more. Highlights include but are not limited to: ☑ Data writeback to #Iceberg tables ☑ Wider support for transparent rewriting in async materialized view ☑ More flexible ingestion, conversion, and processing of semi-structured data https://lnkd.in/gvaNhgrG #database #datalakehouse #opensource

    Apache Doris 2.1.6 just released - Apache Doris

    Apache Doris 2.1.6 just released - Apache Doris

    doris.apache.org

  • Apache Doris reposted this

    View organization page for VeloDB, graphic

    682 followers

    Compute Clusters in VeloDB 🌟 The multi-compute cluster architecture of VeloDB is to facilitate read-write isolation and query workload isolation.  It may not appear to be challenging to implement such architecture in a cloud-native solution with decoupled computation and storage. However, from a product perspective, there are still many key aspects that require carefully crafted design. We are going to answer a series of questions around it with a few posts. 💡 Question 1️⃣ 💡 How to ensure data consistency across the compute clusters? With computation and storage decoupled, data is in shared storage accessible by multiple compute clusters. VeloDB has undergone in-depth refactoring to achieve shared metadata. After data is written into shared storage, the shared metadata is updated first, and then the data write result is returned. Other clusters will access the shared metadata center and retrieve the latest data.

    • No alternative text description for this image
  • View organization page for Apache Doris, graphic

    2,577 followers

    Thanks to Riccardo Bernardi, Eng. for introducing Routine Load, one of our most frequently used data ingestion methods, in a succinct way.  It's always valuable to receive user feedback on our technologies and hear how they describe them in their own words. https://lnkd.in/gfB7RW4k #Kafka #database #dataengineer #OLAP #opensource #dataplatform

    What are routine loads in Apache Doris? And why are they advantageous…

    What are routine loads in Apache Doris? And why are they advantageous…

    medium.com

  • View organization page for Apache Doris, graphic

    2,577 followers

    We've been thrilled to witness the incredible formation of #Lakehouse V2 at Ortege AI in the past months. It's truly an honor to see how Apache Doris is fueling the growth and innovation of a #blockchain and #AI #startup. A huge thank you to Justin Trollip for sharing this exciting journey with us. We look forward to the great products you make. https://lnkd.in/g3HwRjjT #database #analytics #Databricks #dataengineering

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • View organization page for Apache Doris, graphic

    2,577 followers

    Quick start guide on building a data lakehouse solution with Apache Doris and Apache #Iceberg. It allows you to: 1️⃣ Build a unified platform for federated data analysis, leveraging the high-performance query engine of Doris to associate data from Iceberg and other data sources. 2️⃣ Manage and build Iceberg tables directly through Doris, performing data cleaning and processing in Doris and writeback to Iceberg. 3️⃣ Build an open data storage platform that shares Doris data with upstream and downstream systems for further processing through the Iceberg table engine. https://lnkd.in/djrBNGXA #DataLakehouse #dataengineering #opensource #dataanalytics

    • No alternative text description for this image

Similar pages