Apache Flink’s cover photo
Apache Flink

Apache Flink

Technology, Information and Internet

Stateful Computations over Data Streams

About us

Apache Flink is a cloud-native and open table format to building Open Data Lakehouses. This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://flink.apache.org/posts/

Website
https://flink.apache.org/
Industry
Technology, Information and Internet
Company size
1 employee
Type
Nonprofit

Employees at Apache Flink

Updates

  • Flink is leading the Streaming space 🚀

    View profile for Alex Campos

    Digital Data Strategist

    Customers are becoming digital, and digital needs data NOW. Fast forward, data analytics quickly evolved from legacy Data Warehouses to Big Data, but always looking at "what happened", a retrospective view. Businesses need fresh data to be faster and make smarter decisions, and that means real-time data. I am happy to introduce "Stream Processing Landscape", a general guideline to understand how Apache Flink, a leading open source engine for real-time data, is setting the pace for data streaming processing and fits into the enterprise ecosystem. 🟢 Structured: well-know and well-governed data source, structured data is pointed to be around 20% of all the corporate data in the world. It is mainly stored in databases, with defined schemas. Apache Flink commonly leverages a CDC strategy to consume data in real-time from databases. 🟢 Unstructured: the remaining 80% of the enterprise data is a combination of unstructured data formats, in many ways. Cost-effective and scalable Big Data solutions, such as cloud-native storage and Apache Kafka, helps companies to safe guard and store years of logs, machine data and images. 🟢 Enterprise Apps: streaming data should integrate with the application ecosystem, triggering actions for best next offers, ad-hoc advertisement and up-selling opportunities. Specialized solutions for marketing, point of sales and enterprise management can be augmented with real-time data and AI. 🟢 Data Ecosystem: Apache Flink leverages most the robust and mature frameworks and engines currently available in the data ecosystem, including open data formats like Apache Parquet, and new Big Data management approaches such as Apache Iceberg and Fluss for Lakehouses. These open standards ensure interoperability and freedom of choice to adopt any tool, any vendor. #StreamProcessing #RealtimeData #ApacheFlink #ApacheIceberg #Lakehouse

    • No alternative text description for this image
  • View profile for Bayu Setiawan

    Architecting Scalable Lakehouse Solutions with Modern Stack | Python, Airflow, DBT, Spark, Kafka, Flink, Iceberg, Clickhouse, MiNIO, Docker & k8s

    This article sat in my Medium drafts for more than 1 month—I almost forgot to hit publish! I’m excited (and a little nervous) to finally share my first deep-dive into real-time data pipeline — and also my first time using Kafka, Debezium, Flink, Scala, and ClickHouse all together. The goal? Keep today’s data blazing-fast while storing history cheaply. In the write-up, I walk through: 1. Setting up CDC from PostgreSQL to Kafka. 2. Streaming writes into ClickHouse with Apache Flink Scala job in Hot Layer. 3. Splitting Local vs. Remote storage inside ClickHouse. 4. Daily Apache Airflow jobs to run task in Cold Layer: Bronze Hot -> Bronze Cold -> Silver Cold -> Gold Cold. 5. A simple dashboard to bring it all to life. It’s far from perfect, but I learned a ton—and I hope this write-up saves you a few rabbit holes (and headaches) if you’re building something similar. #Kafka #DataEngineering #RealtimeAnalysis #Clickhouse #ApacheFlink

  • If you are in the Amsterdam area, this is a greaftopportunity to learn from Flink experts.

    View profile for Alex Campos

    Digital Data Strategist

    Hallo Nederland! 🇳🇱 Amsterdam is ready to receive some "real-time" data with Apache Flink® Lab Day hosted by Ververica 🚀 I am excited to bring to life the very first Lab Day powered by Apache Flink, and the beautiful city of Amsterdam will be our first stop 🌏 🔥 What to expect from Lab Day hosted by Ververica | Original creators of Apache Flink®? ⏩ Leverage Apache Flink® to develop an end-to-end use case. ⏩ Build your first streaming application. ⏩ Connect to most-common data source such as Kafka and databases. ⏩ Become a Streaming Champion. 📅 Thursday, September 18, 2025 🕣 8:30 AM - 12:00 PM 📍 Central Amsterdam, Netherlands Secure your spot here: https://lnkd.in/dec3SgQ3 Bouke van der Meer Francie Kastl Jaime López Maciej Mojsiewicz Michael R Misurell Spencer Arnold Karin Landers Joseph Gade Mitchell Gray Rémi Forest Mark Maxwell Roma Astemberg Thomas Gérard Jun Q. Yan Yue

    • No alternative text description for this image
  • Apache Flink reposted this

    Wide Table Support and Blob Columns in Apache Hudi. To improve query performance, users often denormalize relational schemas into wide tables, performing multi-stream joins in real-time processing systems like Apache Flink. Similarly, #MachineLearning feature stores curate 100s or 1000s of features per entity to build deeper models and improve training granularity. Additionally, large columns such as blobs (images, PDFs) can outweigh primitive/nested columns, making traditional columnar storage formats inefficient. To handle such wide tables, #lakehouse table formats needs to evolve! Hudi’s new "Column Family" work introduces a new way to organize & update data, improving write, read, and compaction performance. So, How does it work? - Instead of storing all columns in a single file, Hudi splits data into column families (groups of related columns). - Each FileGroup consists of multiple column family files, making updates more granular. Benefits? ✅ Faster writes: Instead of rewriting all columns, only affected column families get updated. ✅ Efficient reads: Queries scan only relevant column families, reducing memory/compute overhead. ✅ Improved SortMerge joins: Sorting within column families enhances Flink’s join performance. ✅ Independent compaction: Compaction runs per column family, avoiding full-row rewrites. How Column Families Benefit Wide Tables & Blob Columns ✅ Wide Tables: Reduces update costs -> only modified column families get rewritten. ✅ Blob Columns: Blob data can be stored in a separate column family, preventing unnecessary read overhead. Read more in the RFC: https://lnkd.in/din64Thk #dataengineering #softwareengineering

    • No alternative text description for this image

Similar pages