Apache Flink

Technology, Information and Internet

Stateful Computations over Data Streams

Discover 1 employee

About us

Apache Flink is a cloud-native and open table format to building Open Data Lakehouses. This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://flink.apache.org/posts/

Website: https://flink.apache.org/
External link for Apache Flink
Industry: Technology, Information and Internet
Company size: 1 employee
Type: Nonprofit

Employees at Apache Flink

Yash Anand

Senior Software Engineer || Real time AI enthusiast

See all employees

Updates

Apache Flink

150 followers
1mo
Report this post
Flink is leading the Streaming space 🚀
Alex Campos

Digital Data Strategist
1mo Edited

Customers are becoming digital, and digital needs data NOW. Fast forward, data analytics quickly evolved from legacy Data Warehouses to Big Data, but always looking at "what happened", a retrospective view. Businesses need fresh data to be faster and make smarter decisions, and that means real-time data. I am happy to introduce "Stream Processing Landscape", a general guideline to understand how Apache Flink, a leading open source engine for real-time data, is setting the pace for data streaming processing and fits into the enterprise ecosystem. 🟢 Structured: well-know and well-governed data source, structured data is pointed to be around 20% of all the corporate data in the world. It is mainly stored in databases, with defined schemas. Apache Flink commonly leverages a CDC strategy to consume data in real-time from databases. 🟢 Unstructured: the remaining 80% of the enterprise data is a combination of unstructured data formats, in many ways. Cost-effective and scalable Big Data solutions, such as cloud-native storage and Apache Kafka, helps companies to safe guard and store years of logs, machine data and images. 🟢 Enterprise Apps: streaming data should integrate with the application ecosystem, triggering actions for best next offers, ad-hoc advertisement and up-selling opportunities. Specialized solutions for marketing, point of sales and enterprise management can be augmented with real-time data and AI. 🟢 Data Ecosystem: Apache Flink leverages most the robust and mature frameworks and engines currently available in the data ecosystem, including open data formats like Apache Parquet, and new Big Data management approaches such as Apache Iceberg and Fluss for Lakehouses. These open standards ensure interoperability and freedom of choice to adopt any tool, any vendor. #StreamProcessing #RealtimeData #ApacheFlink #ApacheIceberg #Lakehouse
Like Comment Share
Apache Flink

150 followers
2mo
Report this post
Bayu Setiawan

Architecting Scalable Lakehouse Solutions with Modern Stack | Python, Airflow, DBT, Spark, Kafka, Flink, Iceberg, Clickhouse, MiNIO, Docker & k8s
2mo

This article sat in my Medium drafts for more than 1 month—I almost forgot to hit publish! I’m excited (and a little nervous) to finally share my first deep-dive into real-time data pipeline — and also my first time using Kafka, Debezium, Flink, Scala, and ClickHouse all together. The goal? Keep today’s data blazing-fast while storing history cheaply. In the write-up, I walk through: 1. Setting up CDC from PostgreSQL to Kafka. 2. Streaming writes into ClickHouse with Apache Flink Scala job in Hot Layer. 3. Splitting Local vs. Remote storage inside ClickHouse. 4. Daily Apache Airflow jobs to run task in Cold Layer: Bronze Hot -> Bronze Cold -> Silver Cold -> Gold Cold. 5. A simple dashboard to bring it all to life. It’s far from perfect, but I learned a ton—and I hope this write-up saves you a few rabbit holes (and headaches) if you’re building something similar. #Kafka #DataEngineering #RealtimeAnalysis #Clickhouse #ApacheFlink

Building a Scalable Real-Time ETL Pipeline with Kafka, Debezium, Flink, Airflow, MinIO, and… medium.com

Like Comment Share
Apache Flink

150 followers
2mo
Report this post
If you are in the Amsterdam area, this is a greaftopportunity to learn from Flink experts.
Alex Campos

Digital Data Strategist
2mo Edited

Hallo Nederland! 🇳🇱 Amsterdam is ready to receive some "real-time" data with Apache Flink® Lab Day hosted by Ververica 🚀 I am excited to bring to life the very first Lab Day powered by Apache Flink, and the beautiful city of Amsterdam will be our first stop 🌏 🔥 What to expect from Lab Day hosted by Ververica | Original creators of Apache Flink®? ⏩ Leverage Apache Flink® to develop an end-to-end use case. ⏩ Build your first streaming application. ⏩ Connect to most-common data source such as Kafka and databases. ⏩ Become a Streaming Champion. 📅 Thursday, September 18, 2025 🕣 8:30 AM - 12:00 PM 📍 Central Amsterdam, Netherlands Secure your spot here: https://lnkd.in/dec3SgQ3 Bouke van der Meer Francie Kastl Jaime López Maciej Mojsiewicz Michael R Misurell Spencer Arnold Karin Landers Joseph Gade Mitchell Gray Rémi Forest Mark Maxwell Roma Astemberg Thomas Gérard Jun Q. Yan Yue
Like Comment Share
Apache Flink reposted this
Apache Hudi

13,762 followers
4mo
Report this post
Wide Table Support and Blob Columns in Apache Hudi. To improve query performance, users often denormalize relational schemas into wide tables, performing multi-stream joins in real-time processing systems like Apache Flink. Similarly, #MachineLearning feature stores curate 100s or 1000s of features per entity to build deeper models and improve training granularity. Additionally, large columns such as blobs (images, PDFs) can outweigh primitive/nested columns, making traditional columnar storage formats inefficient. To handle such wide tables, #lakehouse table formats needs to evolve! Hudi’s new "Column Family" work introduces a new way to organize & update data, improving write, read, and compaction performance. So, How does it work? - Instead of storing all columns in a single file, Hudi splits data into column families (groups of related columns). - Each FileGroup consists of multiple column family files, making updates more granular. Benefits? ✅ Faster writes: Instead of rewriting all columns, only affected column families get updated. ✅ Efficient reads: Queries scan only relevant column families, reducing memory/compute overhead. ✅ Improved SortMerge joins: Sorting within column families enhances Flink’s join performance. ✅ Independent compaction: Compaction runs per column family, avoiding full-row rewrites. How Column Families Benefit Wide Tables & Blob Columns ✅ Wide Tables: Reduces update costs -> only modified column families get rewritten. ✅ Blob Columns: Blob data can be stored in a separate column family, preventing unnecessary read overhead. Read more in the RFC: https://lnkd.in/din64Thk #dataengineering #softwareengineering
Like Comment Share
Apache Flink

150 followers
4mo
Report this post
Four open components — Agent2Agent, Model Context Protocol, Apache Kafka and Apache Flink — provide the infrastructure for collaborative, autonomous AI agents. https://lnkd.in/eEqiGtBY

A2A, MCP, Kafka and Flink: The New Stack for AI Agents https://thenewstack.io

Like Comment Share

LinkedIn respects your privacy

Apache Flink

Technology, Information and Internet

Stateful Computations over Data Streams

About us

Employees at Apache Flink

Yash Anand

Senior Software Engineer || Real time AI enthusiast

Updates

Join now to see what you are missing

Similar pages

Apache Kafka

Confluent

Apache Spark

Apache Iceberg

Apache Hudi

Kafka

Ververica | Original creators of Apache Flink®

Apache Fluss (Incubating)

Delta Lake

JUnit