[go: up one dir, main page]

Showing 170 open source projects for "etl"

View related business solutions
  • All-in-One IT Monitoring - No More Blind Spots Icon
    All-in-One IT Monitoring - No More Blind Spots

    Stop juggling tools. PRTG gives you a complete, real-time view of your IT: servers, devices, cloud, and more - in one easy dashboard.

    Tired of switching between different tools and missing critical alerts? PRTG brings everything together, monitoring your entire IT infrastructure from a single, intuitive interface. Whether it’s servers, switches, printers, or cloud services, you get instant visibility and clear notifications - no technical jargon, no clutter. Set up in minutes, PRTG helps you prevent downtime, reduce stress, and prove your value to your company. Focus on your job, not on chasing issues. Try PRTG and experience true IT peace of mind.
    Get Your Unified IT Trial
  • Fully managed relational database service for MySQL, PostgreSQL, and SQL Server Icon
    Fully managed relational database service for MySQL, PostgreSQL, and SQL Server

    Focus on your application, and leave the database to us

    Cloud SQL manages your databases so you don't have to, so your business can run without disruption. It automates all your backups, replication, patches, encryption, and storage capacity increases to give your applications the reliability, scalability, and security they need.
    Try for free
  • 1
    Ethereum ETL

    Ethereum ETL

    Python scripts for ETL (extract, transform and load) jobs for Ethereum

    Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery. Ethereum ETL lets you convert blockchain data into convenient formats like CSVs and relational databases.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    Embedded Template Library (ETL)

    Embedded Template Library (ETL)

    Embedded Template Library

    C++ is a great language to use for embedded applications and templates are a powerful aspect. The standard library can offer a great deal of well-tested functionality, but there are some parts of the standard library that do not fit well with deterministic behavior and limited resource requirements. These limitations usually preclude the use of dynamically allocated memory and containers with open-ended sizes. What is needed is a template library where the user can declare the size, or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    InfluxDB

    InfluxDB

    The open source time series database

    InfluxDB is an open source time series datastore designed to handle high write and query loads. Time series is currently the fastest growing database category there is, and InfluxDB is here to ensure businesses can keep up. InfluxDB provides infrastructure and application monitoring, IoT monitoring and analytics and more. It has APIs for storing and querying data, processing it in the background for ETL or monitoring and alerting purposes. This data can also be visualized, explored and more...
    Downloads: 38 This Week
    Last Update:
    See Project
  • 4
    Addax

    Addax

    Addax is a versatile open-source ETL tool

    Addax is a data integration and ETL (Extract, Transform, Load) tool designed for high-performance data migration tasks. It simplifies the process of moving data between different systems and formats.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Turn speech into text using Google AI Icon
    Turn speech into text using Google AI

    Accurately convert voice to text in over 125 languages and variants by applying powerful machine learning models with an easy-to-use API.

    New customers get $300 in free credits to spend on Speech-to-Text. All customers get 60 minutes for transcribing and analyzing audio free per month, not charged against your credits.
    Try for free
  • 5
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    ... ETL tasks like load/unload data from Data Lakes, Data Warehouses, and Databases. Convert the column name to be compatible with Amazon Athena and the AWS Glue Catalog. Run a query against AWS CloudWatchLogs Insights and convert the results to Pandas DataFrame. Get QuickSight dashboard ID given a name and fails if there is more than 1 ID associated with this name. List IAM policy assignments in the current Amazon QuickSight account.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 6
    Logstash

    Logstash

    Centralize, transform and stash your data

    Logstash is a server-side data processing pipeline that dynamically ingests data from numerous sources, transforms it, and ships it to your favorite “stash” regardless of format or complexity. It supports and ingests data of all shapes, sizes and sources, dynamically transforms and prepares this data, and transports it to the output of your choice. Logstash is extensible, with over 200 plugins available to let you create and configure your pipeline how you choose.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Dungbeetle

    Dungbeetle

    A distributed job server

    Dungbeetle is a metadata and data lineage tracking tool developed by Zerodha to map and visualize how data flows across systems. It helps teams maintain data transparency by tracking dependencies between databases, tables, and reports, offering a centralized view of data pipelines. Dungbeetle is designed to enhance observability and trust in analytics ecosystems.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    Rubix ML

    Rubix ML

    A high-level machine learning and deep learning library for PHP

    Rubix ML is a free open-source machine learning (ML) library that allows you to build programs that learn from your data using the PHP language. We provide tools for the entire machine learning life cycle from ETL to training, cross-validation, and production with over 40 supervised and unsupervised learning algorithms. In addition, we provide tutorials and other educational content to help you get started using ML in your projects. Our intuitive interface is quick to grasp while hiding alot...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Find out just how much your login box can do for your customer | Auth0 Icon
    Find out just how much your login box can do for your customer | Auth0

    With over 53 social login options, you can fast-track the signup and login experience for users.

    From improving customer experience through seamless sign-on to making MFA as easy as a click of a button – your login box must find the right balance between user convenience, privacy and security.
    Sign up
  • 10
    lakeFS

    lakeFS

    lakeFS - Git-like capabilities for your object storage

    ... is version controlled and you can easily time-travel between consistent snapshots of the lake. Easier ETL testing - test your ETLs on top of production data, in isolation, without copying anything. Safely experiment and test on full production data. Easily Collaborate on production data with your team. Automate data quality checks within data pipelines.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 11
    Hamilton DAGWorks

    Hamilton DAGWorks

    Helps scientists define testable, modular, self-documenting dataflow

    .... As shown below, it results in readable code that can always be visualized. Hamilton loads that definition and automatically builds the DAG for you. Hamilton brings modularity and structure to any Python application moving data: ETL pipelines, ML workflows, LLM applications, RAG systems, BI dashboards, and the Hamilton UI allows you to automatically visualize, catalog, and monitor execution.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    Daft

    Daft

    Distributed DataFrame for Python designed for the cloud

    Daft is a framework for ETL, analytics and ML/AI at scale. Its familiar Python Dataframe API is built to outperform Spark in performance and ease of use. Daft plugs directly into your ML/AI stack through efficient zero-copy integrations with essential Python libraries such as Pytorch and Ray. It also allows requesting GPUs as a resource for running models. Daft runs locally with a lightweight multithreaded backend. When your local machine is no longer sufficient, it scales seamlessly to run out...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    Steampipe

    Steampipe

    Zero-ETL, infinite possibilities. Live query APIs, code & more

    Steampipe is the zero-ETL solution for getting data directly from APIs and services. We offer these Steampipe engines. SQL has been the data access standard for decades. It levels the playing field for your team, easily integrates with other systems, and accelerates delivery. Painlessly join live cloud configuration data with internal or external data sets to create new insights. Your cloud is a live database that changes fast. Don't wait on ETL to sync, or rely on old data. Crunch it where...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    UIforETW

    UIforETW

    User interface for recording and managing ETW traces

    UIforETW is a Windows performance tracing companion that wraps the Event Tracing for Windows (ETW) toolchain in an approachable GUI. It standardizes trace collection profiles, launches WPR/xperf with the right providers, and organizes the resulting .etl files for repeatable investigations. The tool streamlines the entire loop—record, annotate, open in WPA/XperfView—so engineers can focus on finding scheduling stalls, I/O bottlenecks, GC pauses, or GPU hitches instead of memorizing command-line...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    Erigon

    Erigon

    Ethereum implementation on the efficiency frontier

    Erigon is an implementation of Ethereum (execution client), on the efficiency frontier, written in Go. For an Archive node of Ethereum Mainnet we recommend >=3TB storage space: 1.8TB state (as of March 2022), 200GB temp files (can symlink or mount folder <datadir>/etl-tmp to another disk). Ethereum Mainnet Full node ( see --prune* flags): 400Gb. Erigon by default is "all in one binary" solution, but it's possible start TxPool as separated processes. Same true about: JSON RPC layer (RPCDaemon...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    Pathway

    Pathway

    Python ETL framework for stream processing, real-time analytics, LLM

    Pathway is an open-source framework designed for building real-time data applications using reactive and declarative paradigms. It enables seamless integration of live data streams and structured data into analytical pipelines with minimal latency. Pathway is especially well-suited for scenarios like financial analytics, IoT, fraud detection, and logistics, where high-velocity and continuously changing data is the norm. Unlike traditional batch processing frameworks, Pathway continuously...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    omniparser

    omniparser

    Native Golang ETL streaming parser and transform library

    Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON output based on a schema written in JSON.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    TiDB

    TiDB

    Open Source NewSQL Database

    TiDB is an open source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is currently the most actively developed open source NewSQL database, and has a rich set of features including horizontal scalability, strong consistency, and high availability.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Superduper

    Superduper

    Superduper: Integrate AI models and machine learning workflows

    ... developers to completely avoid implementing MLOps, ETL pipelines, model deployment, data migration, and synchronization. Using Superduper is simply "CAPE": Connect to your data, apply arbitrary AI to that data, package and reuse the application on arbitrary data, and execute AI-database queries and predictions on the resulting AI outputs and data.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Prefect

    Prefect

    Prefect is a workflow orchestration framework

    Prefect is an open-source modern workflow orchestration tool for scheduling, monitoring, and managing data workflows and tasks. It enables Python-native pipeline definitions with robust retries, caching, observability, and a powerful UI—ideal for data engineering and ETL processes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    ImportExcel

    ImportExcel

    PowerShell module to import/export Excel spreadsheets, without Excel

    ... styling or conditional formatting programmatically. The module is optimized for performance (streaming where possible) and supports large datasets, making it useful for ETL tasks, automated reporting, and data analysis in pure PowerShell environments. It integrates well with scheduled jobs and CI pipelines where generating or consuming spreadsheets is part of an automated workflow.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    AlaSQL

    AlaSQL

    JavaScript SQL database for browser and Node.js for relational tables

    AlaSQL.js - JavaScript SQL database for browser and Node.js. Handles both traditional relational tables and nested JSON data (NoSQL). Export, store, and import data from localStorage, IndexedDB, or Excel. We focus on speed by taking advantage of the dynamic nature of JavaScript when building up queries. Real-world solutions demand flexibility regarding where data comes from and where it is to be stored. We focus on flexibility by making sure you can import/export and query directly on data...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Pyper

    Pyper

    Concurrent Python made simple

    Pyper is a Python-native orchestration and scheduling framework designed for modern data workflows, machine learning pipelines, and any task that benefits from a lightweight DAG-based execution engine. Unlike heavier platforms like Airflow, Pyper aims to remain lean, modular, and developer-friendly, embracing Pythonic conventions and minimizing boilerplate. It focuses on local development ergonomics and seamless transition to production environments, making it ideal for small teams and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    ... (microbatches) and Structured Streaming, it delivers low-latency event processing suitable for real-time analytics. The built-in MLlib library provides scalable machine learning algorithms, while GraphX enables graph computations integrated with data pipelines. Spark supports multiple languages—Scala, Java, Python, R—and connects with many storage systems like HDFS, S3, Cassandra, and streaming platforms like Kafka, making it a versatile choice for big data workloads in analytics, ETL, and data science.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next