[go: up one dir, main page]

Skip to content

Hive

This guide will get you up and running with Apache Iceberg™ using Apache Hive™, including sample code to highlight some powerful features. You can learn more about Iceberg's Hive runtime by checking out the Hive section.

Docker Images🔗

The fastest way to get started is to use Apache Hive images which provides a SQL-like interface to create and query Iceberg tables from your laptop. You need to install the Docker Desktop.

Take a look at the Tags tab in Apache Hive docker images to see the available Hive versions.

Set the version variable.

export HIVE_VERSION=4.0.0

To accommodate both Intel-based (x86_64) and Apple Silicon (M1, M2, M3) Macs when running your Docker container, you can use the --platform flag to specify the desired architecture. Apple Silicon Macs use the arm64 architecture, while Intel Macs use the amd64 architecture. Start the container, using the option --platform linux/arm64 for a Mac with an M-Series chip:

docker run -d --platform linux/arm64 -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION}

The docker run command above configures Hive to use the embedded derby database for Hive Metastore. Hive Metastore functions as the Iceberg catalog to locate Iceberg files, which can be anywhere.

Give HiveServer (HS2) a little time to come up in the docker container, and then start the Hive Beeline client using the following command to connect with the HS2 containers you already started:

docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:10000/'

The hive prompt appears:

0: jdbc:hive2://localhost:10000>

You can now run SQL queries to create Iceberg tables and query the tables.

show databases;

Creating a Table🔗

To create your first Iceberg table in Hive, run a CREATE TABLE command. Let's create a table using nyc.taxis where nyc is the database name and taxis is the table name.

CREATE DATABASE nyc;
CREATE TABLE nyc.taxis
(
  trip_id bigint,
  trip_distance float,
  fare_amount double,
  store_and_fwd_flag string
)
PARTITIONED BY (vendor_id bigint) STORED BY ICEBERG;
Iceberg catalogs support the full range of SQL DDL commands, including:

Writing Data to a Table🔗

After your table is created, you can insert records.

INSERT INTO nyc.taxis
VALUES (1000371, 1.8, 15.32, 'N', 1), (1000372, 2.5, 22.15, 'N', 2), (1000373, 0.9, 9.01, 'N', 2), (1000374, 8.4, 42.13, 'Y', 1);

Reading Data from a Table🔗

To read a table, simply use the Iceberg table's name.

SELECT * FROM nyc.taxis;

Next steps🔗

Adding Iceberg to Hive🔗

If you already have a Hive 4.0.0 or later environment, it comes with the Iceberg 1.4.3 included. No additional downloads or jars are needed. If you have a Hive 2.3.x or Hive 3.1.x environment see Enabling Iceberg support in Hive.

Learn More🔗

To learn more about setting up a database other than Derby, see Apache Hive Quick Start. You can also set up a standalone metastore, HS2 and Postgres. Now that you're up and running with Iceberg and Hive, check out the Iceberg-Hive docs to learn more!