[go: up one dir, main page]

dirgrab 0.3.1

CLI tool to concatenate file contents from directories, respecting Git context.
dirgrab-0.3.1 is not a library.

dirgrab ๐Ÿ“โšก

Crates.io Docs.rs

dirgrab walks a directory (or Git repository), selects the files that matter, and concatenates their contents for easy copy/paste into language models. It can write to stdout, a file, or your clipboard, and it ships with a library crate so the same logic can be embedded elsewhere.

Highlights

  • ๐Ÿ”ง Configurable defaults โ€“ merge built-in defaults with global config.toml, project-local .dirgrab.toml, .dirgrabignore, and CLI flags.
  • ๐Ÿงญ Git-aware out of the box โ€“ untracked files are included by default, scoped to the selected subdirectory, with --tracked-only and --all-repo to opt out.
  • ๐Ÿ—‚๏ธ Structured context โ€“ optional directory tree, per-file headers, PDF text extraction, and deterministic file ordering for stable diffs.
  • ๐Ÿงฎ Better stats โ€“ -s/--stats now prints summary totals plus a per-file token leaderboard, and you can pick which reports to show each run.
  • ๐Ÿ™… Safety nets โ€“ automatically ignores the active output file, respects .gitignore, and gracefully skips binary/non-UTF8 files.

Installation

cargo install dirgrab
# or from a local checkout
# cargo install --path .

Check it worked:

dirgrab --version

Usage

dirgrab [OPTIONS] [TARGET_PATH]

TARGET_PATH defaults to the current directory. When invoked inside a Git repo, dirgrab scopes the listing to that subtree unless you pass --all-repo.

Common Options

  • -o, --output [FILE] โ€“ write to a file (defaults to dirgrab.txt if no name is given). Conflicts with --clipboard.
  • -c, --clipboard โ€“ copy to the system clipboard instead of stdout or a file.
  • --no-headers / --no-tree / --no-pdf โ€“ disable headers, the directory tree, or PDF extraction.
  • -e, --exclude <PATTERN> โ€“ add glob-style excludes (applied after config files).
  • --tracked-only โ€“ Git mode: limit to tracked files. (Compatibility note: -u/--include-untracked still forces inclusion if you need it.)
  • --all-repo โ€“ Git mode: operate on the entire repository even if the target is a subdirectory.
  • --include-default-output โ€“ allow dirgrab.txt back into the run.
  • --no-git โ€“ ignore Git context entirely and walk the filesystem.
  • --no-config โ€“ ignore global/local config files and .dirgrabignore.
  • --config <FILE> โ€“ load an additional TOML config file (applied after global/local unless --no-config).
  • --token-ratio <FLOAT> โ€“ override the characters-to-tokens ratio used by --stats (defaults to 3.6).
  • --tokens-exclude-tree / --tokens-exclude-headers โ€“ subtract tree or header sections when estimating tokens.
  • -s, --stats [REPORT...] โ€“ print stats reports to stderr. Defaults to overview + top-files=5; provide explicit reports like --stats overview top-files=10.
  • -v, -vv, -vvv โ€“ increase log verbosity (Warn, Info, Debug, Trace).
  • -h, --help / -V, --version โ€“ CLI boilerplate.

Configuration Files

dirgrab layers configuration in the following order (later wins):

  1. Built-in defaults
  2. Global config + ignore
    • Linux: ~/.config/dirgrab/config.toml & ~/.config/dirgrab/ignore
    • macOS: ~/Library/Application Support/dirgrab/config.toml & โ€ฆ/ignore
    • Windows: %APPDATA%\dirgrab\config.toml & ignore
  3. Project-local config: <target>/.dirgrab.toml
  4. Project-local ignore patterns: <target>/.dirgrabignore
  5. CLI flags (--tracked-only, --no-tree, etc.)

Sample config.toml:

[dirgrab]
exclude = ["Cargo.lock", "*.csv", "node_modules/", "target/"]
include_tree = true
add_headers = true
convert_pdf = true
tracked_only = false
all_repo = false

[stats]
enabled = true
token_ratio = 3.6
tokens_exclude = ["tree"]
reports = ["overview", "top-files=8"]

ignore files use the same syntax as .gitignore. CLI -e patterns and the active output file name are appended last, so the freshly written file is never re-ingested accidentally.

Examples

# Grab the current repo subtree (includes untracked files) and show stats
dirgrab -s

# Limit to tracked files only and exclude build artifacts
dirgrab --tracked-only -e "*.log" -e "target/"

# Force a whole-repo snapshot from within a subdirectory
dirgrab --all-repo

# Plain directory mode with custom excludes, writing to the default file
dirgrab --no-git -e "*.tmp" -o

# Use project defaults but ignore configs for a โ€œcleanโ€ run
dirgrab --no-config --no-tree --no-headers

Behaviour Notes

  • Git scope & ordering โ€“ Paths are gathered via git ls-files, scoped to the target subtree unless --all-repo is set, and the final list is sorted for deterministic output. Non-Git mode uses walkdir with the same ordering.
  • File headers & tree โ€“ Headers and tree sections remain enabled by default; toggle them per run or through config files.
  • PDF handling โ€“ Text is extracted from PDFs unless disabled. Failures and binary files are skipped with informative (but less noisy) logs.
  • Stats โ€“ When --stats is active (or enabled in config), stderr shows the requested reports (default: totals + top files). Exclude tree/headers, adjust the ratio, or pick different reports via config or CLI.
  • Safety โ€“ dirgrab.txt stays excluded unless explicitly re-enabled, and any active -o FILE target is auto-excluded for that run.

Library (dirgrab-lib)

The same engine powers dirgrab-lib; import it to drive custom tooling:

use dirgrab_lib::{grab_contents, GrabConfig};
# // build a GrabConfig and call grab_contents(&config)

See docs.rs for API details.

Changelog

See CHANGELOG.md for the full release history.

License

Licensed under either of:

Contributing

Issues and PRs are welcome! Please run cargo fmt, cargo clippy, and cargo test before submitting.