[go: up one dir, main page]

dirgrab 0.3.0

CLI tool to concatenate file contents from directories, respecting Git context.
# dirgrab ๐Ÿ“โšก

[![Crates.io](https://img.shields.io/crates/v/dirgrab.svg)](https://crates.io/crates/dirgrab)
[![Docs.rs](https://docs.rs/dirgrab-lib/badge.svg)](https://docs.rs/dirgrab-lib)

`dirgrab` walks a directory (or Git repository), selects the files that matter, and concatenates their contents for easy copy/paste into language models. It can write to stdout, a file, or your clipboard, and it ships with a library crate so the same logic can be embedded elsewhere.

## Highlights

- ๐Ÿ”ง **Configurable defaults** โ€“ merge built-in defaults with global `config.toml`, project-local `.dirgrab.toml`, `.dirgrabignore`, and CLI flags.
- ๐Ÿงญ **Git-aware out of the box** โ€“ untracked files are included by default, scoped to the selected subdirectory, with `--tracked-only` and `--all-repo` to opt out.
- ๐Ÿ—‚๏ธ **Structured context** โ€“ optional directory tree, per-file headers, PDF text extraction, and deterministic file ordering for stable diffs.
- ๐Ÿงฎ **Better stats** โ€“ `-s/--stats` now prints bytes, words, and an approximate token count with configurable ratio and exclusion toggles.
- ๐Ÿ™… **Safety nets** โ€“ automatically ignores the active output file, respects `.gitignore`, and gracefully skips binary/non-UTF8 files.

## Installation

```bash
cargo install dirgrab
# or from a local checkout
# cargo install --path .
```

Check it worked:

```bash
dirgrab --version
```

## Usage

```bash
dirgrab [OPTIONS] [TARGET_PATH]
```

`TARGET_PATH` defaults to the current directory. When invoked inside a Git repo, `dirgrab` scopes the listing to that subtree unless you pass `--all-repo`.

### Common Options

- `-o, --output [FILE]` โ€“ write to a file (defaults to `dirgrab.txt` if no name is given). Conflicts with `--clipboard`.
- `-c, --clipboard` โ€“ copy to the system clipboard instead of stdout or a file.
- `--no-headers` / `--no-tree` / `--no-pdf` โ€“ disable headers, the directory tree, or PDF extraction.
- `-e, --exclude <PATTERN>` โ€“ add glob-style excludes (applied after config files).
- `--tracked-only` โ€“ Git mode: limit to tracked files. (Compatibility note: `-u/--include-untracked` still forces inclusion if you need it.)
- `--all-repo` โ€“ Git mode: operate on the entire repository even if the target is a subdirectory.
- `--include-default-output` โ€“ allow `dirgrab.txt` back into the run.
- `--no-git` โ€“ ignore Git context entirely and walk the filesystem.
- `--no-config` โ€“ ignore global/local config files and `.dirgrabignore`.
- `--config <FILE>` โ€“ load an additional TOML config file (applied after global/local unless `--no-config`).
- `--token-ratio <FLOAT>` โ€“ override the characters-to-tokens ratio used by `--stats` (defaults to 3.6).
- `--tokens-exclude-tree` / `--tokens-exclude-headers` โ€“ subtract tree or header sections when estimating tokens.
- `-s, --stats` โ€“ print bytes, words, and approximate tokens to stderr when finished.
- `-v, -vv, -vvv` โ€“ increase log verbosity (Warn, Info, Debug, Trace).
- `-h, --help` / `-V, --version` โ€“ CLI boilerplate.

### Configuration Files

`dirgrab` layers configuration in the following order (later wins):

1. Built-in defaults
2. Global config + ignore
   - Linux: `~/.config/dirgrab/config.toml` & `~/.config/dirgrab/ignore`
   - macOS: `~/Library/Application Support/dirgrab/config.toml` & `โ€ฆ/ignore`
   - Windows: `%APPDATA%\dirgrab\config.toml` & `ignore`
3. Project-local config: `<target>/.dirgrab.toml`
4. Project-local ignore patterns: `<target>/.dirgrabignore`
5. CLI flags (`--tracked-only`, `--no-tree`, etc.)

Sample `config.toml`:

```toml
[dirgrab]
exclude = ["Cargo.lock", "*.csv", "node_modules/", "target/"]
include_tree = true
add_headers = true
convert_pdf = true
tracked_only = false
all_repo = false

[stats]
enabled = true
token_ratio = 3.6
tokens_exclude = ["tree"]
```

`ignore` files use the same syntax as `.gitignore`. CLI `-e` patterns and the active output file name are appended last, so the freshly written file is never re-ingested accidentally.

### Examples

```bash
# Grab the current repo subtree (includes untracked files) and show stats
dirgrab -s

# Limit to tracked files only and exclude build artifacts
dirgrab --tracked-only -e "*.log" -e "target/"

# Force a whole-repo snapshot from within a subdirectory
dirgrab --all-repo

# Plain directory mode with custom excludes, writing to the default file
dirgrab --no-git -e "*.tmp" -o

# Use project defaults but ignore configs for a โ€œcleanโ€ run
dirgrab --no-config --no-tree --no-headers
```

## Behaviour Notes

- **Git scope & ordering** โ€“ Paths are gathered via `git ls-files`, scoped to the target subtree unless `--all-repo` is set, and the final list is sorted for deterministic output. Non-Git mode uses `walkdir` with the same ordering.
- **File headers & tree** โ€“ Headers and tree sections remain enabled by default; toggle them per run or through config files.
- **PDF handling** โ€“ Text is extracted from PDFs unless disabled. Failures and binary files are skipped with informative (but less noisy) logs.
- **Stats** โ€“ When `--stats` is active (or enabled in config), stderr shows bytes, words, and an approximate token count. Exclude tree/headers or change the ratio via config or CLI.
- **Safety** โ€“ `dirgrab.txt` stays excluded unless explicitly re-enabled, and any active `-o FILE` target is auto-excluded for that run.

## Library (`dirgrab-lib`)

The same engine powers `dirgrab-lib`; import it to drive custom tooling:

```rust
use dirgrab_lib::{grab_contents, GrabConfig};
# // build a GrabConfig and call grab_contents(&config)
```

See [docs.rs](https://docs.rs/dirgrab-lib) for API details.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for the full release history.

## License

Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE)
- MIT license ([LICENSE-MIT]LICENSE-MIT)

## Contributing

Issues and PRs are welcome! Please run `cargo fmt`, `cargo clippy`, and `cargo test` before submitting.