# dirgrab ๐โก
[](https://crates.io/crates/dirgrab)
[](https://docs.rs/dirgrab-lib)
`dirgrab` walks a directory (or Git repository), selects the files that matter, and concatenates their contents for easy copy/paste into language models. It can write to stdout, a file, or your clipboard, and it ships with a library crate so the same logic can be embedded elsewhere.
## Highlights
- ๐ง **Configurable defaults** โ merge built-in defaults with global `config.toml`, project-local `.dirgrab.toml`, `.dirgrabignore`, and CLI flags.
- ๐งญ **Git-aware out of the box** โ untracked files are included by default, scoped to the selected subdirectory, with `--tracked-only` and `--all-repo` to opt out.
- ๐๏ธ **Structured context** โ optional directory tree, per-file headers, PDF text extraction, and deterministic file ordering for stable diffs.
- ๐งฎ **Better stats** โ `-s/--stats` now prints bytes, words, and an approximate token count with configurable ratio and exclusion toggles.
- ๐
**Safety nets** โ automatically ignores the active output file, respects `.gitignore`, and gracefully skips binary/non-UTF8 files.
## Installation
```bash
cargo install dirgrab
# or from a local checkout
# cargo install --path .
```
Check it worked:
```bash
dirgrab --version
```
## Usage
```bash
dirgrab [OPTIONS] [TARGET_PATH]
```
`TARGET_PATH` defaults to the current directory. When invoked inside a Git repo, `dirgrab` scopes the listing to that subtree unless you pass `--all-repo`.
### Common Options
- `-o, --output [FILE]` โ write to a file (defaults to `dirgrab.txt` if no name is given). Conflicts with `--clipboard`.
- `-c, --clipboard` โ copy to the system clipboard instead of stdout or a file.
- `--no-headers` / `--no-tree` / `--no-pdf` โ disable headers, the directory tree, or PDF extraction.
- `-e, --exclude <PATTERN>` โ add glob-style excludes (applied after config files).
- `--tracked-only` โ Git mode: limit to tracked files. (Compatibility note: `-u/--include-untracked` still forces inclusion if you need it.)
- `--all-repo` โ Git mode: operate on the entire repository even if the target is a subdirectory.
- `--include-default-output` โ allow `dirgrab.txt` back into the run.
- `--no-git` โ ignore Git context entirely and walk the filesystem.
- `--no-config` โ ignore global/local config files and `.dirgrabignore`.
- `--config <FILE>` โ load an additional TOML config file (applied after global/local unless `--no-config`).
- `--token-ratio <FLOAT>` โ override the characters-to-tokens ratio used by `--stats` (defaults to 3.6).
- `--tokens-exclude-tree` / `--tokens-exclude-headers` โ subtract tree or header sections when estimating tokens.
- `-s, --stats` โ print bytes, words, and approximate tokens to stderr when finished.
- `-v, -vv, -vvv` โ increase log verbosity (Warn, Info, Debug, Trace).
- `-h, --help` / `-V, --version` โ CLI boilerplate.
### Configuration Files
`dirgrab` layers configuration in the following order (later wins):
1. Built-in defaults
2. Global config + ignore
- Linux: `~/.config/dirgrab/config.toml` & `~/.config/dirgrab/ignore`
- macOS: `~/Library/Application Support/dirgrab/config.toml` & `โฆ/ignore`
- Windows: `%APPDATA%\dirgrab\config.toml` & `ignore`
3. Project-local config: `<target>/.dirgrab.toml`
4. Project-local ignore patterns: `<target>/.dirgrabignore`
5. CLI flags (`--tracked-only`, `--no-tree`, etc.)
Sample `config.toml`:
```toml
[dirgrab]
exclude = ["Cargo.lock", "*.csv", "node_modules/", "target/"]
include_tree = true
add_headers = true
convert_pdf = true
tracked_only = false
all_repo = false
[stats]
enabled = true
token_ratio = 3.6
tokens_exclude = ["tree"]
```
`ignore` files use the same syntax as `.gitignore`. CLI `-e` patterns and the active output file name are appended last, so the freshly written file is never re-ingested accidentally.
### Examples
```bash
# Grab the current repo subtree (includes untracked files) and show stats
dirgrab -s
# Limit to tracked files only and exclude build artifacts
dirgrab --tracked-only -e "*.log" -e "target/"
# Force a whole-repo snapshot from within a subdirectory
dirgrab --all-repo
# Plain directory mode with custom excludes, writing to the default file
dirgrab --no-git -e "*.tmp" -o
# Use project defaults but ignore configs for a โcleanโ run
dirgrab --no-config --no-tree --no-headers
```
## Behaviour Notes
- **Git scope & ordering** โ Paths are gathered via `git ls-files`, scoped to the target subtree unless `--all-repo` is set, and the final list is sorted for deterministic output. Non-Git mode uses `walkdir` with the same ordering.
- **File headers & tree** โ Headers and tree sections remain enabled by default; toggle them per run or through config files.
- **PDF handling** โ Text is extracted from PDFs unless disabled. Failures and binary files are skipped with informative (but less noisy) logs.
- **Stats** โ When `--stats` is active (or enabled in config), stderr shows bytes, words, and an approximate token count. Exclude tree/headers or change the ratio via config or CLI.
- **Safety** โ `dirgrab.txt` stays excluded unless explicitly re-enabled, and any active `-o FILE` target is auto-excluded for that run.
## Library (`dirgrab-lib`)
The same engine powers `dirgrab-lib`; import it to drive custom tooling:
```rust
use dirgrab_lib::{grab_contents, GrabConfig};
# // build a GrabConfig and call grab_contents(&config)
```
See [docs.rs](https://docs.rs/dirgrab-lib) for API details.
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for the full release history.
## License
Licensed under either of:
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))
- MIT license ([LICENSE-MIT](LICENSE-MIT))
## Contributing
Issues and PRs are welcome! Please run `cargo fmt`, `cargo clippy`, and `cargo test` before submitting.