AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides inference scripts, checkpoints, and simple Python APIs so you can generate clips from prompts or incorporate the models into applications. It also contains training code and recipes, so researchers can fine-tune on custom data or explore new objectives without building infrastructure from scratch. Example notebooks, CLI tools, and audio utilities help with prompt design, conditioning on reference audio, and post-processing to produce ready-to-share outputs.
Features
- MusicGen for text-to-music with optional melody conditioning
- AudioGen for text-to-sound effects and ambient audio
- EnCodec neural audio codec for discrete tokenization and efficient modeling
- Ready-to-use checkpoints and straightforward Python/CLI inference
- Training recipes and scripts for fine-tuning on custom datasets
- Example notebooks and utilities for prompting, conditioning, and post-processing