Organize Media With Python: A Comprehensive Guide
Organizing your ever-growing media library can feel like a daunting task. This article introduces a powerful, python-based solution designed to streamline the process. It leverages Python and the mutagen
library to scan folders, extract metadata, identify duplicates, and organize your media files according to your preferences. Let's dive into the details of this versatile tool.
Introduction
This Python script, the Media & Music Library Organizer, is designed to help you manage your media files efficiently. It automates several key tasks, including scanning directories, extracting metadata from audio, video, and image files, storing this data in a SQLite database, identifying duplicate files, and renaming/moving files based on user-defined templates. It also supports exporting playlists in the M3U8 format. This tool requires the mutagen
library, which can be installed using pip.
Features
This Media & Music Library Organizer comes packed with features designed to simplify the management of your media collection:
- Scanning and Metadata Extraction: The script recursively scans specified directories, extracting metadata from audio, video, and image files using the
mutagen
library. This metadata includes information such as artist, album, title, track number, duration, and more. - SQLite Database Storage: All extracted metadata is stored in a SQLite database. This allows for efficient querying and manipulation of the data. The database schema includes fields for file path, size, modification time, media type, SHA256 hash, duration, and various metadata tags.
- Duplicate Detection: The script can identify duplicate files based on either their SHA256 hash or a lightweight audio "fingerprint" consisting of the artist, title, and duration. This helps you clean up your library and free up disk space.
- File Renaming and Moving: The script allows you to rename and move files based on user-defined templates. These templates can include metadata fields such as artist, album, title, track number, and more. This enables you to organize your files according to your preferred naming scheme.
- Playlist Export: The script can export playlists in the M3U8 format. You can create playlists based on various criteria, such as album, artist, or a custom query.
Dependencies
Before you can use the Media & Music Library Organizer, you need to install the mutagen
library. You can do so using pip:
pip install mutagen
Usage
The script is invoked from the command line with various arguments and subcommands. Here's a breakdown of the available commands and options:
Scanning Files
To scan one or more directories and extract metadata, use the scan
subcommand:
python your_script_name.py scan /path/to/music /path/to/videos
You can also use the --no-hash
option to skip SHA256 hashing, which can speed up the scanning process:
python your_script_name.py scan --no-hash /path/to/music
Displaying Statistics
To display library statistics, such as the total number of files and the disk space used by each media type, use the stats
subcommand:
python your_script_name.py stats
Finding Duplicates
To find duplicate files, use the dupes
subcommand. You can choose to identify duplicates by either SHA256 hash or audio fingerprint:
python your_script_name.py dupes --by hash
python your_script_name.py dupes --by fingerprint
Organizing Files
To rename and move or copy files based on a template, use the organize
subcommand. You must specify the destination directory and the template to use:
python your_script_name.py organize --dest /path/to/organized/library --template '{media_type}/{artist}/{album}/{track:02d} - {title}{ext}'
Use the --move
option to move files instead of copying them. The --dry
option performs a dry run, showing you what would be done without actually making any changes. The --query
option allows you to filter the files that are processed based on a custom query:
python your_script_name.py organize --dest /path/to/organized/library --template '{media_type}/{artist}/{album}/{track:02d} - {title}{ext}' --move --dry --query 'genre:rock year>=1990'
Exporting Playlists
To export a playlist in the M3U8 format, use the playlist
subcommand. You must specify the criteria for creating the playlist (album, artist, or query) and the output file:
python your_script_name.py playlist --by album --name "My Album" --out my_album.m3u8
python your_script_name.py playlist --by artist --name "My Artist" --out my_artist.m3u8
python your_script_name.py playlist --by query --query 'genre:rock' --out rock.m3u8
Code Explanation
Let's break down some of the key functions and code snippets in the script:
normstr(s: Optional[str]) -> Optional[str]
: This function normalizes a string by applying Unicode normalization (NFKC) and stripping whitespace. This helps ensure consistency in metadata.safe_int(v) -> Optional[int]
andsafe_float(v) -> Optional[float]
: These functions safely convert a value to an integer or float, respectively. They handle potential errors and returnNone
if the conversion fails.file_sha256(p: Path, chunk: int = 1024 * 1024) -> str
: This function calculates the SHA256 hash of a file. It reads the file in chunks to handle large files efficiently.media_type_for_ext(ext: str) -> str
: This function determines the media type (audio, video, image, or other) based on the file extension.extract_metadata(p: Path) -> Dict[str, Any]
: This function extracts metadata from a file using themutagen
library. It handles various tag formats and falls back to the filename if necessary.open_db(db_path: Path) -> sqlite3.Connection
: This function opens a connection to the SQLite database and creates the necessary tables and indexes.upsert_file(conn: sqlite3.Connection, rec: Dict[str, Any])
: This function inserts or updates a file record in the database. It uses theON CONFLICT
clause to handle duplicate entries.scan_paths(conn: sqlite3.Connection, roots: List[Path], hash_files: bool = True)
: This function scans the specified directories and adds or updates file records in the database.humanize_duration(sec: Optional[float]) -> str
: This function converts a duration in seconds to a human-readable format (e.g., "1:23:45").cmd_stats(conn: sqlite3.Connection)
: This function executes thestats
subcommand and prints library statistics.cmd_dupes(conn: sqlite3.Connection, by: str)
: This function executes thedupes
subcommand and finds duplicate files.sanitize_component(s: str) -> str
: This function sanitizes a string for use as a filename or directory component. It removes or replaces invalid characters.render_template(rec: Dict[str, Any], template: str) -> str
: This function renders a template using metadata fields from a file record.cmd_organize(conn: sqlite3.Connection, dest: Path, template: str, dry: bool, move: bool, subset_query: Optional[str])
: This function executes theorganize
subcommand and renames or moves files based on a template.parse_query(q: str) -> List[Tuple[str, str, str]]
andquery_match(rec: Dict[str, Any], q: str) -> bool
: These functions parse and execute a query to filter files based on metadata.cmd_playlist(conn: sqlite3.Connection, by: str, name: Optional[str], out: Path, query: Optional[str])
: This function executes theplaylist
subcommand and exports a playlist in the M3U8 format.
Conclusion
This Python script provides a comprehensive solution for organizing and managing media libraries. By automating tasks such as scanning, metadata extraction, duplicate detection, and file renaming, it helps users keep their media collections organized and easily accessible. The script's flexibility, combined with its support for various media formats and metadata tags, makes it a valuable tool for anyone managing a large media library. So, guys, give it a try and reclaim control of your media!