Organize Media With Python: A Comprehensive Guide

by Lucas 50 views

Organizing your ever-growing media library can feel like a daunting task. This article introduces a powerful, python-based solution designed to streamline the process. It leverages Python and the mutagen library to scan folders, extract metadata, identify duplicates, and organize your media files according to your preferences. Let's dive into the details of this versatile tool.

Introduction

This Python script, the Media & Music Library Organizer, is designed to help you manage your media files efficiently. It automates several key tasks, including scanning directories, extracting metadata from audio, video, and image files, storing this data in a SQLite database, identifying duplicate files, and renaming/moving files based on user-defined templates. It also supports exporting playlists in the M3U8 format. This tool requires the mutagen library, which can be installed using pip.

Features

This Media & Music Library Organizer comes packed with features designed to simplify the management of your media collection:

  • Scanning and Metadata Extraction: The script recursively scans specified directories, extracting metadata from audio, video, and image files using the mutagen library. This metadata includes information such as artist, album, title, track number, duration, and more.
  • SQLite Database Storage: All extracted metadata is stored in a SQLite database. This allows for efficient querying and manipulation of the data. The database schema includes fields for file path, size, modification time, media type, SHA256 hash, duration, and various metadata tags.
  • Duplicate Detection: The script can identify duplicate files based on either their SHA256 hash or a lightweight audio "fingerprint" consisting of the artist, title, and duration. This helps you clean up your library and free up disk space.
  • File Renaming and Moving: The script allows you to rename and move files based on user-defined templates. These templates can include metadata fields such as artist, album, title, track number, and more. This enables you to organize your files according to your preferred naming scheme.
  • Playlist Export: The script can export playlists in the M3U8 format. You can create playlists based on various criteria, such as album, artist, or a custom query.

Dependencies

Before you can use the Media & Music Library Organizer, you need to install the mutagen library. You can do so using pip:

pip install mutagen

Usage

The script is invoked from the command line with various arguments and subcommands. Here's a breakdown of the available commands and options:

Scanning Files

To scan one or more directories and extract metadata, use the scan subcommand:

python your_script_name.py scan /path/to/music /path/to/videos

You can also use the --no-hash option to skip SHA256 hashing, which can speed up the scanning process:

python your_script_name.py scan --no-hash /path/to/music

Displaying Statistics

To display library statistics, such as the total number of files and the disk space used by each media type, use the stats subcommand:

python your_script_name.py stats

Finding Duplicates

To find duplicate files, use the dupes subcommand. You can choose to identify duplicates by either SHA256 hash or audio fingerprint:

python your_script_name.py dupes --by hash
python your_script_name.py dupes --by fingerprint

Organizing Files

To rename and move or copy files based on a template, use the organize subcommand. You must specify the destination directory and the template to use:

python your_script_name.py organize --dest /path/to/organized/library --template '{media_type}/{artist}/{album}/{track:02d} - {title}{ext}'

Use the --move option to move files instead of copying them. The --dry option performs a dry run, showing you what would be done without actually making any changes. The --query option allows you to filter the files that are processed based on a custom query:

python your_script_name.py organize --dest /path/to/organized/library --template '{media_type}/{artist}/{album}/{track:02d} - {title}{ext}' --move --dry --query 'genre:rock year>=1990'

Exporting Playlists

To export a playlist in the M3U8 format, use the playlist subcommand. You must specify the criteria for creating the playlist (album, artist, or query) and the output file:

python your_script_name.py playlist --by album --name "My Album" --out my_album.m3u8
python your_script_name.py playlist --by artist --name "My Artist" --out my_artist.m3u8
python your_script_name.py playlist --by query --query 'genre:rock' --out rock.m3u8

Code Explanation

Let's break down some of the key functions and code snippets in the script:

  • normstr(s: Optional[str]) -> Optional[str]: This function normalizes a string by applying Unicode normalization (NFKC) and stripping whitespace. This helps ensure consistency in metadata.
  • safe_int(v) -> Optional[int] and safe_float(v) -> Optional[float]: These functions safely convert a value to an integer or float, respectively. They handle potential errors and return None if the conversion fails.
  • file_sha256(p: Path, chunk: int = 1024 * 1024) -> str: This function calculates the SHA256 hash of a file. It reads the file in chunks to handle large files efficiently.
  • media_type_for_ext(ext: str) -> str: This function determines the media type (audio, video, image, or other) based on the file extension.
  • extract_metadata(p: Path) -> Dict[str, Any]: This function extracts metadata from a file using the mutagen library. It handles various tag formats and falls back to the filename if necessary.
  • open_db(db_path: Path) -> sqlite3.Connection: This function opens a connection to the SQLite database and creates the necessary tables and indexes.
  • upsert_file(conn: sqlite3.Connection, rec: Dict[str, Any]): This function inserts or updates a file record in the database. It uses the ON CONFLICT clause to handle duplicate entries.
  • scan_paths(conn: sqlite3.Connection, roots: List[Path], hash_files: bool = True): This function scans the specified directories and adds or updates file records in the database.
  • humanize_duration(sec: Optional[float]) -> str: This function converts a duration in seconds to a human-readable format (e.g., "1:23:45").
  • cmd_stats(conn: sqlite3.Connection): This function executes the stats subcommand and prints library statistics.
  • cmd_dupes(conn: sqlite3.Connection, by: str): This function executes the dupes subcommand and finds duplicate files.
  • sanitize_component(s: str) -> str: This function sanitizes a string for use as a filename or directory component. It removes or replaces invalid characters.
  • render_template(rec: Dict[str, Any], template: str) -> str: This function renders a template using metadata fields from a file record.
  • cmd_organize(conn: sqlite3.Connection, dest: Path, template: str, dry: bool, move: bool, subset_query: Optional[str]): This function executes the organize subcommand and renames or moves files based on a template.
  • parse_query(q: str) -> List[Tuple[str, str, str]] and query_match(rec: Dict[str, Any], q: str) -> bool: These functions parse and execute a query to filter files based on metadata.
  • cmd_playlist(conn: sqlite3.Connection, by: str, name: Optional[str], out: Path, query: Optional[str]): This function executes the playlist subcommand and exports a playlist in the M3U8 format.

Conclusion

This Python script provides a comprehensive solution for organizing and managing media libraries. By automating tasks such as scanning, metadata extraction, duplicate detection, and file renaming, it helps users keep their media collections organized and easily accessible. The script's flexibility, combined with its support for various media formats and metadata tags, makes it a valuable tool for anyone managing a large media library. So, guys, give it a try and reclaim control of your media!