tl;dr: kinda crappy xargs-ish Windows console app using .Net parallelism and thread-safety
FileMunger is a high-performance file processing Windows Console app designed to quickly traverse directories, handle symbolic links properly, and then perform some action on each file.
It uses parallel processing optimized for both CPU and I/O performance.
Uses only open-for-read because it's kinda dumb to have a submachine gun with no safety. OTOH, it was original written to do R/W, on the off-chance I wanted to seriously hose something important. The changes for
This is a conceptual offshoot of my LNKFileAnalyzer
Note: this code (including comments and README) was generated by the us.anthropic.claude-3-7-sonnet-20250219-v1:0 "AI" model under close supervision.
No attempts were made to do rational things like
- make use of nuget packages, e.g. Microsoft System.CommandLine
- it has not been thoroughly checked or tested for common "AI" sub-optimizing misfeaturettes such as off-by-one errors, pointless code, or just generally bad style.
- currently (2025-03-31) requires functionality in or called from program.cs/Program/ProcessFileAsync() to do the desired work.
IOW, you need to program what you want done to each file, and then build and run it
- this will be remedied Real Soon Now with the ability to exec a shell commmand, script, executable, etc., just like a Grown-up Program!
- Testing? What's that? (built and known to run, at least, on late model Windows 10)
- The load measurement and throttling code hasn't been tested and might (likely) or might not (unlikely) hose your system with too many threads. Remedy: def get a better computer with all-SSD storage
- This is almost certainly poky AF on spinning disks, but since I don't have any of those, I don't know
- I don't know if this is a limitation, but currently it doesn't bypass things like the "standard" (i.e. unreadable) files like pagefile.sys, \Windows, etc. (There is extensive (LOL) exception handling regarding failed open-for-read)
- High-performance file system traversal with optimized CPU and I/O usage
- Proper handling of symbolic links to prevent recursive loops
- File association gathering to identify file extensions and their associated applications
- Multi-threaded processing that scales with available processors
- I/O-aware throttling to optimize performance based on disk capabilities
- Support for filtering files by pattern
FileMunger [options]
| Option | Description |
|---|---|
--directories, -dir, -d |
Comma-separated list of directories to process Special value 'all' processes all local drive roots Default: current drive root |
--recursive, -r [yes/no] |
Process subdirectories recursively Default: yes |
--filespec, -f |
File specification pattern (Windows wildcards) Default: . |
--verbosity, -v |
Output verbosity level: high, medium, low, none Default: low |
--help, -h, -? |
Show help message |
FileMunger --directories C:\Data,D:\Backup --filespec *.docx
FileMunger -d all -r no -v high
FileMunger C:\Data
FileMunger produces both console output and a FileAssociations.txt file containing details about all file extensions found during processing, including:
- The file extension (.txt, .exe, etc.)
- The registered file type
- The friendly name of the file type
- The MIME content type
- The perceived file type
- The command used to open files with this extension
FileMunger is optimized for high performance:
- Automatically scales processing threads based on CPU cores
- Throttles I/O operations based on disk performance
- Uses reader-writer locks for efficient concurrent access
- Processes different physical drives in parallel
- Uses efficient memory management for large directory trees
- Windows operating system
- .NET 9.0 or later
- Administrator rights (for accessing certain system directories)
To build FileMunger from source:
dotnet build -c Release
FileMunger employs several techniques for thread safety:
ReaderWriterLockSlimfor efficient concurrent read/exclusive write operationsInterlockedoperations for atomic counter updatesConcurrentDictionaryfor thread-safe collections- TPL Dataflow for parallel processing with controlled concurrency
FileMunger safely traverses symbolic links by:
- Detecting reparse points using FileAttributes
- Resolving targets using Windows API calls
- Tracking visited targets to avoid cycles
- Supporting directory and file symbolic links
The I/O performance throttling dynamically adjusts concurrency based on:
- Current disk read/write throughput
- Available system resources
- The number of logical processors
- Lynne Whitehorn (https://github.com/lynnewu)