LeanProductivity MarkItDown Batch Converter (No GUI)
A simple, no-GUI batch converter for converting various file formats to Markdown using MarkItDown.
Built for developers, writers, and knowledge workers who want a local, fast, and scriptable solution.
๐๏ธ Tutorial
๐ฆ Features
- ๐ Converts DOCX, PDF, PPTX, HTML, EPUB, TXT, JSON, XML, CSV, audio files (MP3, WAV, M4A), and more to Markdown
- ๐ Supports dry runs (simulate conversion)
- ๐ Optional force-convert even if files are already up to date
- ๐ฏ Supports selective extension filtering (
.docx
,.pdf
, etc.) - ๐ฌ Uses CLI-based fallback for better audio conversion stability
- ๐ All conversion runs locally โ no data is sent to the cloud
๐งฐ Requirements
- Python 3.9+ (recommended: 3.10+)
- Dependencies (install via
pip install -r requirements.txt
) - markitdown
pydub
speechrecognition
ffmpeg
binary available inresources/bin/ffmpeg.exe
or in your system PATH
๐ Usage
python "LP MID Bulk Converter.py"
You'll be prompted to enter file extensions to convert:
Enter file extensions to convert (comma-separated, e.g. docx,pdf,html):
All matching files under the configured input folder will be recursively converted to .md
and saved under the output folder, maintaining structure.
๐ Configuration
Edit these lines in the script to set your folders:
input_folder = Path(r"d:\GitProjects\Input\Demo Files")
output_folder = Path(r"d:\GitProjects\Output")
๐ Conversion Summary
After running, the script outputs:
- โ Converted files
- โญ๏ธ Skipped (up-to-date)
- โ Errors
๐งช Example Output
โถ๏ธ Converting file: test.docx
โ
Converted: test.md
โญ๏ธ Skipped (up-to-date): test.pdf
โ Error converting test.wav: UnknownValueError โ No speech detected
=== Conversion Summary ===
Converted: 12
Skipped: 3
Errors: 1
๐งโ๐ป Author
Sascha D. Kasper โ LeanProductivity
GitHub | YouTube
๐ License
MIT License.
See LICENSE.txt
for details.
๐ Script
# โโโ APPLICATION METADATA โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
APP_NAME = "LeanProductivity MarkItDown Batch Converter no GUI"
APP_DESCRIPTION = "A no GUI batch converter for MarkItDown to convert various file formats to Markdown."
VERSION = "00.07.20250620"
AUTHOR_NAME = "Sascha D. Kasper - LeanProductivity"
HELP_URL = "https://github.com/microsoft/markitdown"
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
import os
from pathlib import Path
from markitdown import MarkItDown
# --- Optional: FFMPEG path for audio support (if pydub or similar is used) ---
try:
from pydub import AudioSegment
ffmpeg_path = os.path.join("resources", "bin", "ffmpeg.exe")
if not Path(ffmpeg_path).is_file():
raise FileNotFoundError(f"FFmpeg not found at {ffmpeg_path}")
AudioSegment.converter = ffmpeg_path
except ImportError:
pass# pydub not installed or not needed for current file types
except Exception as e:
print(f"โ ๏ธ FFmpeg configuration warning: {e}")
# --- Configuration ---
input_folder = Path(r"d:\GitProjects\Input\Demo Files") # set this to your input folder
output_folder = Path(r"d:\GitProjects\Output") # set this to your output folder
# --- Extension input from user ---
ext_input = input("Enter file extensions to convert (comma-separated, e.g. docx,pdf,html): ")
supported_extensions = {
f".{ext.strip().lower()}"
for ext in ext_input.split(",")
if ext.strip()
}
force_convert = False# set to True to ignore modification times
dry_run = False# set to True to simulate conversion only
# --- Init ---
md = MarkItDown()
files_converted = 0
files_skipped = 0
errors = 0
# --- Conversion ---
for root, _, files in os.walk(input_folder):
for file in files:
src_path = Path(root) / file
if src_path.suffix.lower() not in supported_extensions:
continue
rel_path = src_path.relative_to(input_folder)
dst_path = output_folder / rel_path.with_suffix(".md")
dst_path.parent.mkdir(parents=True, exist_ok=True)
if dst_path.exists() and not force_convert:
if dst_path.stat().st_mtime >= src_path.stat().st_mtime:
print(f"โญ๏ธSkipped (up-to-date): {rel_path}")
files_skipped += 1
continue
if dry_run:
print(f"๐ Would convert: {rel_path}")
files_converted += 1
continue
try:
# print(f"โถ๏ธ Converting file: {src_path} (resolved: {src_path.resolve()})") - uncomment for debugging
# print(f"๐ Working directory: {os.getcwd()}") - uncomment for debugging
import subprocess
try:
result = subprocess.run(
["markitdown", str(src_path.resolve())],
capture_output=True,
text=True,
check=True
)
with open(dst_path, "w", encoding="utf-8") as f:
f.write(result.stdout)
print(f"โ
Converted: {rel_path}")
files_converted += 1
except subprocess.CalledProcessError as e:
print(f"โ CLI failed for {rel_path}: {e.stderr.strip()}")
errors += 1
except Exception as e:
print(f"โ Error converting {rel_path}: {type(e).__name__} โ {e}")
errors += 1
print(f"โ
Converted: {rel_path}")
files_converted += 1
except Exception as e:
print(f"โ Error converting {rel_path}: {e}")
errors += 1
# --- Summary ---
print("\n=== Conversion Summary ===")
print(f"Converted: {files_converted}")
print(f"Skipped: {files_skipped}")
print(f"Errors : {errors}")