Evernote Archive Guide

A local, browsable HTML archive of 4,344 Evernote notes (4.7 GB), converted from .enex exports before account closure on March 11, 2026.

← Back to index · Changelog

Background and Plan

In February 2026, Evernote was exported as 112 .enex files (78 standalone notebooks + 34 notebooks nested in 7 stack directories), totaling 4.7 GB. The goal was to convert this into a self-contained HTML archive that:

Preserves the look and feel of Evernote notes
Keeps web clippings intact with their original HTML structure
Embeds images inline with relative paths
Extracts all attachments (PDFs, audio, etc.) alongside each note
Transcribes 443 audio dictation files using OpenAI Whisper
Produces a navigable index page organized by stacks and notebooks

The conversion is handled by a single Python script (convert.py) with three phases that can be run independently.

What Was Done

Phase 1: Parse & Extract (ENEX to HTML)

The script walks export/ for all .enex files, determines the output folder from each file's position in the directory tree (stack directories become parent folders), and stream-parses each file with xml.etree.ElementTree.iterparse for memory safety on files up to 1.7 GB.

For each <note> element, it:

Extracts title, created/updated dates, source URL, and other attributes
Decodes each <resource> element's base64 data, writes the file to a _resources/ subfolder, and computes its MD5 hash
Converts the ENML content to HTML:
- <en-note> becomes a styled <div>
- <en-media> for images becomes <img> matched by MD5 hash
- <en-media> for audio becomes <audio controls>
- <en-media> for PDFs and other files becomes download links
- <en-todo> becomes HTML checkboxes
- All other HTML passes through (preserving web clippings)
Wraps the content in a clean HTML template with Evernote-like CSS styling, note title, dates, and source URL

Result: 4,344 HTML files and 16,543 extracted resources in 26 seconds.

Phase 2: Transcribe Audio (Whisper)

After extraction, the script walks notes/ for audio files (.m4a, .wav, .mp3, .amr, .aac), loads the Whisper base model, and transcribes each file. For every transcription:

A .txt file is saved alongside the audio
A collapsible <details> transcription block is injected into the parent HTML file below the <audio> tag

Result: 437 of 443 audio files transcribed with content (6 were empty/silent), 269,295 total words. Ran on CPU in ~57 minutes.

Phase 3: Generate Index

Creates notes/index.html with all notebooks grouped by stack, note counts per notebook, and clickable links to every note. Notes containing audio recordings are tagged with a 🎤 emoji for easy identification. A legend at the top shows the total count.

Result: Full index with 4,339 note links (5 .html files in _resources/ folders are embedded web page attachments, not notes), 222 tagged as audio notes.

Bug Fix: Path Encoding

During testing, discovered that html.escape() was encoding apostrophes in src/href attributes (' became '), breaking image paths for notes like "Egypt's corrupt decades." Fixed by switching to urllib.parse.quote() percent-encoding, which browsers handle correctly for file paths.

Directory Structure

Evernote/ notes/ ← Generated HTML archive (browse from index.html) index.html ← Start here guide.html ← This file changelog.html ← Version history [Stack_Name]/ ← Stacks become parent folders [Notebook_Name]/ [Note_Title].html [Note_Title]_resources/ ← Images, audio, PDFs for that note image.jpeg audio.m4a audio.txt ← Whisper transcription [Notebook_Name]/ ← Standalone notebooks (no stack) ... export/ ← Original .enex files — DO NOT modify en_backup.db ← Original Evernote database — DO NOT modify convert.py ← The conversion script CLAUDE.md ← Instructions for Claude Code

Stacks

Stack Directory	Notebooks
All Tech	Frontend Development, Minecraft Modding, Reference
Alt-Process	Carbon Printing, Cyanotype, Gum bichromate
Egypt - Ancient Stack	Egypt - Ancient, Gardiner Sign List
Hipsta Stories	Gathering of Spirits, Instant Transit, Iris and the Magic Eye, Scratch, Song of the Siren
Letha F. Swope Stack	Additions in Woodsville, Letha F. Swope, Letters
Notebook Stack	17 notebooks (PB1099, The Papyrus Diary, Web History of AZ, etc.)
Thinker's Club Stack	The Dovekeepers

Filename Conventions

Spaces become underscores
Characters < > : " / \ | ? * are stripped
Maximum 200 characters
Duplicate titles within a notebook get _2, _3, etc.
Special characters like apostrophes and emoji are preserved in filenames and percent-encoded in HTML links

How to Use the Archive

Browse Notes

Open notes/index.html in any web browser. All notes are listed by notebook, grouped under stacks. Click any note to view it. Notes with audio recordings are marked with 🎤. Each note has a "← Back to index" link at the top.

Open a Specific Notebook

Navigate to notes/[Notebook_Name]/ in Finder or a file browser. Each .html file is a self-contained note.

Listen to Audio

Audio notes have an embedded <audio> player. Click play in the browser. The original .m4a/.wav files are in the _resources/ subfolder next to the note.

Read Transcriptions

Transcribed audio notes have a collapsible "Transcription" section below the audio player. Click it to expand. Transcriptions are also saved as .txt files alongside the audio in _resources/.

Search

Use your OS file search (Spotlight on macOS) to find notes by title, or use browser find (Cmd+F) within individual notes. For full-text search across all notes, use:

grep -r "search term" notes/ --include="*.html" -l

How to Re-run or Maintain

Running the Converter

# Run all three phases
python3 convert.py

# Run specific phases
python3 convert.py 1       # Phase 1: Parse & Extract only
python3 convert.py 2       # Phase 2: Whisper transcription only
python3 convert.py 3       # Phase 3: Generate index only
python3 convert.py 1 3     # Multiple phases

When to Re-run

Phase 1 overwrites existing notes. Re-run if you modify convert.py's parsing or template logic. Delete notes/ first for a clean run.
Phase 2 is safe to re-run. It overwrites existing .txt files and re-injects transcriptions. Processes all audio files each time.
Phase 3 regenerates index.html. Re-run after any changes to the notes/ folder structure. It does not overwrite changelog.html or guide.html — those are maintained separately.

Dependencies

Python 3 (tested with 3.13)
openai-whisper — pip install openai-whisper (Phase 2 only)
ffmpeg — required by Whisper for audio decoding (Phase 2 only)
No other dependencies. Phases 1 and 3 use only the Python standard library.

Upgrading Whisper Transcriptions

To re-transcribe with a better model (e.g., medium or large), edit the model = whisper.load_model("base") line in convert.py and re-run Phase 2. Larger models are more accurate but significantly slower on CPU.

Technical Details

ENML Conversion

ENML (Evernote Markup Language) is a restricted subset of XHTML with custom tags:

ENML Tag	Converted To
`<en-note>`	`<div class="en-note">`
`<en-media type="image/*">`	`<img src="...">`
`<en-media type="audio/*">`	`<audio controls src="...">`
`<en-media type="application/pdf">`	`<a href="...">filename.pdf</a>`
`<en-todo checked="true"/>`	`<input type="checkbox" checked disabled>`
`<en-todo checked="false"/>`	`<input type="checkbox" disabled>`
`<en-crypt>`	`[encrypted content]`
All other HTML	Passed through unchanged

Resource Matching

Each <en-media> tag has a hash attribute (MD5 hex digest). The script computes hashlib.md5(decoded_bytes).hexdigest() for each resource and uses this to map media tags to extracted files.

Memory Safety

Files like PB1099.enex (1.7 GB) and All Instagram Posts.enex (669 MB) require streaming. The script uses iterparse and calls elem.clear() after processing each note to prevent the full XML tree from accumulating in memory.

Path Encoding

All file paths in src and href attributes use urllib.parse.quote() percent-encoding. This correctly handles apostrophes, emoji (e.g., the Pop-👁ris notebook), commas, parentheses, and other characters that html.escape() would break.

Preservation Notes

The export/ directory and en_backup.db are the original, untouched source data. Everything in notes/ can be regenerated from them.
The archive is fully self-contained with no external dependencies at browse time — all resources use relative paths, no CDN links, no JavaScript.
Notes retain their original creation and modification dates in the metadata header.
Web clippings preserve their original HTML structure and inline styles.