Evernote Archive Guide

A local, browsable HTML archive of 4,344 Evernote notes (4.7 GB), converted from .enex exports before account closure on March 11, 2026.

← Back to index  ·  Changelog

Background and Plan

In February 2026, Evernote was exported as 112 .enex files (78 standalone notebooks + 34 notebooks nested in 7 stack directories), totaling 4.7 GB. The goal was to convert this into a self-contained HTML archive that:

The conversion is handled by a single Python script (convert.py) with three phases that can be run independently.

What Was Done

Phase 1: Parse & Extract (ENEX to HTML)

The script walks export/ for all .enex files, determines the output folder from each file's position in the directory tree (stack directories become parent folders), and stream-parses each file with xml.etree.ElementTree.iterparse for memory safety on files up to 1.7 GB.

For each <note> element, it:

  1. Extracts title, created/updated dates, source URL, and other attributes
  2. Decodes each <resource> element's base64 data, writes the file to a _resources/ subfolder, and computes its MD5 hash
  3. Converts the ENML content to HTML:
  4. Wraps the content in a clean HTML template with Evernote-like CSS styling, note title, dates, and source URL

Result: 4,344 HTML files and 16,543 extracted resources in 26 seconds.

Phase 2: Transcribe Audio (Whisper)

After extraction, the script walks notes/ for audio files (.m4a, .wav, .mp3, .amr, .aac), loads the Whisper base model, and transcribes each file. For every transcription:

Result: 437 of 443 audio files transcribed with content (6 were empty/silent), 269,295 total words. Ran on CPU in ~57 minutes.

Phase 3: Generate Index

Creates notes/index.html with all notebooks grouped by stack, note counts per notebook, and clickable links to every note. Notes containing audio recordings are tagged with a 🎀 emoji for easy identification. A legend at the top shows the total count.

Result: Full index with 4,339 note links (5 .html files in _resources/ folders are embedded web page attachments, not notes), 222 tagged as audio notes.

Bug Fix: Path Encoding

During testing, discovered that html.escape() was encoding apostrophes in src/href attributes (' became &#x27;), breaking image paths for notes like "Egypt's corrupt decades." Fixed by switching to urllib.parse.quote() percent-encoding, which browsers handle correctly for file paths.

Directory Structure

Evernote/ notes/ ← Generated HTML archive (browse from index.html) index.html ← Start here guide.html ← This file changelog.html ← Version history [Stack_Name]/ ← Stacks become parent folders [Notebook_Name]/ [Note_Title].html [Note_Title]_resources/ ← Images, audio, PDFs for that note image.jpeg audio.m4a audio.txt ← Whisper transcription [Notebook_Name]/ ← Standalone notebooks (no stack) ... export/ ← Original .enex files — DO NOT modify en_backup.db ← Original Evernote database — DO NOT modify convert.py ← The conversion script CLAUDE.md ← Instructions for Claude Code

Stacks

Stack DirectoryNotebooks
All TechFrontend Development, Minecraft Modding, Reference
Alt-ProcessCarbon Printing, Cyanotype, Gum bichromate
Egypt - Ancient StackEgypt - Ancient, Gardiner Sign List
Hipsta StoriesGathering of Spirits, Instant Transit, Iris and the Magic Eye, Scratch, Song of the Siren
Letha F. Swope StackAdditions in Woodsville, Letha F. Swope, Letters
Notebook Stack17 notebooks (PB1099, The Papyrus Diary, Web History of AZ, etc.)
Thinker's Club StackThe Dovekeepers

Filename Conventions

How to Use the Archive

Browse Notes

Open notes/index.html in any web browser. All notes are listed by notebook, grouped under stacks. Click any note to view it. Notes with audio recordings are marked with 🎀. Each note has a "← Back to index" link at the top.

Open a Specific Notebook

Navigate to notes/[Notebook_Name]/ in Finder or a file browser. Each .html file is a self-contained note.

Listen to Audio

Audio notes have an embedded <audio> player. Click play in the browser. The original .m4a/.wav files are in the _resources/ subfolder next to the note.

Read Transcriptions

Transcribed audio notes have a collapsible "Transcription" section below the audio player. Click it to expand. Transcriptions are also saved as .txt files alongside the audio in _resources/.

Search

Use your OS file search (Spotlight on macOS) to find notes by title, or use browser find (Cmd+F) within individual notes. For full-text search across all notes, use:

grep -r "search term" notes/ --include="*.html" -l

How to Re-run or Maintain

Running the Converter

# Run all three phases
python3 convert.py

# Run specific phases
python3 convert.py 1       # Phase 1: Parse & Extract only
python3 convert.py 2       # Phase 2: Whisper transcription only
python3 convert.py 3       # Phase 3: Generate index only
python3 convert.py 1 3     # Multiple phases

When to Re-run

Dependencies

Upgrading Whisper Transcriptions

To re-transcribe with a better model (e.g., medium or large), edit the model = whisper.load_model("base") line in convert.py and re-run Phase 2. Larger models are more accurate but significantly slower on CPU.

Technical Details

ENML Conversion

ENML (Evernote Markup Language) is a restricted subset of XHTML with custom tags:

ENML TagConverted To
<en-note><div class="en-note">
<en-media type="image/*"><img src="...">
<en-media type="audio/*"><audio controls src="...">
<en-media type="application/pdf"><a href="...">filename.pdf</a>
<en-todo checked="true"/><input type="checkbox" checked disabled>
<en-todo checked="false"/><input type="checkbox" disabled>
<en-crypt>[encrypted content]
All other HTMLPassed through unchanged

Resource Matching

Each <en-media> tag has a hash attribute (MD5 hex digest). The script computes hashlib.md5(decoded_bytes).hexdigest() for each resource and uses this to map media tags to extracted files.

Memory Safety

Files like PB1099.enex (1.7 GB) and All Instagram Posts.enex (669 MB) require streaming. The script uses iterparse and calls elem.clear() after processing each note to prevent the full XML tree from accumulating in memory.

Path Encoding

All file paths in src and href attributes use urllib.parse.quote() percent-encoding. This correctly handles apostrophes, emoji (e.g., the Pop-πŸ‘ris notebook), commas, parentheses, and other characters that html.escape() would break.

Preservation Notes