A local, browsable HTML archive of 4,344 Evernote notes (4.7 GB), converted from .enex exports before account closure on March 11, 2026.
In February 2026, Evernote was exported as 112 .enex files (78 standalone notebooks + 34 notebooks nested in 7 stack directories), totaling 4.7 GB. The goal was to convert this into a self-contained HTML archive that:
The conversion is handled by a single Python script (convert.py) with three phases that can be run independently.
The script walks export/ for all .enex files, determines the output folder from each file's position in the directory tree (stack directories become parent folders), and stream-parses each file with xml.etree.ElementTree.iterparse for memory safety on files up to 1.7 GB.
For each <note> element, it:
<resource> element's base64 data, writes the file to a _resources/ subfolder, and computes its MD5 hash<en-note> becomes a styled <div><en-media> for images becomes <img> matched by MD5 hash<en-media> for audio becomes <audio controls><en-media> for PDFs and other files becomes download links<en-todo> becomes HTML checkboxesResult: 4,344 HTML files and 16,543 extracted resources in 26 seconds.
After extraction, the script walks notes/ for audio files (.m4a, .wav, .mp3, .amr, .aac), loads the Whisper base model, and transcribes each file. For every transcription:
.txt file is saved alongside the audio<details> transcription block is injected into the parent HTML file below the <audio> tagResult: 437 of 443 audio files transcribed with content (6 were empty/silent), 269,295 total words. Ran on CPU in ~57 minutes.
Creates notes/index.html with all notebooks grouped by stack, note counts per notebook, and clickable links to every note. Notes containing audio recordings are tagged with a π€ emoji for easy identification. A legend at the top shows the total count.
Result: Full index with 4,339 note links (5 .html files in _resources/ folders are embedded web page attachments, not notes), 222 tagged as audio notes.
During testing, discovered that html.escape() was encoding apostrophes in src/href attributes (' became '), breaking image paths for notes like "Egypt's corrupt decades." Fixed by switching to urllib.parse.quote() percent-encoding, which browsers handle correctly for file paths.
| Stack Directory | Notebooks |
|---|---|
| All Tech | Frontend Development, Minecraft Modding, Reference |
| Alt-Process | Carbon Printing, Cyanotype, Gum bichromate |
| Egypt - Ancient Stack | Egypt - Ancient, Gardiner Sign List |
| Hipsta Stories | Gathering of Spirits, Instant Transit, Iris and the Magic Eye, Scratch, Song of the Siren |
| Letha F. Swope Stack | Additions in Woodsville, Letha F. Swope, Letters |
| Notebook Stack | 17 notebooks (PB1099, The Papyrus Diary, Web History of AZ, etc.) |
| Thinker's Club Stack | The Dovekeepers |
< > : " / \ | ? * are stripped_2, _3, etc.Open notes/index.html in any web browser. All notes are listed by notebook, grouped under stacks. Click any note to view it. Notes with audio recordings are marked with π€. Each note has a "← Back to index" link at the top.
Navigate to notes/[Notebook_Name]/ in Finder or a file browser. Each .html file is a self-contained note.
Audio notes have an embedded <audio> player. Click play in the browser. The original .m4a/.wav files are in the _resources/ subfolder next to the note.
Transcribed audio notes have a collapsible "Transcription" section below the audio player. Click it to expand. Transcriptions are also saved as .txt files alongside the audio in _resources/.
Use your OS file search (Spotlight on macOS) to find notes by title, or use browser find (Cmd+F) within individual notes. For full-text search across all notes, use:
grep -r "search term" notes/ --include="*.html" -l
# Run all three phases python3 convert.py # Run specific phases python3 convert.py 1 # Phase 1: Parse & Extract only python3 convert.py 2 # Phase 2: Whisper transcription only python3 convert.py 3 # Phase 3: Generate index only python3 convert.py 1 3 # Multiple phases
convert.py's parsing or template logic. Delete notes/ first for a clean run..txt files and re-injects transcriptions. Processes all audio files each time.index.html. Re-run after any changes to the notes/ folder structure. It does not overwrite changelog.html or guide.html — those are maintained separately.pip install openai-whisper (Phase 2 only)To re-transcribe with a better model (e.g., medium or large), edit the model = whisper.load_model("base") line in convert.py and re-run Phase 2. Larger models are more accurate but significantly slower on CPU.
ENML (Evernote Markup Language) is a restricted subset of XHTML with custom tags:
| ENML Tag | Converted To |
|---|---|
<en-note> | <div class="en-note"> |
<en-media type="image/*"> | <img src="..."> |
<en-media type="audio/*"> | <audio controls src="..."> |
<en-media type="application/pdf"> | <a href="...">filename.pdf</a> |
<en-todo checked="true"/> | <input type="checkbox" checked disabled> |
<en-todo checked="false"/> | <input type="checkbox" disabled> |
<en-crypt> | [encrypted content] |
| All other HTML | Passed through unchanged |
Each <en-media> tag has a hash attribute (MD5 hex digest). The script computes hashlib.md5(decoded_bytes).hexdigest() for each resource and uses this to map media tags to extracted files.
Files like PB1099.enex (1.7 GB) and All Instagram Posts.enex (669 MB) require streaming. The script uses iterparse and calls elem.clear() after processing each note to prevent the full XML tree from accumulating in memory.
All file paths in src and href attributes use urllib.parse.quote() percent-encoding. This correctly handles apostrophes, emoji (e.g., the Pop-πris notebook), commas, parentheses, and other characters that html.escape() would break.
export/ directory and en_backup.db are the original, untouched source data. Everything in notes/ can be regenerated from them.