dict-to-flashcard: Local-first spaced repetition with macOS Dictionary and Logseq

I made an thing to help me remember words by gluing 5 apps together

November 17, 2024

TL;DR

I made a tiny alfred workflow that gets the word definition from macOS dictionary (with extra steps), converts it to logseq flashcard. This allows me to practice words I met while reading, on the phone too, offline too.

repo.

alfred workflow.

twitter shitposting as I made this.

Introduction

I learn languages. At the time of writing, Serbian. This is not a very popular language, so I can’t use Memrise or other spaced repetition apps to help me with words and phrases.

I use Logseq to store my notes, and it has its own spaced repetition system called flashcards, which format is a no-brainer. I (and every mac computer) also have Apple Dictionary to look up words with dictionaries good enough and the app that works very nice as a dictionary.

So I thought, let’s mate the two.

Extracting definitions

MacOS dictionary app works great. You can select dictionaries you want to use, order in which translations appear, and it updates as you type. It doesn’t need internet connection, and dictionaries quality is great. There is no Serbian dictionary, but there is a Croatian thesaurus and Croatian-English and back, which is good enough, at least for learning purposes.

It also have no intention to give all these definitions to you in any form except showing them in the interface. No API, no shortcuts, no nothing.

But if it doesn’t need internet, there has to be files. And those files can be found and parsed, right?

Right, there are files. And I’m not the first person having this train of thoughts. And someone have done the parsing, and then someone else improved it and made into a python package called apple-peeler.

The repo: apple-peeler. README has the references section that mentions a very enjoyable to read hackernews discussion.

After using this tool you’ll get a file per dictionary where every line is a definition in some xml, which the dictionary probably renders as is. To find a particular word I used ripgrep:

-> rg 'd:title="odškrinuti"' ../apple-peeler-output/Croatian.xml --max-count=1 --no-line-number
<d:entry xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng" id="59304" d:title="odškrinuti" class="entry" lang="hr"><span class="hg x_xh0"><span role="text" class="hw">odškrinuti </span><span class="pr"><span class="gp tg_pr"> | </span><span d:prn="UK_IPA solitary" lexid="m-hr0059298.007" class="ph t_other">odškrínuti<d:prn></d:prn></span><span class="gp tg_pr"> | </span></span></span><span class="sg"><span lexid="m-hr0059298.002" class="se1 x_xd0"><span role="text" class="posg x_xdh"><span d:pos="1" class="pos">svrš. <d:pos></d:pos></span><span class="infg"><span class="gp tg_infg">(</span><span class="sy">prez. </span><span class="inf">odškrinem </span><span class="pr"><span class="gp tg_pr"> | </span><span d:prn="UK_IPA solitary" lexid="m-hr0059298.008" class="ph t_other">òdškrīnēm<d:prn></d:prn></span><span class="gp tg_pr"> |</span></span><span class="gp">, </span><span class="sy">pril. pr. </span><span class="inf">-uvši </span><span class="pr"><span class="gp tg_pr"> | </span><span d:prn="UK_IPA solitary" lexid="m-hr0059298.009" class="ph t_other">-ūvši<d:prn></d:prn></span><span class="gp tg_pr"> |</span></span><span class="gp">, </span><span class="sy">prid. trp. </span><span class="inf">odškrinut </span><span class="pr"><span class="gp tg_pr"> | </span><span d:prn="UK_IPA solitary" lexid="m-hr0059298.010" class="ph t_other">òdškrīnūt<d:prn></d:prn></span><span class="gp tg_pr"> |</span></span><span class="gp tg_infg">) </span></span></span><span role="text" class="gg x_xd1"><span class="gp tg_gg">[</span>što<span class="gp tg_gg">] </span></span><span lexid="m-hr0059298.004" class="msDict x_xd1 t_core"><span d:def="1" role="text" class="df t_standard">malo otvoriti, jedva malo rastvoriti<span class="gp tg_df">. </span><d:def></d:def></span><span role="text" class="xrg"><span class="xr">pritvoriti</span><span class="gp tg_xrg">: </span></span><span role="text" class="eg"><span class="ex">odškrinuti vrata </span><span class="gp tg_eg"> | </span></span><span role="text" class="eg"><span class="ex">odškrinuti prozor</span><span class="gp tg_eg">. </span></span></span></span></span></d:entry>

Making sense of definitions XML

That’s the main part of the app, but also there is not much to tell about here. This was mostly trial and error process, and it’s not finished. But it’s good enough for my purposes.

I have a text file with my findings but I don’t know if it’s any generic. For example, there are two types of definitions in the Croatian thesaurus, one is wrapped into nodes with classes .se2.x_xd1.hasSn, and the other type has .msDict.x_xd1.t_core classes. And there is a chance I haven’t met more types yet, for definitions or other things I try to extract.

Speaking of, I managed to extract the word itself, transcription, definitions and phrases. More than enough for a flashcard.

A way to find problematic words and new layouts would be to go through all the words in a dictionary and compare the text that the app extracts and the text that is there, and raise a flag if the two are very different by some metric, even by size. I might do this at some point, and pull requests are very welcome.

Formatting it as a flashcard

Logseq has a very easy flashcard format. Two main concepts are marking blocks as cards using #card and wrapping parts of it into {{cloze ...}} so it hides them from you while you are trying to remember them. The parsing of it is not ideal (I think it gives up on parsing more than 3 clozes in a block and they don’t play nicely with some markdown formatting) but good enough.

This is what I got:

- ### {{cloze #{heading.text}}}
  {{cloze #{GENDERS[gender] || gender}#{inflections}}}

  Definitions

  #{definitions}#{definitions2}

  #{phrases.empty? ? "" : "Phrases\n\n  " + phrases}
  #card

And this is what it looks like rendered:

Cards in Logseq are just blocks. It means that it’s easy to modify them manually, and if the app formatted something incorrectly or I want to add my own note, I can just do that right when it’s shown to me. This proved extremely useful, I often add translations from the languages I know and sometimes synonims, phrases and associations have more use for me than the definition itself.

Wrapping it all into an Alfred workflow

I use Alfred to run things, be it an app, a file, or a helper. It’s really nice and allows to make your own workflows. So I made one. It takes the word, find the definition, puts it into logseq, and also opens the word in the dictionary app.

Conclusion

At the moment of writing I have 192 cards, and 8 to repeat now. Spaced repetition works ok, the whole thing is convenient and surprisingly doesn’t break every other word. Styles could be better but could be worse.

It also checks local-first and works-on-mobile boxes, because logseq can sync its’ files via icloud. I actually use both if I want to read offline, and to practice while on mobile.

The video demo is in the repo, go check it out.

←

Improving CSS in Firefox Reader Mode

Ultranormal Signal

→