The scheduler problem

The Scheduler problem is the biggest open problem in the Anki/Spaced repetition learning community I believe. As any good research problems, there are two questions to consider: what are the problems we want to solve, and how to solve them. I've no idea how to solve them, but at least, I hope I can contribute by allowing to clarify the questions.

I've wrote an uncountable number of time, Anki improved my life drastically. Anki is far superior to any non-assisted technique that exists for learning. We have an impressive number of very positive feedback. That's great, I'm really proud of my contributions to Anki, I wish more people use it... but that does not mean that Anki is optimal. I would invite any student, and generally any people who loves to learn or have to learn to use anki or another spaced repetition system such as Super Memo (SM). However, once I consider that Anki already exists, is a part of my life, and that I am an important contributor to this ecosystem, the question remains: how to improve the software.

What is a scheduler

The role of the scheduler is basically to decide what the user should review now. Technically, it is more complex but I am trying to get the big picture first. The user may choose to review everything, or maybe just to review their guitar related question because they currently have their guitar, or their geography because that is what their next exam is about... The scheduler is in charge of deciding which questions to ask and to record the answers given to those questions.

There are two questions that a scheduler algorithm must answer. What are the input it requires to takes its decisions and what should it optimize ?

Input

The input may varies a lot depending on the app'. There are two kinds of inputs I believe; inputs that remains consistent over time and inputs that change. For example, if the user only has access to a computer during breaks at school, then it is important to know that they only have access to the system at most 30 minutes each work day and not during holidays at all... We may also state that some deck is more important, e.g. learning French may be more important than learning Latin because of the way school grades those matters.

And there is the far bigger can of input, which is the inputs that gets logged over time. Some inputs are useful but currently non existant for most users. For example super memo's user should log their sleeping hours because it is supposed to influence a lot the learning process. Anki does not have this information, and any new scheduler for Anki should take into consideration that the data is not available. Of course, it could ask the user to enter this information in the future, or any other information, but it should also be able to work without it.

The input usually contains the past reviews. In the case of anki, it contains for each question id the timestamp, the time taken to answer (usually up to a minute, assuming that if a card took more than a minute, the user simply left their computer/phone and stopped using the app, and so the time is not a useful information), the answer button pressed. It should be noted that anki store the question unique identifier and not the actual question. This means that if for any reason the question has changed, this information is absent. For example, let's say you have a question which is "Who is the prime minister of the U.K.", there is a risk the user changed "Theresa May" to "Boris Johnson" instead of creating a new card, which means that the scheduler does not know that the answer has changed and the card is as good as new.

The input also contains the set of all questions. Anki uses this last informations very basically, simply by having a few parameters that can be configured on a deck by deck basis. The user should configure those parameter themselves, which is uselessly complex. Indeed, the scheduler could figure out that if a lot of card are hard, the next card will probably be hard too, and so the paramaters should be set accordingly (for example, by having a lower number of new cards seen each day)

However, it is easy to imagine that if two cards are sibling (i.e. generated from the same information. Such as, "Drink(English)"->"Boire(French)" and "Boire(French)"->"Drink(English)"), the easiness of both cards is correlated.

If a card asks "what is a square" and the answer is "a rectangle that is a diamond" then this card is at least as complex as the cards containing the definitions of "rectangle" and "diamond" (or may be actually they are easier... or they are related in some other way that I do not know).

We can assume that the card "Trinkt(German)"->"Drink(English)" is easier than "Boire(French)"->"Drink(English)". In theory, both questions are really similar, and the computer may have trouble seeing why one is easier, but for a human it is clear that "drink" sounds similar to "trinkt" and not to "boire".

I should note that knowing what the user want to optimize may also require more input data, but I'll leave that for the next section.

Optimizing

The very complex question that must be solved is what should the scheduler optimize ? Here are a (absolutely non-exhaustive) list of example I can think of:

Passing an exam

If you have an exam and you know you need to have at least 12/20 to pass your class, and that the exam is the 28th of june and the second try is the 15th of july, then your goal is to have enough knowledge and skills to get at least 12/20 the 15th of july. Since nothing can be sure, I assume we should state that you get at least 12/20 with 99% of probability the 15th of july. And since you want your month of june mostly free, you can also try to succeed with 95% probability for the 28th of june. In theory, you want to get exactly 12/20, because any better grade is time lost that you can have spent doing more interesting things.

I should note that, unless the exam contains very basic question such as "what is the capital of France" and "what is the name of the river in Berlin" then it's actually hard to modelize the note you'll get. I should also note that the goal is not to get 12/20 on average, because if there is 50% of change that you get 20/20 and 50% of chance you get 4/20, you'll have a high probability to fail the exam.

Being first at a contest

This problem is harder than the last one, because the actual amount of knowledge and skills you need depends on the other contestant. In the worst case, if everyone uses the same scheduler as you, the scheduler can't guarantee anything since they are always some looser ! So the better approximation to this problem consists simply in trying to get the maximal score... Or at least to the maximal score possible when satisfying some other constraints. For example, maybe the maximal score requires to spend 3 hours reviewing each day, but that you have a job and can't spend 3 hours each day reviewing, then the scheduler should also try to take the amount of time you have into account, and possibly the fact that you are reviewing some questions in a noisy place in the public transit and reviewing some other question in your private bedroom...

Note that spending 3 hours each day reviewing is probably a bad idea. So a scheduler should probably never suggest to spend that much time in order to optimize learning; but the scheduler will have a hard time discovering how tired you are if you are mentally exhausted by anything other than reviewing. Or maybe the scheduler should take into account that if you start to get an unusual number of bad answer, then you should stop and get some rest..

Learning as quickly as possible

Maybe you don't care about a specific date, and you want to learn something as quickly as possible. Let's say that you want to learn Greek. The Greek language has 24 letters, with upper and lower caps, so there are 59 letters[1] you need to learn. Your goal is going to be learning them as quickly as possible, since it is a preriquisite to any ulterior learning of Greek, and you can't wait until the exam occurs to know them. I may imagine that this require a very different scheduler than the one of the previous cases.

Getting a shallow knowledge of a subject

Sometime, I don't care about knowing a subject well, I just want to have a shallow knowledge. I believe in France it is what we call "Culture générale"(General culture), we need to show off that we know a lot of thing even if we don't know anything well. A lot of people in politics, newspaper... are supposed to be able to discuss any subject, what matter is your style, the way you present your (non-existant) knowledge, by using a few buzzword. I would believe it to be really useful for salesman, for job interview... As you can guess by my writting, I'm not a fan of it, but I know it exists, and that it is a skill which opens a lot of possible actions. I admit that I used it myself to prove to some people that I have a basic knowledge of their field of speciality, even if I can never cite anything that is not in the introduction chapter of the 101 lecture about this subject, and that is usually sufficient for people to consider that they can seriously discuss with me, even if we can't have a research-level discussion[2].

I would assume that in this case, the scheduler should be really different than in the previous case, since your goal is not actually to know anything well, but just to ensure that you always have a few buzzword accessible in your mind and that when someone uses a buzzword, you know in which field this buzzword belong. So you can actually concentrate on some easy card and quickly decide to ignore anything that is difficult.

Learning because it may one day be useful

Similarly to the last point, you may also want to learn something in case it turns out to be useful, and still know it well. For example, I can imagine that it is useful to know the countries of the world. I don't expect to ever need to know the full list, but it may be useful if someone comes from a country which is not often in the news that I show that I know where the country is. That probably show more respect for their countries than most people give. Or maybe just because if I need to take a trip there one day, I'll have in my mind the list of other countries nearby that I may want to visit. As I've no reason to learn this, I can take a lot of time, and in this case, it's not easy to see what to optimize. I guess I could state that I want to optimize the usefulnes of the time I spent using the spaced-repetition app maybe...

Optimizing the pleasure

I'm learning guitar and piano. Anki helped me a lot with it. However, there is no goal for me other than to be able to play music (and maybe cruise using it...). I'm also learning lyrics of songs, because it often create strong emotion to myself when I sing (badly) a song I like. There is probably no reason to do all of this. Maybe what should be optimized is the pleasure I spent, because it associates the use of Anki to something pleasurable... I'm not really sure of myself here, because anki allows me to concentrate on the more complex scales (on piano), which is not really pleasurable directly. It only get pleasurable later, when I can actually play well things that uses the skill developped thanks to the scales.

Why this is an imperfect setting

Who we teach

As I recently wrote, Anki is nice for people who have self-motivation. It does not solve in the slightest the far bigger problem of teaching to all children, teenagers and students that don't want to learn, that don't see the point (yet), who does not know how to read or who have trouble doing it. The problems I listed above entirely ignore this question. This question totally ignore the crucial question which is to ensure people actually use the software !

How to use domain specific knowledge

To teach how to recognize birds, mushrooms, some developpers use a huge database of pictures that are known to be correctly labelled, and each time the user uses the app, a new picture is shown. This ensure the user does not learn to recognize a particular picture, but learn to actually recognize the specy. This does not seems to be even possible in the current setting[3].

Similarly, Duolingo teachs foreign language, and their uses a lot of context that is available only for foreign language. For example, if there is a typo, they can guess what general grammar rules the user forgot, and so, they can make decisions which depends on the exact error the user made to decide what they should review[4]. They can show hints for each word independtly.

Current solutions

Currently, the best solution I have heard of is the Super Memo algorithm, version 18; I must aknowledge that I have not spend the time required to understand it correctly and see whether the article contains enough information to implement it in a distinct software, but that is certainly something that should eventually be tried. Anki currently uses a variation of the version 2 of super memo algorithm, and it is obvious that 18>2.

I have a huge admiration for Woz, the creator of supermemo, and arguably the creator of spaced repetition. I am not convinced that, simply because he is the creator, he spent 35 years working on spaced repetition, and have access to a huge amount of data, nothing better can be done. I would expect that, even for a genius, there is a limit to what you can do mostly alone. Furthermore, during most of those 35 years, he would not have had access to machine learning and other modern tools to analyze data, and while his insights seems amazing when reading his wiki, it is not obvious to me that he did not miss any other important variables. I already explained that access to the data is possible even if potentially complex, but that does not mean that the problem can't ultimately be solved. Any scheduler requiring data of a lot of user is going to be late in the race with Anki and Super Memo, but that does not remove all hopes.

I should also mention that duolingo published some research, but it seems dubious that their system can be used for a non-centralized system. I went to human learning workshop, but I don't actually know how any general solution can be found from their research. In particular, while I offered to help them interface with anki and its userbase, they never contacted me. Seems a real user base is harder to consider than some clean data set obtained thanks from Amazon Turk with people in settings you can easily control. But maybe there are more research I missed that would be of interest.

Notes

[1] There are two lower case sigmas

[2] I guess that, especially in France where searcher are really badly considered, anyone showing your respect to your research subject is already more friendly than a random people you meet in social event.

[3] Actually, if we can access the database through javascript, it could be possible since a question is an arbitrary web page, but that's not trivial at all.

[4] I believe they used to do so, but they stopped, not clear why.

Effective altruism and criticism toward activism: Answer to a paradox

Since a little while now, I have been exploring the notion of Effective Altruism - EA for short. My readings on the topic so far have been very interesting[1], and I would like to add my own idea that I deem important and have yet to read elsewhere. If ever this has been written down somewhere, I can at least attest to it being all well too hidden. Personally, I believe that it should be discussed in introductions to the EA topic.

Note

[1] I have attempted some meeting with the French EA group, and have seen nothing but discussions yet. As it seems I have been more effective through direct actions against LGBTPhobia in high-school - for all my uncertainties about them - it had seemed pointless for me to join.

Continue reading

Collaborative decks in Anki

A lot of people want to create collaborative deck for Anki. In September 2018, I had already made quite a few add-ons, and some people contacted me thus to discuss collaborative decks. It has always been in the back of my head since. I'm going to try to write down every thoughts I had and why it seems quite complex.

Continue reading

How hard can it be to code a feature to let users resize images in a software.

2020-02-17-004518_790x883_scrot.png

In this post, I expect to show you why it may be difficult to create a seemingly simple program. In particular, to do it well. I'll show case with the last program I wrote, an add-on for Anki. More precisely, the most wanted add-on for Anki, according to the vote of users of Anki's subreddit: being able to resize image in the editor. This seems to be a simple add-on; after all, resizing by dragging corner has been done in every editing software for decades[1]. In this post, I intend to document all of the things which made me loose time when I created the add-on "Resize image" for Anki. I also created a video showing how the add-on works.

I'm going to mostly consider the code problem relating to add-ons. This is going to be technical, but I'm going to try to give intuition to people who don't code. I'm going to consider changes in order I made them.

Note

[1] Appart from LaTeX, but let's not consider it.

Continue reading

How I learn lyrics with anki

After years of using anki, I finally found a nice way to learn lyrics. I think I tried three different methods before finding one which works for me. More precisely, I found it a few months ago, and after testing it, I can finally way I found something which works.

Continue reading

Learning how to play music with anki

I've been playing music for half of my life. But while I was enjoying sight reading partitions, and sometime practiced a little bit the boring part (scales, arpeggios), I have been stuck. Here is a list of what changed:

  • The most frustrating thing for me being that I relied on partitions. Which means that if you gave me a piano or guitar without a partition, I wasn't able to play anything. I found that ridiculous, and anki helped me solve that.
  • Similarly, I played classical guitar, and I didn't know how to read tab. Because, honestly, they are so many chords, I keep forgetting them. Which means that, if you give me a song with tab, as they are hundred of thousands of them, I couldn't play it, because it was not written in a way I can easily read. I don't know every single chord yet (and I'll probably never know them all), I know far more chords today than what I knew before I started anki, and it clearly helps learning songs and doing improv.

The example in this post are related to ocarina, guitar, piano, harmonica and tin whistle. I will explain what differs and what is similar for all of those instruments. Some explanation may not always be clear, if you don't know the instruments I'm talking about. But don't worry, if you don't understand, just read the next paragraph, you should be able to get the general idea.

This article will be illustrated using almost only cards that I have really seen the day I was writing this article. You can find here my [piano], [guitar] and [ocarina] decks. They are far from being perfect, some typos may still be in them. But it may help you to understand what I write here. And maybe you can find them useful in your collection.

Continue reading

Anki and learning which require practice (origami, knot, instrument...)

I use anki to learn things which require practice. Origami, drawing, music, rope (nodes and shibari). Music will be considered in another text.

I consider two kinds of practical knowledge:

  • some practice requires making choices regularly (like drawing, or musical improv)
  • some practice requires learning and practicing some exact moves over and over. That may be the case when you want to learn a musical piece, or how to tie some particular note.

I don' have any idea how to deal with the first kind of knowledge, thus I'll only consider the second kind. I'll list here different methods, which depends on what I want to learn. I don't know in general how to decide which method is the best one.

Continue reading

Lists in anki: desiderata and partial solution

In this text, I assume you are familiar with anki, and in particular know what is a field, a card, a card's type (aka template), a note and a note's type (aka a model), and that you have an idea of what are the rules used by anki to decide which cards should be generated or not.

There is one big limitation in anki, it concerns lists[1]. Here I list my trouble, the existing work arounds I know, their limits, and the functionnality I would really want. Sadly, this functionnality seems to require such a big modification of anki's underlying model that I fear that no add-on can answer my request. In particular if I want this request to also be satisfied in smartphone's application, which does not allows to add add-ons.

Learning a list of things is hard, but it's something I sometime want to do. A poem/song is just a list of line. Sometime, a mathematical notions have 4 distinct names. E.g. a pullback is also called a fiber product, a fibered product and a Cartesian square. In some othe case, a mathematical objects admits many distinct definitions[2]. E.g. I've got 5 definitions of left-trivial monoids. And I'd also wanted to see if I can learn the list of the prime number less than 100. Mostly to see how hard it is to learn an arbitrary list.

Notes

[1] I assume here that sets are list, with an arbitrary order

[2] This is in general considered to be a proof that the object is really interesting

Continue reading

Note on an introduction on Anki given a 35C3

This post is a comment about a self-organised workshop Introduction to anki I gave at #35C3 (35th Chaos Communication Congress, a congress of 17k hackers). This workshop was announced on the anki's subredd where I asked for ideas. I received a lot of useful feedback from this subreddit and from the related discord server. The main audience of the current blog post is thus those person, already in anki's community. This post contains idea in random order.

Continue reading

The trolley problem, and what you should do if I'm on the tracks

Originally published in French and crossposted on LessWrong. Translation by Épiphanie.

Trigger warning: Death, suicide, and murder. Trolley problem.

This is quite the conventional and ethical conundrum: You are near train tracks, and a train is rolling down the hill. It is going to run over 4 people who are tied to the rails of the main track. However, you can change the train's direction to a secondary track by pulling a lever; so that it runs over only one guy, also tied down the rails. Should you pull the lever?

I do believe there is a more interesting way to frame it: What would you choose if you are yourself tied to the rails, alone, while the train is not heading toward you yet. My own answer is very simple: I want the person deciding where the train should go to have _no doubts_ they should pull the lever! Because, for lack of context, I assume that the other four people are just me, or rather copy of mes. That's a bit simplistic, of course they are not perfect clone. But as far as concrete predicates go, they are indistinguishable. That is to say I have odds of being on tracks alone of 1 in 5, and odds for being in the group of 4 in 5. And tell you what, I prefer dying with 20% probability because of what someone did, rather than to die with 80% probability because no one was ever willing to take the burden of responsibility.

Continue reading

Page top