Google Summer of Code, point of view of a new admin/org

It's been a month since it has been announced that AnkiDroid was selected for Google Summer of Code (GSoC). Here is the story of a new admin, in a first-time organization. As it’s standard to state, views are my own, not my employer nor my organization. It explains how we went from unprepared to a huge success even before the end of the application phase!

GSoC is a yearly event financed by Google. Organizations apply, describing their open source software, the languages and tools they use, and list some potential projects. Google then selects organization, lists them on their blog, and students can decide where to apply. The selected students can earn from 1500$ to 3500$ depending on the country they live in, paid by Google, and a nice line on their resume.

We are competing for attention with some of the biggest Open Source organizations in the world, from programing language and tools (django, Python, Elm Tooling, Gcc, Godot, Apache), OS (Debian, FreeBSD, Gentoo), command line tools (FFMpeg, Git, Gnu Mailman), user facing software (VLC, Chromium) and websites (internet archive)... I thought that most students would apply to a project they already know; one must be so proud to state that they contributed to Python or VLC that we would never be able to compete with them in attracting students. I mean, you're pretty sure in your career to meet people who code in Python and like it, and people who use VLC daily to watch movies, everybody will be thankful to you for being a part of such wonderful tools! Compared to them, AnkiDroid, with 2 million active users, with a big community of language learners and medical school students, there is no reason we would attract any developers. There are a bunch of developers using AnkiDroid of course, and I know plenty of them... but the ones I know are almost all contributors to AnkiDroid, so that leads to a huge bias!

So, let's say I was surprised when we got 180 emails from interested students. This led AnkiDroid team to have secrets, for the first time ever. Up to March, virtually every single discussion was public. However, we felt like we could not ask students for their feedback on the rules we would use to select them. Many would be biased to choose rules that would select them. Furthermore, it's not "the whole Ankidroid community" who would be mentors, but only four of us, so in a way, I find it acceptable that the actual people who would do the actual work choose the rules to select who we would mentor.

For the first time ever, I really felt like I deserved my role as maintainer. Previously, I only worked on Anki(Droid) code. This is to be compared to the first of the current maintainers, Mike, who did all the tasks to automate the deployment process and improve automated testing; and also did start the Open Collective. I have the power to manage the open collective if necessary, and to decide what to do with the 16k$ we currently have, but it's a power that I never wanted and that I'll try to use only if nobody else can do it. I used to only do code, and suddenly, I started writing a lot of documents, for Google's application, to students, and to people who gave us money too.

I had to write a letter to students explaining what we would take into account when we choose them, how they should apply, what we expect. We didn't prepare this in advance since we thought we'd get a few candidates we already knew. And we easily realized that to choose between 180 people, or even 16 people, we needed more formal criteria. I went to find Nicolas, the creator of AnkiDroid, who has dealt for years with GSoC in MediaWiki, to give us advice and feedback; that was extremely helpful. I'm kind of sad that students can not thank him directly, but I guess that dealing with MediaWiki (the tool behind wikipedia), means that AnkiDroid is actually a small project for him!

Suddenly, I understood why people I application for internship with MediaWiki, InkScape or Git are so hard. Why they require one or two Pull Requests (code contribution) from each candidate. I find it distasteful to require students for free work in order for them to compete for money. Discovering a code base takes, at best, hours of work. And GSoC attract so many people that most student will not be selected. I'm a-okay with it because AnkiDroid has an Open Collective, with 16k$ in our account, so we could pay a little bit of money to the students if they want; but still we only pay 10$/hour and at most 200$/month. It's not a job, but a tip from our users to a free software[1]. To be fair, the same rules apply to maintainers, we are not asking anything from the students that we don't require of ourselves first. Clearly, we had to have these kinds of drastic requirements, asking people to act to prove their interest.

On the other hand, we were ready to help people onboard, we took time - a lot of time - answering questions, annotating issues as "good first issues" so that they can have something to work on[2], reviewing. I lost time reviewing. There were many errors that students made that I could correct myself quickly. Correcting code interpreted by a computer is so much easier than explaining something to a human. However, if I did the correction, the student/contributor would not have learned, and that would be a clear loss in the middle term. We spent an incredible amount of time teaching about atomic commits, commit messages, rebasing interactively to correct typos and errors instead of adding a commit on top of another. We tried to explain why we would want that tests pass on each intermediate commits. That small commits help the reviewers review more easily. In March, we got 38 contributors[3], while we only got 26 drafts of GSoC applications. I suspect that some people realized that it's hard to contribute correctly, harder in some sense than doing an academic exercise, and decided not to go through the whole application process.

I want to mention some PR that were important to me. One task in AnkiDroid was slow. It was rarely done but could take up dozens of second when the user runs it. I did try to optimize it as much as possible but could not get something correct without rewriting fundamentally the database layer. One contributor made a one line change, a where in a query, and that saved quite some time; somehow I totally missed it. Another contributor wanted to uncouple multiple elements in a big class in order to add features. This led to splitting a PR in two preliminary parts, and one of those preliminary parts was also split into another simpler PR!

So many contributors simultaneously meant that, for the first time ever, we really had to require people to ask us to be assigned tasks. I think we had one or two conflicts where people corrected the same bug. That almost never occurred before.

There are three people we rejected before they submitted their application. In all cases, it was people who we were not able to discuss with. However, the three cases were very different. One person failed to understand how to copy and fill the Google Doc template, nor how to join the discord server the team uses to discuss. Another asked us multiple times to review an application which did not follow the template and where a lot of information was missing. The last one was someone very talkative. It was one of the longest application I did read, and I could not make sense out of it. For example, they explained that the PR they submitted saved a lot of time, and when we answered that: it does not compile, and even if it did, we wanted actual measured numbers, we got an answer that they are also a developer and that it's clear it saves time... Generally, most of the conversation seemed empty of actual technical meaning, and I totally failed to explain what we required in the team.

We required candidates to also have written one test. Except that we didn't clearly explain what our rules about tests were and how to find missing tests. So I wrote a test document on our wiki.

I thought that our codebase is not really excellent, even if I tried to improve it as much as possible since I joined. I thought it would be hard and only people motivated by their love for AnkiDroid would take the time to understand and contribute. At best, I was expecting some simple little issues and tests. I was really astonished to find people who just discovered Anki actually make real non-trivial quality changes that would have taken me more than an hour - and probably took them far more since they didn't know the codebase. I'm happy that "have used AnkiDroid previously" was a preference and not a requirement, as I would not have wanted to reject such good contributions arbitrarily; even if I'm not sure what motivates them to participate.

Multiple people wanted to improve our UI. It's really old-looking, we are not a beautiful app. This is a strong complaint from new users, and it's quite probable that we lost a lot of them because of that. Medical school students are so desperate that if someone they trust tells them we are a really great tool, they'll try it, but more casual learners may not care so much that they'll try something that looks old. However, we rejected most of the UI change proposals. I also wrote a wiki page explaining why so that we don't have to repeat ourselves over and over and have a source of truth that all maintaineurs agree on. Essentially, it's really easy to have a strong opinion about the size of texts, a color, and really long discussion can ensue. We DON'T want to deal with it and will not deal with it unless the person can convince us with real arguments. Improving accessibilities is great. Adding a missing feature too. But just changing to make something more beautiful is not acceptable currently. We have very vocal hard-core users, they want to keep the app not distractive and very basic to use, so they could concentrate on learning.

There are also people who arrive with proposals that make no sense to us. In both cases, people wanted to introduce new features, because it's cool, because every good app has it, and in both cases they failed to answer how it would actually benefit our users. As an example, biometric identification has absolutely NO interest since we save everything in a "media" folder, plus a database in the user phone. Using encryption here would require rewriting a big part of the backend. We don't ever want to have to deal with security at the level of AnkiDroid; if ever the user needs privacy, they should do it at the phone level, we know that we are not competent to do it right and we won't give false safety to our users.

There were only two PRs where I had to ask the contributor for change due to efficiency concerns. One was making O(n^2) work where O(n) could be done. It was easily corrected by using some more efficient sql query, and honestly, I'm happy the contributor understood what I answered, because I don't know how I could have explained the problem if they didn't already know how to consider this kind of question. In the other PR, the contributor was committing data to the database immediately instead of using buffering, saving the data to change in RAM and saving when the remainder of the system considers that it's saving time.

Luckily, I planned to take 11 days of holiday before easter. This gave me plenty of time to do all of this work. The trouble is that I realized that I was starting to be pretty quickly frustrated. I do perfectly know that I can NOT ask every candidate to read and remember every single thing we wrote. The entire wiki, the letter to candidates, the hints given in the template. Worse, when I gave feedback to a candidate, for the sake of fairness, I published it - anonymized and generalized - to all candidates in a #feedback chan. This chan is 4813 words long now, so of course people can't remember every random thing I wrote. However, it still felt frustrating when I had to repeat something a third time, to a third person, realizing that all of the previous work does not mean everyone knows what we asked them to do in detail. It is my personal rules that I always start PR reviews by thanking the person and explaining why the work is great[4]. It's far harder to start review of student applications with positive feedback, I'm not yet sure why. I assume that reading applications is just not something I find as intrinsically rewarding as reading code; I don't expect to be learning new things - in the sense that there is never a moment where I think "this is wise, I would not have thought about it, I hope I'll reuse this techniques to write better".

I really feel outside of my comfort zone. I love to code. I never intended to have to deal with 50+ more people in a month. It's not just that I encountered that many people, even if I rarely meet so many people at once. It's that those people depend on me, on my feedback, they know that I'll take a decision that can have a huge impact on their summer, and thus their resume. I had to play the role of someone who knows what he is doing, not only in terms of code, but in terms of higher-level decisions. This is entirely new and unexpected. I believe I'm doing it correctly, but I also want time for myself, and I don't want the newcomers to be blocked for multiple days either. Both other maintainers are also overwhelmed, so adding my plates to their would not be nice.

To be honest, one unexpected thing is that some contributors who joined 3 weeks ago already started to answer questions and give advice to people who arrived a week ago. That's really beautiful to see. I do not always agree, sometimes I catch a mistake because, usually because they don't have the higher-level view, but that's still really really cool, and I look forward to giving them reviewer rights in a few months if they are still here.

Notes

[1] I decided not to take a single cent. The amount I could make is so low compared to my job income that it's not worth the time it would spend to declare it to the tax administrations and to get my employer's approval.

[2] We already had some good first issues. Not enough for 30+ people!

[3] We had 37 contributors in the whole of 2020!

[4] except for current core contributors. I don't feel I need to ensure that core contributors still feel welcomed. After all they have the keys.

What the "Rationalist Community" means to me

For years, I wanted to write about what the aspiring rationalist community meant to me. Seeing a lot of people criticizing it, sometime with argument I agree with, often with ones which does not represents the reality I've seen, I made a twitter thread about it. Threads are helpful because I find it more acceptable to write whatever comes through my mind randomly, which makes it easier to write than a blog post. I'll translate and post it here too.

Continue reading

The scheduler problem

The Scheduler problem is the biggest open problem in the Anki/Spaced repetition learning community I believe. As any good research problems, there are two questions to consider: what are the problems we want to solve, and how to solve them. I've no idea how to solve them, but at least, I hope I can  […]

Continue reading

Effective altruism and criticism toward activism: Answer to a paradox

For a little while now, I have been exploring the notion of Effective Altruism - EA for short. My readings on the topic so far have been very interesting[1], and I would like to add my own idea that I deem important and have yet to read elsewhere. If ever this has been written down somewhere, I can at least attest to it being all well too hidden. Personally, I believe that it should be discussed in introductions to the EA topic.

Note

[1] I have attempted some meeting with the French EA group, and have seen nothing but discussions yet. As it seems I have been more effective through direct actions against LGBTPhobia in high-school - for all my uncertainties about them - it had seemed pointless for me to join.

Continue reading

Collaborative decks in Anki

A lot of people want to create collaborative deck for Anki. In September 2018, I had already made quite a few add-ons, and some people contacted me thus to discuss collaborative decks. It has always been in the back of my head since. I'm going to try to write down every thoughts I had and why it seems quite complex.

Continue reading

How hard can it be to code a feature to let users resize images in a software.

In this post, I expect to show you why it may be difficult to create a seemingly simple program. In particular, to do it well. I'll show case with the last program I wrote, an add-on for Anki. More precisely, the most wanted add-on for anki, according to the vote of users of anki's subreddit: being able to resize image in the editor. This seems to be a simple add-on; after all, resizing by dragging corner has been done in every editing software for decades[1]. In this post, I intend to document all of the things which made me loose time when I created the add-on "Resize image" for anki. I also created a video showing how the add-on works.

Note

[1] Appart from LaTeX, but let's not consider it.

Continue reading

How I learn lyrics with anki

After years of using anki, I finally found a nice way to learn lyrics. I think I tried three different methods before finding one which works for me. More precisely, I found it a few months ago, and after testing it, I can finally way I found something which works.

To be more precise, I want to learn lyrics of song I love. Songs I've heard a lot of time, and whose meaning I know. The method I give here would not be efficient for a new song. In this post, I'll first explain what I want anki to do, and why I want it. I'll explain how to do it in a second part.

Continue reading

Learning how to play music with anki

I've been playing music for half of my life. But while I was enjoying sight reading partitions, and sometime practiced a little bit the boring part (scales, arpeggios), I have been stuck. Here is a list of what changed:

  • The most frustrating thing for me being that I relied on partitions. Which means that if you gave me a piano or guitar without a partition, I wasn't able to play anything. I found that ridiculous, and anki helped me solve that.
  • Similarly, I played classical guitar, and I didn't know how to read tab. Because, honestly, they are so many chords, I keep forgetting them. Which means that, if you give me a song with tab, as they are hundred of thousands of them, I couldn't play it, because it was not written in a way I can easily read. I don't know every single chord yet (and I'll probably never know them all), I know far more chords today than what I knew before I started anki, and it clearly helps learning songs and doing improv.

The example in this post are related to ocarina, guitar, piano, harmonica and tin whistle. I will explain what differs and what is similar for all of those instruments. Some explanation may not always be clear, if you don't know the instruments I'm talking about. But don't worry, if you don't understand, just read the next paragraph, you should be able to get the general idea.

This article will be illustrated using almost only cards that I have really seen the day I was writing this article. You can find here my [piano], [guitar] and [ocarina] decks. They are far from being perfect, some typos may still be in them. But it may help you to understand what I write here. And maybe you can find them useful in your collection.

Continue reading

Anki and learning which require practice (origami, knot, instrument...)

I use anki to learn things which require practice. Origami, drawing, music, rope (nodes and shibari). Music will be considered in another text.

I consider two kinds of practical knowledge:

  • some practice requires making choices regularly (like drawing, or musical improv)
  • some practice requires learning and practicing some exact moves over and over. That may be the case when you want to learn a musical piece, or how to tie some particular note.

I don' have any idea how to deal with the first kind of knowledge, thus I'll only consider the second kind. I'll list here different methods, which depends on what I want to learn. I don't know in general how to decide which method is the best one.

Continue reading

Lists in anki: desiderata and partial solution

In this text, I assume you are familiar with anki, and in particular know what is a field, a card, a card's type (aka template), a note and a note's type (aka a model), and that you have an idea of what are the rules used by anki to decide which cards should be generated or not.

There is one big limitation in anki, it concerns lists[1]. Here I list my trouble, the existing work arounds I know, their limits, and the functionnality I would really want. Sadly, this functionnality seems to require such a big modification of anki's underlying model that I fear that no add-on can answer my request. In particular if I want this request to also be satisfied in smartphone's application, which does not allows to add add-ons.

Learning a list of things is hard, but it's something I sometime want to do. A poem/song is just a list of line. Sometime, a mathematical notions have 4 distinct names. E.g. a pullback is also called a fiber product, a fibered product and a Cartesian square. In some othe case, a mathematical objects admits many distinct definitions[2]. E.g. I've got 5 definitions of left-trivial monoids. And I'd also wanted to see if I can learn the list of the prime number less than 100. Mostly to see how hard it is to learn an arbitrary list.

Notes

[1] I assume here that sets are list, with an arbitrary order

[2] This is in general considered to be a proof that the object is really interesting

Continue reading

Note on an introduction on Anki given a 35C3

This post is a comment about a self-organised workshop Introduction to anki I gave at #35C3 (35th Chaos Communication Congress, a congress of 17k hackers). This workshop was announced on the anki's subredd where I asked for ideas. I received a lot of useful feedback from this subreddit and from the related discord server. The main audience of the current blog post is thus those person, already in anki's community. This post contains idea in random order.

Continue reading

The trolley problem, and what you should do if I'm on the tracks

Originally published in French and crossposted on LessWrong. Translation by Épiphanie.

Trigger warning: Death, suicide, and murder. Trolley problem.

This is quite the conventional and ethical conundrum: You are near train tracks, and a train is rolling down the hill. It is going to run over 4 people who are tied to the rails of the main track. However, you can change the train's direction to a secondary track by pulling a lever; so that it runs over only one guy, also tied down the rails. Should you pull the lever?

I do believe there is a more interesting way to frame it: What would you choose if you are yourself tied to the rails, alone, while the train is not heading toward you yet. My own answer is very simple: I want the person deciding where the train should go to have _no doubts_ they should pull the lever! Because, for lack of context, I assume that the other four people are just me, or rather copy of mes. That's a bit simplistic, of course they are not perfect clone. But as far as concrete predicates go, they are indistinguishable. That is to say I have odds of being on tracks alone of 1 in 5, and odds for being in the group of 4 in 5. And tell you what, I prefer dying with 20% probability because of what someone did, rather than to die with 80% probability because no one was ever willing to take the burden of responsibility.

Continue reading

Page top