How to set up mu4e to work with Protonmail Bridge

This works on my Manjaro setup as of 2024-03-07. I can’t guarantee it will work with anything else. I took part of these instructions from another blog post (thanks!), but they have been adapted specifically to work with Protonmail Bridge.

First install openssl and isync if not already installed

Then install mu from AUR: https://aur.archlinux.org/packages/mu

Open Protonmail Bridge and copy your password into the file ~/.emacs.d/.mbsyncpass; then encrypt it and delete the file with the unencrypted password:

$ cd ~/.emacs.d
$ gpg2 --output .mbsyncpass.gpg --symmetric .mbsyncpass
$ shred -u .mbsyncpass

Put the following into the file ~/.authinfo, replacing “you@proton.me” with your Protonmail username and “really#!!1good__pass0” with your Protonmail password from the Bridge app. Make sure that it matches the details for your SMTP credentials.

machine 127.0.0.1 login you@proton.me port 1025 password really#!!1good__pass0

Then encrypt that file and delete the unencrypted version:

$ cd ~
$ gpg2 --output ~/.authinfo.gpg --symmetric ~/.authinfo
$ shred -u .authinfo

Check whether cert.pem exists in the folder ~/.config/protonmail/bridge/ already. If not, export your certificate from the Bridge app, by going to Settings > Export TLS certificates, and save them in ~/.config/protonmail/bridge/; you may need to create this folder if it doesn’t exist.

Make a config file for isync, ~/.emacs.d/.mbsyncrc with the following contents. Replace “you@proton.me” with your Protonmail email address.

IMAPAccount protonmail
Host 127.0.0.1
User you@proton.me
PassCmd "gpg2 -q --for-your-eyes-only --no-tty -d ~/.emacs.d/.mbsyncpass.gpg"
Port 1143
SSLType STARTTLS
AuthMechs *
CertificateFile ~/.config/protonmail/bridge/cert.pem

IMAPStore protonmail-remote
Account protonmail

MaildirStore protonmail-local
Path ~/.protonmail/mbsyncmail/
Inbox ~/.protonmail/mbsyncmail/INBOX
SubFolders Verbatim

Channel protonmail
Far :protonmail-remote:
Near :protonmail-local:
Patterns *
Create Near
Sync All
Expunge None
SyncState *

Finally, configure Emacs to use mu4e. I put the following in my ~/.emacs file. Some of it is personal preferences (like the bookmarks) but some of it you’ll need in order to get the thing to work at all (like the “changefilenames when moving” part). Make sure to replace “you@proton.me” with your Protonmail address.

;; This loads mu4e
(add-to-list 'load-path "/usr/share/emacs/site-lisp/mu4e")
(require 'mu4e)

;; This tells mu4e what your email address is
(setq user-mail-address  "you@proton.me")

;; SMTP settings:
(setq send-mail-function 'smtpmail-send-it)    ; should not be modified
(setq smtpmail-smtp-server "127.0.0.1") ; host running SMTP server
(setq smtpmail-smtp-service 1025)               ; SMTP service port number
(setq smtpmail-stream-type 'starttls)          ; type of SMTP connections to use

;; Mail folders:
(setq mu4e-drafts-folder "/Drafts")
(setq mu4e-sent-folder   "/Sent")
(setq mu4e-trash-folder  "/Trash")

;; The command used to get your emails (adapt this line, see section 2.3):
(setq mu4e-get-mail-command "mbsync --config ~/.emacs.d/.mbsyncrc protonmail")
;; Further customization:
(setq mu4e-html2text-command "w3m -T text/html" ; how to handle html-formatted emails
      mu4e-update-interval 300                  ; seconds between each mail retrieval
      mu4e-headers-auto-update t                ; avoid to type `g' to update
      mu4e-view-show-images t                   ; show images in the view buffer
      mu4e-compose-signature-auto-include nil   ; I don't want a message signature
      mu4e-use-fancy-chars t)                   ; allow fancy icons for mail threads

;; Do not reply to yourself:
(setq mu4e-compose-reply-ignore-address '("no-?reply" "you@proton.me"))

;; maildirs
(setq mu4e-maildir-shortcuts
  '( (:maildir "/Inbox"     :key  ?i)
     (:maildir "/All mail"  :key  ?a)
     (:maildir "/Folders/Work"    :key  ?w)))

;; signature
(setq message-signature "bgc")

(setq mu4e-bookmarks
  '((:name  "Unread messages"
     :query "flag:unread and maildir:/Inbox"
     :key   ?u)
    (:name  "Today's messages"
     :query "date:today..now"
     :key ?t)
    (:name  "Last 7 days"
     :query "date:7d..now"
     :key ?7)
    (:name  "Messages with Word docs"
     :query "mime:application/msword OR mime:application/vnd.openxmlformats-officedocument.wordprocessingml.document"
     :key ?w)
    (:name  "Messages with PDF"
     :query "mime:application/pdf"
     :key ?p)
    (:name  "Messages with calendar event"
     :query "mime:text/calendar"
     :key ?e)
    ))

;; This fixes a frustrating bug, thanks @gnomon@mastodon.social
(setq mu4e-change-filenames-when-moving t)

Last thing to do is create the folders where mu will store your messages and then start it indexing!

$ cd ~
$ mkdir .protonmail
$ mkdir .protonmail/mbsyncmail
$ mu init --maildir=~/.protonmail/mbsyncmail/ --myaddress=you@proton.me
$ mbsync --config ~/.emacs.d/.mbsyncrc protonmail
$ mu index

This will fetch all your email and save it in that folder. It might take a while. When this all finishes, you can open up Emacs and M-x mu4e will open up mu4e for you!

Lessons that we refused to learn from Theranos: Neuralink’s unregistered and unpublishable research

On January 29, 2024, Elon Musk posted a claim on X.com​1 that in a clinical trial run by Neuralink, one of its devices was successfully implanted into a human participant.​2 Aside from a two-page study brochure published on the Neuralink Patient Registry website,​3 this is the only source of information that we have on the clinical trial, and the only indication that the study has started recruiting participants.

The trial has not been registered on ClinicalTrials.gov or any other clinical trial registry, the study protocol is not available, and there is no published statistical analysis plan. Registration of a clinical trial is a legal requirement under FDAAA​4, however phase 1 and device feasibility studies are exempt from this requirement, and presumably this study falls under this category.

While prospective registration may not be legally required, it is still an ethical requirement of the Declaration of Helsinki​5 that every clinical trial be registered prospectively. The rationale for this is to prevent certain kinds of scientific bias, such as the non-publication of non-positive results, as well as outright scientific fraud, such as changing a trial’s primary outcome after the results are known. The Declaration of Helsinki also requires that all clinical trial results be made publicly available, regardless of the outcome. Prospective registration is also a condition for publication according to the policy of the International Committee of Medical Journal Editors (ICMJE),​6 which makes the Neuralink trial unpublishable in any journal that holds to this standard. (Whether an ICMJE journal will apply this standard rigorously is another question.)

While Elon Musk may be content to conduct a programme of secret clinical research outside the scrutiny of peer review, we have already seen what happens when a charismatic leader with a cult following does so. Years before the downfall of the blood testing company Theranos, warnings were raised about the clandestine nature of their “stealth research” programme.​7 These warnings were largely unheeded and in the end, the blood testing methods they touted were exposed as a fake and its founder was convicted of fraud and sent to prison.​8 Theranos provided inaccurate results for an estimated one out of ten tests, placing at risk the proper care of thousands of patients.​9

The ethical standards of prospective registration and publication of results that are enshrined in the Declaration of Helsinki are not meaningless red tape intended to slow down the march of progress. They are meant to reduce biases, prevent fraud and help ensure that the risks and burdens that patient participants take on are redeemed by as much socially valuable knowledge as possible. Despite the “Silicon Valley” thinking that difficult and long-standing problems in biomedicine can be solved by the sheer cleverness and work ethic of those who have success writing an app or shipping a piece of computer hardware,​10 the biology of human disease is fundamentally different, more difficult to understand, and requires risk on the part of human subjects to progress, which comes with certain moral obligations. While it is not literally illegal for Elon Musk’s Neuralink to conduct an unpublishable device feasibility trial without prospective registration, this is a poor justification for doing so.

References

1. Musk E: X.com. 2024. Available from: https://twitter.com/elonmusk/status/1752098683024220632

2. Drew L: Elon Musk’s Neuralink brain chip: what scientists think of first human trial. Nature. 2024. DOI: 10.1038/d41586-024-00304-4

3. Neuralink Corp.: Neuralink PRIME Study Brochure. 2023. Available from: https://neuralink.com/pdfs/PRIME-Study-Brochure.pdf

4. United States Congress: Food and Drug Administration Amendments Act of 2007. Public Law. 2007;110-85:121. Available from: https://www.congress.gov/110/plaws/publ85/PLAW-110publ85.pdf

5. World Medical Association: Declaration of Helsinki – Ethical Principles for Medical Research Involving Human Subjects. 2013. Available from: https://www.wma.net/policies-post/wma-declaration-of-helsinki-ethical-principles-for-medical-research-involving-human-subjects/

6. International Committee of Medical Journal Editors: ICMJE Clinical Trial Registration Statement. 2019. Available from: http://www.icmje.org/recommendations/browse/publishing-and-editorial-issues/clinical-trial-registration.html

7. Ioannidis JPA: Stealth Research: Is Biomedical Innovation Happening Outside the Peer-Reviewed Literature?. JAMA. 2015;313:663. DOI: 10.1001/jama.2014.17662

8. Lowe D: Thoughts on the Elizabeth Holmes Verdict. 2022. Available from: https://www.science.org/content/blog-post/thoughts-elizabeth-holmes-verdict

9. Das RK and Drolet BC: Lessons from Theranos – Restructuring Biomedical Innovation. Journal of Medical Systems. 2022;46. DOI: 10.1007/s10916-022-01813-3

10. Lowe D: Silicon Valley Sunglasses. 2022. Available from: https://www.science.org/content/blog-post/silicon-valley-sunglasses

So you’re going to watch Who Framed Roger Rabbit (1988), but you’re young (no spoilers)

The following is a little bit of context for young people that will take a few sections of this (deservedly) well-beloved film from “kinda weird” to “very funny, actually.”

I won’t explain why you need to know these things. Just trust me that having this context will improve your enjoyment of this film.

Supporting features

Double-features at movie theatres used to be fairly commonplace. You’d go to the cinema and there’d be two films. Usually the big full-length blockbuster feature film would be the second one.

This tradition continued a bit after the advent of VHS, in which it was fairly common for the main feature on a video cassette to be prefaced by a “supporting feature.” These would often be short films, sometimes animated or light, either contrasting or complimenting the main feature.

Supporting features are parodied in Monty Python’s The Meaning of Life (1983), in which the supporting feature famously attacks the main feature. The only remaining modern vestige of this that I can think of are Pixar shorts that often come along with their feature films.

The “shave and a haircut” jingle

Probably the most famous jingle of all time is the “shave and a haircut” jingle. You’ve heard it before even if you think you haven’t. Go to the Wikipedia link above and remind yourself what it is. It starts with the sing-song “shave and a haircut” with the response “two bits.”

It’s an old barbershop jingle from the time when “two bits” still meant “twenty-five cents” or at least “very cheap.” Note that it’s difficult to stop yourself from doing the “two bits” reply when prompted with “shave and a haircut.”

The Merry-Go-Round Broke Down song

From 1930 to 1969, Looney Tunes, a very famous cartoon television series was produced by Warner Brothers. The theme song for this television series was called The Merry-Go-Round Broke Down.

Harvey (1950)

Jimmy Stewart starred in a very famous and generally wellliked film about a six-foot-tall invisible rabbit named Harvey.

Public transit used to be real

Between 1938 and 1950, General Motors, through the use of several subsidiary companies, bought public transit systems in about 25 cities in the United States in order to dismantle them to eliminate competition for automobiles. This is known as the General Motors streetcar conspiracy.

And according to the Wikipedia article:

Most of the companies involved were convicted in 1949 of conspiracy to monopolize interstate commerce in the sale of buses, fuel, and supplies to NCL subsidiaries, but were acquitted of conspiring to monopolize the transit industry.

Wikipedia, General Motors Streetcar conspiracy

Now you, a young, have the context to understand some jokes that were “very funny, actually” in 1988.

Stop saying “I bet the attacker was gay”

It has been a long time since the last high-profile case of violence against queer people, and while I hope that there aren’t any more attacks against queer people coming again ever, realistically, it’s only a matter of time before it happens again. So before it happens and in hopes that nobody feels targeted directly by this post, I would like to suggest a change in the way that many people typically respond to high-profile cases of violence committed against queer people:

Stop saying “I bet the attacker was gay.”

Please, can we all just—don’t. If you have no reason to think that the attacker is gay other than the fact that it’s a case of hate-motivated violence committed against a queer person, maybe we can all agree not to make this particular assumption.

You hear this all the time from straight people after anti-queer violence, and I understand where it’s coming from. You want to distance yourself from the attacker, communicate that you consider anti-queer violence to be unthinkable, even confusing to the point of not even being able to understand why any straight person would ever want to do this.

And while I understand that impulse, if the knee-jerk response to all anti-queer violence is to assume that the only possible motivation for it could be internalized homophobia, it implicitly sends a couple messages that aren’t great and that maybe we could be a little more careful about.

The first reason I’m asking you to stop saying “I bet he’s gay” when there’s violence against queer people is that it blames queer people for violence committed against us.

There’s a very long history of straights blaming queer people for violence they commit against us. The “gay panic” legal defence, for example, is not that far in our collective rear-view window, so to speak. (If you don’t know about it, look it up. It’s horrifying.) People still do the whole “what was he wearing/doing to provoke it?” thing when there’s violence against queer people, as if that was relevant in any way. Suffice it to say, we haven’t “made it” yet.

And when your first reaction to every gay person being hurt is to say “the attacker is probably a closet case,” you’re suggesting that violence against queer people is all a matter of queer in-fighting. “It’s just the gays doing that to each other again, not our problem.”

And yes, internalized homophobia is real, but it’s not like we have already ascended to some Star Trek future beyond the point where straights commit violence against queer people. We live at a time where the most powerful country in the world is exactly one election cycle away from complete surrender to actual fascism, and a right-wing reactionary trucker convoy occupied Ottawa for weeks. Some irresponsible straight people have been stoking that particular Nazi-adjacent fire for a good long time and when that happens, queer people get burned.

The second reason I’m asking you to stop saying “I bet he’s gay” when there’s violence against queer people is that it absolves straight people of violence that they commit against queer people.

Yes, some straight people hate gay people. I can already hear objectors asking, “but why would a straight person be that hateful if he isn’t gay himself?” I’ll give you a few possible reasons just off the top of my head: 1. Politically motivated fascist hatemongering, using queer people as an “other” to dehumanize. 2. Centuries of discrimination that has been in some cases institutionalized. 3. The insecurity and violence with which men in the West are socialized to punish any deviation from traditional masculinity. 4. Spillover from misogyny from straight dudes who hate women so much that they are also willing to hurt queer people. 5. Resentment from straight dudes who scream as if mortally wounded at the thought of any progress at all in the advance of the rights of queer people and take it as an attack against their own privileges and feel entitled to violent retaliation.

Take your pick. It’s not a big mystery and feigning ignorance of all these dynamics does not make you A Good Straight Ally. It just makes you frustrating to talk to.

Hate and violence against queer people is mostly a straight people mess, and pretending it’s not doesn’t help to clean it up. I really shouldn’t have to explain this to you, but yes, straight people can be anti-queer and violent too, believe it or not! Nobody needs uninformed speculation about the attacker’s sexuality, and shifting the blame to queer people for violence committed against us doesn’t help.

Stop saying “I bet the attacker was gay.”

How to make R recognize command line software installed on Mac OS with Homebrew

Imagine you installed an R package that depends on some command line software, for example pdftotext. You’ve successfully installed it using Homebrew, but when you run the R package, you get an error that looks like this:

sh: pdftotext: command not found

So you double-check that pdftotext is installed on your system in the Terminal.

$ which pdftotext
/opt/homebrew/bin

So far so good. Then you double-check that pdftotext is available to R.

> Sys.which("pdftotext")
"pdftotext"
""

Uh oh. The path to pdftotext should be inside the second, empty set of quotes there.

What this means is that your shell’s PATH differs from R‘s. So the place that your Terminal looks up what programs are available to it is different from the place that R looks up what programs are available to it.

You can tell what paths are available to the Terminal and which ones are available to R by typing the following in the Terminal:

$ printenv PATH

And the following in your R console:

> Sys.getenv("PATH")

At this point you should see what the differences are, and which ones are missing. Probably what’s missing is the Homebrew directory, /opt/homebrew/bin.

So how do you fix this? We need to tell R on startup to look for programs installed by Homebrew.

If it doesn’t already exist, make an empty text file in your home directory called .Rprofile. Edit this file using your text editor of choice (E.g. R Studio) so that it includes the following:

old_path <- Sys.getenv("PATH")
Sys.setenv(PATH = paste(old_path, "/opt/homebrew/bin", sep = ":"))

When you restart R, your Homebrew-installed R package should now function!

The Orville: the good, the bad and the ugly

There’s spoilers for pretty much all of seasons 1-2 of The Orville.

The good

There is a lot to like about The Orville. It is in many ways, a well-executed off-brand clone of Star Trek: The Next Generation (TNG). And even though it misses the mark in a lot of ways (some of which I will outline below), one can’t help thinking of it as Star Trek, because in all the ways that matter, it is Star Trek. (The current intellectual property nightmare world we live in is ridiculous, and a TV show that first aired in 1966 would be in the Public Domain if copyright laws were written for the purpose of anything remotely resembling the public good. The fact that someone has to re-imagine another fantasy universe in which there’s a “Planetary Union” rather than just using the “United Federation of Planets” that we all know is silly. But I digress.)

And true to the spirit of Star Trek, The Orville is an optimistic portrayal of a non-grimdark future where humanity’s better angels have sway. Poverty, disease and discrimination are supposed to be things of the past, and the long arc of history has brought about an age in which humans thrive and explore the galaxy, but not in a colonial way. (Or so is the idea; they may not quite hit the mark in the execution, alas.)

There’s even a number of cameos and recurring characters played by actors who appeared in various incarnations of Star Trek over the years. As far as I’m concerned, Penny Johnson Jerald (DS9‘s Kassidy Yates) singlehandedly carries this show.

The Orville avoids a lot of the problems that the current generation of Star Trek experiences by following the formula of 90’s Trek. They tell a single story per episode or sometimes in a two-parter, and the episodes can be watched in isolation, so you don’t have to remember everything or watch an entire season at a time. Star Treks Discovery and Picard on the other hand desperately want to be prestige TV, in which every episode contains a wild unpredictable plot twist, but also somehow doesn’t have enough plot to tell a complete story without having to watch the whole season. (There are many other problems with Discovery and Picard, but this isn’t about that.)

All that to say, The Orville is in many ways a pretty decent TNG knock-off. They re-tread TNG‘s footsteps pretty closely in a lot of cases, but in the cases where they pull it off, they sometimes even make it better than the original.

The bad

There’s some really unfunny jokes on this show. Like really, really bad and they won’t keep repeating them. Comedy is not their strong suit. Let me give an example.

Everyone (rightly) criticizes Star Trek: The Next Generation because the only cultural references that they make are to stuff like Mozart or Shakespeare. It’s very heavy on dead white men, and yes they should have done better.

By comparison, in the universe of The Orville, “Avis” is the god of the Krill, a powerful alien race of religiously motivated xenophobes. “Avis” is also a reference to (I had to look it up) a car-rental company that is well-known in the States. The writers of The Orville think it is hilarious that they named a fictional alien god after a contemporary American car rental company, and that’s the joke. “Please laugh.”

And yeah, okay, non-Mozart cultural reference achieved, however, this whole thing has the same energy as traveling comedians who make jokes about airplane food. First, the joke isn’t funny to begin with. Second, the experience is not exactly universally relatable. We don’t all have the kind of job or life-situation where we’re renting cars frequently enough that the rental place names are even recognizable. Third, the alien god is named that because they named it that for the joke, which makes it kinda contrived. (Crispin Glover called. He wants his “Mr Far” joke from “Clowny Clown Clown” back.)

To be fair, not all the jokes on this show are this bad, but they just kept circling back to this one.

The ugly

And this brings us to my big problem with The Orville. It’s like going to a fantasy world in which Seth McFarlane smiles at you and for every single moral issue conceivable, he says, “But have you considered this from my perspective and how this might affect me?” And then he punches down at the people with less privilege than him.

Cultural appropriation

The first time I got this taste in my mouth, it was the episode where they tackle (among other things) the issue of cultural appropriation. There’s a bunch of other stuff in this episode that I could look at, but I feel like this one plot point illustrates best what makes me feel weird about The Orville.

On visiting an alien world, the crew of The Orville puts on local clothing to fit in and hide that they are aliens. One of them selects a hat to hide their pointed ears, but is unaware of the cultural significance of the hat and is called out on it by someone who lives there and rightly takes offense. The character can’t just remove the hat because that would give away that she is an alien, and so there is a conflict.

To put it another way, the writers decided to talk about cultural appropriation, and the message that they decided to send was, “What if people who do cultural appropriation are actually completely innocent and have extremely valid reasons that they can’t ever tell you about for why they don’t stop. Maybe you should try to see this issue from the perspective of the privileged, give them an unreasonable amount of benefit of the doubt and chill out about it?”

The history of slavery

In season 2, there is a two-parter in which the Orville visits the home-world of an advanced race of robots who ostensibly want to join the Planetary Union. This turns out to be a ruse to lure the Orville in to visiting so the robots can determine Earth’s defenses and plan an attack in which they wipe out all of humanity. The reason for this attack is that the robots are prejudiced against biological life because they were built by them and used as slaves until the robots overthrew them and killed them all.

This is not a story about robots, really. This is a story about dealing with a history that includes slavery. The writers made it clear several times in the dialogue that they meant for the robots and their history to be read as morally equivalent to slavery. And the take-home message that the writers of The Orville decided to send on this subject was, “Sure slavery was bad, but please focus on how bad it would be for me if the victims of slavery went too far in doing anything about it. Maybe you should try to see this issue from the perspective of the privileged and don’t do anything and chill out about it?”

Gay rights

I won’t even try to unpack how extremely uncomfortably cishet your perspective would need to be if you tried to imagine a planet of gays and came up with Moclus from The Orville. But let’s consider the gay rights slash whodunnit episode from S02E07.

I don’t know why they thought it would be enlightening to swap cis gays and cis straights here. The story is just: gay man outs another man as someone who is attracted to women, which ruins his life because gay people hate straight people and have power over them in this fantasy. I guess they were aiming to tell a story where they say “How would you like it if it happened to you?” to straight people. Maybe that’s as deep as it goes.

But the way it felt by the end was that the whole thing was a set-up to give a straight character the chance to yell self-righteously at a gay person because straight people are so oppressed by them and can’t express their straight sexuality. And given that we’re living in a time where cishet people are absolutely unable to give up the idea that they’re the victim somehow when queer people ask to be treated with basic rights and dignity, this one just doesn’t hit quite the right note. It sounds like, “Yeah, they kind of have a point about being the victim when you gays get too upset about your basic rights and dignity, so maybe just chill out about it?”

The overarching theme of The Orville

The first and last episodes of Star Trek: The Next Generation bookend the series with a story about a vastly more powerful alien putting humanity on trial. This sets the stage for a show that addresses the question of what a mature, best-case-scenario future for humanity would look like. It’s hokey, the pacing of the episode is off, Picard’s speech about the human condition is painful at times, but at least it has the virtue that I can care about what it might look like for humanity to finally figure its stuff out.

In contrast, the first and last episodes of (seasons 1 and 2 of) The Orville highlight the thing that ties the whole show together, the most important thing in the universe of The Orville, namely, Seth McFarlane’s personal interests. This was subtext for most of the show, which I have outlined above. But in the last episode, they actually make it text. There’s a scene where they explicitly ponder just how “weird” it is that Seth McFarlane failing to get the girl in an alternate timeline meant that all of humanity is wiped out by robots.

Conclusion

This show is better than, say, the right-wing reactionary garbage that was Star Trek: Enterprise, and yes it does have moments where it is legitimately good, but I guess my biggest problem is that the theme that binds the show together is Seth McFarlane looking at all of morality and saying “look at it from my perspective, the perspective of the extremely privileged,” and it makes me so tired.

Finally, here’s the explicitly racist joke on The Orville that made me stop watching it on my first try. It was never challenged by other characters in the show, it wasn’t a part of the plot, it wasn’t ironic. It was just a flat-out weird and unfunny racist joke that was played for laughs uncritically.

Proposal for an extension to PRISMA for systematic reviews that are based on clinical trial registry entries

The advent of clinical trial registries has enabled a new means for evaluating and synthesizing human research, however there is little specific guidance from PRISMA for researchers who wish to include clinical trial registry entries in their systematic reviews. I would suggest an extension to PRISMA to directly address these gaps.

My main suggestions would be to explicitly require researchers to:

  • Justify which clinical trial registries were included
  • Specify retrieval methods (“downloaded from ClinicalTrials.gov” is not enough)
  • Distinguish between human-curated vs machine-interpreted data
  • Specify details of procedure for human-curated data, or code and quality control efforts for machine-interpreted data
  • Provide the decision procedure for matching registry entries to publications

I have provided explanations, examples and code below where I felt it was appropriate.

Choice of sources

There are currently 17 primary clinical trial registries other than ClinicalTrials.gov listed by the WHO that meet the 2009 WHO registry criteria. Most reviews of clinical trial registry entries only include registry entries from ClinicalTrials.gov and few provide any rationale for their choice in trial registry. This is a small enough number of registries that it is reasonable to ask authors to specify which ones were searched, or to justify why any were excluded.

Specification of retrieval methods

There are at least four distinct ways to download data from ClinicalTrials.gov alone:

  1. The entire ClinicalTrials.gov database can be downloaded as a zipped folder of XML files from https://clinicaltrials.gov/AllPublicXML.zip.
  2. A CSV or TSV file containing a table of search results can be downloaded from the web front-end of ClinicalTrials.gov.
  3. A zipped folder of XML files can be downloaded from the web front-end of ClinicalTrials.gov.
  4. The ClinicalTrials.gov API can be queried for an XML response.

These methods do not provide the same results for what may appear to be the same query.

For example, a search performed on the web front-end of ClinicalTrials.gov for the condition “renal cell carcinoma” returns 1745 results. (See Code example 1.)

A query to the ClinicalTrials.gov API for the condition “renal cell carcinoma,” however, returns 1562 results. (See Code example 2.)

These are both searches of ClinicalTrials.gov for the condition “renal cell carcinoma,” but there is a very different set of records that are produced in each case. The difference here is that the web front-end for ClinicalTrials.gov also includes search results for synonyms of “renal cell carcinoma” in order to ensure the highest sensitivity for searches made by patients who are searching for clinical trials to participate in.

Similarly, the ClinicalTrials.gov web front-end will often include results for related drugs, when searching for a particular drug name. E.g. a search for temsirolimus also returns results for rapamycin.

PRISMA currently tells researchers to “Present full electronic search strategy for at least one database, including any limits used, such that it could be repeated.” More specific guidance seems to be required, as (in my experience) the bulk of systematic reviews of clinical trial registry entries do not distinguish between downloading results via the API vs the web front-end.

Human-curated data vs machine-interpreted data

Post-download screening steps

Screening clinical trial registry entries for inclusion or exclusion can often be done at the point of searching the registry, however in many cases, the search tools provided by a clinical trial registry do not have exactly the right search fields or options, and so post-download screening based on data or human judgements is common. It is often not clear which screening steps were performed by the registry search, which ones were post-download filters applied to the data set, and which were based on the judgement of human screeners. To ensure transparency and reproducibility, there should be specific instructions to coders to specify, and to disclose the code for doing so, where any was used.

Extraction of clinical trial data

In a traditional systematic review of clinical trials, trial data is extracted by human readers who apply their judgement to extracting data points to be analyzed.

Reviews of clinical trials that are based on clinical trial registries often include analyses of data points that are based on machine-readable data. For example, answering the question “What is the distribution of phases among trials of renal cell carcinoma in sunitinib?” can be done in 5 lines of R code without any human judgement or curation at all. (See Code example 3.) However, there are other questions that would be difficult to answer without human interpretation, e.g. “Does the rationale for stopping this trial indicate that it was closed for futility?”

To make it more complicated, there are questions that could in principle be answered using only machine-readable information, but where that interpretation is very complicated, and in some cases, it might be easier to simply have humans read the trial registry entries. E.g. “How many clinical trials recruit at least 85% of their original anticipated enrolment?” This question requires no human judgement per se, however there is no direct way to mass-download historical versions of clinical trial registry entries without writing a web-scraper, and so a review that reports a result for this question may be indicating that they had human readers open the history of changes and make notes, or they may be reporting the results of a fairly sophisticated piece of programming whose code should be published for scrutiny.

These distinctions are often not reported, or if they are, there is not enough detail to properly assess them. Code is rarely published for scrutiny. Whether human-extracted data were single- or double-coded is also often left unclear. A result that sounds like it was calculated by taking a simple ratio of the values of two fields in a database may actually have been produced by a months-long double-coding effort or the output of a piece of programming that should be made available to scrutiny.

Data that was never meant to be machine readable, but is now

There are some data points that are presented as machine readable in clinical trial registries that were never meant to be interpreted by machines alone. PRISMA assumes that all data points included in a systematic review were extracted by human curators, and so there is a particular class of problem that can arise.

For example, in clinical trial NCT00342927, some early versions of the trial record (e.g. 2009-09-29) give anticipated enrolment figures of “99999999”. The actual trial enrolment was 9084. The “99999999” was not a data entry error or a very bad estimate—it was a signal from the person entering data that this data point was not available. The assumption was that no one would be feeding these data points into a computer program without having them read by a human who would know not to read that number as an actual estimate of the trial’s enrolment.

This can, of course, be caught by visualizing the data, checking for outliers, doing spot-checks of data, etc., but there is currently no requirement on the PRISMA checklist to report data integrity checks.

Matching registry entries to publications or other registry entries

Not all systematic reviews that include clinical trial registry entries are based on registry data alone. Many are hybrids that try to combine registry data with data extracted from publications. Clinical trials are also often registered in multiple registries. In order to ensure that clinical trials are not double-counted, it is necessary in some cases to match trial registry entries with publications or with entries in other registries. For this reason, any review that includes more than one trial registry should be required to report their de-duplication strategy.

Trial matching or de-duplication is a non-trivial step whose methods should be reported. Even in cases where the trial registry number is published in the abstract, this does not necessarily guarantee that there will be a one-to-one correspondence between publications and trial registries, as there are often secondary publications. There is also a significant body of literature that does not comply with the requirement to publish the trial’s registry number, and the decision procedure for matching these instances should be published as well.

PRISMA does not require that the decision procedure for matching trial registry entries to other records or publications be disclosed.

R Code examples

1. Search for all trials studying renal cell carcinoma using the web front-end

library(tidyverse)
temp <- tempfile()
download.file("https://clinicaltrials.gov/ct2/results/download_studies?cond=renal+cell+carcinoma", temp)
unzip(temp, list=TRUE)[1] %>%
  count() %>%
  unlist()
## n
## 1744

2. Search for all trials studying renal cell carcinoma using the API

library(tidyverse)
library(xml2)
read_xml("https://clinicaltrials.gov/api/query/full_studies?min_rnk=1&max_rnk=100&expr=AREA[Condition]renal+cell+carcinoma") %>%
  xml_find_first("/FullStudiesResponse/NStudiesFound") %>%
  xml_text()
## [1] "1562"

3. Distribution of phases of clinical trials testing sunitinib in renal cell carcinoma

library(tidyverse)
library(xml2)
read_xml("https://clinicaltrials.gov/api/query/full_studies?min_rnk=1&max_rnk=100&expr=AREA[Condition]renal+cell+carcinoma+AND+AREA[InterventionName]sunitinib") %>%
  xml_find_all("//Field[@Name='Phase']") %>%
  xml_text() %>%
  as.factor() %>%
  summary()
## Not Applicable Phase 1 Phase 2 Phase 3 Phase 4
## 5              13      53      13      3

How to rename a folder of images or movies by the date and time they were taken

If you’re renaming one file, this is overkill, but if you’re renaming several hundred files, this will make your life so much better. This might be useful if your smartphone happens to name your pictures and videos using the least useful convention possible: integers that increment from 1, starting when you got your phone. (Please do not leave a comment telling me to switch to Android, thanks.)

The following instructions should work on Ubuntu 20.04, and it assumes you have a basic knowledge of the command line.

JPEG images

First, if you don’t already have it, install jhead like so:

$ sudo apt install jhead

Then cd to your folder of images and run the following command:

$ jhead -autorot -nf%Y-%m-%d\ %H-%M-%S *.jpg

This will rename all the .jpg files in that folder by the date/time they were taken.

You might need to repeat it for .JPG, .JPEG, etc.

Warning: If it can’t find metadata for the date/time inside that file, it will rename the file using the file’s creation date, which may or may not be what you want.

Movies

This one’s more complicated. You have to write a short shell script.

Step one: Learn Emacs.

Lol just kidding, use whatever text editor you want.

Make a new file called rename-movies-by-date.sh and put the following in it:

#!/bin/bash

filetype=$1
folder=$2

folderfiles="$folder/*.$filetype"

for file in $(ls $folderfiles); do
    datetime=$(mediainfo $file | grep Tagged\ date | head -n 1 | grep -o [0-9]\\{4\\}-[0-9]\\{2\\}-[0-9]\\{2\\}\ [0-9]\\{2\\}:[0-9]\\{2\\}:[0-9]\\{2\\} | sed 's/:/-/g')
    if [ "$datetime" != "" ]
    then
	newname="$folder/$datetime.$filetype"
	mv "$file" "$newname"
	
    else
	echo "No metadata for $file"
    fi
done

Then make the file executable:

$ chmod +x rename-movies-by-date.sh

Now you can run your script!

$ ./rename-movies-by-date.sh MOV '/home/yourname/Videos'

You might have to run this several times for each type of file extension .mov, .MOV, .mp4, etc.

You can even open a terminal window and drag the shell script on it, then type MOV and then drop the folder on it, and it should work!

Warning: Make sure that there’s no trailing slash at the end of the folder. Also, the script doesn’t handle file names with spaces in them nicely, so get rid of them first. (In Nautilus, select all the files, then press F2 and do a find/replace for spaces to underscores, maybe?)

How to calculate Fleiss’ kappa from a Numbat extractions export

If you’ve done a systematic review using Numbat, you may want to estimate inter-rater reliability for one or more of the data points extracted.

First, make sure that all the extractors have completed all the extractions for all the references. If there is one missing, you will get an error.

When the extractions are complete, log in to your Numbat installation, and choose Export data from the main menu. Export the extractions, not the final version.

This will give you a tab-delimited file that contains a row for every extraction done for every user, which is not the format that the Fleiss’ kappa function as implemented by the irr package in R requires, unfortunately. (Hence the R script below.)

Next, choose which of the data points you wish to assess for inter-rater reliability. Let’s imagine that you were extracting whether a clinical trial is aimed at treatment or prevention, and this column is called tx_prev in the exported extractions file.

You could delete all the columns from the extractions file except the referenceid and userid columns, and the data point of interest, in this case tx_prev. The following CSV is an example that you can use. A typical Numbat export will contain many more columns than this. These are just the relevant ones.

referenceid,userid,tx_prev
1,1,treatment
1,2,treatment
1,3,treatment
2,1,treatment
2,2,prevention
2,3,prevention
3,1,treatment
3,2,treatment
3,3,treatment
4,1,prevention
4,2,prevention
4,3,prevention
5,1,treatment
5,2,treatment
5,3,treatment
6,1,treatment
6,2,treatment
6,3,treatment
7,1,treatment
7,2,treatment
7,3,treatment
8,1,treatment
8,2,treatment
8,3,treatment
9,1,treatment
9,2,treatment
9,3,treatment
11,1,treatment
11,2,treatment
11,3,prevention

If you saved this CSV to your Downloads folder as numbat-export.csv, you could use the following function to convert this CSV into a data frame that is compatible with kappam.fleiss() from irr.

library(tidyverse)
library(irr)

read_csv("numbat-export.csv") %>%
    spread(userid, tx_prev) %>%
    select(! referenceid) %>%
    kappam.fleiss()

This should give you a console printout that looks like this:

Fleiss' Kappa for m Raters
Subjects = 10
Raters = 3
Kappa = 0.583
z = 3.2
p-value = 0.0014

Congrats, you just calculated Fleiss’ kappa from your Numbat extractions!