How to get R to parse the <study_design> field from XML files

by helpfully provides a facility for downloading machine-readable XML files of its data. Here’s an example of a zipped file of 10 XML files.

Unfortunately, a big zipped folder of XML files is not that helpful. Even after parsing a whole bunch of trials into a single data frame in R, there are a few fields that are written in the least useful format ever. For example, the <study_design> field usually looks something like this:

Allocation: Non-Randomized, Endpoint Classification: Safety Study, Intervention Model: Single Group Assignment, Masking: Open Label, Primary Purpose: Treatment

So, I wrote a little R script to help us all out. Do a search on, then save the unzipped search result in a new directory called search_result/ in your ~/Downloads/ folder. The following script will parse through each XML file in that directory, putting each one in a new data frame called “trials”, then it will explode the <study_design> field into individual columns.

So for example, based on the example field above, it would create new columns called “Allocation”, “Endpoint_Classification”, “Intervention_Model”, “Masking”, and “Primary_Purpose”, populated with the corresponding data.

require ("XML")
require ("plyr")

# Change path as necessary
path = "~/Downloads/search_result/"

xml_file_names <- dir(path, pattern = ".xml")

counter <- 1

# Makes data frame by looping through every XML file in the specified directory
for ( xml_file_name in xml_file_names ) {
  xmlfile <- xmlTreeParse(xml_file_name)
  xmltop <- xmlRoot(xmlfile)
  data <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
  if ( counter == 1 ) {
    trials <- data.frame(t(data), row.names = NULL)
  } else {
    newrow <- data.frame(t(data), row.names = NULL)
    trials <- rbind.fill (trials, newrow)
  # This will be good for very large sets of XML files
  print (
      " processed (",
      format(100 * counter / length(xml_file_names), digits = 2),
      "% complete)"
  counter <- counter + 1

# Data frame has been constructed. Comment out the following two loops
# (until the "un-cluttering" part) in the case that you are not interested
# in exploding the <study_design> column.

columns = vector();

for ( stu_des in trials$study_design ) {
  # splits by commas NOT in parentheses
  for (pair in strsplit( stu_des, ", *(?![^()]*\\))", perl=TRUE)) {
    newcol <- substr( pair, 0, regexpr(':', pair) - 1 )
    columns <- c(columns, newcol)

for ( newcol in unique(columns) ) {
  # get rid of spaces and special characters
  newcol <- gsub('([[:punct:]])|\\s+','_', newcol)
  if (newcol != "") {
    # add the new column
    trials[,newcol] <- NA
    i <- 1
    for ( stu_des2 in trials$study_design ) {
      for (pairs in strsplit( stu_des2, ", *(?![^()]*\\))", perl=TRUE)) {
        for (pair in pairs) {
          if ( gsub('([[:punct:]])|\\s+','_', substr( pair, 0, regexpr(':', pair) - 1 )) == newcol ) {
            trials[i, ncol(trials)] <- substr( pair, regexpr(':', pair) + 2, 100000 )
      i <- i+1

# Un-clutter the working environment

remove (i)
remove (counter)
remove (data)
remove (newcol)
remove (newrow)
remove (columns)
remove (pair)
remove (pairs)
remove (stu_des)
remove (stu_des2)
remove (xml_file_name)
remove (xml_file_names)
remove (xmlfile)
remove (xmltop)

# Get nice NCT id's

get_nct_id <- function ( row_id_info ) {
  return (unlist (row_id_info) ["nct_id"])

trials$nct_id <- lapply(trials$id_info, function(x) get_nct_id (x))

# Clean up enrolment field

trials$enrollment[trials$enrollment == "NULL"] <- NA

trials$enrollment <- as.numeric(trials$enrollment)

Useful references:



    title = {How to get R to parse the <study_design> field from XML files},
    journaltitle = {The Grey Literature},
    author = {Benjamin Gregory Carlisle},
    address = {Montreal, Canada},
    date = 2016-10-6,
    url = {}


Carlisle, Benjamin Gregory. "How to get R to parse the <study_design> field from XML files" Web blog post. The Grey Literature. 06 Oct 2016. Web. 22 Feb 2018. <>


Carlisle, Benjamin Gregory. (2016, Oct 06). How to get R to parse the <study_design> field from XML files [Web log post]. Retrieved from

Gotcha! This is why piracy happens



This summer, I took a two-week long course on systematic reviews and meta-analytic techniques for which there was some required software, in this case, Stata. As a McGill student, I was encouraged to buy the student version, which was about $50 for “Stata Small.” Not bad. I’ve paid more for textbooks. So I got out my credit card, bought the license, installed it on my computer, and ran the very first example command of the course. I immediately got a string of red letter error text.

The error message was telling me that my license did not allow me enough variables to complete the command. I checked the license, and it said I was allowed 120 variables. I checked the “Variable manager” in Stata, and I had only assigned 11 variables. (I checked the variable limit beforehand in fact, and made sure that none of the data sets that we’d be working with had more than 120 variables. None of them came close to that limit.)

So I emailed Stata technical support. It turns out that the meta-analysis package for Stata creates “hidden variables.” Lots of them, apparently. So many that the software cannot accomplish the most basic commands. Then they tried to up-sell me to “Stata SE.” For $100 more, they said, they would send me a license for Stata that would allow me to run the meta-analysis package—for realsies this time.

I asked for a refund and decided that if I really needed Stata, I would use the copy that’s installed on the lab computers. (Now I’m just using the meta package in R, which does everything Stata does, just with a bit more effort.)

For the record: I am perfectly fine with paying for good software. I am not okay with a one-time purchase turning me into a money-pump. I thought that the “small” student license would work. All their documentation suggested it would. If I had upgraded to “Stata SE,” would that have actually met my needs, or would they have forced me to upgrade again later, after I’d already made Stata a part of my workflow?

It probably would have been okay, but the “gotcha” after the fact soured me on the prospect of sending them more money, and provided all the incentive I need to find a way to not use Stata.


A few years ago, I bought a number of pieces of classical music through the iTunes Store. I shopped around, compared different performances, and found recordings that I really liked. This was back when the iTunes store had DRM on their music.

I’ve recently switched to Linux, and now much of the music that I legally bought and paid for can’t be read by my computer. Apple does have a solution for me, of course! For about $25, I can subscribe to a service of theirs that will allow me to download a DRM-free version of the music that I already paid for.

This is why I won’t even consider buying television programmes through the iTunes Store: It’s not that I think that I will want to re-watch the shows over and over and I’m afraid of DRM screwing that up for me. It’s because I’ve had some nasty surprises from iTunes in the past, and I can borrow the DVD’s from the Public Library for free.

For the record: I do not mind paying for digital content. But I won’t send you money if I think there’s a “gotcha” coming after the fact.

I’m really trying my best

People who produce good software or music should be compensated for their work. I don’t mind pulling out my wallet to help make that happen. But I don’t want to feel like I’m being tricked, especially if I’m actually making an effort in good faith to actually pay for something.

Since DRM is almost always fairly easily circumvented, it only punishes those who pay for digital content. And this is why I’m sympathetic to those who pirate software, music, TV shows, etc.


    title = {Gotcha! This is why piracy happens},
    journaltitle = {The Grey Literature},
    author = {Benjamin Gregory Carlisle},
    address = {Montreal, Canada},
    date = 2015-05-22,
    url = {}


Carlisle, Benjamin Gregory. "Gotcha! This is why piracy happens" Web blog post. The Grey Literature. 22 May 2015. Web. 22 Feb 2018. <>


Carlisle, Benjamin Gregory. (2015, May 22). Gotcha! This is why piracy happens [Web log post]. Retrieved from

Proof of prespecified endpoints in medical research with the bitcoin blockchain



The gerrymandering of endpoints or analytic strategies in medical research is a serious ethical issue. “Fishing expeditions” for statistically significant relationships among trial data or meta-analytic samples can confound proper inference by statistical multiplicity. This may undermine the validity of research findings, and even threaten a favourable balance of patient risk and benefit in certain clinical trials. “Changing the goalposts” for a clinical trial or a meta-analysis when a desired endpoint is not reached is another troubling example of a potential scientific fraud that is possible when endpoints are not specified in advance.

Pre-specifying endpoints

Choosing endpoints to be measured and analyses to be performed in advance of conducting a study is a hallmark of good research practice. However, if a protocol is published on an author’s own web site, it is trivial for an author to retroactively alter her own “pre-specified” goals to align with the objectives pursued in the final publication. Even a researcher who is acting in good faith may find it less than compelling to tell her readers that endpoints were pre-specified, with only her word as a guarantee.

Advising a researcher to publish her protocol in an independent venue such as a journal or a clinical trial registry in advance of conducting research does not solve this problem, and even creates some new ones. Publishing a methods paper is a lengthy and costly process with no guarantee of success—it may not be possible to find a journal interested in publishing your protocol.

Pre-specifying endpoints in a clinical trial registry may be feasible for clinical trials, but these registries are not open to meta-analytic projects. Further, clinical trial registry entries may be changed, and it is much more difficult (although still possible) to download previous versions of trial registries than it is to retrieve the current one. For example, there is still no way to automate downloading of XML-formatted historical trial data from in the same way that the current version of trial data can be automatically downloaded and processed. Burying clinical trial data in the “history” of a registry is not a difficult task.

Publishing analyses to be performed prior to executing the research itself potentially sets up a researcher to have her project “scooped” by a faster or better-funded rival research group who finds her question interesting.

Using the bitcoin blockchain to prove a document’s existence at a certain time

Bitcoin uses a distributed, permanent, timestamped, public ledger of all transactions (called a “blockchain”) to establish which addresses have been credited with how many bitcoins. The blockchain indirectly provides a method for establishing the existence of a document at particular time that can be independently verified by any interested party, without relying on a medical researcher’s moral character or the authority (or longevity) of a central registry. Even in the case that the NIH’s servers were destroyed by a natural disaster, if there were any full bitcoin nodes left running in the world, the method described below could be used to confirm that a paper’s analytic method was established at the time the authors claim.


  1. Prepare a document containing the protocol, including explicitly pre-specified endpoints and all prospectively planned analyses. I recommend using a non-proprietary document format (e.g. an unformatted text file or a LaTeX source file).
  2. Calculate the document’s SHA256 digest and convert it to a bitcoin private key.
  3. Import this private key into a bitcoin wallet, and send an arbitrary amount of bitcoin to its corresponding public address. After the transaction is complete, I recommend emptying the bitcoin from that address to another address that only you control, as anyone given the document prepared in (1) will have the ability to generate the private key and spend the funds you just sent to it.


The incorporation into the blockchain of the first transaction using the address generated from the SHA256 digest of the document provides an undeniably timestamped record that the research protocol prepared in (1) is at least as old as the transaction in question. Care must be taken not to accidentally modify the protocol after this point, since only an exact copy of the original protocol will generate an identical SHA256 digest. Even the alteration of a single character will make the document fail an authentication test.

To prove a document’s existence at a certain point in time, a researcher need only provide the document in question. Any computer would be able to calculate its SHA256 digest and convert to a private key with its corresponding public address. Anyone can search for transactions on the blockchain that involve this address, and check the date when the transaction happened, proving that the document must have existed at least as early as that date.


This strategy would prevent a researcher from retroactively changing an endpoint or adding / excluding analyses after seeing the results of her study. It is simple, economical, trustless, non-proprietary, independently verifiable, and provides no opportunity for other researchers to steal the methods or goals of a project before its completion.

Unfortunately, this method would not prevent a malicious team of researchers from preparing multiple such documents in advance, in anticipation of a need to defraud the medical research establishment. To be clear, under a system as described above, retroactively changing endpoints would no longer be a question of simply deleting a paragraph in a Word document or in a trial registry. This level of dishonesty would require planning in advance (in some cases months or years), detailed anticipation of multiple contingencies, and in many cases, the cooperation of multiple members of a research team. At that point, it would probably be easier to just fake the numbers than it would be to have a folder full of blockchain-timestamped protocols with different endpoints, ready in case the endpoints need to be changed.

Further, keeping a folder of blockchain-timestamped protocols would be a very risky pursuit—all it would take is a single honest researcher in the lab to find those protocols, and she would have a permanent, undeniable and independently verifiable proof of the scientific fraud.


Fraud in scientific methods erodes confidence in the medical research establishment, which is essential to it performing its function—generating new scientific knowledge, and cases where pre-specified endpoints are retroactively changed casts doubt on the rest of medical research. A method by which anyone can verify the existence of a particular detailed protocol prior to research would lend support to the credibility of medical research, and be one less thing about which researchers have to say, “trust me.”


    title = {Proof of prespecified endpoints in medical research with the bitcoin blockchain},
    journaltitle = {The Grey Literature},
    author = {Benjamin Gregory Carlisle},
    address = {Montreal, Canada},
    date = 2014-08-25,
    url = {}


Carlisle, Benjamin Gregory. "Proof of prespecified endpoints in medical research with the bitcoin blockchain" Web blog post. The Grey Literature. 25 Aug 2014. Web. 22 Feb 2018. <>


Carlisle, Benjamin Gregory. (2014, Aug 25). Proof of prespecified endpoints in medical research with the bitcoin blockchain [Web log post]. Retrieved from

Copyright laws in Canada; Narnia; Tatooine


I have a conspiracy theory that there is a common reason why the most recent Superman, Star Trek and Muppets films are all mediocre offerings at best, and the reason is that there are problems with copyright law. Please watch the following educational Youtube video, and continue reading below.

Required viewing

The problem with copyright is that it only allows for “corporate fan-fiction”

We all like to see old stories re-told. That’s just part of the human experience—taking ideas, building on them, and combining them with other ones in interesting ways. A large proportion of our culture could be described in this way. There are very few truly novel ideas out there.

The reason that you see Superman, Star Trek and Muppets re-makes at the movies is because the ideas that they’re based on still have some mileage left in them. We, as a culture, still want fresh new stories with Kermit the Frog, James Kirk or Clark Kent, and the underlying premisses and characters are still engaging enough that people will pay good money to see them.

To put this in other terms, humans really like fan-fiction. I’m not using the term in a derogatory way. Fan-fiction can be pretty awesome. (There is, of course bad fan-fiction too.) I’m just saying that we should include the Star Trek reboot in what we count as fan-fiction, since it’s a re-hashed creative work based on another artist’s original work.

It’s well-funded fan-fiction. It’s licensed fan-fiction. But it’s fan-fiction, nonetheless.

My theory is that we have “corporate fan-fiction” that tends toward suckiness because of the unwillingness of large movie studios to take the risk of doing something interesting with their work, combined with the lack of creative competition that extremely long periods of copyright affords them. This is why we’ll probably never see a black actor play Superman on the big screen. (Well, that and systemic racism.) This is also why we just had a re-make of Star Trek II (complete with a white-washed Khan).

Only one company owns the rights to make Superman, so there’s no competing studio making a more creative Superman film. Hence they can get away with selling us another film with the same premiss and the same villain and there’s nothing anyone can do about it.

On the other hand, Sherlock Holmes is a character from a series of books that are now squarely in the Public Domain and there’s a whole industry of Sherlock Holmes re-imaginings. There’s a chapter in This Is How You Die that is Sherlock Holmes fan-fiction. The Magician’s Nephew by C. S. Lewis is, tangentally, Sherlock Holmes fan-fiction. (Don’t believe me? Read paragraph 2 of chapter 1.)

Moving from the written word to the screen, you can see Iron Man play Sherlock Holmes, then you can turn on the BBC and watch white Khan play Sherlock Holmes, then you can see (an even whiter) Lt. Cdr. Data play Sherlock Holmes, and to finish it off, on CBS there’s a Sherlock Holmes with a gender-swapped Watson, which I haven’t seen but I’ve heard is pretty good.

Brent Spiner as Data as Sherlock is pretty awful, I have to admit. But the nice thing about Sherlock Holmes is that if you don’t care for one version, you can move on to another one, and there are enough Sherlock Holmeses out there that one of them will likely strike your fancy.

The point is, after a certain point, copyright law stops encouraging creativity and starts stifling it, and when a creative work enters the Public Domain, brand new opportunities open up around it—not just downloading the original from Project Gutenberg, but taking the source material, re-mixing it, and telling a new story based on an old theme.

So the question becomes, what is entering the Public Domain these days?

Copyright law in Canada

The above video gives a very US-centric view of copyright law, but copyright law varies from country to country. As the video states, in 1998, US copyright was extended from 50 to 70 years after the death of the artist. The same thing happened in the UK in 1995. This never happened in Canada. In Canada, a creative work falls into the Public Domain only 50 years after the artist dies.

I was talking to my colleagues at work about this today, and thought to myself, Can I think of any authors that died 50 years ago? And I answered myself: Yes, yes I can.

C. S. Lewis died in 1963, hence his books are now in the Public Domain in Canada

Do you know what this means? First of all, it means that if you’re in Canada, contact me and I will email you ebooks of the complete Chronicles of Narnia, AND IT WILL BE COMPLETELY LEGAL. Or actually, here’s a link to Project Gutenberg Canada. You can get it for yourself there. (This is very exciting for me. Anyone who knew me in high school will be able to attest to how much of a C. S. Lewis nerd I was. This guy is probably the reason why I ended up with two degrees in philosophy.)

It also means that if you are a producer at a Canadian TV network, and you are so inclined, you can produce your own Narnia-based fan-fiction show, and sell it in Canada. And it’s in the Public Domain, so you can go any direction with the source material that you like, without consulting Disney, the publisher or C. S. Lewis’ estate.

No one can stop you from writing a story about a queer Muslim girl who goes to Narnia. You could make it a murder mystery with a gender-swapped Anne of Green Gables (“man of Green Gables”) who murders Mister Darcy in the land of Calormen. Sherlock Holmes would solve the case of course.

There’s a million creative angles that will never be explored by Disney. Why do the Pevensies have to be white and British? Could Aslan be a villain? What happened with the magic rings at the end of The Last Battle? CBC TV producers—contact me. I have ideas!

Any Canadian can make a gritty reboot of those bloodless and uninteresting Narnia films that Disney made a few years ago. If you do it in Canada, you don’t need Disney’s permission. (Note that the Disney Narnia films are still under copyright—it’s their source material, the books, that are in the Public Domain, and only in Canada, so you’d have to hold off for 20 years before bringing it to the US or UK market.)

The fact that Canadian copyright laws are different gives us creative opportunities that Americans and British people don’t have, and I think that by even further loosening up copyright laws here, Canada could become a world leader in producing new creative work.

My proposed copyright reform

Right now, Canada has a 20-year head start on Public Domain content as compared to the US and the UK. This is awesome. This is why all my Canadian readers are downloading The Magician’s Nephew to your e-readers as we speak, I’m sure.

Unfortunately, anyone who’s reading this from the US or the UK is out of luck. For the next twenty years, you’ll have to pay for stuff we can download in Canada for free. I’m not saying this (just) out of a sense of smug superiority. I’m saying this out of a genuine fear of creeping copyright law. I would love to see a whole bunch of Americans get really upset about the fact that I can download The Lion, the Witch and the Wardrobe for free and they can’t, because a popular backlash against copyright extensions in the States might remove a bit of pressure for Canadian law to conform to the American and British standard of life + 70 years.

But do you know what would be even better? Rolling back copyright law in Canada to something like the 1710 Statute of Anne would be even better.

Like the video says, Star Wars: A New Hope was made in 1977, so it would be in the Public Domain by now. We could have different studios making competing Star Wars films. Who knows? The impending let-down that will be the upcoming Disney Star Wars films could even be avoided by a Canadian market where creative works enter the Public Domain in a reasonable amount of time.

Cross-posted to:


    title = {Copyright laws in Canada; Narnia; Tatooine},
    journaltitle = {The Grey Literature},
    author = {Benjamin Gregory Carlisle},
    address = {Montreal, Canada},
    date = 2014-03-27,
    url = {}


Carlisle, Benjamin Gregory. "Copyright laws in Canada; Narnia; Tatooine" Web blog post. The Grey Literature. 27 Mar 2014. Web. 22 Feb 2018. <>


Carlisle, Benjamin Gregory. (2014, Mar 27). Copyright laws in Canada; Narnia; Tatooine [Web log post]. Retrieved from

Why I dumped Gmail


Reason one: I need my email to work, whether I follow the rules on Google Plus or not

Google has linked so many different products with so many different sets of rules to the same account that I feel like I can’t possibly know when I am breaking some of its terms of use. And I’m not even talking about specifically malicious activity, like using software to scrape information from a Google app or a DDoS attack. I mean something as basic as using a pseudonym on Google Plus, or a teenager revealing that she lied about her age when signing-up for her Gmail account. (These are both things that have brought about the deletion of a Google account, including Gmail.)

For starters, I think it is a dangerous and insensitive policy to require all users to use their real names on the Internet, but putting that aside, I don’t want to risk having all my emails deleted and being unable to contact anyone because of some Murph / Benjamin confusion on Google Plus.

Reason two: it’s actually not okay for Google to read my email

Google never made it a secret that they read everyone’s email. Do you remember when you first started seeing the targeted ads in your Gmail? I bet you called a friend over to look. “Look at this,” you said, “we were just talking about getting sushi tonight, and now there’s an ad for Montréal Sushi in my mailbox! That’s so creepy,” you said.

And then you both laughed. Maybe you made a joke about 1984. Over time, you got comfortable with the fact that Google wasn’t even hiding the fact that they read your mail. Or maybe you never really made the connexion between the ads and the content of your email. Maybe you thought, “I have nothing to hide,” and shrugged it off, or did some mental calculation that the convenience of your Gmail was worth the invasion of privacy.

I guess over time I changed my mind about being okay with it.

And no, this isn’t because I have some huge terrible secret, or because I’m a criminal or anything like that. I just don’t want to send the message that I’m okay with this sort of invasion of privacy anymore. Google’s unspoken challenge to anyone who questions their targeted ads scheme has always been, This the price you pay for a free service like Gmail. If you don’t like it, you can leave.

This is me saying, I don’t like it. I’m leaving.

Reason three: Gmail isn’t even that good anymore

When I signed up for Gmail, there were three things that set it apart:

  1. Tag and archive emails—forget folders!
  2. 10 gigabytes of space—never delete an email again!
  3. Web-based interface—access it from anywhere!

I’ll deal with each of these in turn.

1. Tagging was fun, but it only really works in the Gmail web interface, or in an app specifically designed for use with Gmail. Unfortunately, Gmail just doesn’t play nicely with other email apps, like the one in Mac OS X, or Mail on the iPhone or the BlackBerry. You could make it work through IMAP, having it tell your mail client that each tag was a folder, but it was always a bit screwy, and I never figured out how to put something in two tags through a 3rd-party app or mobile device.

The value of being able to organise emails by simultaneously having them in two categories is outweighed by the fact that I couldn’t access this functionality except through the browser.

2. The amount of space that Gmail provides for emails is not very much these days. I have a website (you may have guessed) and it comes with unlimited disc space for web hosting and emails. 10 gigabytes is just not that big a deal anymore.

3. I can do this with my self-hosted email as well, and I don’t have to suffer through an interface change (“upgrade”) just because Google says so.

So what’s the alternative?

Full disclosure: I haven’t shut down my Google account. I’m forwarding my Gmail to my self-hosted email account, so people who had my old Gmail account can still contact me there for the foreseeable future. I am also still using a number of other Google products, like the Calendar and Google Plus, but my life would not go down in flames quite so quickly if those stopped working as compared to a loss of email access.

Basically, I am moving as many “mission critical” aspects of my life away from Google as I can, to keep my technological eggs in a few more baskets. Email, for example, will be handled by my web host, of which I make backups on a regular basis.

I’m not trying to go cold-turkey on Google. I’m just not going to pretend to be as comfortable as I used to be as a guest on Google’s servers.

Update (2013 Nov 18)

I switched back to the Thunderbird email client a couple weeks ago. It supports tagging and archiving, just like Gmail.


    title = {Why I dumped Gmail},
    journaltitle = {The Grey Literature},
    author = {Benjamin Gregory Carlisle},
    address = {Montreal, Canada},
    date = 2013-09-27,
    url = {}


Carlisle, Benjamin Gregory. "Why I dumped Gmail" Web blog post. The Grey Literature. 27 Sep 2013. Web. 22 Feb 2018. <>


Carlisle, Benjamin Gregory. (2013, Sep 27). Why I dumped Gmail [Web log post]. Retrieved from

What would it take to convince you that someone was from the future?



It’s so good!

Ever since the pilot first aired as a free download in the iTunes Store, I’ve been a big fan of the television programme Continuum. It is a sci-fi show about a police officer from the year 2077 sent 65 years into her past. By rights, any conspicuously Canadian time-travel TV show should be terrible, but this one turned out to be pretty good.

Continuum season 2 spoilers ahead

Until part-way through the second season, the main character, Kiera, hides the fact that she is from the future from her partner, Carlos. During an episode in the second season, for reasons pertaining to the plot of that episode, Kiera has to reveal to her partner that she’s from the future.

He doesn’t believe her, and says that her prescience regarding the murders in question suggests that she is in cahoots with the serial killer who is the villain of this episode. Of course, by the end of the episode, Kiera proves that she is not a serial killer, and that she is telling the truth, that she comes from 2077. Carlos sees Kiera’s future-handgun—a “transformable multifunctional weapon with a holographic sight,” (source: Internet Movie Firearms Database, which apparently exists) and all is well.

When this happened, it got me thinking, first in the context of Continuum. If I were Carlos, and my partner had intimate knowledge of a serial killer’s whereabouts, victims, modus operandi, etc. that she couldn’t explain, and the best alternate explanation to her complicity in the murders is that she is from the future, I’m not sure that even seeing a gun with a holographic sight would be enough to convince me. I mean, just because a technology looks advanced doesn’t mean that it’s beyond the capabilities of secret government or corporate engineers.

I guess I would assume that the future-gun was just something that had been developed by a secret government organisation or something. Even the special CPS suit that Kiera wears that has a multi-touch interface—it’s not so futuristic that if I saw someone wearing it, I’d immediately think, She’s from the future! I’d probably go with the simpler explanation: She’s one of the serial killers, and she happens to have a large budget for futuristic firearms.

In some ways, it reminds me of the British scientists who concluded that the duck-billed platypus was just a very convincing fraud, when a specimen was first taken from Australia to Great Britain.

What would it take?

Then I started thinking in general terms, outside the context of Continuum, what would it take to convince me that some other person is from the future?

I don’t think specific knowledge of the future would cut it for me in terms of proof that someone came from the future. As the Continuum episode proved, I would be more likely to believe that a person either has some sort of “insider information” or that she is somehow causing the event she has foreknowledge of. Even if it were something like the exact time and location of a lightning strike, I would probably be more likely to think, This person has developed some way to cause a massive electric arc that looks like lightning, than I would be to think, She’s from the future.

If it was a Sports Almanac that accurately predicted the outcome of every sporting event, like in Back to the Future, it would be harder to maintain my scepticism in the long run, but I would probably start looking for evidence that the alleged record from the future wasn’t being changed to reflect likely predicted outcomes of sporting events.

What about technology from the future?

It would have to be something pretty remarkable to make me think that it is more likely to be from the future than something produced by very well-funded scientists and engineers. The following are things that, although impressive, would not convince me that they are from the future:

  • Back to the Future Flying Delorean—this sort of thing is possible in 2013, although expensive
  • Back to the Future Hoverboard—there is a crowdfunding project for this
  • Holograms/multi-touch clothes, Continuum-style—after the smartwatches, smartshirts are certain to be next
  • Working invisibility device
  • Star Trek-style tricorder—people have made these already
  • Star Trek-style dermal regenerator—would be cool, but not “you’re from the future” cool
  • Star Trek-style replicator—3D printer anyone?

The following are things that probably would convince me that they are from the future:

  • A structure that is bigger on the inside than it is outside (e.g. a functioning Tardis)
  • A working teleporter—although I’d need to do a large number of tests myself to make sure that it was genuine and not just a sophisticated illusion
  • An actual trip through the time machine itself—although I would need a good amount of convincing even then

What about you? Where would you draw the line?


    title = {What would it take to convince you that someone was from the future?},
    journaltitle = {The Grey Literature},
    author = {Benjamin Gregory Carlisle},
    address = {Montreal, Canada},
    date = 2013-09-25,
    url = {}


Carlisle, Benjamin Gregory. "What would it take to convince you that someone was from the future?" Web blog post. The Grey Literature. 25 Sep 2013. Web. 22 Feb 2018. <>


Carlisle, Benjamin Gregory. (2013, Sep 25). What would it take to convince you that someone was from the future? [Web log post]. Retrieved from

A review of and rationale for Bitmessage


Bitmessage address

Bitmessage address

There is a little bit of crypto-anarchist in all of us, and bowing to that spirit, I’ve been checking out Bitmessage, the peer-to-peer encrypted messaging protocol, which I commend to you as a fun experiment. It is not ready for mass-consumption (yet), but it shows a lot of promise. If you would like to test for yourself whether or not the software is actually impervious to the NSA’s prying eyes, download the software with a friend (Mac, PC) and send each other messages whose content would normally get you on the “do not fly” list, then book a flight and go to the airport. Say goodbye to your loved ones before you leave—there is limited internet access for inmates at Guantanamo Bay.

“I have no need to use Bitmessage because I have nothing to hide”

You might be asking yourself why anyone would even be interested in using Bitmessage. After all, if you are not emailing about child pornography or drug trafficking, you have nothing to worry about, right? I can think of 4 reasons why I might want to switch as many of my communications to Bitmessage as possible. I will outline them below, and then go into some of the more interesting aspects of the protocol that I’ve been thinking about.

1. You’re probably guilty of something.

The laws of Canada, the United States and pretty much every modern nation state are at the moment, so convoluted, contradictory and difficult to understand and follow, that it is likely that every single person alive today has already done enough to warrant time in jail, whether they realise it or not.

Affluent straight white males without mental health issues don’t notice this because it is not good politics to arrest them very much. But it doesn’t take much of an excuse for the police to take a homeless guy away, or to declare that a black guy was acting suspiciously and treat him accordingly. This might not bother you if you are, in fact, an affluent straight white male without mental health issues, until you realise that:

2. Sometimes messages that are perfectly acceptable in their proper context, can be very damning when quoted out of context.

I don’t know from first-hand experience, but I’m betting that the NSA/CSIS/whoever’s getting my email doesn’t have a human carefully reading through every single email and flagging the ones that are explicitly related to terrorism. I bet they’re searching for key words and phrases. And if you happen to have the wrong sorts of phrases in your emails, combined with the wrong sorts of Google searches, etc., you might find that the government might decide to scrutinise what you’re doing, and this is a problem because of reason 1, outlined above.

3. Just because you know that what you’re doing is fine doesn’t mean that someone else won’t punish you for it.

For example, I am a homosexual. This is fine. My family/friends/co-workers think it’s fine. I’ve never been physically attacked or had anything like that happen to me because of it. However, if I were planning on going to Russia as a tourist during the Olympics, I would be rightly afraid, and not because I’m doing anything even slightly wrong.

4. Solidarity with the less privileged is a generous gesture.

Let’s imagine that you are an affluent straight white male who has very vanilla sexual tastes and no mental health issues, whose email correspondence is so clear that it could never be misconstrued badly, and against all odds, you know for certain that nothing you have done is now, or ever will be illegal. You still might want to switch to something like Bitmessage, just out of solidarity with the rest of us who aren’t so privileged. After all, if we have to constantly switch between regular email and Bitmessage, we’ll eventually mess up.

Now that you’re all on-board the Bitmessage train, let’s settle down and look at what it is and what it does differently.

What I find interesting about the Bitmessage protocol itself

Bitmessage is still in a beta-version, so it does not allow rich text formatting, attachments (although in principle, it could) or any sort of filtering, searching tagging or sorting of your messages once they’re in your inbox. It’s more of a proof-of-principle release, than anything that’s designed for actual communication.

Unfortunately, it’s not pretty.

The big selling point of Bitmessage is that all messages are encrypted, and the protocol makes it difficult even to discern the sender / receiver of a message. Beyond that, it has some interesting properties that could be exploited, the most interesting of which I will explain here:

The address of the sender of a bitmessage can’t be faked

Something you may not have known about regular email: There is nothing preventing anyone from sending an email from any address she chooses. Any person could send an email from your email address to any other email address, without needing to get access to the sender’s email account. That’s why you sometimes see extra information next to the sender’s name in Gmail. It’s kind of like regular mail. You can write anyone’s name you like in the top-left corner of an envelope and have it delivered.

Have you ever wondered why, whenever you sign up for something online, you have to put in your username / password, then it sends you an email, and then you click a link in the email, which sends you back to the site you came from? It’s a weird and awkward system, and it’s not even very secure, but we’ve been doing it for so long that we’ve forgotten just how weird it is.

If a website required a bitmessage address for sign-up instead of an email address, the sign-up process could be streamlined or changed in a number of ways: It could be as simple as, “Fill out the Captcha to reveal a bitmessage address. Send a blank bitmessage to the address to sign up. Your bitmessage address is your username. You will receive a password in reply to the message you sent.” There you go. That might be what sign-ups look like, once web servers start installing a bitmessage client.

Just by building a protocol such that sender addresses can’t be faked, we can finally eliminate the cost of constantly writing software to confirm the identity of a user.

Proof of work: the end of spam?

If you’ve tried sending a bitmessage, one thing you might have noticed is that it needs to do a “proof of work” before the network will accept a message for sending. It takes about a minute on my year-old MacBook Air to do, but the bigger the message is, the more difficult the proof of work becomes.

This isn’t much of a burden, if you’re sending messages only as fast as you can type them, but it becomes a huge drain on computer resources and electricity if you’re trying to send thousands of spam messages. This in itself may make spam uneconomical. Also, having an incentive to keep emails short and to keep attachments to a minimum is a good thing.

There are some legit cases in which you may want to send out a message to a large number of recipients. Some people actually do want to receive updates on projects they’re interested in, for example. Bitmessage allows “broadcasting” for this. Users subscribe to a bitmessage address, and anyone who has the address receives messages broadcast to it, which requires only one proof of work.

Addresses are not human-readable

A bitmessage address doesn’t look like an email address or a Twitter handle. In fact, you can’t really pick an address in the same way you did for pretty much every other web service you ever signed up for. Addresses are generated either randomly or deterministically from a user-chosen passphrase.

This is an example of a bitmessage address:


These addresses are not meant to be read and transcribed by humans. To be honest, relying on humans to remember and reliably communicate specific exact strings of characters never worked really well. (E.g. “Was that a one or a lower-case L in your email?” or “Did you mean the numeral ‘6’ or the letters ‘S-I-X’?”)

Even better, it means that we can finally move past “vanity” email addresses. You don’t have to deal with addresses like [email protected] and you don’t have to try to think up a dignified user name when signing up for a Gmail to use for applying for jobs when your real name is already taken. We can give up on human-readable addresses and let QR codes and the copy / paste commands take over.

Addresses are “disposable” by design

You can make as many addresses as you like, and by virtue of the fact that they are disposable, you can use one per project / contact / context, and keep track of how other users get your contact info. For example, you might be a member of a message board for Russian political dissidents. You could post a bitmessage address there, and mark in your bitmessage client where you posted it and when, and if you ever received a message on that address, you’d know that ultimately, that’s where the guy got it from. You could trade another address with your family, use another just for a particular school project, or print a QR code for another one on a poster for your indy-music band, etc.

Deterministically generated addresses are an interesting property of the bitmessage protocol as well. Be sure to use a very good passphrase if you want to generate them in this way, otherwise you run the risk of an “address collision,” where you and another person have generated the same address. If this happens, you will receive each other’s messages.

This could be a bug or a feature, depending on how you look at it. I can imagine that a government agency or a company might want to have a copy of all their employees’ communications—one that can’t be deleted in the case of a scandal. You could write a little application for use on company computers that generates addresses, and when it does so, it informs the user, as well as a “listening computer,” which uses an agreed-upon set of passphrases to deterministically generate the same addresses. In this way, a government agency or a department within a company wouldn’t have the option of deleting old emails that would reflect poorly on them, unless they go and delete them on the “listening computer.”

Emails were never really reliable, anyway

Do you remember when Gmail went down for a few hours a couple years ago? I really didn’t know what to do. I was in shock. I went outside, wandered around and reconsidered the priorities in my life.

Gmail, like all email servers, has a single central physical location. This means that if power goes out, or a meteorite strikes, or if climate change floods that location, you no longer have your normal means of communication. It doesn’t happen much, but it could. Bitmessage is a distributed peer-to-peer network, so it doesn’t have the same sorts of vulnerabilities.

Even outside of catastrophes, email is a bit unreliable. Sometimes emails legitimately go missing. Again, not often—it’s usually a user error or a misfiled message. But the fact is, you can’t tell if someone has received a message you sent them, or if the message disappeared into the ether. Automatic spam filters are also common culprits for the loss of email messages.

Bitmessage sends receipts for messages, indicating that the receiver’s client has downloaded the message from the network, but not that she has read it.

Have you tried Bitmessage yet?

Let me know if you do download it and give it a try. Hit me up and we can send secret messages to each other!


    title = {A review of and rationale for Bitmessage},
    journaltitle = {The Grey Literature},
    author = {Benjamin Gregory Carlisle},
    address = {Montreal, Canada},
    date = 2013-08-13,
    url = {}


Carlisle, Benjamin Gregory. "A review of and rationale for Bitmessage" Web blog post. The Grey Literature. 13 Aug 2013. Web. 22 Feb 2018. <>


Carlisle, Benjamin Gregory. (2013, Aug 13). A review of and rationale for Bitmessage [Web log post]. Retrieved from

My default typed greeting changed since I became a Dvorak typist


In 2007 I switched from QWERTY to Dvorak. I’m agnostic about which one is more efficient. I like Dvorak because I feel like I move my hands less. Also it’s great for the entertainment value (“What is wrong with your keyboard?”), and because it makes it that much harder for someone looking over my shoulder to guess my passwords.

Before I switched, my default greeting when chatting with someone was “hi.” It was two letters and it could be typed with one hand, which made it faster than “hello,” which required hand-switching.

Since I have converted to Dvorak, I noticed that my default typed greeting became “yo,” and I think it’s for the same reasons—in Dvorak, you can type “yo” with one hand, but you can’t type “hi” with one hand.

Just weird is all.


    title = {My default typed greeting changed since I became a Dvorak typist},
    journaltitle = {The Grey Literature},
    author = {Benjamin Gregory Carlisle},
    address = {Montreal, Canada},
    date = 2013-05-26,
    url = {}


Carlisle, Benjamin Gregory. "My default typed greeting changed since I became a Dvorak typist" Web blog post. The Grey Literature. 26 May 2013. Web. 22 Feb 2018. <>


Carlisle, Benjamin Gregory. (2013, May 26). My default typed greeting changed since I became a Dvorak typist [Web log post]. Retrieved from

How to automatically back up WordPress or ownCloud using cron jobs


Recently I set up WordPress for my research group in the Medical Ethics Unit. We will be blogging our journal clubs, posting links to our publications and upcoming events. In related news, my research group has been using DropBox to coordinate papers in progress, sharing of raw data, citations, and all manner of other information. This was working pretty well, but we have been bumping up against the upper limit of our capacity on DropBox for a while, so I installed ownCloud on the web host we got for the research group blog. I’m pretty happy with how nice it is to use and administer.

Of course one of our concerns is making sure that we don’t lose any data in the case of the failure of our web host. This is unlikely, but it does happen, and we don’t want to run into a situation where we try to log in to our cloud-based file storage / sharing service and find that months’ worth of research is gone forever.

For a few weeks, the following was more-or-less my workflow for making backups:

  1. Log in to phpMyAdmin
  2. Make a dump file of the WP database (choose database > Export > Save as file … )
  3. Make a dump file of the ownCloud database
  4. Save to computer and label with appropriate date
  5. Log in to web server using FTP
  6. Copy contents of WP’s /wp-content/ to a date-labelled folder on my computer
  7. Copy contents of ownCloud’s /data/ to a date-labelled folder on my computer

This worked pretty well, except that it was a pain for me to have to do this every day, and I know that if I ever forgot to do it, that would be when something terrible happened. Fortunately for me, my boss mentioned that he had an old but still serviceable iMac sitting in his office that he wanted to put to some good purpose.

I decided to make a fully automatic setup that would make backups of our remotely hosted data and save it locally without any input on my part, so I can just forget about it. I made it with cron jobs.

Server side cron jobs

First, I set up some cron jobs on the server side. The first one waits until midnight every day, then dumps all the MySQL databases into a gzipped file on my web host, then zips up the WordPress /wp-content/ and ownCloud /data/ folders and puts them in the backup folder as well. The second server-side cron job empties the backup folder every day at 23h00.

  • 0 0 * * * PREFIX=`date +%y-%m-%d`; mysqldump -u USERNAME -h HOSTNAME -pPASSWORD –all-databases | gzip > /path/to/backup/folder/${PREFIX}-DBNAME-db.sql.gz; zip -r /path/to/backup/folder/${PREFIX} /path/to/wordpress/wp-content/; zip -r /path/to/backup/folder/${PREFIX} /path/to/owncloud/data/;
  • 0 23 * * * rm -r /path/to/backup/folder/*

A few notes for someone trying to copy this set-up

  • Your web host might be in a different time zone, so you might need to keep that in mind when coordinating cron jobs on your web host with ones on a local machine.
  • My web host provided a cron job editor that automatically escapes special characters like %, but you might have to add back-slashes to make yours work if you’re manually editing with crontab -e.
  • You might want to put a .htaccess file in your backup directory with the following in it: “Options -Indexes” (remove the quotes of course). This stops other people from going to your backup directory in a browser and helping themselves to your files. You could also name your backup directory with a random hash of letters and numbers if you wanted to make it difficult for people to steal your backed-up data.

Local cron job

Then on the local machine, the old iMac, I set up the following cron job. It downloads the files and saves them to a folder on an external hard disc every day at 6h00.

  • 0 6 * * * PREFIX=`date +%y-%m-%d`; curl${PREFIX}-DBNAME-db.sql.gz > /Volumes/External HD/Back-ups/${PREFIX}-DBNAME-db.sql.gz; curl${PREFIX} > /Volumes/External HD/Back-ups/${PREFIX}; curl${PREFIX} > /Volumes/External HD/Back-ups/${PREFIX};

If you were super-paranoid about losing data, you could install this on multiple local machines, or you change the timing so that the cron jobs run twice a day, or as often as you liked, really. As long as they’re always turned on, connected to the internet and they have access to the folder where the backups will go, they should work fine.


This isn’t a super-secure way to back up your files, but then we’re more worried about losing data accidentally than having it stolen maliciously. I don’t think the world of medical ethics is cut-throat enough that our academic rivals would stoop to stealing our data in an effort to scoop our papers before we can publish them. That said, I’m not about to give away the exact URL where our backups are stored, either.

The practical upshot of all this is that now we have at least three copies of any file we’re working on. There’s one on the computer being used to edit the document, there’s one stored remotely on our web host, and there’s a copy of all our files backed up once a day on the old iMac at the Medical Ethics Unit.


    title = {How to automatically back up WordPress or ownCloud using cron jobs},
    journaltitle = {The Grey Literature},
    author = {Benjamin Gregory Carlisle},
    address = {Montreal, Canada},
    date = 2013-05-20,
    url = {}


Carlisle, Benjamin Gregory. "How to automatically back up WordPress or ownCloud using cron jobs" Web blog post. The Grey Literature. 20 May 2013. Web. 22 Feb 2018. <>


Carlisle, Benjamin Gregory. (2013, May 20). How to automatically back up WordPress or ownCloud using cron jobs [Web log post]. Retrieved from

Internet vigilante justice against the police in Montréal through social media


I hate Instagram too, but arresting someone for using it is ridiculous

I hate Instagram too, but arresting someone for using it is ridiculous

It’s hard to trust the police in Montréal these days. “Officer 728” is a household name, known for her abuse of power, which was caught on video. There was also a famous CCTV video of a prostrate man being brutally kicked repeatedly by the Montréal police. This problem isn’t restricted to Montréal either. Recently a police officer in Vancouver was caught on video punching a cyclist in the face while putting him in handcuffs.

Technology and the abuse of police power

I used to largely dismiss reports of police abuses of power. When I saw graffiti saying, “eff the police” or something to that effect, I used to chalk it up to conspiracy theorists and delinquent youths. Now that it’s all on Youtube, it’s harder to ignore the problem.

(I also used to dismiss those who spray-painted “burn the banks” in a number of parts of Montréal as conspiracy theorists, but since 2008, I can kind of see where they’re coming from.)

We’re entering into an age when abuses of power by police are being caught on tape more and more often. I don’t think that police abusing their power is a new thing, or even that the rates have changed recently. I’m of the position that it might just be more visible because of the recent development that nearly everyone is carrying around a camera in their pocket that can instantly upload video of police brutality to Youtube. The Google Glass project (and the clones that are sure to follow) may make this even more common.

This is unsettling to me, partly because it might mean that a lot of the times I dismissed claims of police abuse, I was in the wrong.

We should all be legitimately outraged by this

More importantly though, this should make us all angry because this is not how justice works in Canada. Even if the robbery suspect was completely guilty of every crime the police suspected, we don’t allow individual police officers to dole out their own personal vengeance in the form of physical beatings. We certainly don’t allow groups of police officers to do so against suspected criminals as they lie helpless in the snow, and most emphatically, there is no place in Canadian justice for criminals to be punished in this way (or any other) without due process or without even having been formally charged with a crime.

A police officer punching a restrained person is much worse than a regular citizen punching another citizen. This is because the police are, so to speak, the final guarantee that the government has power over its citizens and that there is the rule of law in a country. The most basic reason for others not to steal your stuff is that if they do, there’s a good chance that the police will come and take away their freedom in such a way that it’s not worth it for most people to engage in that behaviour. All laws largely work on the same principle. Sure, there’s other sanctions that a government can use, like taxation, but even that is underwritten by the threat of police coming and putting you in prison if you break the tax laws.

So, when a police officer physically abuses a citizen, he shakes our faith in the proper functioning of the machinery of government. This makes the issue not just one of bad PR for a particular police department, but one of general faith in our country to work in a just and equitable way. Further, if the police are vigilantes and there is no recourse, it legitimizes vigilante justice by the people against the police.

This means that when a police officer abuses his power, there must be some recourse that is transparent, timely and just. There can’t even be the appearance that the police are above the law, otherwise what you will see is ordinary citizens taking the law into their own hands to bring the police to justice, which is a very scary prospect.

Ordinary citizens are taking the law into their own hands to bring the police to justice

In response to the issues I have described above, as well as a number of much less famous examples of abuse of police power during the protests in Montréal, there has been a movement toward the use of social media to identify the police who are abusing their power. This is being done by citizens who believe that there has been abuse of power by police in Montréal, and that the normal channels of addressing these abuses have been of no avail.

They are collecting photos, videos, identification numbers, names and addresses of police officers, cataloguing their transgressions and calling for retribution.

The police are calling this “intimidation.” They are calling for it to be taken down. They’re (rightly) complaining that there is no way for a police officer who is wrongly accused in this way to clear his name, and that the police, and even some non-police are being put in danger because of this.

What needs to happen

I have not been involved in the student protests in Montréal. I have never been beaten by the police. I generally believe that if I call 911, it will be the “good guys” who show up at my door. That said, I can understand why someone who was abused by a police officer might be tempted to post this information out of frustration at the ineffectiveness of the official recourse against such abuse.

In some ways, the police have been implicitly training us to use these methods if we want anything to get done: Likely the police officer from Vancouver would have gotten away with punching the cyclist in the face if the cyclist’s friend hadn’t caught it on video and posted it to Youtube.

If the police want us to use official channels to address police abuses, they have to give us reason to think that it’s better to do that than to just appeal to the Internet for justice. Politically-motivated arrests of people for posting “intimidating” things online won’t cut it.

I think we will only see a real change in public attitudes toward police brutality given the following three conditions.

  1. The official channels must be transparent. It must be clear to everyone that something is being done, and we have to see that officers who abuse their power are appropriately punished. Confidence in the relationship between the state and its citizens is what’s at stake, and so the solution must be one that publicly restores confidence.
  2. Official channels must be timely. The old adage, “justice delayed is justice denied” applies here. If citizens get impatient waiting for justice to be applied, they may be tempted to take it into their own hands.
  3. Finally, official recourse against police abuse must be just. This is where an official means of recourse against police brutality could actually outdo Internet vigilantes. Internet vigilante justice will always be faster and more transparent than anything official could ever be, but an official channel can enforce punishments fitting to the crime, and can claim legitimacy in a way that vigilantes never can.

If a police officer publicly faced criminal charges, rather than just a “paid leave of absence” followed by “counselling” and this happened in short order after an accusation of abuse, this would do a lot to restore faith in official channels. The people of Montréal might even learn that the legitimate checks and balances are preferable to pursuing vigilante justice through social media.


    title = {Internet vigilante justice against the police in Montréal through social media},
    journaltitle = {The Grey Literature},
    author = {Benjamin Gregory Carlisle},
    address = {Montreal, Canada},
    date = 2013-04-6,
    url = {}


Carlisle, Benjamin Gregory. "Internet vigilante justice against the police in Montréal through social media" Web blog post. The Grey Literature. 06 Apr 2013. Web. 22 Feb 2018. <>


Carlisle, Benjamin Gregory. (2013, Apr 06). Internet vigilante justice against the police in Montréal through social media [Web log post]. Retrieved from


Tag bag

Recent comments

Old posts

All content © Benjamin Carlisle