Clinical agnosticism and when trials say “maybe”—a presentation for #SummerSchool hosted by scholar.social

On 2020 August 4, I gave a presentation on Clinical Agnosticism as a part of , a free, online, interdisciplinary academic conference hosted by scholar.social.

You can download the slides from my presentation here. I transcribed my presentation in the Notes for each slide (Click View > Notes), if you want to know what I said, too!

If you want more information on this subject, this research was based on my doctoral thesis, The Moral Efficiency of Clinical Trials in Anti-cancer Drug Development. Chapter 5 will be of particular relevance.

The risks and harms of 3rd party tech platforms in academia

CW: some strong language, description of abuse

Apologies for the rambly nature of this post. I wrote it in airports, partly out of frustration, and I may come back and make it more readable later.

In this post, I’m going to highlight some of the problems that come along with using 3rd party tech companies’ platforms on an institutional level in academia. Tech companies have agendas that are not always compatible with academia, and we have mostly ignored that. Briefly, the core problem with the use of these technologies, and entrenching them into academic life, is that it is an abdication of certain kinds of responsibility. We are giving up control over many of the structures that are necessary to participation in academic work and life, and the people we’re handing the keys to are often hostile to certain members of the academic community, and in a way that is often difficult to see.

I have included a short “too long; didn’t read” at the end of each section, and some potential alternatives.

Using a tech company’s services is risky

There’s an old saying: “There’s no such thing as the cloud; it’s just someone else’s computer.” And it’s true, with all the risks that come associated with using someone else’s computer. The usual response to this is something along the lines of “I don’t care, I have nothing to hide.” But even if that’s true, that isn’t the only reason someone might have for avoiding the use of 3rd party tech companies’ services.

For starters, sometimes tech companies fail on a major scale that could endanger entire projects. Do you remember in 2017 when a bug in Google Docs locked thousands of people out of their own files because they were flagged as a violation of the terms of use?

https://twitter.com/widdowquinn/status/925360317743460352

Or more recently, here’s an example of a guy who got his entire company banned by Google by accident, proving that you can lose everything because of someone else’s actions:

TIFU by getting google to ban our entire company while on the toilet

And of course, this gets worse for members of certain kinds of minorities. Google and Facebook for example, both have a real-names policy, which is hostile to people who are trans, and indigenous North Americans:

https://boingboing.net/2015/02/14/facebook-tells-native-american.html

There are other risks beyond just data loss—for example, if your research involves confidential data, you may even be overstepping the consent of your research subjects, and potentially violating the terms under which your institutional review board granted approval of your study by putting it on a 3rd party server where others can access it. This may also be the case of web apps that include Google Analytics.

tl;dr—If your academic work depends on a 3rd party tech company’s services, you risk: losing your work at a critical time for reasons that have nothing to do with your own conduct; violating research subject consent; and you may be excluding certain kinds of minorities.

Alternatives—In this section, I have mostly focused on data sharing risks. You can avoid using Google Docs and Dropbox by sharing files on a local computer through Syncthing, or by installing an encrypted Nextcloud on a server.

Tech companies’ agendas are often designed to encourage abuse against certain minorities

I have touched on this already a bit, but it deserves its own section. Tech companies have agendas and biases that do not affect everyone equally. For emphasis: technology is not neutral. It is always a product of the people who built it.

For example, I have been on Twitter since 2011. I have even written Twitter bots. I have been active tweeting for most of that time both personally and about my research. And because I am a queer academic, I have been the target of homophobic trolls nearly constantly.

I have received direct messages and public replies to my tweets in which I was told to kill myself, called a “fag,” and in which a user told me he hopes I get AIDS. Twitter also closed my account for a short period of time because someone reported me for using a “slur”—you see, I used the word “queer.” To describe myself. And for this, there was a short period of time in which I was locked out, and it took some negotiation with Twitter support, and the deletion of some of my tweets to get back on.

I was off Twitter for a number of months because of this and out of a reluctance to continue to provide free content to a website that’s run by a guy who periodically retweets content that is sympathetic to white supremacists:

Twitter CEO slammed for retweeting man who is pro-racial profiling

And this isn’t something that’s incidental to Twitter / Facebook that could be fixed. It is a part of their core business model, which is about maximising engagement. And the main way they do that is by keeping people angry and yelling at each other. These platforms exist to encourage abuse, and they are run by people who will never have to endure it. That’s their meal-ticket, so to speak. And most of that is directed at women, members of racial minorities and queer people.

I have been told that if I keep my Twitter account “professional” and avoid disclosing my sexuality that I wouldn’t have problems with abuse. I think the trolls would find me again if I did open a new account, but even if it were the case that I could go back into the closet, at least for professional purposes, there are four reasons why I wouldn’t want to:

  • My experience as a queer academic medical ethicist gives me a perspective that is relevant. I can see things that straight people miss, and I have standing to speak about those issues because of my personal experiences.
  • Younger queer people in academia shouldn’t have to wonder if they’re the only one in their discipline.
  • As a good friend of mine recently noted, it’s unfair to make me hide who I am, while all the straight men all have “professor, father and husband” or the like in their Twitter bio’s.
  • I shouldn’t have to carefully avoid any mention of my boyfriend or my identity in order to participate in academic discussions, on pain of receiving a barrage of abuse from online trolls.

I’m not saying that everyone who uses Twitter or Facebook is bad. But I am extremely uncomfortable about the institutional use of platforms like Google/Facebook/Twitter for academic communications. When universities, journals, academic departments, etc. use them, they are telling us all that this kind of abuse is the price of entry into academic discussions.

tl;dr—Using 3rd-party tech company platforms for academic communications, etc. excludes certain people or puts them in the way of harm, and this disproportionately affects women, members of racial minorities and queer people.

Alternatives—In this section, I have mostly focused on academic communications. For micro-blogging, there is Mastodon, for example (there are even instances for science communication and for academics generally). If you are an institution like an academic journal, a working RSS feed (or several, depending on your volume of publications) is better than a lively Twitter account.

Tech companies are not transparent in their decisions, which often cannot be appealed

Some of the problems with using 3rd party tech company platforms go beyond just the inherent risks in using someone else’s computer, or abuse by other users—in many cases, the use of their services is subject to the whims of their support personnel, who may make poor decisions out of carelessness, a discriminatory policy, or for entirely inscrutable or undisclosed reasons. And because these are private companies, there may be nothing that compels them to explain themselves, and no way to appeal such a decision, leaving anyone caught in a situation like this unable to participate in some aspect of academic life.

For example, in the late 00’s, I tried to make a purchase with Paypal and received an error message. I hadn’t used my account for years, and I thought it was just that my credit card needed to be updated. On visiting the Paypal website, I found that my account had been closed permanently. I assumed this was a mistake that could be resolved, so I contacted Paypal support. They informed me that I had somehow violated their terms of use, and that this decision could not be appealed under any circumstance. The best explanation for this situation that I could ever get from them was, to paraphrase, “You know what you did.”

This was baffling to me, as I hadn’t used Paypal in years and I had no idea what I could have possibly done. I tried making a new account with a new email address. When I connected my financial details to this account, it was also automatically closed. I’ve tried to make a new account a few times since, but never with success. As far as I can tell, there is no way for me to ever have a Paypal account again.

And that wasn’t a problem for me until a few months ago when I tried to register for some optional sessions at an academic conference that my department nominated me to attend. In order to confirm my place, I needed to pay a deposit, and the organizers only provided Paypal (not cash or credit card) as a payment option.

And this sort of thing is not unique to my situation either. Paypal has a long, terrible and well-documented history of arbitrarily closing accounts (and appropriating any money involved). This is usually in connexion with Paypal’s bizarre and sometimes contradictory policies around charities, but this also affects people involved in sex work (reminder: being a sex worker is perfectly legal in Canada).

Everything worked out for me in my particular situation at this conference, but it took work. After several emails, I was eventually able to convince them to make an exception and allow me to pay by cash on arrival, but I still had to go through the process of explaining to them why I have no Paypal account, why I could try making a new one, but it wouldn’t work, and that I wasn’t just being a technophobe or difficult to work with on purpose. I was tempted to just opt out of the sessions because I didn’t want to go through the embarrassment of explaining my situation.

And my problem with Paypal was a “respectable” one—it’s just some weird mistake that I’ve never been able to resolve with Paypal. Now imagine trying to navigate a barrier to academic participation like that if you were a person whose Paypal account was closed because you got caught using it for sex work. Do you think you’d even try to explain that to a conference organizer? Or would you just sit those sessions out?

tl;dr—When you use services provided by tech companies, you may be putting up barriers to entry for others that you are unaware of.

Alternatives—This section was about money, and there aren’t that many good solutions. Accept cash. And when someone asks for special accommodation, don’t ask them to justify it.

Conclusion

Technology isn’t neutral. It’s built by people, who have their own biases, agendas and blind-spots. If we really value academic freedom, and we want to encourage diversity in academic thought, we need to be very critical about the technology that we adopt at the institutional level.

How to get any medical journal into your RSS reader even if they don’t provide an RSS feed

What is RSS?

For the non-initiate, RSS is a very useful protocol that is used all over the web. You can think of it as a way of separating a stream of content from the website where it’s normally viewed. Nearly every blog has an RSS feed, as do news sources, web comics, and even academic journals. Podcasts are like a specialised version of RSS for audio files only.

What makes RSS great is that I can take all the RSS links from all the news sites, blogs, webcomics and journals that I’m interested in and put them together into a single aggregator. In this way, I don’t have to be constantly checking all these websites to see if there’s new stuff posted.

But what about medical journals that don’t provide RSS?

Unfortunately, there are some medical journals that do not have an RSS feed. For example, the JNCI (the Journal of the National Cancer Institute) does not have one. (If I’m wrong, please put the link in the comments.) So if I want to know what’s been published recently in the JNCI, I have to visit their site, or look on their Twitter. This is annoying, since the whole point of RSS is to have all the content you want to consume (or as much of it as possible) in the same place.

Pubmed allows users to save any search as RSS feeds

Pubmed provides a wonderful and open, standards-compliant service, but almost no one seems to know about it! This is great for people who are actively researching a subject, and also for people who just want to keep up with a particular journal or subject area.

Some of you have probably figured out where I’m going with this by now, but if you haven’t, I’ll spell it out. Let’s continue with the example of JNCI.

How to put new articles from any journal into Feedly

This assumes you already have an account on Feedly, but you can do this with any RSS reader, of course.

  1. Visit Pubmed in your browser
  2. Click “Advanced” under the search field
  3. Under “Builder,” click “All fields” and choose “Journal”
  4. In the text field beside the box where you selected “Journal,” enter the name of the journal you’re interested in (it will autocomplete, if you have done this correctly, you should see something like “Journal of the National Cancer Institute”[Journal] in the uneditable text field at the top)
  5. Click “Search”
  6. Under the search field at the top of the page, click the “Create RSS” link
  7. Choose how far back you want your search to go (I chose 20)
  8. Click the “Create RSS” button
  9. Right-click the orange “XML” button and click “Copy link”
  10. Go to Feedly, and paste the link into the “Search” field at the top right
  11. There should be one result, click “Follow” and choose which collection you want to keep it in

You’re done! Now whenever Pubmed indexes a new entry for that journal, it will appear in your RSS reader!

You can also make RSS feeds for any search you want on Pubmed

Of course, you may not be interested in everything a journal has to say, so you can refine the search to only include “breast cancer” or you can drop the journal identity part of the search entirely. The world is your oyster!

How to get R to parse the <study_design> field from clinicaltrials.gov XML files

Clinicaltrials.gov helpfully provides a facility for downloading machine-readable XML files of its data. Here’s an example of a zipped file of 10 clinicaltrials.gov XML files.

Unfortunately, a big zipped folder of XML files is not that helpful. Even after parsing a whole bunch of trials into a single data frame in R, there are a few fields that are written in the least useful format ever. For example, the <study_design> field usually looks something like this:

Allocation: Non-Randomized, Endpoint Classification: Safety Study, Intervention Model: Single Group Assignment, Masking: Open Label, Primary Purpose: Treatment

So, I wrote a little R script to help us all out. Do a search on clinicaltrials.gov, then save the unzipped search result in a new directory called search_result/ in your ~/Downloads/ folder. The following script will parse through each XML file in that directory, putting each one in a new data frame called “trials”, then it will explode the <study_design> field into individual columns.

So for example, based on the example field above, it would create new columns called “Allocation”, “Endpoint_Classification”, “Intervention_Model”, “Masking”, and “Primary_Purpose”, populated with the corresponding data.

require ("XML")
require ("plyr")

# Change path as necessary
path = "~/Downloads/search_result/"

setwd(path)
xml_file_names <- dir(path, pattern = ".xml")

counter <- 1

# Makes data frame by looping through every XML file in the specified directory
for ( xml_file_name in xml_file_names ) {
  
  xmlfile <- xmlTreeParse(xml_file_name)
  
  xmltop <- xmlRoot(xmlfile)
  
  data <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
  
  if ( counter == 1 ) {
    
    trials <- data.frame(t(data), row.names = NULL)
    
  } else {
    
    newrow <- data.frame(t(data), row.names = NULL)
    trials <- rbind.fill (trials, newrow)
    
  }
  
  # This will be good for very large sets of XML files
  
  print (
    paste0(
      xml_file_name,
      " processed (",
      format(100 * counter / length(xml_file_names), digits = 2),
      "% complete)"
    )
  )
  
  counter <- counter + 1
  
}

# Data frame has been constructed. Comment out the following two loops
# (until the "un-cluttering" part) in the case that you are not interested
# in exploding the <study_design> column.

columns = vector();

for ( stu_des in trials$study_design ) {
  # splits by commas NOT in parentheses
  for (pair in strsplit( stu_des, ", *(?![^()]*\\))", perl=TRUE)) {
    newcol <- substr( pair, 0, regexpr(':', pair) - 1 )
    columns <- c(columns, newcol)
  }
}

for ( newcol in unique(columns) ) {
  
  # get rid of spaces and special characters
  newcol <- gsub('([[:punct:]])|\\s+','_', newcol)
  
  if (newcol != "") {
    
    # add the new column
    trials[,newcol] <- NA
    
    i <- 1
    
    for ( stu_des2 in trials$study_design ) {
      
      for (pairs in strsplit( stu_des2, ", *(?![^()]*\\))", perl=TRUE)) {
        
        for (pair in pairs) {
          
          if ( gsub('([[:punct:]])|\\s+','_', substr( pair, 0, regexpr(':', pair) - 1 )) == newcol ) {
            
            trials[i, ncol(trials)] <- substr( pair, regexpr(':', pair) + 2, 100000 )
            
          }
          
        }
        
      }
      
      i <- i+1
      
    }
    
  }
  
}

# Un-clutter the working environment

remove (i)
remove (counter)
remove (data)
remove (newcol)
remove (newrow)
remove (columns)
remove (pair)
remove (pairs)
remove (stu_des)
remove (stu_des2)
remove (xml_file_name)
remove (xml_file_names)
remove (xmlfile)
remove (xmltop)

# Get nice NCT id's

get_nct_id <- function ( row_id_info ) {
  
  return (unlist (row_id_info) ["nct_id"])
  
}

trials$nct_id <- lapply(trials$id_info, function(x) get_nct_id (x))

# Clean up enrolment field

trials$enrollment[trials$enrollment == "NULL"] <- NA

trials$enrollment <- as.numeric(trials$enrollment)

Useful references:

  • https://www.r-bloggers.com/r-and-the-web-for-beginners-part-ii-xml-in-r/
  • http://stackoverflow.com/questions/3402371/combine-two-data-frames-by-rows-rbind-when-they-have-different-sets-of-columns
  • http://stackoverflow.com/questions/21105360/regex-find-comma-not-inside-quotes

It’s Movember! Review your knowledge of Bayes’ theorem before getting your PSA test.

Background info

There are 3 million in the U.S. currently living with prostate cancer. There are approximately 320 million people in the US today, roughly half of whom will have prostates. Hence, let us take the prevalence of prostate cancer among those who have prostates to be approximately 3 in 160, or just under 2%.

The false positive (type I error) rate is reported at 33% for PSA velocity screening, or as high as 75%. The false negative (type II error) rate is reported as between 10-20%. For the purpose of this analysis, let’s give the PSA test the benefit of the doubt, and attribute to it the lowest type I and type II error rates, namely 33% and 10%.

Skill testing question

If some random person with a prostate from the United States, where the prevalence of prostate cancer is 2%, receives a positive PSA test result, where that test has a false positive rate of 33% and a false negative rate of 10%, what is the chance that this person actually has prostate cancer?

Bayes’ theorem

Recall Bayes’ theorem from your undergraduate Philosophy of Science class. Let us define the hypothesis we’re interested in testing and the evidence we are considering as follows:

P(h): The prior probability that this person has cancer
P(e|¬h): The false positive (type I error) rate
P(¬e|h): The false negative (type II error) rate

P(h) = 3/160
P(e|¬h) = 0.33
P(¬e|h) = 0.10

Given these definitions, the quantity we are interested in calculating is P(h|e), the probability that the person has prostate cancer, given that he returns a positive PSA test result. We can calculate this value using the following formulation of Bayes’ theorem:

P(h|e) = P(h) / [ P(h) + ( P(e|¬h) P(¬h) ) / ( P(e|h) ) ]

From the above probabilities and the laws of probability, we can derive the following missing quantities.

P(¬h) = 1 – 3/160
P(e|h) = 0.90

These can be inserted into the formula above. The answer to the skill-testing question is that there is a 4.95% chance that the randomly selected person in question will have prostate cancer, given a positive PSA test result.

What if we know more about the person in question?

Let’s imagine that the person is not selected at random. Say that this person is a man with a prostate and he is over 60 years old.

According to Zlotta et al, the prevalence of prostate cancer rises to over 40% in men over age 60. If we redo the above calculation with this base rate, P(h) = 0.40, we find that P(h|e) rises to 64.5%.

Take-home messages

  1. Humans are very bad at intuiting probabilities. See Wikipedia for recommended reading on the Base Rate Fallacy.
  2. Having a prostate is neither a necessary nor a sufficient condition for being a man. Just FYI.
  3. Don’t get tested for prostate cancer unless you’re in a higher-risk group, because the base rate of prostate cancer is so low in the general population that if you get a positive result, it’s likely to be a false positive.

An unexpected link between computer science and the ethics of consent in the acutely comatose

Yesterday, Dr Weijer from Western U came to the STREAM research group at McGill to give a talk on the ethics of fMRI studies on acutely comatose patients in the intensive care unit. One of the topics he briefly covered (not the main topic of his talk) was that of patients who may be “awake,” but generally unaware of their surroundings, while in an acutely comatose state of some kind. Using an fMRI, questions can be asked of some of these subjects, by telling them to imagine playing tennis for “yes,” and to imagine navigating their home for “no.” Since the areas of the brain for these two tasks are very different, these can be used to distinguish responses with some accuracy. In some rare cases, patients in this condition are able to consistently answer biographical questions, indicating that they are in some sense, conscious.

One of the questions that arises is: Could we use this method to involve a comatose patient in decision-making regarding her own care, in cases where we were able to establish this sort of communication?

Informed consent in medical ethics is usually conceived in terms of: disclosure, capacity and voluntariness, and the most obvious question to arise in the types of cases we’re considering is whether or not you could ever know with certainty that a comatose person has the capacity to make such decisions in such a state. (Indeed, a comatose patient is often the example given of someone who does not have the capacity to consent.) Dr Weijer was generally sceptical on that front.

Partway through his discussion, I had the impression that the problem was strangely familiar. If we abstract away some of the details of the situation in question, we are left with an experimenter who is sending natural language queries into a black box system, which replies with a digital (0/1) output, and then the experimenter has to make the best evaluation she can as to whether the black box contains a person, or if it is just an “automatic” response of some kind.

For those of you with some background in computer science, you will recognise this as the Turing Test. Over the 65 years since it was first suggested, for one reason or another, most people have abandoned the Turing Test as a way to address the question of artificial intelligence, although it still holds a certain popular sway, as claims of chatbots that can beat the Turing Test still make the news. While many would reject that it is even an important question whether a chatbot can make you believe it is a person, at least in the fMRI/coma patient version, no one can dispute whether there is something important at stake.

So I started learning Lojban .ui

This Friday past, I started learning Lojban. For the non-initiate, Lojban is a constructed language based on predicate logic that is syntactically unambiguous. I’d known about it for years, probably hearing about it first on CBC, maybe 10 years ago. It’s the sort of thing that shows up in Dinosaur Comics or in XKCD periodically. Up until this weekend, the existence of Lojban had mostly been one of those “cocktail party facts,” but then I finally took the plunge. After 1 weekend of working on it, I’m about 35% of the way through Lojban for Beginners, having downloaded it to my Kobo for reference during the car ride to Stratford.

It’s often billed as being an ideal language for fields like law, science or philosophy, due to its unambiguous and culturally neutral nature. So I set out to find out certain specialised terms from my field, bioethics, and it turns out that they mostly don’t exist yet. This, of course, offers some exciting opportunities for a grad student. :)

I’ve convinced a few people in Montréal to learn Lojban with me, and even found a Montrealer who speaks Lojban on a IRC channel. (Yes, IRC still exists!) We may “ckafi pinxe kansa,” as they say in Lojban, apparently.

If you too want to get in on the ground floor of Lojban Montréal, let me know!

Proof of prespecified endpoints in medical research with the bitcoin blockchain

NOTICE (2022-05-24)

This blog post was written in 2014, when I still naively hoped that the myriad problems with cryptocurrency might still be solved. I am now somewhat embarrassed to have written this in the first place, but will leave the post up for historical reasons. (Quite a number of medical journal articles link here now, for better or for worse.)

While the following methods are valid as far as they go, I absolutely DO NOT recommend actually using them to timestamp research protocols. In fact, I recommend that you never use a blockchain for anything, ever.

Introduction

The gerrymandering of endpoints or analytic strategies in medical research is a serious ethical issue. “Fishing expeditions” for statistically significant relationships among trial data or meta-analytic samples can confound proper inference by statistical multiplicity. This may undermine the validity of research findings, and even threaten a favourable balance of patient risk and benefit in certain clinical trials. “Changing the goalposts” for a clinical trial or a meta-analysis when a desired endpoint is not reached is another troubling example of a potential scientific fraud that is possible when endpoints are not specified in advance.

Pre-specifying endpoints

Choosing endpoints to be measured and analyses to be performed in advance of conducting a study is a hallmark of good research practice. However, if a protocol is published on an author’s own web site, it is trivial for an author to retroactively alter her own “pre-specified” goals to align with the objectives pursued in the final publication. Even a researcher who is acting in good faith may find it less than compelling to tell her readers that endpoints were pre-specified, with only her word as a guarantee.

Advising a researcher to publish her protocol in an independent venue such as a journal or a clinical trial registry in advance of conducting research does not solve this problem, and even creates some new ones. Publishing a methods paper is a lengthy and costly process with no guarantee of success—it may not be possible to find a journal interested in publishing your protocol.

Pre-specifying endpoints in a clinical trial registry may be feasible for clinical trials, but these registries are not open to meta-analytic projects. Further, clinical trial registry entries may be changed, and it is much more difficult (although still possible) to download previous versions of trial registries than it is to retrieve the current one. For example, there is still no way to automate downloading of XML-formatted historical trial data from www.clinicaltrials.gov in the same way that the current version of trial data can be automatically downloaded and processed. Burying clinical trial data in the “history” of a registry is not a difficult task.

Publishing analyses to be performed prior to executing the research itself potentially sets up a researcher to have her project “scooped” by a faster or better-funded rival research group who finds her question interesting.

Using the bitcoin blockchain to prove a document’s existence at a certain time

Bitcoin uses a distributed, permanent, timestamped, public ledger of all transactions (called a “blockchain”) to establish which addresses have been credited with how many bitcoins. The blockchain indirectly provides a method for establishing the existence of a document at particular time that can be independently verified by any interested party, without relying on a medical researcher’s moral character or the authority (or longevity) of a central registry. Even in the case that the NIH’s servers were destroyed by a natural disaster, if there were any full bitcoin nodes left running in the world, the method described below could be used to confirm that a paper’s analytic method was established at the time the authors claim.

Method

  1. Prepare a document containing the protocol, including explicitly pre-specified endpoints and all prospectively planned analyses. I recommend using a non-proprietary document format (e.g. an unformatted text file or a LaTeX source file).
  2. Calculate the document’s SHA256 digest and convert it to a bitcoin private key.
  3. Import this private key into a bitcoin wallet, and send an arbitrary amount of bitcoin to its corresponding public address. After the transaction is complete, I recommend emptying the bitcoin from that address to another address that only you control, as anyone given the document prepared in (1) will have the ability to generate the private key and spend the funds you just sent to it.

Result

The incorporation into the blockchain of the first transaction using the address generated from the SHA256 digest of the document provides an undeniably timestamped record that the research protocol prepared in (1) is at least as old as the transaction in question. Care must be taken not to accidentally modify the protocol after this point, since only an exact copy of the original protocol will generate an identical SHA256 digest. Even the alteration of a single character will make the document fail an authentication test.

To prove a document’s existence at a certain point in time, a researcher need only provide the document in question. Any computer would be able to calculate its SHA256 digest and convert to a private key with its corresponding public address. Anyone can search for transactions on the blockchain that involve this address, and check the date when the transaction happened, proving that the document must have existed at least as early as that date.

Discussion

This strategy would prevent a researcher from retroactively changing an endpoint or adding / excluding analyses after seeing the results of her study. It is simple, economical, trustless, non-proprietary, independently verifiable, and provides no opportunity for other researchers to steal the methods or goals of a project before its completion.

Unfortunately, this method would not prevent a malicious team of researchers from preparing multiple such documents in advance, in anticipation of a need to defraud the medical research establishment. To be clear, under a system as described above, retroactively changing endpoints would no longer be a question of simply deleting a paragraph in a Word document or in a trial registry. This level of dishonesty would require planning in advance (in some cases months or years), detailed anticipation of multiple contingencies, and in many cases, the cooperation of multiple members of a research team. At that point, it would probably be easier to just fake the numbers than it would be to have a folder full of blockchain-timestamped protocols with different endpoints, ready in case the endpoints need to be changed.

Further, keeping a folder of blockchain-timestamped protocols would be a very risky pursuit—all it would take is a single honest researcher in the lab to find those protocols, and she would have a permanent, undeniable and independently verifiable proof of the scientific fraud.

Conclusion

Fraud in scientific methods erodes confidence in the medical research establishment, which is essential to it performing its function—generating new scientific knowledge, and cases where pre-specified endpoints are retroactively changed casts doubt on the rest of medical research. A method by which anyone can verify the existence of a particular detailed protocol prior to research would lend support to the credibility of medical research, and be one less thing about which researchers have to say, “trust me.”

CBC’s “Dr C” and the problem of doctor-centred care

A CBC-hosted blog has been following the story of “Dr C.” CBC describes him as “a St. John’s physician training in internal medicine. He’s also a writer, and he’s documenting his life since being diagnosed with cancer.” His blog posts show up on the CBC Health twitter account periodically, and they pass through my newsreader on a fairly regular basis.

For the last few months, I felt uncomfortable every time I saw one of his blog posts go by, and I couldn’t put my finger on why that might be. I think today I can finally articulate my misgivings.

A doctor’s privilege

I feel like the underlying assumption for CBC’s intense coverage, and the voice that “Dr C” has in expressing his experience with cancer is that when it’s a doctor who is diagnosed with cancer, he will have some interesting insights on the matter. In fact that’s the whole premiss of the “Dr C” blog. This makes me uncomfortable because in the modern medical system, a doctor’s voice is always the most important.

Fortunately, this is less the case than it used to be, to be sure. It used to be that nurses were trained to stand up out of respect when a doctor entered a hospital room, for example. But even today in 2014, the opinion of a doctor is, on the last analysis, the only one that really matters in the healthcare system, and in a lot of ways, that shouldn’t be the case.

What is patient-centred care?

Before a conspiracy theorist mistakes what I’m writing about, I want to clarify that I’m not saying that an untrained quack should be given the same voice as a medical doctor on issues like vaccine safety, or the efficacy of “alternative” medical therapies. I’m not advocating for that at all. I’m fully on the side of medical science, and I have rather mainstream views on that matter, even though I work in the Medical Ethics Unit. (It turns out that the real evils of drug development and medical practice are rather mundane things, mostly done under the light of peer-reviewed scrutiny. Go figure.)

What I’m talking about is patient-centred healthcare, a concept that most medical professionals agree on, or at least pay lip-service to. It is a somewhat nebulous umbrella concept, and it is aspirational in nature—a healthcare worker can always try to be more patient-centred.

The idea itself is not controversial. Every healthcare worker would likely say that she wants to be patient-centred, and this includes things like catering her care toward the patient’s own idiosyncratic values, taking into account the patient’s strengths, and seeing the patient’s family as the unit of care, rather than maintaining the fiction that it is possible to treat a disease process in an individual without regard for the rest of the patient’s life.

So what’s the problem with Dr C’s blog?

I have nothing against “Dr C” from the CBC. I think it’s terrible that he (or anyone) has cancer, and I wish him the best in his treatment and recovery. I’m even glad that his blog has given him a place to work through his thoughts. I hope that he’s a more sympathetic physician as a result, and that his insights have helped other people to deal with their own cancer diagnoses.

That said, I feel like the way in which a doctor’s opinions are privileged in any discussion on healthcare is very troubling, and I can’t shake the feeling that this blog pushes it even one step further. It’s as if they’re saying that privileging a doctor’s voice when he’s the one treating the cancer isn’t enough. We also have to get a doctor to tell us what it’s like to be a patient as well, because then it will be something worth listening to.

How to automatically back up WordPress or ownCloud using cron jobs

Recently I set up WordPress for my research group in the Medical Ethics Unit. We will be blogging our journal clubs, posting links to our publications and upcoming events. In related news, my research group has been using DropBox to coordinate papers in progress, sharing of raw data, citations, and all manner of other information. This was working pretty well, but we have been bumping up against the upper limit of our capacity on DropBox for a while, so I installed ownCloud on the web host we got for the research group blog. I’m pretty happy with how nice it is to use and administer.

Of course one of our concerns is making sure that we don’t lose any data in the case of the failure of our web host. This is unlikely, but it does happen, and we don’t want to run into a situation where we try to log in to our cloud-based file storage / sharing service and find that months’ worth of research is gone forever.

For a few weeks, the following was more-or-less my workflow for making backups:

  1. Log in to phpMyAdmin
  2. Make a dump file of the WP database (choose database > Export > Save as file … )
  3. Make a dump file of the ownCloud database
  4. Save to computer and label with appropriate date
  5. Log in to web server using FTP
  6. Copy contents of WP’s /wp-content/ to a date-labelled folder on my computer
  7. Copy contents of ownCloud’s /data/ to a date-labelled folder on my computer

This worked pretty well, except that it was a pain for me to have to do this every day, and I know that if I ever forgot to do it, that would be when something terrible happened. Fortunately for me, my boss mentioned that he had an old but still serviceable iMac sitting in his office that he wanted to put to some good purpose.

I decided to make a fully automatic setup that would make backups of our remotely hosted data and save it locally without any input on my part, so I can just forget about it. I made it with cron jobs.

Server side cron jobs

First, I set up some cron jobs on the server side. The first one waits until midnight every day, then dumps all the MySQL databases into a gzipped file on my web host, then zips up the WordPress /wp-content/ and ownCloud /data/ folders and puts them in the backup folder as well. The second server-side cron job empties the backup folder every day at 23h00.

  • 0 0 * * * PREFIX=`date +%y-%m-%d`; mysqldump -u USERNAME -h HOSTNAME -pPASSWORD –all-databases | gzip > /path/to/backup/folder/${PREFIX}-DBNAME-db.sql.gz; zip -r /path/to/backup/folder/${PREFIX}-wordpress-files.zip /path/to/wordpress/wp-content/; zip -r /path/to/backup/folder/${PREFIX}-owncloud-files.zip /path/to/owncloud/data/;
  • 0 23 * * * rm -r /path/to/backup/folder/*

A few notes for someone trying to copy this set-up

  • Your web host might be in a different time zone, so you might need to keep that in mind when coordinating cron jobs on your web host with ones on a local machine.
  • My web host provided a cron job editor that automatically escapes special characters like %, but you might have to add back-slashes to make yours work if you’re manually editing with crontab -e.
  • You might want to put a .htaccess file in your backup directory with the following in it: “Options -Indexes” (remove the quotes of course). This stops other people from going to your backup directory in a browser and helping themselves to your files. You could also name your backup directory with a random hash of letters and numbers if you wanted to make it difficult for people to steal your backed-up data.

Local cron job

Then on the local machine, the old iMac, I set up the following cron job. It downloads the files and saves them to a folder on an external hard disc every day at 6h00.

  • 0 6 * * * PREFIX=`date +%y-%m-%d`; curl http://www.your-web-site.com/back-up/${PREFIX}-DBNAME-db.sql.gz > /Volumes/External HD/Back-ups/${PREFIX}-DBNAME-db.sql.gz; curl http://www.your-web-site.com/back-up/${PREFIX}-wordpress-files.zip > /Volumes/External HD/Back-ups/${PREFIX}-wordpress-files.zip; curl http://www.your-web-site.com/back-up/${PREFIX}-owncloud-files.zip > /Volumes/External HD/Back-ups/${PREFIX}-owncloud-files.zip;

If you were super-paranoid about losing data, you could install this on multiple local machines, or you change the timing so that the cron jobs run twice a day, or as often as you liked, really. As long as they’re always turned on, connected to the internet and they have access to the folder where the backups will go, they should work fine.

Stoop-n-scoop

This isn’t a super-secure way to back up your files, but then we’re more worried about losing data accidentally than having it stolen maliciously. I don’t think the world of medical ethics is cut-throat enough that our academic rivals would stoop to stealing our data in an effort to scoop our papers before we can publish them. That said, I’m not about to give away the exact URL where our backups are stored, either.

The practical upshot of all this is that now we have at least three copies of any file we’re working on. There’s one on the computer being used to edit the document, there’s one stored remotely on our web host, and there’s a copy of all our files backed up once a day on the old iMac at the Medical Ethics Unit.