The nuclear option for blocking Facebook and Google

I took the list of domains from the following pages:

  • https://qz.com/1234502/how-to-block-facebook-all-the-urls-you-need-to-block-to-actually-stop-using-facebook/
  • https://superuser.com/questions/1135339/cant-block-connections-to-google-via-hosts-file

And then I edited my computer’s /etc/hosts file to include the following lines.

This blocks my computer from contacting Google and Facebook, and now a lot of sites load way faster. It still allows Youtube, but you can un-comment those lines out too, if you like.

Put it on your computer too!

(To view the big block of text that you need to copy into your /etc/hosts, click the following button. It’s hidden by default because it’s BIG.)

How to get R to parse the <study_design> field from clinicaltrials.gov XML files

Clinicaltrials.gov helpfully provides a facility for downloading machine-readable XML files of its data. Here’s an example of a zipped file of 10 clinicaltrials.gov XML files.

Unfortunately, a big zipped folder of XML files is not that helpful. Even after parsing a whole bunch of trials into a single data frame in R, there are a few fields that are written in the least useful format ever. For example, the <study_design> field usually looks something like this:

Allocation: Non-Randomized, Endpoint Classification: Safety Study, Intervention Model: Single Group Assignment, Masking: Open Label, Primary Purpose: Treatment

So, I wrote a little R script to help us all out. Do a search on clinicaltrials.gov, then save the unzipped search result in a new directory called search_result/ in your ~/Downloads/ folder. The following script will parse through each XML file in that directory, putting each one in a new data frame called “trials”, then it will explode the <study_design> field into individual columns.

So for example, based on the example field above, it would create new columns called “Allocation”, “Endpoint_Classification”, “Intervention_Model”, “Masking”, and “Primary_Purpose”, populated with the corresponding data.

require ("XML")
require ("plyr")

# Change path as necessary
path = "~/Downloads/search_result/"

setwd(path)
xml_file_names <- dir(path, pattern = ".xml")

counter <- 1

# Makes data frame by looping through every XML file in the specified directory
for ( xml_file_name in xml_file_names ) {
  
  xmlfile <- xmlTreeParse(xml_file_name)
  
  xmltop <- xmlRoot(xmlfile)
  
  data <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
  
  if ( counter == 1 ) {
    
    trials <- data.frame(t(data), row.names = NULL)
    
  } else {
    
    newrow <- data.frame(t(data), row.names = NULL)
    trials <- rbind.fill (trials, newrow)
    
  }
  
  # This will be good for very large sets of XML files
  
  print (
    paste0(
      xml_file_name,
      " processed (",
      format(100 * counter / length(xml_file_names), digits = 2),
      "% complete)"
    )
  )
  
  counter <- counter + 1
  
}

# Data frame has been constructed. Comment out the following two loops
# (until the "un-cluttering" part) in the case that you are not interested
# in exploding the <study_design> column.

columns = vector();

for ( stu_des in trials$study_design ) {
  # splits by commas NOT in parentheses
  for (pair in strsplit( stu_des, ", *(?![^()]*\\))", perl=TRUE)) {
    newcol <- substr( pair, 0, regexpr(':', pair) - 1 )
    columns <- c(columns, newcol)
  }
}

for ( newcol in unique(columns) ) {
  
  # get rid of spaces and special characters
  newcol <- gsub('([[:punct:]])|\\s+','_', newcol)
  
  if (newcol != "") {
    
    # add the new column
    trials[,newcol] <- NA
    
    i <- 1
    
    for ( stu_des2 in trials$study_design ) {
      
      for (pairs in strsplit( stu_des2, ", *(?![^()]*\\))", perl=TRUE)) {
        
        for (pair in pairs) {
          
          if ( gsub('([[:punct:]])|\\s+','_', substr( pair, 0, regexpr(':', pair) - 1 )) == newcol ) {
            
            trials[i, ncol(trials)] <- substr( pair, regexpr(':', pair) + 2, 100000 )
            
          }
          
        }
        
      }
      
      i <- i+1
      
    }
    
  }
  
}

# Un-clutter the working environment

remove (i)
remove (counter)
remove (data)
remove (newcol)
remove (newrow)
remove (columns)
remove (pair)
remove (pairs)
remove (stu_des)
remove (stu_des2)
remove (xml_file_name)
remove (xml_file_names)
remove (xmlfile)
remove (xmltop)

# Get nice NCT id's

get_nct_id <- function ( row_id_info ) {
  
  return (unlist (row_id_info) ["nct_id"])
  
}

trials$nct_id <- lapply(trials$id_info, function(x) get_nct_id (x))

# Clean up enrolment field

trials$enrollment[trials$enrollment == "NULL"] <- NA

trials$enrollment <- as.numeric(trials$enrollment)

Useful references:

  • https://www.r-bloggers.com/r-and-the-web-for-beginners-part-ii-xml-in-r/
  • http://stackoverflow.com/questions/3402371/combine-two-data-frames-by-rows-rbind-when-they-have-different-sets-of-columns
  • http://stackoverflow.com/questions/21105360/regex-find-comma-not-inside-quotes

The answer to the question

On October 9, inspired by the STREAM research group’s Forecasting Project, I posed a question to the Internet: “Do you know how the election is going to turn out?” I tweeted it at news anchors, MP’s, celebrities, academics, friends and family alike.

I’m very happy with the response! I got 87 predictions, and only 11 of them were what I would consider “spam.” I took those responses and analysed them to see if there were any variables that predicted better success in forecasting the result of the election.

The take-home message is: No. Nobody saw it coming. The polls had the general proportion of the vote pretty much correct, but since polls do not reflect the distribution of voters in individual ridings, the final seat count was very surprising. This may even suggest that the Liberals got the impetus for a majority result from the fact that everyone expected they would only narrowly eke out a victory over the incumbent Tories.

You can view the final report in web format or download it as a PDF.

Can you predict the outcome of Canada’s 42nd federal election?

The STREAM (Studies of Translation, Ethics and Medicine) research group at McGill University, of which I’m a part, has been working on a project for the last year or so in which we elicit forecasts of clinical trial results from experts in their field. We want to see how well-calibrated clinical trialists are, and to see which members of a team are better or worse at predicting trial outcomes like patient accrual, safety events and efficacy measures.

Inspired by this, I borrowed some of the code we have been using to get forecasts from clinical trial investigators, and have applied it to the case of Canada’s 42nd federal election, and now I’m asking for you to do your best to predict how many seats each party will get, and who will win in your riding.

Let’s see how well we, as a group, can predict the outcome, and see if there are regional or demographic predictors for who is better or worse at predicting election results. The more people who make predictions, the better the data set I’ll have at the end, so please submit a forecast, and ask your friends!

The link for the forecasting tool is here: http://www.bgcarlisle.com/elxn42/

Just to make it interesting: I will personally buy a beer for the forecaster who gives me the best prediction out of them all.* :)

* If you are younger than 18 years of age, you get a fancy coffee, not a beer. No purchase necessary, only one forecast per person. Forecaster must provide email with the prediction in order for me to contact him/her. In the case of a tie, one lucky beer-receiver will be chosen randomly. Having the beer together with me is conditional on the convenience of both parties (e.g. if you live in Vancouver or something, I’ll just figure out a way to buy you a beer remotely, since I’m in Montreal). You may consult any materials, sources, polls or whatever. This is a test of your prediction ability, not memory, after all. Prediction must be submitted by midnight on October 18, 2015.

Short story prompt for Lojban enthusiasts: la cizra mensi

Short story prompt: la cizra mensi

The hero of your short story has found a way to summon the Weird Sisters of Macbeth fame to inquire after the future. Worried that the witches will try to trick your hero by giving a prophesy that can be favourably and plausibly read one way, but that also has an alternate, surprising and terrible interpretation that is consistent with the words of the prophesy, your hero finds a way to force the witches to speak in Lojban.

Unfortunately for the hero of your story, a witch’s prophesy can backfire in unexpected ways that still respect the letter of the prophesy itself, even if it’s delivered in a language that’s syntactically unambiguous.

Macbeth 1.3

In the spirit of this short story prompt, I have rendered the first part of Macbeth, act 1 scene 3 into Lojban for your enjoyment. Corrections and suggestions welcome. :)

termafyfe’i 1: [1] .i doi lo mensi do pu zvati ma

termafyfe’i 2 .i lo jai bu’u lo nu catra lo xarju

termafyfe’i 3 .i doi lo mensi do zvati ma

termafyfe’i 1 .i lo fetspe be lo blopre pu cpana be lo galtupcra ku ralte lo narge

[5] gi’e omnomo gi’e omnomo gi’e omnomo .i lu ko dunda fi mi li’u se cusku mi .i lu ko cliva doi lo termafyfe’i li’u lo zargu citka cagna cu se krixa .i lo nakspe be lo se go’i pu klama la .alepos. gi’e bloja’a la .tirxu. .i ku’i ne’i lo julne mi lo te go’i fankla

[10] .ije mi simsa be lo ratcu poi claxu lo rebla ku co’e gi’e co’e gi’e co’e

termafyfe’i 2: .i mi dunda do pa lo brife

termafyfe’i 1 .i do xendo

termafyfe’i 3 .i mi co’e pa lo drata

termafyfe’i 1: [15] .i mi ralte ro da poi drata .i je’a lo blotcana cu bifca’e ro da poi farna be fi lo makfartci pe lo blopre ku’o zi’e poi se djuno .i mi ba simsa be lo sudysrasu bei lo ka sudga ku rincygau

[20] .i lo nu sipna ku ba canai lo donri ku .a lo nicte ku dandu za’e lo galtu dinju canko gacri .i zo’e ba dapma renvi .i ba ca lo tatpi jeftu be li so pi’i so cu jdika lo ka stali .e lo ka pacna .e lo ka gleki

[25] .i zu’u lo bloti to’e pu’i se daspo .i zu’unai lo go’i vilti’a se renro .i ko viska lo se ralte be mi

termafyfe’i 2: .i ko jarco fi mi .i ko jarco fi mi

termafyfe’i 1 .i mi nau ralte lo tamji be fi lo blosazri

[30] poi ca lo nu zdani klama ku bloti janli morsi

[.i ne’i damri]

termafyfe’i 3: .i damri .i damri .ua .i la .makbet. je’a tolcliva

ro da poi termafyfe’i: .i lo cizra mensi noi xance jgari simxu zi’e noi klama be fo lo xamsi .e lo tumla be’o sutra

[35] cu klama fi’o tadji tu’a di’e .i ciroi klama lo tu’a do .i ciroi klama lo tu’a mi .i ciroi ji’a klama .iki’ubo krefu fi li so .i ko smaji .i lo makfa cu bredi

[.i nerkla fa la .makbet. .e la bankos.]

Switching to left-handed Dvorak

I’m doing an experiment. A lot of my thesis work consists of me clicking between form elements, spreadsheet cells, or parts of text documents, entering short bits of text and then clicking away to another thing.

I’ve been trying to bring my efficiency up, but running into a wall. The rate-limiting step in my workflow is not my typing speed or how quickly I find information, but rather how fast I can switch from mouse to keyboard.

A few years back, I switched from QWERTY to Dvorak, which was distressing at the time, but turned out to have been an excellent life choice. (Highly recommended!) I’m going to try left-handed Dvorak out for a bit and see how it takes. :)

Dr Susan’s counselling service for para-magical, epi-paranormal and time-travel adjacent children and young adults

Dr Susan's Counselling Service
Dr Susan’s Counselling Service

For this year’s upcoming NaNoWriMo, I think I have settled on an idea and a title.

A recurring trope in sci-fi and fantasy is the minor character who significantly helps the main character to accomplish a fantastical and difficult-to-believe goal (e.g. returning to her own non-dystopian timeline, saving a magical kingdom, etc.), and does so often at great cost to herself, without any hope of participating in that victory, and with little or no proof that anything of importance happened at all. I want to write a series of short stories about this sort of character, and a therapist whose job it is to help them pick up the pieces of their shattered lives, after they discover that they’re living in a dystopian timeline bad enough that a time-traveller needed their help to go back in time to prevent it, leaving them hopelessly behind.

The title will be Dr Susan’s counselling service for para-magical, epi-paranormal and time-travel adjacent children and young adults.

Proof of prespecified endpoints in medical research with the bitcoin blockchain

NOTICE (2022-05-24)

This blog post was written in 2014, when I still naively hoped that the myriad problems with cryptocurrency might still be solved. I am now somewhat embarrassed to have written this in the first place, but will leave the post up for historical reasons. (Quite a number of medical journal articles link here now, for better or for worse.)

While the following methods are valid as far as they go, I absolutely DO NOT recommend actually using them to timestamp research protocols. In fact, I recommend that you never use a blockchain for anything, ever.

Introduction

The gerrymandering of endpoints or analytic strategies in medical research is a serious ethical issue. “Fishing expeditions” for statistically significant relationships among trial data or meta-analytic samples can confound proper inference by statistical multiplicity. This may undermine the validity of research findings, and even threaten a favourable balance of patient risk and benefit in certain clinical trials. “Changing the goalposts” for a clinical trial or a meta-analysis when a desired endpoint is not reached is another troubling example of a potential scientific fraud that is possible when endpoints are not specified in advance.

Pre-specifying endpoints

Choosing endpoints to be measured and analyses to be performed in advance of conducting a study is a hallmark of good research practice. However, if a protocol is published on an author’s own web site, it is trivial for an author to retroactively alter her own “pre-specified” goals to align with the objectives pursued in the final publication. Even a researcher who is acting in good faith may find it less than compelling to tell her readers that endpoints were pre-specified, with only her word as a guarantee.

Advising a researcher to publish her protocol in an independent venue such as a journal or a clinical trial registry in advance of conducting research does not solve this problem, and even creates some new ones. Publishing a methods paper is a lengthy and costly process with no guarantee of success—it may not be possible to find a journal interested in publishing your protocol.

Pre-specifying endpoints in a clinical trial registry may be feasible for clinical trials, but these registries are not open to meta-analytic projects. Further, clinical trial registry entries may be changed, and it is much more difficult (although still possible) to download previous versions of trial registries than it is to retrieve the current one. For example, there is still no way to automate downloading of XML-formatted historical trial data from www.clinicaltrials.gov in the same way that the current version of trial data can be automatically downloaded and processed. Burying clinical trial data in the “history” of a registry is not a difficult task.

Publishing analyses to be performed prior to executing the research itself potentially sets up a researcher to have her project “scooped” by a faster or better-funded rival research group who finds her question interesting.

Using the bitcoin blockchain to prove a document’s existence at a certain time

Bitcoin uses a distributed, permanent, timestamped, public ledger of all transactions (called a “blockchain”) to establish which addresses have been credited with how many bitcoins. The blockchain indirectly provides a method for establishing the existence of a document at particular time that can be independently verified by any interested party, without relying on a medical researcher’s moral character or the authority (or longevity) of a central registry. Even in the case that the NIH’s servers were destroyed by a natural disaster, if there were any full bitcoin nodes left running in the world, the method described below could be used to confirm that a paper’s analytic method was established at the time the authors claim.

Method

  1. Prepare a document containing the protocol, including explicitly pre-specified endpoints and all prospectively planned analyses. I recommend using a non-proprietary document format (e.g. an unformatted text file or a LaTeX source file).
  2. Calculate the document’s SHA256 digest and convert it to a bitcoin private key.
  3. Import this private key into a bitcoin wallet, and send an arbitrary amount of bitcoin to its corresponding public address. After the transaction is complete, I recommend emptying the bitcoin from that address to another address that only you control, as anyone given the document prepared in (1) will have the ability to generate the private key and spend the funds you just sent to it.

Result

The incorporation into the blockchain of the first transaction using the address generated from the SHA256 digest of the document provides an undeniably timestamped record that the research protocol prepared in (1) is at least as old as the transaction in question. Care must be taken not to accidentally modify the protocol after this point, since only an exact copy of the original protocol will generate an identical SHA256 digest. Even the alteration of a single character will make the document fail an authentication test.

To prove a document’s existence at a certain point in time, a researcher need only provide the document in question. Any computer would be able to calculate its SHA256 digest and convert to a private key with its corresponding public address. Anyone can search for transactions on the blockchain that involve this address, and check the date when the transaction happened, proving that the document must have existed at least as early as that date.

Discussion

This strategy would prevent a researcher from retroactively changing an endpoint or adding / excluding analyses after seeing the results of her study. It is simple, economical, trustless, non-proprietary, independently verifiable, and provides no opportunity for other researchers to steal the methods or goals of a project before its completion.

Unfortunately, this method would not prevent a malicious team of researchers from preparing multiple such documents in advance, in anticipation of a need to defraud the medical research establishment. To be clear, under a system as described above, retroactively changing endpoints would no longer be a question of simply deleting a paragraph in a Word document or in a trial registry. This level of dishonesty would require planning in advance (in some cases months or years), detailed anticipation of multiple contingencies, and in many cases, the cooperation of multiple members of a research team. At that point, it would probably be easier to just fake the numbers than it would be to have a folder full of blockchain-timestamped protocols with different endpoints, ready in case the endpoints need to be changed.

Further, keeping a folder of blockchain-timestamped protocols would be a very risky pursuit—all it would take is a single honest researcher in the lab to find those protocols, and she would have a permanent, undeniable and independently verifiable proof of the scientific fraud.

Conclusion

Fraud in scientific methods erodes confidence in the medical research establishment, which is essential to it performing its function—generating new scientific knowledge, and cases where pre-specified endpoints are retroactively changed casts doubt on the rest of medical research. A method by which anyone can verify the existence of a particular detailed protocol prior to research would lend support to the credibility of medical research, and be one less thing about which researchers have to say, “trust me.”

If I were the philosopher-king of the world

  • All measurements would be metric, even in America. This would include recipes. (“T” is different from “t”? Why?)
  • The 12-hour clock would be abolished in favour of a 24-hour clock.
  • Time zones would be abolished. Every clock would be set to GMT, Beijing time, I don’t care, just make it consistent.
  • Dates would be written in compliance with ISO 8601.
  • Daylight savings time would be abolished.
  • The calendar would be reformed to something that makes sense.
  • Academic references and style guidelines would be standardised across all disciplines and publications once and for all. Academic citations would be hyperlinked in all electronic versions.
  • Usernames and passwords would be gone forever.
  • AAA batteries would be illegal
  • DRM would be illegal, and for that matter, copyright / patent law would be a very different thing.
    • Maybe copyright on a creative work would last 28 years only, with no extensions? (Ahem, Disney.)
  • There would be a universal maximum wage, indexed to a universal minimum wage, and a guaranteed basic income.
  • The laws regarding royal succession for Canada would be changed such that the queen of Canada is chosen by random lottery among Canadian citizens.

I have opinions.

How to play “Dave’s Famous Telephone Charades”

Dave’s Famous Telephone Charades is a party game that requires a minimum of 4 “participants,” as well as a certain critical mass of reasonably creative “audience members,” probably no less than 4. It was a perennial favourite of my circle of friends when I was an undergrad at Western.

Here’s how it works:

  1. Players 1–4 go to a separate room where they can’t see or hear the audience members talking.
  2. The audience members choose a scene to be acted out silently by the players.
    • The instructions for the scene to be acted out should be simple—aim for 1 sentence.
    • The scene should lend itself easily to physical movement and interpretation.
    • The scene must be something that can be acted out silently.
    • Examples include: “washing the dishes,” “an otter in its natural habitat,” “a day in the life of a …”
  3. Player 1 comes back to the room with the audience, where he is told the scene to be acted out. He is given 10 seconds to think about what exactly he will do.
  4. Player 2 comes into the room and watches player 1 silently act out the scene given to him. The scene should be about 30 seconds long, tops. To be clear: no one tells players 2–4 what the scene is until after the game is finished.
  5. Player 3 enters and player 2 acts out the scene from memory, not knowing the instructions that were given to player 1.
  6. Player 4 enters and player 3 acts out the scene from memory.
  7. Player 4 acts out the scene as best he can from memory, narrating what it is she thinks she is acting out.
  8. Player 3 corrects player 4.
  9. Player 2 corrects player 3.
  10. Player 1 reveals the instructions she was given in the first place.

This sort of game only works with certain kinds of people in the right sort of mood, but when you have the right combination of people with the right sort of energy all together in the same place, it can be hilarious.