App Dev – The Grey Literature

The newest version of Mac OS, called “Mountain Lion,” includes “Dictation,” which is a piece of system software that takes speech and converts it to text. This is nothing new, of course. I remember that I had a piece of dictation software for my old Windows 98 PC. You had to “train” the software to understand what you said, and even then it was wildly inaccurate, but in principle, this sort of software has existed for a long time. Dictation on Mac OS is much better than the one I had back in 1998, but of course it is not perfect.

That particular piece of software I had on my PC was not built in to the operating system. I had to pay for it. Not only that, but because it didn’t work very well, I never got another dictation programme again. But now that this one is built into the OS, I think I’m going to try an experiment.

Here’s my inspiration: In Star Trek, every character keeps a “log,” and because it’s the future, it’s an audio log. In The Next Generation, they were often shown as video (b)logs. Sometimes, in order to advance the plot, a character would be shown searching through his own (or another person’s) logs. What was interesting was that the search would usually be a semantic keyword search. Something like, “Computer, show me all log entries relating to the warp core” (or whatever they were interested in at the time). With dictation software now a standard feature in OS X, we’re at a point where we could write an app that does exactly what the computer did in Star Trek.

The workflow will be as follows: Take a video (or a set of videos) that you’re interested in, and extract the audio. Divide the one big audio file into hundreds of smaller (say, ten-second-long), overlapping audio files that are annotated with their start time in the original video. For each of these smaller files, pass them through the dictation software and generate a text file that includes the text that has been generated by the system’s text-to-speech dictation software. And voilà, you have generated a time-encoded text index for your video—just like the one on YouTube, but you wouldn’t have to upload the file.

Wrap this all up in a shiny OS X app wrapping and put it on the App Store. Sell it for $0.99.

Then, if you had a bunch of videos—say, seasons 5–6 of Doctor Who, and you wanted to find all references to “the Silence,” you could install the app, have it index your iTunes library, and then do a search through your videos for certain keywords or phrases.

Actually, this might work. If anyone wants to collaborate with me on this one, hit me up in the comments.

Edit: I take it back. A quick experiment with Dictation indicates that we are nowhere near having the technology to be able to do this.

Logo for Montréal Métro Exits

On and off for the last little bit, I’ve been working on a little bit of a side-project: Something for when I don’t want to think about research ethics anymore. I was inspired to do this by something I heard on CBC a while back. A guy in London, UK made an iPhone app that would tell you which car to exit so that you would be closest to the exit on the subway.

I thought that this was a great idea. I would certainly use an application like that! Turns out someone already did it for Montréal, but they did a crappy job of it. The data set is incomplete, and the interface leaves much to be desired. Also, this other app tells you nothing about which car to board in order to transfer. In fact, the other app told you only which métro car to exit in order to be near the exit, not which métro car to enter, which seemed to undermine the point of the app. You need to know which car to board before you get on the train. (You can’t just infer one from the other, though, since in some cases the train approaches from the right side of the platform and in some cases it approaches from the left.)

I decided to write an app that would be really simple from the user’s perspective—just choose two stations, and the app tells you which car to get into at your departure station, and then which car to get into at your transfer station(s) (if applicable). I thought it would be a good exercise, just as practice for some other ideas for iPhone apps that I’ve had.

So, a couple weeks ago, I donned my lab coat, grabbed a clip board and went to every métro station in Montréal and wrote down where all the exits were. I also collected information regarding transfers. Writing the app wasn’t so hard, although submitting it to the iTunes store was a bit of a headache. That said, it was approved on my first try, and it took less than a week. (Thanks, Apple!)

It was getting Apple to process my tax forms that was the longest part of the development process.

The app was approved on Friday the 18th, and Apple processed my Canadian tax info last Tuesday. I had to fill out some US tax forms (just indicating that I wasn’t a US citizen) and then today they finally started selling my app on the iTunes store.

Tell your friends! Seriously. Every month I get roughly 300 visits to my blog from people in the Montréal area. If I could get a few of you guys to post this to your Facebook, I’d be raking it in. :)

Now that I’ve sort of figured out how to write and submit an app for the iPhone, I’ve got my sights set on bigger cities where this sort of app hasn’t been written before. (Yes, there are still some. Not many!) Also, I have a few ideas for other, better iPhone apps that I think could be a lot of fun. I’m not about to start posting my ideas on the internet though: That’s a great way to have someone else make my app before I do. :P

Tag: App Dev

Semantic video indexing app

Montréal Métro iPhone app