PDFs as open data in NYC?

I had a recent Twitter exchange with @NYCAging, the twitter account for New York City’s Department for the Aging. I think it points up how inconsistent the efforts are across city agencies to provide access to their data.

In this case, DFTA was touting the “staggering” amount of data they produced recently.  But the document the tweet referenced was in PDF format.

My reply was that PDF is ok, but they should also make the raw data available so others (whether it’s the Council of Senior Centers and Services or Foursquare or the NY Times) can use the data for analysis, etc.

They wrote back to ask “Could you use an Excel version if we could get our hands on one?”  To me, this reply is evidence that without any clear city policies in effect requiring agencies to provide data in consistent, coordinated ways, agencies are left to their own devices to publish data (or not) as they see fit.

Actually, even though Mayor Bloomberg has made it clear that the city’s DataMine is where data should be published, apparently DFTA isn’t participating in DataMine (even though they have a “staggering” amount of data to share).

The other thing that struck me about their reply is that the PDF was obviously created from an Excel spreadsheet to begin with (you can even see the lettered columns across the top and numbered rows along the left side). So they can certainly “get their hands on” the original spreadsheet. It’d be just as easy to publish the spreadsheet as the PDF. (And some people would consider it an embarrassment to publish a PDF given how difficult it is to extract information from that format.)

When we get to a point where posting data like this online is a matter of city policy, and agencies don’t think they have to crow about posting a PDF document online – it’ll just be an expected thing they do – then that will be an indication that the city is actually embracing the digital era. We’re not quite there yet.

