The “Socratic Method” of publishing city data?

I was encouraged at the OpenGov Camp this past Sunday by an announcement from NYC DoITT  that the city will be using Socrata to provide online access to its data.  It’s a great platform.  It doesn’t ensure that the city will actually provide good data, or update it in a timely way, or expand its available data sets — but it’s a good step forward and hopefully a harbinger of better things to come.

The city is seeking feedback here.  They’ve indicated that an “end of summer launch” is planned for a NYC/Socrata rollout.  Here’s an example of what the site might look like.

OpenBaltimore opened my eyes

Earlier this year I had tweeted that a new municipal data portal — OpenBaltimore — blew away sites like NYC’s Datamine.

I was asked by Alex Howard (@digiphile) for my thoughts on OpenBaltimore and other, similar portals.  At the time I didn’t realize OpenBaltimore was using Socrata, but after I looked into it further, I came away impressed.  The platform is visually appealing, easy to search, and offers multiple ways of accessing/extracting data.

(I don’t want to endorse the Socrata product/service, but it seems to me to be a good choice for NYC.)

Useful features, and lots of them

One nice aspect of the platform is the ability to immediately preview the data, in your browser (no downloading needed just to see what it contains).  You can also view more details about each row in the file.  And you can visualize  the data in multiple ways — using an interactive map option built into the platform (if the dataset has a location component) or using one of 9 different chart options.

And if you want to download/export  a data set, they give you at least 8 formats for extracting/exporting, as well as an API for programmatic access.  NYC says that “all datasets will now be available as APIs” once they replace Datamine with NYC/Socrata.

Short links and “perma” links are available to each data set.  And there’s a “Discuss”  option where anyone can attach notes and commentary for each data set.  It’s user-generated metadata — you can immediately see, for example, if anyone else has commented about the data’s quality, or completeness, or how up-to-date it is.  I didn’t notice too many comments at the OpenBaltimore site, but there were some, and they were helpful (including responses from that city’s data team).

The  option includes a map, but didn’t seem to have real time geocoding.  So even if a list has street addresses, it can’t be mapped through Socrata on the fly.  Each list needs a “location column” which presumably means lat/lon.  (It’s easy to submit feature requests to the Socrata team, though, so hopefully we’ll be seeing this addition soon.)

All in all it’s really great.  Other cities use Socrata, including ChicagoSeattle, and even smaller municipalities such as Manor, Texas (pop. 6,500).

However, not a silver bullet

Even though OpenBaltimore’s portal has been online for just a few months, already there are criticisms (for example, data hasn’t been updated since February, some data sets have quality problems, etc).  Many people (including me) have leveled these same criticisms at NYC’s Datamine effort.  So simply having a better portal won’t solve these issues.

But at least a platform like Socrata will make it easier to deploy data sets, it’ll certainly make it easier for the public to access those data sets, and it’ll make it easier to suggest improvements to the substance and the process.

NYC’s Datamine was an improvement in some ways over earlier opendata efforts in New York.  Now that it’s been around for two years, I think it’s fair to say that Datamine is clunky at best.  For me, I can’t wait for it to be replaced by something better.  I’m looking forward to the NYC/Socrata roll out.

What do you think?


  1. Steven,

    Thank you for sharing your enthusiasm for the planned evolution of the NYC Datamine this summer. You are correct that the Open Data platform isn’t a silver bullet that instantly heals all the woes associated with public data access or dissemination. I’m the founder & CEO of Socrata and we have worked with dozens of governments on this important topic. The evolution that we see, and one that I hope gives you and the residents of NYC and Baltimore and every other jurisdiction hope, is that organizations go through an Open Data adoption lifecycle.

    They start by describing public data on crufty HTML pages with links to data in largely inaccessible download formats. Then they prop up static catalogs, with better descriptions and more consistently described data, with easier to find links to data in slightly more common download formats. Then they evolve to more interactive, dynamically updated catalogs with data that can be explored interactively and even visualized and shared. Data is offered in a multitude of download formats and available via application programming interfaces (API) empowering civic developers to build applications on top of public data. That’s where the reference Open Data site, Data.Gov, is in the Open Data adoption lifecycle. See, which recently moved on to the Socrata Open Data Platform.

    But that’s where the Open Data adoption lifecycle curve kicks into high gear and the publishing organization starts to see engagement around their data. That encourages them. And so they put more data online. Then they recognize that those APIs that give programmers access to consuming their data can also be used to keep the data up to date with the underlying business systems that capture or generate the data in the first place.

    The final step in the Open Data adoption lifecycle is the recognition that they should instrument the underlying business systems to make data available as a byproduct of doing business. There’s no longer an export from the business system and an import into the data site. It’s automatic. The business system generates and collects data and stores it in a business-process optimized format and automation copies and transforms the data into a consumption-optimized format on the public facing data site.

    So there’s hope. Patience is required. But the first step is an important one and sooner or later we’ll get to where access to data is pervasive.

    Kevin Merritt
    Founder & CEO

    • Thanks Kevin. Great overview of the cycle. I’ve been involved in open data efforts with municipal, state, and federal government agencies for more than 20 years, and I’ve filed more FOIA / FOIL requests than I can count (back when electronic data wasn’t even a possibility, to now when it is), so I’m quite familiar with the need to be patient. And that this is a process, and that agencies / elected officials will eventually realize the value in making more data more widely & easily accessible. Hopefully Socrata’s platform will help.

      But I also understand the need to hold government’s feet to the fire, as it were. So I (and many others, I hope) will always keep the pressure on, in whatever ways we can, to ensure that the holdouts realize they’re on the losing side. Because even though a platform like Socrata makes it easier (and maybe even automatic as you say) to open up access to data, that still won’t be enough to persuade some public officials (elected or otherwise) that open data is a good idea.

      Also, one thing I should’ve noted in my post is that even though the use of a platform like Socrata is very helpful in terms of making it easy to publish data-as-APIs and in easily accessible export formats, I’m still concerned about an over-reliance on apps as the driving force behind making more data available (see my post here). It’s not just about apps. There’s still a great deal of value in enabling a Community Board member, or local reporter, or concerned citizen who wants to write a letter to their Councilmember, to simply be able to download a spreadsheet that they can use to make their case about a local issue. They don’t necessarily need an app. But having the data easily accessible is essential.

      Looking forward to continuing the conversation.

  2. I agree on all your follow-on comments. Getting governments to share their data requires a multi-pronged strategy that includes technology enablement, civic demand and policy direction/legislation.

    We see three core data consuming user groups, each of which has different accessibility requirements. Programmers want API access. Scientists, journalist, analysts and researchers need bulk access to the data in machine-readable, downloadable formats. Interested citizens need an easy-to-navigate online exploration experience where they can quickly get answers to their questions, such as “how did my city councilmember vote on x?” or “how much did the city spend on garbage collection in my neighborhood?” The Socrata Open Data Platform meets all of their needs, plus the less-often considered needs of the data publisher. Easy to use data and metadata maintenance capabilities, workflow, data hygiene, analytics and automation are essential.

