Advertisements
  • SR_spatial tweets

    • RT @MikeLydon: All politics aside, we should celebrate Alabama’s vote for human decency tonight. However, we should also loathe that it was… 8 hours ago
    • RT @DrPhilGoff: I get that this is a big election. But I really won’t have a high tolerance for folks celebrating “the sex offending slaver… 9 hours ago
    • RT @BillKristol: I’m sitting at Midway airport and—you won’t believe this!—I don’t see anyone else on his computer feverishly trying to fig… 9 hours ago

NYC’s open data legislation: reading between the lines

TL; DR (i.e., the summary)

NYC is about to adopt what some are calling “landmark” and “historic” legislation regarding open data.  Does the hype match the reality?

I offer the analysis below not as a critique of the City Council.  I think they probably tried to negotiate as good a bill as they thought they could achieve.  I offer it more as food for thought for those of us who will be seeking the data that may eventually become available because of the legislation (and for those of us who rely on data that’s currently available that may become less so due to the bill).

Hopefully my concerns represent a worst case scenario.  If the bill’s implementation indeed lives up to the “landmark” status bestowed on its passage, that would be a great thing.

For example, the Council’s committee report on the bill [Word doc] suggested that substantial city data sets such as the Building Information System (BIS) or the Automated City Register Information System (ACRIS) would be made available in open, accessible formats due to the legislation. If that happens, that would be great.  But for each of the handful of examples like that suggested at yesterday’s Council committee meeting, I could offer several more that I believe might escape the requirements of this bill.

My overall sense is that somewhere during the two-plus years the bill has been on the table, the details got in the way of the original vision embodied in this proposal.  And, as they say, the devil is in the details.  If you’re interested in my take on those gory details, please read on.


An important step

The bill is important, in a way. It’s an acknowledgment by the City Council (and the Mayor, if he signs it) that city agencies need to provide public access to data sets online, in a standardized electronic format.

In doing so, it goes a step beyond FOIL — the New York State law since the mid-1970s that has required agencies (including local government) to provide public access to data.  Though FOIL has adapted to the times to some extent — the courts and policymakers now understand that FOIL applies to electronic data as well as printed material — it is still a reactive approach.  You have to submit a FOIL request (and have a good idea of what data you’re requesting) for an agency to respond and give you access.  New York’s Committee on Open Government describes it as “pull” vs. “push”. [PDF]

Some smart agencies have realized that posting data electronically saves money, time, and effort. By posting data online proactively, before the agency even receives a single FOIL letter  (“pushing” it so people don’t have to “pull” it), it avoids having to respond individually to FOIL requests.

So the City Council bill acknowledges that pushing is better than pulling.

Those devilish details

But will the legislation require agencies to post data online?  To some extent, yes.  But how far that goes depends on how it’s interpreted, and how aggressively it’s implemented (and perhaps how strongly the public reacts, since it seems like the only enforcement mechanism is public reaction).

The first substantive part of the bill says that within a year, agencies need to post their data at the city’s online data portal.  But let’s look closely at the language.  Section 23-502(a) says that within a year, agencies don’t need to publish all their data to the portal.  Only “the public data sets that agencies make available on the Internet” need to be included in the portal (emphasis mine).

In other words, if an agency has refused to provide public access to a data set, or perhaps only allows access to that data after you’ve paid a fee and/or signed a license agreement, or otherwise hasn’t already posted the data online — that data is exempt.

Then it gives agencies another loophole.  The next sentence says that even if an agency has a data set online, it doesn’t need to post it on the portal if they “cannot” put it on the portal.  (“Cannot” isn’t defined in the bill.  Does it mean “doesn’t want to”? Does it mean the data’s too complex for some reason?  “Cannot” seems to offer quite a bit of wiggle room.)

The bill further states:

the agency shall report to the department and to the council which public data set or sets that it is unable to make available, the reasons why it cannot do so and the date by which the agency expects that such public data set or sets will be available on the single web portal.

I’m not a lawyer, but it seems to me that if an agency doesn’t want to comply, it just needs to give a reason.  And it needs to give a date by when it will add the data to the portal.  The date could be two years from now, or it could be two decades from now.  That part of the bill doesn’t have a deadline.

Without aggressive support from the top — the Mayor and/or perhaps a new Chief Data Officer position with some teeth — agencies could just take their ball and go home and not play the open data game.  And the public will be the worse for it without much recourse.

Over-reliance on “the portal”

Let’s be optimistic and assume that all city agencies (even the current holdouts – I’m looking at you, City Planning Department & MapPLUTO) decide to post their data online.

The bill doesn’t say, or even mention as an option, that agencies can keep posting the data online at their own websites.  Instead, it has to be posted on “a single web portal that is linked to nyc.gov”.

But I’m not as enthusiastic as I once was for the portal approach (currently implemented here).

  1. Data for APIs, or people?

At first I thought the portal would be so much better than the city’s earlier Datamine site. But the site seems to focus heavily on APIs and web service access to the data, which might be great for programmers and app developers, but not so good for people, like Community Board staff, or reporters, or students, or anyone else who just wants to download the data and work with the files themselves.

  1. Some agency websites are doing a better job

Also, why not allow — even encourage — agencies to continue posting data on their own websites?  I think that, in many instances, the individual agencies are doing a better job than the data portal. The files available for downloading from agency sites such as Finance, City Planning, Buildings, and Health are more up to date, more comprehensive (though still hardly complete), and easier to understand than what I can find on the portal.

I think it would be ok if both approaches existed (portal and individual agency sites). But the way the bill is worded, I think the risk is that agencies are more likely to do only what they have to do or what they’re expected to do.  Since the bill focuses on the portal, I think we may see individual agency data sites whither away, the rationale being why bother with individual sites since they have to post to the portal.  With sites such as City Planning’s Bytes of the Big Apple (which is really great, with the exception of the PLUTO license/fee), I think that could be a big loss for the many people and organizations who have come to rely on the high quality data access that these agency sites provide.  Hopefully I’ll be proven wrong.

  1. The current portal falls far short of a forum for public discussion

The bill requires DoITT to

implement an on-line forum to solicit feedback from the public and to encourage public discussion on open data policies and public data set availability on the web portal.

But if the current portal is the model for this online forum, I’m concerned.

When I access data from the agencies themselves, I can talk with the people directly responsible for creating and maintaining the data I’m seeking. I can have conversations with them to understand the data’s limitations. I can discuss with them how I’m planning to use the data, and if they think my expectations of the data are realistic.

In contrast, the portal requires me to either go through a web form (which I’ve done, and received zero communication in return), or to contact someone who has no identification beyond their name (or some online handle).  Do they work for an agency?  Do they even work for New York City?  I have no idea; the portal provides no information.  So much for a site that’s supposed to be promoting “transparency in government.”

To me, the portal is somewhat analogous to the city’s 311 system and the recent articles about putting the city’s Green Book online.  Though 311 is great in a lot of ways, it has put a wall between the public and individual city agency staff members.  Try finding a specific staffperson’s contact information via nyc.gov, like the New York Times recently did.  It’s almost impossible; you have to communicate through 311. Similarly, the online data portal — if it ends up replacing agency websites as sources for online data access — will make it difficult to locate someone knowledgeable about the data.

This widens the “data gap” — the gap of knowledge between data creators and data users.  In order to know whether a particular data set meets my needs (if I’m creating an app, or even just writing a term paper), sometimes a written description of the data is not enough.  I may need to actually talk with someone about the data set.

But good luck finding that person through the data portal.

And even when people have used the portal to submit online comments, I don’t know if anything ever comes of it.  It looks like only 14 of the 800+ datasets at the portal have comments (sort the list by “Most Comments”).  All of the comments raise important questions about the data.  For example, two people offered comments about the HPD Registration data available through the portal.  They asked “Is there any plan to expand it?” and “Could you help us?”  Both remain unanswered.

Maybe everyone who commented was contacted “offline”, as they say.  Either way, this hardly constitutes a forum for public discussion.  No public interactivity.  No transparency.  No guidance.  It’s no wonder there’s been so little use of the portal’s  button (and I use the term “Discuss” loosely).

Public data inventory

Another section of the bill has a nugget of hope.  But the way it’s worded, I’m not too optimistic.

Section 23-506(a) says that within 18 months, DoITT shall present a “compliance plan” to the Mayor, the Council, and the public.  Among other things, the plan must “include a summary description of public data sets under the control of each agency.”

In effect, this “summary description” (if it’s done right) will be the public data inventory that advocates have been pushing for (and which has been required by the NYC Charter since 1989). That’s a good thing. At least now we’ll know what data sets each agency maintains.

Hopefully it’ll be a comprehensive list. I guess the list’s comprehensiveness will be up to DoITT to enforce. (And if the list comes up obviously short, perhaps some enterprising FOILers can point out — very publicly — where the holes are 😉 ).

But that same section of the bill also says that the plan “shall prioritize such public data sets for inclusion on the single web portal on or before December 31, 2018“.  So it still relies solely on the data portal. And it gives the city another 6 years to make the data public. As someone said on Twitter, “sheesh”!

Then there’s another loophole.  The bill allows agencies to avoid meeting even the 2018 deadline by allowing them to

state the reasons why such [public data] set or sets cannot be made available, and, to the extent practicable, the date by which the agency that owns the data believes that it will be available on the single web portal.

“[T]o the extent practicable”?  When the agency “believes” it’ll be available?  Wow.  Those are some loose terms.  If I ran an agency and didn’t want to provide online access to my department’s data, I’d probably feel pretty confident I could continue preventing public access while easily complying with the law.

Where does this all leave us?

It looks like the City Council will pass this law, despite its limitations.  In fact, DoITT was so confident the law will pass, it emailed its February 2012 newsletter on the day the Council’s technology committee voted on the bill (Feb. 28, a day ahead of the expected full Council vote).  Here’s what the newsletter said about Intro 29-A:

“Will be voted on and then passed”?  I guess the full Council vote is pretty much a foregone conclusion.

That leaves us to hope that the bill’s implementation will address the issues I’ve outlined above, and any others that advocates may have identified.  Fingers crossed?

(Disclaimer: my viewpoints on this blog are my own, not necessarily my employer’s.)

Advertisements

Some NYC OpenData improvements – small but important victory!

I noticed today that NYC’s new OpenData site (on the Socrata platform) has made some modest improvements since I blogged about it earlier this month, and since several people have responded to comments from Socrata’s CEO.

In particular, many of the files listed in the Socrata/OpenData site as “GIS” files or “shapefiles” are now actually available for download as shapefiles.  You have to dig a bit to find the download option — it’s not available via the  button. You have to click the  button, and then scroll down to the “Attachments” section of the About page.  But in many cases, you’ll now find a zipped file containing a GIS shapefile.  Small — but important — victory!

The back story

When the OpenData site first launched, I was very concerned because there was no option to actually download most geospatial data sets — you could only access them as spreadsheets or web services via an API.  That’s not very helpful for people who want to work with the actual data using geographic information systems.  And it was a step backward, since many agencies already provide the GIS data for download, and earlier versions of the OpenData site had made the data available for direct download.

It also seemed like it was extra work for the agencies and for us — extra work to convert the data from GIS format into spreadsheets, for example, and then extra work for the public to try to convert the data back into GIS format once they had downloaded a spreadsheet from the OpenData site.  Seems pretty silly.

It also seemed like it was an example of DoITT not understanding the needs of the public — which includes Community Boards, urban planning students, journalists, and many others who routinely use GIS to analyze and visualize data.  Spreadsheets and APIs are nice for app developers — and the “tech community” broadly speaking — but what about the rest of us?

More public access to data, not less

If the city adds the shapefiles as a download option, that’s providing more open access to data, not less.  But by not offering GIS data along with the other formats, the Socrata system seems to be limiting access.  I’d hope that NYC would be as open and flexible and accommodating as possible when it comes to accessing public data.  Socrata’s CEO seems to argue that with the Socrata platform it’s too hard to do that.  If he’s right, maybe we should just stick with a tried and true approach — NYC agency websites already provide direct download of GIS data along with many other formats.

But I know that we can do better.  In fact, Chicago’s open data portal (also powered by Socrata) has offered many GIS datasets for direct download from Day 1.  Actually, Chicago has 159 datasets tagged as “GIS” files, while New York only has 69what’s up with that, NYC? I thought NYC was the best in everything when it comes to open data?

Still more to be done

Alas, even though we’re talking about a victory here, we can’t pop open the champagne quite yet.  Several of NYC’s data sets via the Socrata site aren’t as current as what you can already get from agency websites.  For example:

  • zoning is current as of August 2011, but you can download more current data (September 2011) from the ever-improving Planning Department’s Bytes of the Big Apple website;
  • building footprints are older (September 2010) than what you can download from DoITT’s GIS site itself (click through DoITT’s online agreement and you’ll get a buildings database from March 2011); and

Also, some data sets described on the Socrata/OpenData site as “shapefiles” are still not available in GIS format.  Some examples:

  • NYC’s landmarks data.  The OpenData site describes this data as a “point shapefile … for use in Geographic Information Systems (GIS).”  But it’s only available from the OpenData site as a spreadsheet (or similar format) or via an API.
  • Waterfront Access Plans.  The OpenData site describes this file as a “polygon shapefile of parklands on the water’s edge in New York City … for mapping all open spaces on the water’s edge in New York City.”  But like the landmarks data, it’s only available as a spreadsheet or via an API.  False advertising, if you ask me.  But if you go to the source (the City Planning Department), the shapefile is there for all to access.  So why is the Socrata/OpenData site any better? I’m still wondering that myself.

And the Socrata/OpenData site still doesn’t provide the kind of meaningful data descriptions (or metadata) that you’ll get from agency websites such as Bytes of the Big Apple or Dept of Finance — data descriptions that are absolutely essential for the public to understand whether the information from NYC OpenData is worth accessing.

But hope springs eternal — someone listened to our concerns about lack of actual geospatial data downloads, maybe they’ll also listen when it comes to everything else. Fingers crossed!

Pretty NYC WiFi map, but not useful beyond that

@nycgov posted a tweet on Friday touting the map of WiFi hotspots on the new NYC OpenData site.  I was impressed the city was trying to get the word out about some of the interesting data sets they’ve made public. It was retweeted, blogged about, etc many many times over during the day.

The map is nice (with little wifi symbols  marking the location of each hotspot).  And it certainly seems to show that there are lots of hotspots throughout the city, especially in Manhattan.

But when I took a close look, I was less than impressed.  Here’s why:

  • No metadata.  The NYC Socrata site has zero information on who created the data, why it was created, when it was created, source(s) for the wifi hotspots, etc.  So if I wanted to use this data in an app, or for analysis, or just to repost on my own website, I’d have no way of confirming the validity of the data or whether it met my needs.  Not very good for a site that’s supposed to be promoting transparency in government.
  • No contact info.  The wifi data profile says that “Cam Caldwell” created the data on Oct. 7, 2011 and uploaded it Oct 10.  But who is Cam?  Does this person work for a city agency?  It says the data was provided by DoITT, but does Cam work at DoITT?
    • If I click the “Contact Data Owner” link I just get a generic message form.  I used the “Contact Data Owner” link for a different data set last week, and still haven’t heard back.  Not even confirmation that my message was received, let alone who received it.  Doesn’t really inspire confidence that I can reach out to someone who knows about the data in order to ask questions about the wifi locations.
  • No links for more information. The “About” page provides a couple of links that seem like they might describe the data, but they don’t.

If I were to use the wifi data for a media story, or to analyze whether my Community Board has more or less hotspots than other Boards, or if I wanted to know if the number of hotspots in my area has changed over time, the NYC Socrata site isn’t helpful.

Even looking at the map on its own, it’s not very helpful.  Without knowing if the list of hotspots is comprehensive (does it include the latest hotspots in NYC parks? does it include the new hotspots at MTA subway stations? etc) or up to date (the Socrata site says the list of wifi sites is “updated as needed” – what does that mean?), I have zero confidence in using the data beyond just a pretty picture.

I’m sure if I clicked the “Contact Data Owner” link, eventually I’d get answers to these questions. But that’s not the point.  The point is that the new NYC OpenData site bills itself as a platform to facilitate how “public information can be used in meaningful ways.”  But if the wifi data is any guide, the OpenData site makes it almost impossible to meaningfully do anything with the data.

The wifi data is another example of how I think NYC’s implementation of the new Socrata platform is a step backwards.  Other NYC websites that provide access to public data — the City Planning Department’s Bytes of the Big Apple site as well as agency-specific sites from Finance, Buildings, HPD, and others — all provide detailed metadata, data “dictionaries”, and other descriptive information about available data files.  This contextual and descriptive information actually makes these data sets useful and meaningful, inviting the public to become informed consumers and repurposers of the city’s data.

The Socrata platform, in and of itself, seems great.  But NYC hasn’t done a very good job at all of putting it to use.  #opendata #fail

NYC’s new OpenData website: soars and falters all at once

UPDATE (10/13/11)

This evening I received a call from NYC DoITT.   They were mainly calling to tell me that they changed the official rules for BigApps 3.0.  Yesterday the rules said that no new data would be added to the OpenData site until after the BigApps competition.  As I said in my blog, why wait?  But DoITT saw that and agreed.  So now that clause has been removed from the rules (see section D.1).  DoITT says that they agree they data should be accessible whether there’s a competition in effect or not.  That’s great news!  I’m looking forward to more dialogue on the other issues I’ve raised below.

___________________________________________

ORIGINAL POST

New York City yesterday announced its new version of what had been called its “Datamine” website, a single online point of entry to access the city’s digital data holdings.

I’ve critiqued the Datamine project before, but I was heartened by the city’s choice to use the Socrata platform to upgrade Datamine. As I wrote a couple of months ago:

NYC’s Datamine was an improvement in some ways over earlier opendata efforts in New York. Now that it’s been around for two years, I think it’s fair to say that Datamine is clunky at best. For me, I can’t wait for it to be replaced by something better. I’m looking forward to the NYC/Socrata roll out.

Yesterday’s announcement came with great fanfare: 230 new data sets! (so they say), BigApps 3.0!, cash prizes!, etc.

But is “NYC OpenData” any better than Datamine?

After digging into the site for several hours last night and today, I’d have to say yes and no. It has some great stuff with great promise, but it still falls flat in some key areas. I look forward to using it for the APIs, but for the raw data I’ll go back to the individual agencies that in many cases are doing a better job of providing access to the data.  Overall the city has come a long way with open data, but I still think the city’s concept of data-as-economic-engine is misguided.  More on that below.

The good

Socrata’s platform is impressive. I’ve blogged about it before, but it’s worth summarizing some of the high points:

  • You can immediately preview the data in your browser (no downloading needed just to see what it contains). And you can view more details about each row in the file — very helpful if you’re interested in one particular aspect of the data.
  • You can visualize  the data in multiple ways — using an interactive map option built into the platform or using one of 9 different chart options.
  • If you want to download/export  a data set, they give you at least 8 formats for extracting/exporting.
  • Short links and “perma” links are available to each data set.
  • There’s a “Discuss”  option where anyone can attach notes and commentary for each data set.  It’s user-generated metadata — you can immediately see, for example, if anyone else has commented about the data’s quality, or completeness, or how up-to-date it is.

The big news with this new approach is the availability of an API for programmatic access for each data set in the Socrata system.  On its face, the APIs look great, and the city deserves kudos for implementing them.  Socrata has developed a template for developers to hook into the data — either row by row, selected queries, or to view metadata — and the template also provides data publishers with guidance on how to structure their data for automated consumption.  And, it seems that DoITT has created web services for the mapped data sets, which is a big step forward.

There are other improvements with specific data sets, such as:

  • It looks like the map data for NYC park boundaries is fixed — I posted a detailed review last year about how the parks data via Datamine was basically impossible to use.  I had to scrape the NYC Parks website to convert it to a useful format. But now the park names are included with the park IDs in the same file. (However, this improvement is tempered by the fact that I can view the map of parks on NYC’s Socrata website, but I can’t download the data in a mapped format. I discuss that in more detail below.)

There are some interesting new data sets.  Two things that caught my eye are:

  • School zones are included in the data, which is something I had urged the city to include [PDF] when the BigApps competition was first announced in 2009.  (School zones are the key determinant as to where your child can attend public elementary school, rather than the administrative school districts.)  But the earlier version of Datamine included school zone boundaries, so this isn’t really new.
  • HPD Registrations.  Unfortunately the data dictionary accompanying this file can be cryptic, so I couldn’t easily decipher exactly what the file includes. But it seems to be a list of almost 140,000 buildings in the city registered as “multiple dwellings” along with each building’s landlord/owner, managing agent(s), and building details.  Should come in pretty handy for anyone interested in the landlord landscape in New York.

Here’s an example of why the data dictionary is not very helpful – the excerpt below is trying to tell us what the “REG-INDV-HM-UNIT-NO” field means:

Um, what?

I thought it was also intriguing (in an insider baseball kind of way) that the interactive maps used at the NYC Socrata site to show mapped views of the data are from ESRI.  And the API/web services provided for the mapped data files are ESRI-based.  DoITT’s GIS unit has made a point of using non-ESRI technology for its interactive maps (Citymap, Scout, ZoLA, etc). But the GIS web services for Socrata all come from DoITT.  Wonder what’s happening there.

The not-so-good

The Mayor’s news release about the new Socrata site proclaims that more than 230 new data sets are included. We don’t get any details about which ones; the release simply says that:

Examples of this new data include a directory of HHC Facilities; electricity, gas and steam consumption available by zip code; and school attendance and report statistics.

But I looked pretty closely at what new data sets I could find, and I was hard pressed to identify more than a few dozen.

Examples of old data masquerading as new simply because it’s available through the new Socrata site include many of the files from NYC’s Dept of Finance, such as:

  • Condominium comparable rental income listings (38 individual datasets);
  • Cooperative comparable rental income listings  (40 datasets); and
  • Summary of Neighborhood (Property) Sales (21 datasets).

That’s almost 100 data sets right there, close to half the number the city says are newly available.  But each of these have been online, for free download, at Finance’s website for several years.  This page notes that coop sales information has been available since 2006, and Finance started making the data available for batch download a couple of years after that.  The Neighborhood Sales data was put online a couple of years ago.  And Finance’s website has more thorough information about the data sets and how to use them than the Socrata site.

Other not-so-new examples include:

  • Street centerlines.  These are from DoITT circa 2009. In contrast, the City Planning Department “LION” file at DCP’s website is from September 2010, and is updated regularly.
  • Building perimeters. From DoITT circa 2010.  But DoITT has a more recent file at their website for direct download (click through the online agreement and you’ll find building footprints from March 2011).
  • Coastal boundaries. From City Planning, but this was posted on the Bytes of the Big Apple site last month.  Great data set, but not new.
  • Campaign contributions. From the NYC Campaign Finance Board.  The data is current (covering the 2013 election cycle), but the files are already available in batch format and via a searchable website from CFB.
  • Landmarks data. There are multiple, conflicting data sets at the NYC Socrata site regarding landmarks.  For example, one data set of “NYC Landmarks” is from 2009, another (called “LPC Landmark Points”) is from 2010.  Either way, there have been several new landmarks and historic districts designated since then by the Landmarks Commission.

Even if there was only one new data set in the new Socrata site, that’s better than nothing. But there’s so much data maintained by city agencies that is still not easily, publicly accessible.  My blog post when BigApps was first announced in 2009 has a listing of some key data files that still haven’t seen the light of day.

The city should be doing a better job — especially since there’s been so much pressure on them to improve their open data policies, they have an avowed policy of doing so, and they’re also under a state law (FOIL) to require them to do so. Frustrating.

One of my biggest and longest standing gripes is about property data.  There are a number of property-related files the NYC Socrata website.  But nothing that allows us to come close to the City Planning Department’s “MapPLUTO” dataset.  The city still charges a fee (up to $3,000 per year) with a restrictive license agreement in order to access the PLUTO data — a mapped file of all properties in NYC with a wealth of information about each one (zoning, ownership, building heighs, land use categories, assessed value, etc).  It’s an essential data set for anyone trying to understand real estate, urban planning, neighborhood change, and more in the city.

When will City Planning get it? They’ve done such a great job of making other data sets available — files they used to charge for but now provide for free, and in better formats, with great metadata, and updated frequently.  The agency obviously spends a lot of time preparing these other data sets that are freely available, so I don’t buy the argument that the PLUTO fee covers their “costs” of doing extra work to put PLUTO together.  I just don’t understand.  And property data is so incredibly useful in NYC — certainly to the big real estate players, but I’m not concerned about them.  If it were free for everyone, at least we’d have a chance at a level playing field — helping “the little guy” do property analysis and mapping so he/she can analyze land use, understand policy implications, etc.

Data for people, not just machines

Data access — at least in this first iteration of the new Socrata site — seems to be weighted toward APIs, and therefore app developers. I understand the value of the API approach — I’ve developed apps myself, and at CUNY we have online sites that can definitely make use of the APIs. And I was kind of amazed that DoITT opened these up.  So the APIs are good, and perhaps they’re worth the effort to create and maintain a one-stop-shop like NYC Socrata.

But for the average user — someone at a Community Board, or a local media outlet, or a City Councilmember’s office — the city’s implementation of the Socrata system seems against them.

For example, with one or two exceptions I wasn’t able to download any mapped data sets from NYC Socrata.  Many files (45 by my count) are described as “GIS datasets”, and they’re obviously in ESRI’s “shapefile” format to begin with, but the “Export” option only provides flat files (CSV, JSON, XLS, XML for example), and not even the now-ubiquitous KML format (used by Google and many others).

If I click the API link for these data sets, this enables me to view the data as map layers in my desktop GIS application.  But I can’t extract any actual data from these links in order to work with it on my own.  The screenshot below (from ESRI’s ArcCatalog application) seems promising, but the inability to download the mapped data itself is very limiting.

It’d be easy enough (I’m assuming) to just add shapefiles to the list of Socrata’s data export formats. The shapefile format (.SHP) is already basically an open one (all the major open source GIS packages read it), so why force GIS users to do extra work to access GIS data?  And why have DoITT go through extra work converting from SHP to something else, just to have the user convert it back again. For “point” locations this isn’t a big deal — it’s easy enough to convert latitude/longitude coordinates into a mapped data set.  But this isn’t straightforward at all for polygons (district boundaries, for example) or lines (streets, transit routes, etc). I’m not saying don’t provide the data in the other formats, just add SHP to the list where appropriate.  (Some GIS datasets are available as GIS downloads: school zones, for example. But this is an exception, as far as I can tell.)

Indeed, not having GIS-ready formats is a step backward. If I visit the City Planning Department’s “Bytes of the Big Apple” website, I can download a wealth of files in GIS format, and several of them are updated regularly. It’s great. Hopefully the NYC OpenData site doesn’t supplant the individual agency sites. For now, they’re better for me, and I’d imagine they’re better for many other users.

And having the raw data, rather than just API access, gives users more flexibility.  For example, during the preparation for Hurricane Irene, several organizations downloaded NYC Datamine files in GIS format to create interactive maps of evacuation zones and evacuation sites.  (And these groups helped the city in a big way because the city’s own maps and website were down, making it difficult if not impossible to get essential information from NYC.gov.)  But the city changed several of the evacuation sites just a day or two before the storm was going to hit.  If the outside organizations didn’t have the raw data that we could update ourselves, our presentation of the evacuation sites would’ve been incorrect and misleading.  I wouldn’t want to rely on the city updating its API in a crisis situation like that, given how rocky the city’s digital response was to the storm itself.

Tying open data to app competitions & economic growth is the wrong approach

(Note: my concern here still stands, but the city has modified its position a bit, which is great.  See the 10/13 Update above.)

I think the real issue here is that the city’s open data efforts are being driven more by the desire to use data access as a way to leverage economic development, and less about true government transparency.

For example, as with the first two BigApps competitions, no new data files will be added to the Socrata site until the latest BigApps competition is over (see section D.1 at the official rules).  Why wait?  Why should app developers get preference?  What about the rest of us? Is NYC providing data just so app developers can do free work for the city, and so the city can make a news splash about open data? Open data should be open 24/7 — and should be updated on a regular basis — not just when it’s convenient for the city and for developers.

Next steps

I understand that the new NYC Socrata site is a work in progress, and will almost certainly be improved going forward.  But for now, although it includes lots of data, much of this has already been available elsewhere.  The APIs are intriguing, but I hope they don’t preclude other ways for people rather than machines or apps to access the data.

At this point, with few exceptions I still would prefer to go to the individual agency websites (or even talk to agency staff and request the files via email, or even via disks & snail mail!) to get the data — from what I’ve seen so far, chances are it’ll be more timely, in better quality, and I’ll have better access to metadata/explanations of the files.

I’m even wondering if instead of a Socrata-like site, it might not be better to encourage the agencies directly responsible for creating the data to continue efforts to provide public access, and having them engage with people using the data so they’d see the benefits of open data (and/or realize that it’s not so bad to provide access to their files to the broad public in easily accessible ways).  At the least, the new NYC Socrata site shouldn’t preclude this agency-specific work to be done.

I’ve already had a good, late-night exchange on Twitter with DoITT on some of these issues. I’ll be submitting feedback directly at the Socrata website.  And hopefully the dialogue will continue.