TL; DR (i.e., the summary)
NYC is about to adopt what some are calling “landmark” and “historic” legislation regarding open data. Does the hype match the reality?
I offer the analysis below not as a critique of the City Council. I think they probably tried to negotiate as good a bill as they thought they could achieve. I offer it more as food for thought for those of us who will be seeking the data that may eventually become available because of the legislation (and for those of us who rely on data that’s currently available that may become less so due to the bill).
Hopefully my concerns represent a worst case scenario. If the bill’s implementation indeed lives up to the “landmark” status bestowed on its passage, that would be a great thing.
For example, the Council’s committee report on the bill [Word doc] suggested that substantial city data sets such as the Building Information System (BIS) or the Automated City Register Information System (ACRIS) would be made available in open, accessible formats due to the legislation. If that happens, that would be great. But for each of the handful of examples like that suggested at yesterday’s Council committee meeting, I could offer several more that I believe might escape the requirements of this bill.
My overall sense is that somewhere during the two-plus years the bill has been on the table, the details got in the way of the original vision embodied in this proposal. And, as they say, the devil is in the details. If you’re interested in my take on those gory details, please read on.
An important step
The bill is important, in a way. It’s an acknowledgment by the City Council (and the Mayor, if he signs it) that city agencies need to provide public access to data sets online, in a standardized electronic format.
In doing so, it goes a step beyond FOIL — the New York State law since the mid-1970s that has required agencies (including local government) to provide public access to data. Though FOIL has adapted to the times to some extent — the courts and policymakers now understand that FOIL applies to electronic data as well as printed material — it is still a reactive approach. You have to submit a FOIL request (and have a good idea of what data you’re requesting) for an agency to respond and give you access. New York’s Committee on Open Government describes it as “pull” vs. “push”. [PDF]
Some smart agencies have realized that posting data electronically saves money, time, and effort. By posting data online proactively, before the agency even receives a single FOIL letter (“pushing” it so people don’t have to “pull” it), it avoids having to respond individually to FOIL requests.
So the City Council bill acknowledges that pushing is better than pulling.
Those devilish details
But will the legislation require agencies to post data online? To some extent, yes. But how far that goes depends on how it’s interpreted, and how aggressively it’s implemented (and perhaps how strongly the public reacts, since it seems like the only enforcement mechanism is public reaction).
The first substantive part of the bill says that within a year, agencies need to post their data at the city’s online data portal. But let’s look closely at the language. Section 23-502(a) says that within a year, agencies don’t need to publish all their data to the portal. Only “the public data sets that agencies make available on the Internet” need to be included in the portal (emphasis mine).
In other words, if an agency has refused to provide public access to a data set, or perhaps only allows access to that data after you’ve paid a fee and/or signed a license agreement, or otherwise hasn’t already posted the data online — that data is exempt.
Then it gives agencies another loophole. The next sentence says that even if an agency has a data set online, it doesn’t need to post it on the portal if they “cannot” put it on the portal. (“Cannot” isn’t defined in the bill. Does it mean “doesn’t want to”? Does it mean the data’s too complex for some reason? “Cannot” seems to offer quite a bit of wiggle room.)
The bill further states:
the agency shall report to the department and to the council which public data set or sets that it is unable to make available, the reasons why it cannot do so and the date by which the agency expects that such public data set or sets will be available on the single web portal.
I’m not a lawyer, but it seems to me that if an agency doesn’t want to comply, it just needs to give a reason. And it needs to give a date by when it will add the data to the portal. The date could be two years from now, or it could be two decades from now. That part of the bill doesn’t have a deadline.
Without aggressive support from the top — the Mayor and/or perhaps a new Chief Data Officer position with some teeth — agencies could just take their ball and go home and not play the open data game. And the public will be the worse for it without much recourse.
Over-reliance on “the portal”
Let’s be optimistic and assume that all city agencies (even the current holdouts – I’m looking at you, City Planning Department & MapPLUTO) decide to post their data online.
The bill doesn’t say, or even mention as an option, that agencies can keep posting the data online at their own websites. Instead, it has to be posted on “a single web portal that is linked to nyc.gov”.
But I’m not as enthusiastic as I once was for the portal approach (currently implemented here).
- Data for APIs, or people?
At first I thought the portal would be so much better than the city’s earlier Datamine site. But the site seems to focus heavily on APIs and web service access to the data, which might be great for programmers and app developers, but not so good for people, like Community Board staff, or reporters, or students, or anyone else who just wants to download the data and work with the files themselves.
- Some agency websites are doing a better job
Also, why not allow — even encourage — agencies to continue posting data on their own websites? I think that, in many instances, the individual agencies are doing a better job than the data portal. The files available for downloading from agency sites such as Finance, City Planning, Buildings, and Health are more up to date, more comprehensive (though still hardly complete), and easier to understand than what I can find on the portal.
I think it would be ok if both approaches existed (portal and individual agency sites). But the way the bill is worded, I think the risk is that agencies are more likely to do only what they have to do or what they’re expected to do. Since the bill focuses on the portal, I think we may see individual agency data sites whither away, the rationale being why bother with individual sites since they have to post to the portal. With sites such as City Planning’s Bytes of the Big Apple (which is really great, with the exception of the PLUTO license/fee), I think that could be a big loss for the many people and organizations who have come to rely on the high quality data access that these agency sites provide. Hopefully I’ll be proven wrong.
- The current portal falls far short of a forum for public discussion
The bill requires DoITT to
implement an on-line forum to solicit feedback from the public and to encourage public discussion on open data policies and public data set availability on the web portal.
But if the current portal is the model for this online forum, I’m concerned.
When I access data from the agencies themselves, I can talk with the people directly responsible for creating and maintaining the data I’m seeking. I can have conversations with them to understand the data’s limitations. I can discuss with them how I’m planning to use the data, and if they think my expectations of the data are realistic.
In contrast, the portal requires me to either go through a web form (which I’ve done, and received zero communication in return), or to contact someone who has no identification beyond their name (or some online handle). Do they work for an agency? Do they even work for New York City? I have no idea; the portal provides no information. So much for a site that’s supposed to be promoting “transparency in government.”
To me, the portal is somewhat analogous to the city’s 311 system and the recent articles about putting the city’s Green Book online. Though 311 is great in a lot of ways, it has put a wall between the public and individual city agency staff members. Try finding a specific staffperson’s contact information via nyc.gov, like the New York Times recently did. It’s almost impossible; you have to communicate through 311. Similarly, the online data portal — if it ends up replacing agency websites as sources for online data access — will make it difficult to locate someone knowledgeable about the data.
This widens the “data gap” — the gap of knowledge between data creators and data users. In order to know whether a particular data set meets my needs (if I’m creating an app, or even just writing a term paper), sometimes a written description of the data is not enough. I may need to actually talk with someone about the data set.
But good luck finding that person through the data portal.
And even when people have used the portal to submit online comments, I don’t know if anything ever comes of it. It looks like only 14 of the 800+ datasets at the portal have comments (sort the list by “Most Comments”). All of the comments raise important questions about the data. For example, two people offered comments about the HPD Registration data available through the portal. They asked “Is there any plan to expand it?” and “Could you help us?” Both remain unanswered.
Maybe everyone who commented was contacted “offline”, as they say. Either way, this hardly constitutes a forum for public discussion. No public interactivity. No transparency. No guidance. It’s no wonder there’s been so little use of the portal’s button (and I use the term “Discuss” loosely).
Public data inventory
Another section of the bill has a nugget of hope. But the way it’s worded, I’m not too optimistic.
Section 23-506(a) says that within 18 months, DoITT shall present a “compliance plan” to the Mayor, the Council, and the public. Among other things, the plan must “include a summary description of public data sets under the control of each agency.”
In effect, this “summary description” (if it’s done right) will be the public data inventory that advocates have been pushing for (and which has been required by the NYC Charter since 1989). That’s a good thing. At least now we’ll know what data sets each agency maintains.
Hopefully it’ll be a comprehensive list. I guess the list’s comprehensiveness will be up to DoITT to enforce. (And if the list comes up obviously short, perhaps some enterprising FOILers can point out — very publicly — where the holes are 😉 ).
But that same section of the bill also says that the plan “shall prioritize such public data sets for inclusion on the single web portal on or before December 31, 2018“. So it still relies solely on the data portal. And it gives the city another 6 years to make the data public. As someone said on Twitter, “sheesh”!
Then there’s another loophole. The bill allows agencies to avoid meeting even the 2018 deadline by allowing them to
state the reasons why such [public data] set or sets cannot be made available, and, to the extent practicable, the date by which the agency that owns the data believes that it will be available on the single web portal.
“[T]o the extent practicable”? When the agency “believes” it’ll be available? Wow. Those are some loose terms. If I ran an agency and didn’t want to provide online access to my department’s data, I’d probably feel pretty confident I could continue preventing public access while easily complying with the law.
Where does this all leave us?
It looks like the City Council will pass this law, despite its limitations. In fact, DoITT was so confident the law will pass, it emailed its February 2012 newsletter on the day the Council’s technology committee voted on the bill (Feb. 28, a day ahead of the expected full Council vote). Here’s what the newsletter said about Intro 29-A:
“Will be voted on and then passed”? I guess the full Council vote is pretty much a foregone conclusion.
That leaves us to hope that the bill’s implementation will address the issues I’ve outlined above, and any others that advocates may have identified. Fingers crossed?
(Disclaimer: my viewpoints on this blog are my own, not necessarily my employer’s.)