@nycgov posted a tweet on Friday touting the map of WiFi hotspots on the new NYC OpenData site.  I was impressed the city was trying to get the word out about some of the interesting data sets they’ve made public. It was retweeted, blogged about, etc many many times over during the day.

The map is nice (with little wifi symbols  marking the location of each hotspot).  And it certainly seems to show that there are lots of hotspots throughout the city, especially in Manhattan.

But when I took a close look, I was less than impressed.  Here’s why:

  • No metadata.  The NYC Socrata site has zero information on who created the data, why it was created, when it was created, source(s) for the wifi hotspots, etc.  So if I wanted to use this data in an app, or for analysis, or just to repost on my own website, I’d have no way of confirming the validity of the data or whether it met my needs.  Not very good for a site that’s supposed to be promoting transparency in government.
  • No contact info.  The wifi data profile says that “Cam Caldwell” created the data on Oct. 7, 2011 and uploaded it Oct 10.  But who is Cam?  Does this person work for a city agency?  It says the data was provided by DoITT, but does Cam work at DoITT?
    • If I click the “Contact Data Owner” link I just get a generic message form.  I used the “Contact Data Owner” link for a different data set last week, and still haven’t heard back.  Not even confirmation that my message was received, let alone who received it.  Doesn’t really inspire confidence that I can reach out to someone who knows about the data in order to ask questions about the wifi locations.
  • No links for more information. The “About” page provides a couple of links that seem like they might describe the data, but they don’t.

If I were to use the wifi data for a media story, or to analyze whether my Community Board has more or less hotspots than other Boards, or if I wanted to know if the number of hotspots in my area has changed over time, the NYC Socrata site isn’t helpful.

Even looking at the map on its own, it’s not very helpful.  Without knowing if the list of hotspots is comprehensive (does it include the latest hotspots in NYC parks? does it include the new hotspots at MTA subway stations? etc) or up to date (the Socrata site says the list of wifi sites is “updated as needed” – what does that mean?), I have zero confidence in using the data beyond just a pretty picture.

I’m sure if I clicked the “Contact Data Owner” link, eventually I’d get answers to these questions. But that’s not the point.  The point is that the new NYC OpenData site bills itself as a platform to facilitate how “public information can be used in meaningful ways.”  But if the wifi data is any guide, the OpenData site makes it almost impossible to meaningfully do anything with the data.

The wifi data is another example of how I think NYC’s implementation of the new Socrata platform is a step backwards.  Other NYC websites that provide access to public data — the City Planning Department’s Bytes of the Big Apple site as well as agency-specific sites from Finance, Buildings, HPD, and others — all provide detailed metadata, data “dictionaries”, and other descriptive information about available data files.  This contextual and descriptive information actually makes these data sets useful and meaningful, inviting the public to become informed consumers and repurposers of the city’s data.

The Socrata platform, in and of itself, seems great.  But NYC hasn’t done a very good job at all of putting it to use.  #opendata #fail

11 Responses

  1. I’ve not seen the city’s policy on releasing data, but last week, at an event we organized that explored governance systems for the .nyc TLD, David Bollier, a noted authority on the commons, outlined the policy of Linz, Austria. Their policy, developed as a “practical economic matter” provides the following reasons for doing so:

    o reduce costs
    o avoid dependency on outside vendors
    o promote local initiatives
    o strengthen the economy
    o create value
    o establish transparency
    o and legal certainty

    See more on the meeting at


    Tom Lowenhaupt

  2. There are inaccuracies in your post as well as what seems to be a deliberate attempt to mislead and misinform.

    For example, you have a bold bullet indicating the wifi hotspot map has no metadata. That’s misleading and inaccurate. The dataset has more than 30 populated metadata fields. Simply click on the [About] button to see a full page of metadata. These may not be the metadata fields you want or you may have issues with their quality, but to say there is no metadata is not accurate.

    As you discovered, there is a “Contact the Data Owner” link. You didn’t click on it but it’s a simple form that proxies an email to the actual person responsible for maintaining the data. That’s there mostly for legacy communication reasons. Some people still like to contact somebody when they have a question. But the “web 2.0″ way to do this is to ask your questions and provide your feedback in context, right alongside the data. That’s why the NYC Open Data Platform has a [Discuss] button which opens a discussion pane next to the data. The data owners can not only answer your question, but share that answer with anyone who visits the data after you. Have comments about the data quality? Post your comments so someone can address them. Together, we’re socially enriching public data, making it better for everyone.

    I don’t think you have adequately considered the alternative to this map and reported on the map in that context. Many cities have a “list” of wifi hotspots. It’s merely a table that provides no geographic context whatsoever. Not only does the NYC map show geographic context, it also visually presents something that is hard for many less sophisticated data lovers to immediately grasp in tabular data – density. It’s very quick and easy to see that the density of wifi locations is higher in Manhattan than in other areas. In your opening paragraph you recognized that Manhattan has a lot of wifi hotspots. The UI did its most important job: it communicated information without getting in your way at all.

    While you personally might be able to download a shapefile and work with it, many others are excluded from even participating in the Open Data discussion if that’s all that’s offered. Before we cast out the new NYC Open Data Platform, let’s first recognize its new strengths and then let’s agree to work together to improve on its weaknesses. I can assure you that you have Socrata’s full commitment. I know our new colleagues in NYC government are eager to provide citizens with the data and metadata that’s valuable to New Yorkers. So the only outstanding commitment we need is from you – the citizen. Will you help us by being part of the solution?

    Kevin Merritt
    Founder & CEO of Socrata
    Champion for Open Data for All

    • Those are some strong accusations about misleading and misinforming. Hopefully you can do a better job delivering public data to the public, rather than casting aspersions on my legitimate concerns.

      Your claim about metadata is interesting. The wifi hotspots file does indeed have “metadata.” But the metadata is for things such as “symbol.height” and “symbol.xoffset” – nothing of real use to anyone other than the most sophisticated GIS technician. So if you can tell me what the difference is between useless metadata and no metadata, then I’ll stand corrected.

      I think it’s nice that Socrata provides the opportunity to comment on individual data elements. But my issues didn’t have anything to do with individual data elements. I did post my comments so someone could address them. Unfortunately even your response above doesn’t really address my issues, so I don’t see how using your “web 2.0″ approach will be any better.

      You seem to think that I only want the ability to download a shapefile. That’s an oddly limited — and perhaps purposeful — misreading of my concern. I love the idea that Socrata provides all sorts of data formats for downloading. And the APIs/web services are great, as I’ve noted in my posts. (Maybe you missed those parts?) But what I don’t understand is why Socrata can’t also provide direct downloading of the original GIS data that was then reformatted into all those other file types (reformatting it was more work for you, and restoring it to the original format will be more work for us). I don’t want to preclude access to anyone. But if the original data was in a shapefile, why not provide that format too? (And hopefully provide some real metadata, and an actual city agency staff person as a contact so I can talk to a real person about the data — instead of relying on some web 2.0 black box.)

      And I know it’s possible for the Socrata system to do that — for example, Chicago’s Socrata website includes links to all sorts of shapefiles. And I see that as of yesterday, the data contact for the NYC WiFi hotsposts posted a shapefile of the wifi data. That’s great! Thank you. But why not do that for all the other GIS data sets that so many people are eager to download.

      It’s pretty silly to criticize a request like that. I’m asking for more open access to data, not less. But by not offering GIS data along with the other formats, your system is limiting access. NYC should actually live up to its hype of being the “number one digital city” and be as open and flexible and accommodating as possible when it comes to accessing public data. But you seem to be arguing that with Socrata it’s too hard to do that. That’s kinda lame. Maybe we should just stick with a tried and true approach — NYC agency websites already provide direct download of GIS data along with many other formats. We’re doing just great without Socrata, so why does the city have to conform to your restrictive approach? For my money, individual city agencies are already doing a better job than you are. Though Socrata has some great potential (as I’ve already pointed out), in this case you’ve set NYC back a couple of steps. That’s not progress. That doesn’t further the cause of Open Data. I don’t get it.

      And I’d really wish you wouldn’t question my commitment to the “open data movement” with trite comments about wishing I was part of the solution. I’m not getting paid to advocate for open data. I’ve been doing this with passion for more than two decades, and I will call them like I see them.

    • I’m not really sure how you start a civil conversation by accusing someone of “a deliberate attempt to mislead and misinform.”

      Yes, meta data exists; there’s also a lot missing from it.

      Any New Yorker with a wireless connection knows that there are hotspots all over the city, and that some of them are fee based. I don’t have to be a rocket scientist to guess this map displays “Location of wifi hotspots in the city with basic descriptive information.” but the larger questions about whether these are private or public, how they were collected … those aren’t answered. The only substantive metadata is buried: the data was provided by DoITT. So there’s a city agency in here somewhere. Does Cam Caldwell work for that agency? Should I report errors in the data to him? Otherwise, where should errors be reported? Can I register a new data point with DoITT? Or does Socrata own this data set now? Who should I report errors to? DoITT or Socrata? That matters. Not just for the shelf life of this map.

      Open Data is about more (way more) than telling customer-citizens where to find things. New York needs access to data so that citizens and journalists and advocates can participate meaningfully in policy debates and decision making. If I want to know where the nearest hot-spot to my dentist’s office is, the map has me covered. If I want to draw any meaningful conclusions about the potential for a wireless grid to address digital divide concerns, I have a thousand more questions. If I want to look at where the city is supporting free internet access, or whether there are parts of the city that are underserved by library wireless … no dice. I can’t even filter the fee based access points off — if I download a CSV I have to manually tweak the data to distinguish free from fee-based sites and even then I have no idea whether these are provided by the city or not.

      I’m not sure how Socrata views their role in improving transparency and access to information in New York City, but I know that Steven has been a tireless advocate for better access to public data. The Oasis maps are unparalleled in the useful and meaningful information they provide about access to recreational space, green space and more in New York City.

  3. Hi have to agree with Steven here. Our vision on Open Data is not that a person can look at a map full of symbols and says, hey! there is a lot of wifi hotspots in NYC! but rather be able to use the data for something useful. Open Data is much more than just showing the data, and contests like NYC BigApps I think make it clear.

    Now, I understand that here we are talking about limitations of the Socrata system and issues with the workflow of publishing the data on it by NY. Socrata obscures the data by trying to provide some value by doing automatic visualizations. So, my recommendation, make it much more obvious WHERE the original data is, something at the same level as Discuss, More views, etc. We are talking of data, lets get the file before anything else!

    The second issue is that not for every dataset NYC is providing the original data, and sometimes they do it by publishing it on an ArcGIS server. Well, those things are just wrong and they should stop doing it. It does not allow me to use the data and therefore are not Open Data.

    In general, I am fine with Socrata trying to provide some value on top of the data, but please, as an addon, the fundamental of the NYC Open Data is to try to help me to find data and facilitate the download, the other stuff is just bonus.

    A more fundamental issue here actually is how an Open Data platform decide to use a non Open Source platform for such a fundamental piece of infrastructure. I would be willing to contribute to improve Socrata, but why? So that it can be sold over and over to more tax payers? NYC Open Data should be Open Source, and then we will not only make comments, but actually provide pataches to fix those issues we might find.

  4. How does NYC Open Data/Socrata explain the Heating Oil data? There’s 700+ records without an address in the web table and then when you download the data, the address and lat/lng’s are concatenated into one field. And is there a link for the metadata behind the Heating Oil data? It’s not easy for me to see where that is.

    @Kevin Merrett – I don’t think anyone is trying to discredit the attempts of open data and opengov, its just that there is little dialogue between the city and the people who use the data daily and we have noticed a decline in the data quality. Not too long ago, I used to read about all of these FGDC and all this hoopla about data standards. I used to work for a guy who prior to joining Esri was the champion/evangelist of Geodatabases and their domain design so that any of the local gov’t’s he worked with could not ruin their data by failing to fill out new records/forms correctly. Now, I don’t want to digress on issues with Geodatabases and Esri in general, but there was a solid attempt to create logical systems that protected data from getting too dirty. And what it seems like Open Data and even some data on Data Mine has led us a few steps back in data quality.

    And as the old saying goes “garbage in, garbage out.” There’s clearly an eagerness in the NYC community help clean and develop data standards. So let us help.

  5. Also, @Kevin Merrett – I love the fact that Chicago opened up their crime data, it truly is great for analyses and forward thinking of Chicago, but perhaps there are some records that should be geographically masked b/c I was able to find some crimes were classes of crimes were victims of sexual assault by a family member, meaning that X,Y is shared by the crime location, the perpetrator and the victim. I can’t think of any other crime types, but perhaps that specific type could be census block centroid or something. I’m not sure of the legality but it seems to me that victims (minors) of this type of crime shouldn’t have to worry about being exposed. If you’d like I could email you offline about a specific record # and address instance of this.

