Proposed NYS Senate & Assembly districts available in GIS format

UPDATE Nov. 5, 2012

In preparation for the Nov. 2012 election, many news organizations and others are linking to our interactive State Legislature and Congressional redistricting maps. We’ve posted examples at the Center for Urban Research website.


UPDATE Sept. 7, 2012

We’ve updated our map of redistricted State Senate and Assembly districts, highlighting the differences in race/ethnicity characteristics between total population and voter-eligible population – in other words, comparing the characteristics of all those who live in the new districts versus the smaller group who will be eligible to vote for each district’s representatives.  In some cases the differences are striking.

Our examination of the district-by-district data is available here.  The New York Times gave our analysis a shout-out in their CityRoom primary election day column.

You can also visit our original NYS redistricting “comparinator” map described below, at www.urbanresearchmaps.org/nyredistricting/map.html


UPDATE February 5, 2012

You can visualize these proposed districts in relation to the current New York State Senate and Assembly districts with our new interactive redistricting map.  We developed the interactive map in collaboration with The New York World, and here’s an article using the maps to describe the redistricting process in the Empire State.  For more background on the interactive map, visit this blog post.


Original Post

If you’re hoping to use GIS or any of the online mapping tools to map the legislative district lines in New York State that were proposed today by the state’s redistricting task force, you’ll have some work to do.  The Task Force released PDF maps as well as “block assignment lists” for the proposed districts.

Unless you’d like to use the shapefiles and/or KML files that our team at the CUNY Graduate Center created!  Here’s our web page with the info: http://www.urbanresearch.org/news/proposed-nys-districts-in-gis-format

Happy redistricting mapping!

About these ads

Access to local GIS data

Rob Goodspeed has an interesting post about his survey of the policies and practices of local governments in Massachusetts regarding GIS data. It looks like a good read. In my experience (in New York State), local governments can have more interesting GIS data (for example, tax parcels and real property records) than the state or Feds, but their data access policies and/or practices can be more limiting. There are major exceptions (NYC, for example), but even New York City requires a fee and restrictive license to access its property data.

I look forward to reading Rob’s paper. Among other things, Rob is a PhD student at MIT. Nick Grossman of Civic Commons first alerted me to the paper via Twitter.

Some NYC OpenData improvements – small but important victory!

I noticed today that NYC’s new OpenData site (on the Socrata platform) has made some modest improvements since I blogged about it earlier this month, and since several people have responded to comments from Socrata’s CEO.

In particular, many of the files listed in the Socrata/OpenData site as “GIS” files or “shapefiles” are now actually available for download as shapefiles.  You have to dig a bit to find the download option — it’s not available via the  button. You have to click the  button, and then scroll down to the “Attachments” section of the About page.  But in many cases, you’ll now find a zipped file containing a GIS shapefile.  Small — but important — victory!

The back story

When the OpenData site first launched, I was very concerned because there was no option to actually download most geospatial data sets — you could only access them as spreadsheets or web services via an API.  That’s not very helpful for people who want to work with the actual data using geographic information systems.  And it was a step backward, since many agencies already provide the GIS data for download, and earlier versions of the OpenData site had made the data available for direct download.

It also seemed like it was extra work for the agencies and for us — extra work to convert the data from GIS format into spreadsheets, for example, and then extra work for the public to try to convert the data back into GIS format once they had downloaded a spreadsheet from the OpenData site.  Seems pretty silly.

It also seemed like it was an example of DoITT not understanding the needs of the public — which includes Community Boards, urban planning students, journalists, and many others who routinely use GIS to analyze and visualize data.  Spreadsheets and APIs are nice for app developers — and the “tech community” broadly speaking — but what about the rest of us?

More public access to data, not less

If the city adds the shapefiles as a download option, that’s providing more open access to data, not less.  But by not offering GIS data along with the other formats, the Socrata system seems to be limiting access.  I’d hope that NYC would be as open and flexible and accommodating as possible when it comes to accessing public data.  Socrata’s CEO seems to argue that with the Socrata platform it’s too hard to do that.  If he’s right, maybe we should just stick with a tried and true approach — NYC agency websites already provide direct download of GIS data along with many other formats.

But I know that we can do better.  In fact, Chicago’s open data portal (also powered by Socrata) has offered many GIS datasets for direct download from Day 1.  Actually, Chicago has 159 datasets tagged as “GIS” files, while New York only has 69what’s up with that, NYC? I thought NYC was the best in everything when it comes to open data?

Still more to be done

Alas, even though we’re talking about a victory here, we can’t pop open the champagne quite yet.  Several of NYC’s data sets via the Socrata site aren’t as current as what you can already get from agency websites.  For example:

  • zoning is current as of August 2011, but you can download more current data (September 2011) from the ever-improving Planning Department’s Bytes of the Big Apple website;
  • building footprints are older (September 2010) than what you can download from DoITT’s GIS site itself (click through DoITT’s online agreement and you’ll get a buildings database from March 2011); and

Also, some data sets described on the Socrata/OpenData site as “shapefiles” are still not available in GIS format.  Some examples:

  • NYC’s landmarks data.  The OpenData site describes this data as a “point shapefile … for use in Geographic Information Systems (GIS).”  But it’s only available from the OpenData site as a spreadsheet (or similar format) or via an API.
  • Waterfront Access Plans.  The OpenData site describes this file as a “polygon shapefile of parklands on the water’s edge in New York City … for mapping all open spaces on the water’s edge in New York City.”  But like the landmarks data, it’s only available as a spreadsheet or via an API.  False advertising, if you ask me.  But if you go to the source (the City Planning Department), the shapefile is there for all to access.  So why is the Socrata/OpenData site any better? I’m still wondering that myself.

And the Socrata/OpenData site still doesn’t provide the kind of meaningful data descriptions (or metadata) that you’ll get from agency websites such as Bytes of the Big Apple or Dept of Finance — data descriptions that are absolutely essential for the public to understand whether the information from NYC OpenData is worth accessing.

But hope springs eternal — someone listened to our concerns about lack of actual geospatial data downloads, maybe they’ll also listen when it comes to everything else. Fingers crossed!

Pretty NYC WiFi map, but not useful beyond that

@nycgov posted a tweet on Friday touting the map of WiFi hotspots on the new NYC OpenData site.  I was impressed the city was trying to get the word out about some of the interesting data sets they’ve made public. It was retweeted, blogged about, etc many many times over during the day.

The map is nice (with little wifi symbols  marking the location of each hotspot).  And it certainly seems to show that there are lots of hotspots throughout the city, especially in Manhattan.

But when I took a close look, I was less than impressed.  Here’s why:

  • No metadata.  The NYC Socrata site has zero information on who created the data, why it was created, when it was created, source(s) for the wifi hotspots, etc.  So if I wanted to use this data in an app, or for analysis, or just to repost on my own website, I’d have no way of confirming the validity of the data or whether it met my needs.  Not very good for a site that’s supposed to be promoting transparency in government.
  • No contact info.  The wifi data profile says that “Cam Caldwell” created the data on Oct. 7, 2011 and uploaded it Oct 10.  But who is Cam?  Does this person work for a city agency?  It says the data was provided by DoITT, but does Cam work at DoITT?
    • If I click the “Contact Data Owner” link I just get a generic message form.  I used the “Contact Data Owner” link for a different data set last week, and still haven’t heard back.  Not even confirmation that my message was received, let alone who received it.  Doesn’t really inspire confidence that I can reach out to someone who knows about the data in order to ask questions about the wifi locations.
  • No links for more information. The “About” page provides a couple of links that seem like they might describe the data, but they don’t.

If I were to use the wifi data for a media story, or to analyze whether my Community Board has more or less hotspots than other Boards, or if I wanted to know if the number of hotspots in my area has changed over time, the NYC Socrata site isn’t helpful.

Even looking at the map on its own, it’s not very helpful.  Without knowing if the list of hotspots is comprehensive (does it include the latest hotspots in NYC parks? does it include the new hotspots at MTA subway stations? etc) or up to date (the Socrata site says the list of wifi sites is “updated as needed” – what does that mean?), I have zero confidence in using the data beyond just a pretty picture.

I’m sure if I clicked the “Contact Data Owner” link, eventually I’d get answers to these questions. But that’s not the point.  The point is that the new NYC OpenData site bills itself as a platform to facilitate how “public information can be used in meaningful ways.”  But if the wifi data is any guide, the OpenData site makes it almost impossible to meaningfully do anything with the data.

The wifi data is another example of how I think NYC’s implementation of the new Socrata platform is a step backwards.  Other NYC websites that provide access to public data — the City Planning Department’s Bytes of the Big Apple site as well as agency-specific sites from Finance, Buildings, HPD, and others — all provide detailed metadata, data “dictionaries”, and other descriptive information about available data files.  This contextual and descriptive information actually makes these data sets useful and meaningful, inviting the public to become informed consumers and repurposers of the city’s data.

The Socrata platform, in and of itself, seems great.  But NYC hasn’t done a very good job at all of putting it to use.  #opendata #fail

NYC’s new OpenData website: soars and falters all at once

UPDATE (10/13/11)

This evening I received a call from NYC DoITT.   They were mainly calling to tell me that they changed the official rules for BigApps 3.0.  Yesterday the rules said that no new data would be added to the OpenData site until after the BigApps competition.  As I said in my blog, why wait?  But DoITT saw that and agreed.  So now that clause has been removed from the rules (see section D.1).  DoITT says that they agree they data should be accessible whether there’s a competition in effect or not.  That’s great news!  I’m looking forward to more dialogue on the other issues I’ve raised below.

___________________________________________

ORIGINAL POST

New York City yesterday announced its new version of what had been called its “Datamine” website, a single online point of entry to access the city’s digital data holdings.

I’ve critiqued the Datamine project before, but I was heartened by the city’s choice to use the Socrata platform to upgrade Datamine. As I wrote a couple of months ago:

NYC’s Datamine was an improvement in some ways over earlier opendata efforts in New York. Now that it’s been around for two years, I think it’s fair to say that Datamine is clunky at best. For me, I can’t wait for it to be replaced by something better. I’m looking forward to the NYC/Socrata roll out.

Yesterday’s announcement came with great fanfare: 230 new data sets! (so they say), BigApps 3.0!, cash prizes!, etc.

But is “NYC OpenData” any better than Datamine?

After digging into the site for several hours last night and today, I’d have to say yes and no. It has some great stuff with great promise, but it still falls flat in some key areas. I look forward to using it for the APIs, but for the raw data I’ll go back to the individual agencies that in many cases are doing a better job of providing access to the data.  Overall the city has come a long way with open data, but I still think the city’s concept of data-as-economic-engine is misguided.  More on that below.

The good

Socrata’s platform is impressive. I’ve blogged about it before, but it’s worth summarizing some of the high points:

  • You can immediately preview the data in your browser (no downloading needed just to see what it contains). And you can view more details about each row in the file — very helpful if you’re interested in one particular aspect of the data.
  • You can visualize  the data in multiple ways — using an interactive map option built into the platform or using one of 9 different chart options.
  • If you want to download/export  a data set, they give you at least 8 formats for extracting/exporting.
  • Short links and “perma” links are available to each data set.
  • There’s a “Discuss”  option where anyone can attach notes and commentary for each data set.  It’s user-generated metadata — you can immediately see, for example, if anyone else has commented about the data’s quality, or completeness, or how up-to-date it is.

The big news with this new approach is the availability of an API for programmatic access for each data set in the Socrata system.  On its face, the APIs look great, and the city deserves kudos for implementing them.  Socrata has developed a template for developers to hook into the data — either row by row, selected queries, or to view metadata — and the template also provides data publishers with guidance on how to structure their data for automated consumption.  And, it seems that DoITT has created web services for the mapped data sets, which is a big step forward.

There are other improvements with specific data sets, such as:

  • It looks like the map data for NYC park boundaries is fixed — I posted a detailed review last year about how the parks data via Datamine was basically impossible to use.  I had to scrape the NYC Parks website to convert it to a useful format. But now the park names are included with the park IDs in the same file. (However, this improvement is tempered by the fact that I can view the map of parks on NYC’s Socrata website, but I can’t download the data in a mapped format. I discuss that in more detail below.)

There are some interesting new data sets.  Two things that caught my eye are:

  • School zones are included in the data, which is something I had urged the city to include [PDF] when the BigApps competition was first announced in 2009.  (School zones are the key determinant as to where your child can attend public elementary school, rather than the administrative school districts.)  But the earlier version of Datamine included school zone boundaries, so this isn’t really new.
  • HPD Registrations.  Unfortunately the data dictionary accompanying this file can be cryptic, so I couldn’t easily decipher exactly what the file includes. But it seems to be a list of almost 140,000 buildings in the city registered as “multiple dwellings” along with each building’s landlord/owner, managing agent(s), and building details.  Should come in pretty handy for anyone interested in the landlord landscape in New York.

Here’s an example of why the data dictionary is not very helpful – the excerpt below is trying to tell us what the “REG-INDV-HM-UNIT-NO” field means:

Um, what?

I thought it was also intriguing (in an insider baseball kind of way) that the interactive maps used at the NYC Socrata site to show mapped views of the data are from ESRI.  And the API/web services provided for the mapped data files are ESRI-based.  DoITT’s GIS unit has made a point of using non-ESRI technology for its interactive maps (Citymap, Scout, ZoLA, etc). But the GIS web services for Socrata all come from DoITT.  Wonder what’s happening there.

The not-so-good

The Mayor’s news release about the new Socrata site proclaims that more than 230 new data sets are included. We don’t get any details about which ones; the release simply says that:

Examples of this new data include a directory of HHC Facilities; electricity, gas and steam consumption available by zip code; and school attendance and report statistics.

But I looked pretty closely at what new data sets I could find, and I was hard pressed to identify more than a few dozen.

Examples of old data masquerading as new simply because it’s available through the new Socrata site include many of the files from NYC’s Dept of Finance, such as:

  • Condominium comparable rental income listings (38 individual datasets);
  • Cooperative comparable rental income listings  (40 datasets); and
  • Summary of Neighborhood (Property) Sales (21 datasets).

That’s almost 100 data sets right there, close to half the number the city says are newly available.  But each of these have been online, for free download, at Finance’s website for several years.  This page notes that coop sales information has been available since 2006, and Finance started making the data available for batch download a couple of years after that.  The Neighborhood Sales data was put online a couple of years ago.  And Finance’s website has more thorough information about the data sets and how to use them than the Socrata site.

Other not-so-new examples include:

  • Street centerlines.  These are from DoITT circa 2009. In contrast, the City Planning Department “LION” file at DCP’s website is from September 2010, and is updated regularly.
  • Building perimeters. From DoITT circa 2010.  But DoITT has a more recent file at their website for direct download (click through the online agreement and you’ll find building footprints from March 2011).
  • Coastal boundaries. From City Planning, but this was posted on the Bytes of the Big Apple site last month.  Great data set, but not new.
  • Campaign contributions. From the NYC Campaign Finance Board.  The data is current (covering the 2013 election cycle), but the files are already available in batch format and via a searchable website from CFB.
  • Landmarks data. There are multiple, conflicting data sets at the NYC Socrata site regarding landmarks.  For example, one data set of “NYC Landmarks” is from 2009, another (called “LPC Landmark Points”) is from 2010.  Either way, there have been several new landmarks and historic districts designated since then by the Landmarks Commission.

Even if there was only one new data set in the new Socrata site, that’s better than nothing. But there’s so much data maintained by city agencies that is still not easily, publicly accessible.  My blog post when BigApps was first announced in 2009 has a listing of some key data files that still haven’t seen the light of day.

The city should be doing a better job — especially since there’s been so much pressure on them to improve their open data policies, they have an avowed policy of doing so, and they’re also under a state law (FOIL) to require them to do so. Frustrating.

One of my biggest and longest standing gripes is about property data.  There are a number of property-related files the NYC Socrata website.  But nothing that allows us to come close to the City Planning Department’s “MapPLUTO” dataset.  The city still charges a fee (up to $3,000 per year) with a restrictive license agreement in order to access the PLUTO data — a mapped file of all properties in NYC with a wealth of information about each one (zoning, ownership, building heighs, land use categories, assessed value, etc).  It’s an essential data set for anyone trying to understand real estate, urban planning, neighborhood change, and more in the city.

When will City Planning get it? They’ve done such a great job of making other data sets available — files they used to charge for but now provide for free, and in better formats, with great metadata, and updated frequently.  The agency obviously spends a lot of time preparing these other data sets that are freely available, so I don’t buy the argument that the PLUTO fee covers their “costs” of doing extra work to put PLUTO together.  I just don’t understand.  And property data is so incredibly useful in NYC — certainly to the big real estate players, but I’m not concerned about them.  If it were free for everyone, at least we’d have a chance at a level playing field — helping “the little guy” do property analysis and mapping so he/she can analyze land use, understand policy implications, etc.

Data for people, not just machines

Data access — at least in this first iteration of the new Socrata site — seems to be weighted toward APIs, and therefore app developers. I understand the value of the API approach — I’ve developed apps myself, and at CUNY we have online sites that can definitely make use of the APIs. And I was kind of amazed that DoITT opened these up.  So the APIs are good, and perhaps they’re worth the effort to create and maintain a one-stop-shop like NYC Socrata.

But for the average user — someone at a Community Board, or a local media outlet, or a City Councilmember’s office — the city’s implementation of the Socrata system seems against them.

For example, with one or two exceptions I wasn’t able to download any mapped data sets from NYC Socrata.  Many files (45 by my count) are described as “GIS datasets”, and they’re obviously in ESRI’s “shapefile” format to begin with, but the “Export” option only provides flat files (CSV, JSON, XLS, XML for example), and not even the now-ubiquitous KML format (used by Google and many others).

If I click the API link for these data sets, this enables me to view the data as map layers in my desktop GIS application.  But I can’t extract any actual data from these links in order to work with it on my own.  The screenshot below (from ESRI’s ArcCatalog application) seems promising, but the inability to download the mapped data itself is very limiting.

It’d be easy enough (I’m assuming) to just add shapefiles to the list of Socrata’s data export formats. The shapefile format (.SHP) is already basically an open one (all the major open source GIS packages read it), so why force GIS users to do extra work to access GIS data?  And why have DoITT go through extra work converting from SHP to something else, just to have the user convert it back again. For “point” locations this isn’t a big deal — it’s easy enough to convert latitude/longitude coordinates into a mapped data set.  But this isn’t straightforward at all for polygons (district boundaries, for example) or lines (streets, transit routes, etc). I’m not saying don’t provide the data in the other formats, just add SHP to the list where appropriate.  (Some GIS datasets are available as GIS downloads: school zones, for example. But this is an exception, as far as I can tell.)

Indeed, not having GIS-ready formats is a step backward. If I visit the City Planning Department’s “Bytes of the Big Apple” website, I can download a wealth of files in GIS format, and several of them are updated regularly. It’s great. Hopefully the NYC OpenData site doesn’t supplant the individual agency sites. For now, they’re better for me, and I’d imagine they’re better for many other users.

And having the raw data, rather than just API access, gives users more flexibility.  For example, during the preparation for Hurricane Irene, several organizations downloaded NYC Datamine files in GIS format to create interactive maps of evacuation zones and evacuation sites.  (And these groups helped the city in a big way because the city’s own maps and website were down, making it difficult if not impossible to get essential information from NYC.gov.)  But the city changed several of the evacuation sites just a day or two before the storm was going to hit.  If the outside organizations didn’t have the raw data that we could update ourselves, our presentation of the evacuation sites would’ve been incorrect and misleading.  I wouldn’t want to rely on the city updating its API in a crisis situation like that, given how rocky the city’s digital response was to the storm itself.

Tying open data to app competitions & economic growth is the wrong approach

(Note: my concern here still stands, but the city has modified its position a bit, which is great.  See the 10/13 Update above.)

I think the real issue here is that the city’s open data efforts are being driven more by the desire to use data access as a way to leverage economic development, and less about true government transparency.

For example, as with the first two BigApps competitions, no new data files will be added to the Socrata site until the latest BigApps competition is over (see section D.1 at the official rules).  Why wait?  Why should app developers get preference?  What about the rest of us? Is NYC providing data just so app developers can do free work for the city, and so the city can make a news splash about open data? Open data should be open 24/7 — and should be updated on a regular basis — not just when it’s convenient for the city and for developers.

Next steps

I understand that the new NYC Socrata site is a work in progress, and will almost certainly be improved going forward.  But for now, although it includes lots of data, much of this has already been available elsewhere.  The APIs are intriguing, but I hope they don’t preclude other ways for people rather than machines or apps to access the data.

At this point, with few exceptions I still would prefer to go to the individual agency websites (or even talk to agency staff and request the files via email, or even via disks & snail mail!) to get the data — from what I’ve seen so far, chances are it’ll be more timely, in better quality, and I’ll have better access to metadata/explanations of the files.

I’m even wondering if instead of a Socrata-like site, it might not be better to encourage the agencies directly responsible for creating the data to continue efforts to provide public access, and having them engage with people using the data so they’d see the benefits of open data (and/or realize that it’s not so bad to provide access to their files to the broad public in easily accessible ways).  At the least, the new NYC Socrata site shouldn’t preclude this agency-specific work to be done.

I’ve already had a good, late-night exchange on Twitter with DoITT on some of these issues. I’ll be submitting feedback directly at the Socrata website.  And hopefully the dialogue will continue.

NYC bikeshare maps & spatial analysis: an exploration of techniques

UPDATE (Feb. 2012)

  1. Reader Steve Vance suggests in the comments below that I could use Google Refine to parse the JSON file and convert it to Excel without relying on the tedious Microsoft Word editing process I summarize below.  He’s right.  Google Refine is amazing. It converted the JSON file to rows/columns in about a second.  And it has powerful editing/cleaning capabilities built-in.  Thanks Google!
  2. Alas, I had hoped to test Google Refine on the latest list of user-suggested bikeshare stations.  But when I checked in mid-February, the link at http://a841-tfpweb.nyc.gov/bikeshare/get_bikeshare_points no longer returns all the detailed info about each suggested site.  It only returns an ID and lat/lon for each site.  There’s another link I found that returns the details (http://a841-tfpweb.nyc.gov/bikeshare/get_point_info?point=1), but it seems to be just one at a time (change the “point=1″ value).  Sigh.  If someone wanted to replicate what I’ve done with the latest data, perhaps either NYC DOT or OpenPlans could provide the file directly.

Original Post (Sept. 2011)

Two weeks ago New York City announced an ambitious bikeshare program, designed to provide 10,000 bikes at 600 bike-sharing stations in Manhattan and parts of Brooklyn by next summer.  I had two immediate thoughts:

  1. I wondered if all 10,000 new bikers will ride like delivery staff and further terrorize me and my pedestrian 5-year olds; and
  2. safe or not, the bike stations would be put somewhere, and maps can likely help figure out where.

I’m a cartographer, so I’ll focus on the second issue for the purpose of this blog post. My maps and analysis below don’t provide any definitive answers — they’re more of an exploration of spatial analysis techniques using the bikeshare data as an example.  I don’t know if this will be helpful to DOT, but if it is, then that’s great.  If not, hopefully at least they’ll be of interest to GIS and biking geeks alike.

NYC’s bikeshare stations: crowdsourcing suggestions

To help figure out where the bikeshare stations might be located, the city’s Dept of Transportation partnered with OpenPlans to provide an interactive map where anyone could suggest a location and provide a reason why they thought it was a good spot. If someone has already picked your favorite spot on the map, you can select that marker and click a “♥ Support Station!” button to register your approval.  Added up, these supporting clicks can provide a “rating” of how many people like each location.

It’s a great, easy to use app. Within just a few days several thousand people had posted their suggestions.  According to DOT,

As of September 20 at 3:30pm [just 6 days after the suggest-a-site went live], we have received 5,566 individual station nominations and 32,887 support clicks.

(via OpenPlans)

But the map looked overwhelmed! Manhattan was covered, as was most of downtown Brooklyn.  It seemed like almost everyone wanted a bikeshare station on their block.  New York Magazine put it this way:

As you can see in the map above, New Yorkers have spoken: The best spots for bike stations are … everywhere/wherever is right next to them.

I wondered how useful this crowdsourced data actually would be for identifying the best sites for bikesharing stations.  NYC DOT says it will be conducting “an intensive community process” to involve multiple stakeholders in helping decide where the 600 stations will go.  Presumably several factors will determine station locations, but it seemed like the crowdsourced data could play a key role — hopefully the website was more than just a PR ploy.

Given all those “dots on a map,” it seemed like a good opportunity to examine how spatial analysis tools could be used — first to see if the crowdsourced location patterns meant anything, but then to see if there’s any value to using them in siting analysis.  Had “the crowd” told us something new and useful, or was it something we already knew and would be better determined through DOT’s public process?

Spatial patterns

Luckily OpenPlans (and DOT) designed the suggest-a-station website so all those dots on the map could be scooped up via a simple HTTP request and converted to GIS format.  At the end of this post I describe how we got the data and put it into a mappable format.  Once we did, we were able to analyze it spatially.  I’ll post the shapefile, as well as a version at Google’s Fusion Tables, shortly.

A few days after the program was announced, DOT produced a “heat map” that “illustrated the number of suggestions and supports per square mile as of September 19″ (map at right).

Our version of a “heat map” using the September 19 data (based on the results as of 9am that day) is shown below.  (Our map uses the same rating scale as the DOT map, but its slightly different patterns could be due to different model specifications to create the map.  We used ArcGIS’s “Kernel Density” function to develop our map — DOT may have used a different method. Even if we both used kernel estimation, this technique can result in different surface patterns based on different inputs such as cell size and search radius.)

But do these maps really tell us anything useful? Some people tweeted that the concentration of suggested bikeshare sites matched New York’s “hipster” population.  Others said that the patterns were “almost perfectly congruent with race/class/culture divides” in the city.

I disagree — I don’t think the suggested bikeshare patterns match any obvious demographic characteristics, whether it’s race/ethnicity or “hipsterism”.  (This may be worth pursuing further, but for now I leave that to others.)

I think a more likely relationship is based on where people work.  The orange-to-red areas on both maps — indicating a high concentration of suggested bikeshare sites with high ratings from website visitors — match the locations of the city’s commercial areas: Manhattan below 59th Street and downtown Brooklyn.

Another possibility, though, is that people who suggested bikeshare locations were just following DOT’s preferences – a spatial version of survey response bias.  In its bikeshare FAQ, DOT says that phase 1 of the program will focus on the following areas:

Manhattan’s Central Business District and nearby residential areas, including Brooklyn neighborhoods of DUMBO, Downtown, Fort Greene, Bedford-Stuyvesant, Williamsburg, Greenpoint and Park Slope

The NYC Planning Department produced a map of these areas in a Spring 2009 report [PDF] as follows:

Superimposing the Phase 1 area on the rating density map above shows that there’s almost an exact match between Phase 1 (outlined in dark pink) and the highest concentrations of suggested/supported sites (the dark orange and red areas on the map):

So based on these density maps (“heat maps”), it’s not clear if the overall patterns from these maps tell us anything interesting about the wisdom of the crowd, or useful about where to put bikesharing stations.

Digging deeper

But whether the overall patterns mean anything or not, maybe the suggested locations could be analyzed to see if they have value as criteria for local siting decisions. In other words, within the patterns, maybe we can use the crowd’s suggestions as a key piece of analytic information, providing quantifiable indicators about where the stations should go.

More than 2,700 bikeshare locations (as of Sept. 25) were suggested within the Phase 1 area — four and a half times the 600 sites that will eventually be sited.  Perhaps they covered every possible bikeshare site. But perhaps there’s also a pattern (or patterns) to the suggestions that will help with the decision to whittle 2.700 down to 600.

For simplicity’s sake I evaluated the suggested station locations against one criteria — proximity to subway station entrances.  Obviously there are other factors to examine (threshold bikeshare station density, proximity to specific residential or employment centers, terrain, etc).  But several people have noted that a bikeshare program can extend the reach of subways — transit riders could ride to a distant subway more easily, cheaply, and quickly than a bus or a cab, or when they reach their subway stop they could pick up a bike and ride to their final destination without the hassles of a cab, etc.  So my assumption is that proximity to subway stations will be a key factor in determining bikeshare station locations.

But how do the suggested locations from the DOT/OpenPlans map compare with that hypothesis?  Are the highly rated bikeshare sites near subway stops?  About 10% of the suggested sites included reasons that mentioned subways.  Did website visitors suggest enough bikeshare sites near subways to make it easier for DOT to pick and choose which ones are best?

(Btw, this same type of analysis can be applied to bike routes, for example.  I just wanted to focus on one component for now.)

Spatial analytics

I used several spatial analysis techniques available through ArcGIS’s toolbox to shed some light on these questions.  The tools are powerful, and ESRI has made them easy to use and interpret.  The tools also underscore the power of GIS beyond making maps — extracting information based on the spatial relationships of multiple geo-referenced data sets.

In order to compare suggested bikeshare sites with subway stations, I used the file of subway entrances/exits available from MTA (current as of July 19, 2011).  The file provides the latitude/longitude of 1,866 entrances and exits, identifies the station name for each one, and lists the routes that serve these stations.  It provides a more precise spatial measure of access to the subways than a single point representing the center of each station (which is how stations are shown on most interactive and print subway maps).

With this file, we can determine how close each suggested bikeshare site is to the actual spots where people exit and enter the subway system.

To calculate proximity, I used the “Near” feature in the ArcGIS Toolbox, which “[d]etermines the distance from each feature in the input features to the nearest feature in the near features.”  I analyzed 5,587 suggested bikeshare sites based on the DOT/OpenPlans map as of Sept. 25 (see data discussion at the end of this post). Here are some statistics:

  • 92 sites were within 25 feet of a subway entrance;
  • fully one-third (1,954 suggested sites) were between 25 and 500 feet of a subway entrance (the length from one Manhattan avenue to the next is usually about 600 feet);
  • another quarter of the sites (1,677) were within 500 and 1,250 feet (1250 ft being roughly a quarter mile, the rule-of-thumb distance that people will walk for public transportation); and
  • the remaining 2,134 were more than a quarter mile from a subway entrance.

Seems like lots of bikeshare stations were suggested in close proximity to subway entrances. If the actual bikeshare sites will be near subways entrances, which entrances should we pick?

(An aside: since just over a third of suggested bikeshare sites were located relatively far away from subway entrances, we can also evaluate these patterns.  The hypothesis would be that if people are picking up bikes at subway stations, they’re using them to travel to destinations further away from subway stops.  Therefore some of the bikeshare sites will need to be located in these “destination” areas, and DOT will need some spatial criteria for locating them.  I’ll save this for a follow up blog post.  Thanks to Kristen Grady for suggesting it.)

One way to visualize the bikeshare/subway entrance relationships is with the following map, showing the subway stations in blue (just a center-point representing the middle of the station) and the bikeshare sites color-coded by proximity (I’ve limited the display of bikeshare sites to only those within 500 feet of a subway entrance so the map wasn’t too cluttered):

This map might be helpful, but you have to visually decide which clusters of close-by bikeshare sites are the most concentrated in order to prioritize which subway stations to focus on.  The map also omits the rating values.

If we incorporate ratings, the map below is an example of the result.  It only shows bikeshare sites very close to subway entrances — within 50 feet — and ranks the symbol size based on rating.  (We could just as easily pick another distance threshold, or display several maps each using a different distance threshold.)

This helps us focus on which subway stations might be best for a nearby bikeshare station, based on suggested bikeshare sites nearby that are ranked highest.

But we can use GIS to be more precise.  Another approach would be to visualize the pattern of the subway entrances themselves, based on average rating of each entrance’s closest bikeshare sites.  In other words, I’d like to use the ratings given to each suggested bikeshare site and assign those ratings to their closest subway entrances.  This will have the effect of combining subway proximity with bikeshare rating, and the resulting map will integrate these patterns.

Here’s an example of the result, with the rated subway entrances juxtaposed with the density map of rated bikeshare sites from earlier in this post:

This map says, “If you want to put bikeshare stations near subway entrances, these are the entrances you’d pick based on the average rating of the closest stations suggested by ‘the crowd’.”  It’s a way of prioritizing the bikeshare station siting process.  These subway entrances are the ones you’d likely start with, based on the preferences of the (bike)riding public who contributed to the DOT/OpenPlans map.

It looks like many subway entrances follow the overall pattern of bikeshare sites with the highest ratings. But there are some interesting differences in the above map. A couple of sites are completely outside the Phase 1 area (an outlier each in the Bronx and Queens), and only two subway entrances with average high ratings are in Brooklyn. The rest are in lower Manhattan. But only one of the Manhattan sites is near the highest rated area centered around NYU:

Here’s another view of this area, with the rated subway entrances overlain on a Bing street map:

In order to create the rated subway entrance map, I used the Voronoi polygon technique, also know as Thiessen polygons (Voronoi was a Russian mathemetician, Thiessen was an American meterologist.)  Voronoi polygons are enclosed areas surrounding each point (subway entrance) so all the other locations (in this case, bikeshare sites) within the polygon are closest to the enclosed subway entrance than any other entrance.  The subway entrance Voronoi polygons look like this:

Here’s a close up, with the subway entrances displayed as pink stars, and the suggested bikeshare stations as blue dots:

The blue dots (bikeshare sites) within a polygon are closer to that particular polygon’s subway entrance than any other entrance in the city. Other GIS techniques, such as creating a buffer around each subway entrance, or even using the “Near” calculations I described earlier in this post, wouldn’t precisely determine the closest criteria for all the points automatically and at once.

The other nice thing about creating Voronoi polygons is that the attributes of the reference points are transferred to the polygons (the polygons end up with more than just a random ID number; in this case, they include all the corresponding subway entrance attributes).  From there I did a spatial join in ArcGIS, joining the bikeshare sites to the polygons.  This automatically calculates the count of all points in each polygon, as well as statistics such as average and sum for any numeric attributes in the point file.  In this case, each subway entrance Voronoi polygon gets a count of the bikeshare sites within it (i.e., the ones that are closest to that entrance) as well as the summed rating and average rating.

From there we could create a choropleth map of the Voronoi polygons. But since we’re interested in the entrance locations rather than an aggregated area around them, I chose to create a graduated symbol map of the actual subway entrances. So I did an attribute join between the Voronoi polygons and the entrances using the shapefile ID field.  That enabled me to make the “Average rating by subway entrance” map above.

Limitations

One limitation to the Voronoi approach is that closeness is measured “as the crow flies.” There are other techniques that measure proximity using “Manhattan distance” (i.e., distance along streets rather than a straight line), such as ESRI’s Network Analyst extension for ArcGIS, but I’ll leave that to the DOT analysts who are going to decide on the actual bike share sites.

Other limitations of this approach have to do with the data themselves.  The bikeshare data from the DOT/OpenPlans website has issues such as:

  • entries accompanied by fictitious names (some examples from the Sept. 25 data include “Andy Warhol”, “George Costanza”, “Holden Caulfield”, “Lady Liberty”, and “United States”.  One or more people using the “United States” pseudonym submitted 51 entries throughout Manhattan, Brooklyn, and Queens, plus a single entry in the Bronx); and
  • multiple entries submitted by a single person. Someone – or some people – named Ryan submitted 143 entries.  Someone named Andrew Watanabe submitted 85 entries. Ryan and Andrew were the top two submitters.  After them and “United States”, there were 4 others who submitted 40 or more entries. It’s possible that these were all sincere. But some seem to be pretty goofy. Of Watanabe’s 85 suggested sites, for example, several included the following reasons:
    • “When whales accidentally swim into the Gowanus, they will be able to ride bike share bikes back out to sea.” (site on the Gowanus Canal)
    • “This will keep drunk booksellers from passing out on the sidewalk.” (site near the Bedford Ave L train stop in Williamsburg)
    • “When the zombie apocalypse comes, they will be riding bicycles. BRAAAAAINS!” (site in the middle of Mt. Laurel Cemetery in Queens)

Multiple entries might be fine, but if someone started plunking down markers on the map just for fun, this doesn’t really help us with meaningful location criteria.

There’s another concern about the crowdsourced data – the squeaky wheel problem.  The first map below shows the bikeshare suggestion pattern as of September 19; the second map below shows the patterns as of September 25.  The more recent map shows a new concentration of sites at the northern tip of Roosevelt Island (as well as a greater concentration in lower Manhattan and downtown Brooklyn, areas that already were very dense):

 

Sept. 19 patterns

 

Sept. 25 patterns

Why did northern Roosevelt Island all of a sudden become such a bikeshare hotspot?  I can’t say for certain.  But in a blog post on September 14 at the Roosevelt Islander, residents were urged to add sites to the DOT/OpenPlans map.  The post ended with the pitch:

So here’s what you can do to bring bike sharing to Roosevelt Island. Click on this link and say you want a bike sharing station on Roosevelt Island – do it now – please [emphasis added]

I don’t think making a pitch like this is a bad thing. (Far from it! It seems to have succeeded in getting attention on bikeshare sites on the island).  But whoever will be analyzing the sites from the DOT/OpenPlans map will need to decide if (and how) they should discount these crowdsourced lobbying efforts so the squeaky wheels don’t skew the map.

Making sense of it all

My analysis in this post is more for illustration than for actually determining best locations for bikeshare stations. A more rigorous analysis would need to deal with the data limitations I mentioned above, and also factor in other criteria.

But it was a fun exploration of the data and the techniques, and hopefully provides some useful ideas if readers are thinking of other spatial analysis projects involving proximity (especially the “closest” criteria).  I’m indebted to DOT & OpenPlans for enabling the creation of an interesting data set — the suggested bikeshare sites — for me to brush up on my spatial analysis skills.

Does my initial exploration shed any light on the wisdom of the crowd? It’s probably too early to tell (or my analysis was too limited to meaningfully evaluate the suggested sites).  But even so, I think the techniques I’ve described are helpful for prioritizing sites and for quantifying the results.  In that respect, the crowd’s input is a good thing.

Data issues, as always

Here are the steps we used to download the suggested bikeshare sites from the DOT/OpenPlans website in order to map and analyze the data:

  1. We used Fiddler to figure out that the suggested station locations were being maintained in a text file (in JSON format) available via http://a841-tfpweb.nyc.gov/bikeshare/get_bikeshare_points (Dave Burgoon ferreted this out).
  2. The JSON data looks like this:
{
"id":"4830",
"lat":"40.742031",
"lon":"-73.777397",
"neighborhood":"Fresh Meadows",
"user_name":"David",
"user_avatar_url":"",
"user_zip":"11355",
"reason":"There is no public transportation from Brooklyn-Queens greenway (Underhill Ave) to Flushing Meadows. By placing bike stations from Cunningham Park thru Kissena Park to Flushing Meadows will allow residents enjoy the parks more.",
"ck_rating_up":"1",
"voted":false
}

I don’t know of a straightforward way to read a JSON file into a desktop GIS package, so I needed to restructure the file into rows & columns.  I chose to do that with a series of Find/Replace statements in MS Word (perhaps there’s a better/more efficient way, but this approach worked), then added a row of field names, and saved the result as a .TXT file (one row of which is shown below):

id,lat,lon,neighborho,username,avatar,zipcode,reason,rating,voted
4830,40.742031,-73.777397,Fresh Meadows,David,,11355,There is no public transportation from Brooklyn-Queens greenway (Underhill Ave) to Flushing Meadows. By placing bike stations from Cunningham Park thru Kissena Park to Flushing Meadows will allow residents enjoy the parks more.,1,FALSE
  1. We’re primarily an ESRI shop at the CUNY Center for Urban Research (with periodic forays into open source, as well as a longstanding reliance on MapInfo for some key tasks).  So my next step was to convert this to a shapefile — which I did by using ArcGIS’s “Display X/Y Points” tool to create a point file based on the lat/lon values.
  2. Just in case there were multiple points at the same location, I ran the ArcGIS script called “Collect Events“, which aggregates point data based on location, and creates a new shapefile of each unique location with a count of all the points at each location.
    • I downloaded the JSON file a couple of times between Sept 19 and 25.  In the latest one (September 25, downloaded at 11pm) there were 55 points at latitude 40.7259, longitude -73.99 (a location at the intersection of E. 3rd Street and Second Ave in Manhattan).  But the user-supplied ZIP Codes and comments for most of these points indicated that they should have been all over the city.
    • Turns out this location is the center point of the Google map that’s displayed at the DOT bikeshare website.  If you zoom in on the DOT/OpenPlans map you’ll bring the map center into close view — and you can see the heavy map marker shadow due to all the points placed at that spot:

    • Presumably what happened here is that when you click the “Suggest Station” button, a marker is put at this spot by default. The marker is accompanied by a note that says DRAG ME! Then click ‘Confirm Station.’  But I’m assuming that 55 people didn’t drag the marker, but just left it there after they had entered their information. (I guess that’s not too bad — only 1 percent of the people using the site didn’t follow directions.)
    • Earlier this week (9/27) it looks like these sites were removed from the live map.  For my purposes, I removed those points from the shapefile, otherwise it would skew the analysis.  I could have put them somewhere in the ZIP Code that was entered with each spot, but I couldn’t be sure of the precise location (the reasons were vague regarding location), and I didn’t want to skew the analysis the other way.
  1. Other data notes:
    • There were 8 locations outside the immediate New York City area – some as far away as Montreal and Portland, Oregon.
    • The reason provided for Portland location was: “Even though it’s a whole continent from NYC it always seems to me like our cultures admire one another. I think NYC would enjoy all the benefits of positioning one of their Bike Share stations in Portland as sign of goodwill and mutual admiration.”
  1. There were also 53 points with lat/lon = 0, which I assumed was just a data entry/processing error.

Out of the 5,973 points as of 11pm September 25, after I removed the 55 locations and zoomed in on the points in or immediately near New York City (and omitting the 8 outside the city and the 53 with lat/lon=0), I ended up with 5,857 points.

Mapping Hurricane Irene in NYC (plus some thoughts on the city’s digital response to the storm)

A disaster, natural or otherwise, always creates an opportunity to demonstrate the power of maps. Hurricane Irene did not disappoint. In New York City, which hadn’t seen a hurricane of this magnitude in decades, there were at least a half dozen websites with interactive maps related to the storm (plus at least one PDF map – more on that below) that were used extensively and were tweeted about extensively. My team at the CUNY Graduate Center was in the mix with our OASISnyc.net site, and I was watching with keen interest as more maps kept coming online as Irene kept coming closer. I thought I’d share some observations below about how Irene was mapped in New York.

I think I kept good track of the various maps that were deployed, but I’m sure my list and descriptions are incomplete so please chime in if I’ve missed anyone or mischaracterized any of the efforts.

The Context

Hurricane maps are nothing new, but usually the maps show the path of a hurricane while it’s happening or analyze its impact after the storm has past. This time, for New York City, the more interesting and useful maps were focused primarily on the possibility of evacuation, and the potential impact of the storm on New York’s shores.

(That said, the damage from Irene continues north of NYC, and several important mapping efforts are helping with the recovery effort there. For example, follow tweets from @DonMeltz and @watershedpost in upstate New York, or @jarlathond in Vermont.)

The interest in these maps was also perhaps more intense than in earlier situations. First, New Yorkers almost never evacuate for anything (at least on a scale of hundreds of thousands of residents), so the idea that so many people from only certain areas of the city needed to move to higher ground meant that everyone wanted/needed to know: am I in the evacuation zone? And that meant maps.

Second, online interest in this storm in particular was high. Other storms have hit since Twitter and Facebook have been around, but not in the New York area and not at this scale. One writer for GigaOm who had lived through hurricanes on the Gulf Coast wrote that she was “overwhelmed” by the “overall hoopla surrounding Irene online.” For her, it replaced TV as a key source of news (I agree, I barely checked TV news throughout the storm. Twitter and weather-related websites provided all the information I needed, and the news from these sources was more up-to-date.) And because so many New Yorkers were online and hungry for information about evacuations and storm impacts, online maps were critically important.

Will I Need to Evacuate?

Mayor Bloomberg and other officials started talking about the possibility of evacuation on Wednesday (8/24). That night, my wife reminded me that our flagship mapping site OASISnyc.net included a layer of “coastal storm impact zones”.

Actually, we’ve had that data online since 2007, when it was a Map of the Day on Gothamist. It shows areas at greatest risk of storm surges from a hurricane (and, as it turns out, those areas closely match the boundaries of the city’s evacuation zones – see screenshot below). I had also received a couple of emails that night from other groups wanting to map the evacuation zones and were worried that the city’s mapping resources weren’t up to the task.

So I wrote a blog post about how using OASIS could help people see if they were in harm’s way if the storm hit the city. I published the post the next morning (Thursday, 8/25).

That same morning, Gothamist posted an item about the potential for evacuation, and they embedded the city’s evacuation zone map. I was the first to add a comment on the Gothamist piece (via our @oasisnycmaps Twitter account), and I included a link to my blog post and to the maps.

PDF maps: blessing, curse, or both?

Let’s look at the city’s evacuation zone map [PDF - see image at right]. It’s a PDF file. It shows all the city’s streets in black ink, in an 8.5″ x 11″ layout, overlain on color shaded areas (muted green, yellow, and brown) corresponding to the 3 evacuation zones A, B, and C. And it has the evacuation center locations labeled on the map.

So it puts a lot of information into one map, which is challenging on its own. But trying to view that as a PDF online can be especially problematic. People who expected something better complained — it was described (perhaps too harshly) as “terrible” and “useless” in the Gothamist comments. People said it was hard to read, took too long to download, it didn’t work well on mobile phones, etc. And quickly after I posted my comment at Gothamist, several people were thankful that they could access OASIS as an alternative to the city’s map.

Distributing a PDF map in a situation like this has pros and cons. On the one hand, it has flexibility. The PDF format can be viewed in any web browser, or can be downloaded to your computer and viewed there, and can be printed out to share with someone who doesn’t have Internet access. And lots of people on Twitter were appreciative. On the other hand, it’s not something that can be easily updated, and it’s not what the growing population of digitally savvy New Yorkers would expect or desire. NYC has been touting itself as the most digital city on the planet, and all they could do was put out a PDF? People were underwhelmed.

To be fair, the city also had an online “Hurricane Evacuation Zone Finder.” You’d type in an address and it would display a zoomed in zone map of your location. But that provided little context, and it wasn’t as user-friendly as the public was expecting. For a long time this type of web service would’ve been considered state of the art. But these days, I think a lot of people were wondering if New York couldn’t do any better.

Luckily the city had posted a dataset in GIS format representing the city’s evacuation zone boundaries. It was available on Datamine, and anyone could download it for free and use it without restriction. So when people asked me if I had the evacuation zones in a format that could be mapped, I just pointed them to Datamine.

(In a mix of optimism and revisionist history, New York City’s Panglossian chief digital officer was quoted saying that As always, we support and encourage developers to develop civic applications using public data” (emphasis added), in reference to other groups that were using the evacuation zone map in their websites. I chuckled when I read this. If you’ve been around this business for more than just a year or two, you’d know that it hasn’t “always” been this way. It’s terrific that at least some of the city’s data is openly available now. But let’s keep it in perspective, and also remember that there are still important public datasets the city is not making easily available to developers or others.)

NYC.gov goes down

By Thursday afternoon, online interest in NYC’s impending evacuation announcement was so intense that not only did the city’s zone finder application go down, but even the city’s website — in particular, its homepage — was inaccessible.

Although the city will certainly congratulate itself for using social media to get the word out (and I agree they did a good job in this area), it’s not good that the city that strives to be the nation’s premier digital city could not even serve up its homepage at the exact moment when everyone was relying on that web page for information on what was happening next. And with a situation as complex as an approaching hurricane, 140 character tweets are just not enough. I can’t imagine it’s easy to withstand several million hits in a day, but I and a lot of others expected better.

(After complaints from the community, at least one civic activist posted links to hurricane resources on his own and shared that via Twitter.)

The city was left apologizing for no web access and pointing people to its PDF map (at this point hosted on Tumblr and elsewhere). Mayor Bloomberg posted the PDF on his website, but that’s the least he could do. Simply taking a PDF and putting it on another website? Doesn’t take much to pull that off.

More Maps Come Online

In the meantime, more maps appeared. WNYC was next. John Keefe, the public radio station’s Senior Executive News Producer, mashed up the city’s evacuation zone data with Google Maps, and put a simple, easy to use interface together. The map didn’t include evacuation centers at first, but it was clean, effective, and … in the absence of the city’s online resources … it worked.

In fact, several people noted the irony. When @nycgov tweeted that the city’s hurricane zone finder was down “due to high traffic”, a Google representative quickly tweeted back that “WNYC’s map is based on NYC OEM data and is running fine.”

John has developed a successful system creating news-oriented maps in short order, and his hurricane map was the latest example. And it was embeddable, so sites such as Gothamist that originally embedded the city’s PDF map, quickly replaced the PDF with WNYC’s interactive map. People were happy.

By Friday morning, the city was still having difficulty providing online access to its web page and its hurricane evacuation zone finder app, so more mapping sites stepped up. ESRI published an interactive map of the evacuation zones and evacuation centers using their relatively new ArcGIS.com online platform. The map looked great, and included the evacuation centers that WNYC’s map was missing.

But the ESRI map didn’t have the slimmed down, focused look and feel of WNYC’s site. It included ArcGIS.com options such as geographic feature editing that maybe weren’t needed for this situation. (That’s just a quibble. Though at one point I clicked “Edit” and it seemed like I was about to delete all of Zone A!)

One nice thing about the WNYC site is that is uses Google’s Fusion Tables service on the backend, which makes it easy to set up geographic data and then overlay that data on a Google map or any other modern, online mapping site. At the CUNY Graduate Center we’ve started to use Fusion Tables to integrate community-oriented mapped information into the OASISnyc site. By Friday morning we were able to use Fusion Tables to display the city’s evacuation centers on OASIS’s maps. The OASIS site provides a wealth of information such as subway and bus routes, schools, public housing sites, etc. so it provided a way (hopefully an easy way) to locate evacuation sites in relation to these other locations.

By Friday, Google had also stepped in with a mapping service of its own, a customized version of its crisis mapping application.

Originally Google’s map omitted the city’s evacuation zones or centers, but it did include several other layers of data related to potential storm impacts (like the storm surge map at OASIS). The federal weather and environmental agencies such as NOAA and FEMA have consistently done a great job of providing free, online access to observation and modeling data about storms, and Google put this information to use.

Regional Maps

On Friday our team at the CUNY Graduate Center also made two enhancements to our mapping applications to make it easy for a wide range of people to find out if they might be hardest hit by Irene. First, we reconfigured the OASIS maps so the storm surge layer could load quick. We created a pre-cached tiled layer instead of a dynamic layer and also set up the map page so that most of the dynamic layers were turned off by default. This made the map page load quicker, and made the storm surge layer load instantaneously (our site had bogged down a bit on Thursday due to increased traffic — site usage almost tripled to 9,000 pageviews almost solely from my comment at Gothamist with a link to OASISnyc.net — so quick loading was key).

We also incorporated the storm surge layer to an interactive mapping site we maintain with the Long Island Index focused on Nassau and Suffolk counties. It seemed that the storm might have a greater impact on Long Island. The storm surge data we used for OASIS was statewide in scope (it was created by NY SEMO), so we coordinated with our partners at the Index and updated the site Friday afternoon.

Newsday included a link to the LI Index mapping site, and usage soared over the weekend.

Understandably, an organization such as WNYC would limit its map to the city’s 5 boroughs. But there weren’t similar maps for any other part of the tri-state region.

Even though mandatory evacuations had been called for much of Long Island’s south shore, the best data available on those areas were lists (some in PDF format) of affected addresses and affected streets. Given the surge in usage of the LI Index mapping site, I like to think that we helped meet a key need.

Mandatory Evacuation and More Maps

During the day on Friday, Mayor Bloomberg announced the city’s mandatory evacuation plans. The scramble was on to see if you were in Zone A!

Not to be outdone by WNYC, Google, or anyone else, the New York Times launched its version of an interactive evacuation zone map late in the day Friday.

Like WNYC’s version, the NY Times map was focused and easy to use. But it was also limited to NYC, despite the Times’s readership outside the 5 boroughs who had also been required to evacuate.

By then, WNYC and Google had also added the locations of evacuation centers to its maps.

Lessons Learned?

So what to make of all these maps?

I think the first thing is that they were all generally helpful. When the nation’s premier digital city was incapable of providing digital information in a timely, useful way, others stepped in and succeeded.

These efforts, however, suffered to some extent from inconsistencies and lack of coordination.

For example, different mapping sites displayed different kinds of information in ways that may have been confusing to the person on the street.

Google and OASIS posted storm surge zones and the city (and WNYC, ESRI, and the Times – and eventually Google too) posted evacuation zones. Ultimately what most people wanted to know was if they lived in evacuation Zone A. The storm surge areas were important in terms of anticipating where the storm would do the most damage, but perhaps a more pressing issue was the evacuation.

But this difference in approaches underscores the lack of coordination among the various mapping entities. It was as if everyone just wanted to get *their* map online.

We’re as guilty of that as anyone. I know top staff at OEM and I easily could’ve contacted them to coordinate the OASIS layer with their’s. But it was somewhat frantic at the time, and the communication didn’t happen. I’d say WNYC was the most earnest in this regard, since they probably just saw a hole that needed to be filled – the city was talking about evacuation, but the city’s evacuation map was sorely lacking or not online.

But once WNYC went online, as far as I know there was little coordination among them, us, ESRI, the NY Times, Google, etc. I think you could reasonably ask — since WNYC’s map worked perfectly well, and provided the information about evacuation zones — why have essentially the same map from ESRI, Google, and the NY Times. Were these groups talking with each other? For the media outlets (WNYC and the Times), was it just a competition thing?

I do know that when the city’s GIS community was more cohesive, this probably would’ve been coordinated a bit more, perhaps through GISMO. Not that the lack of cohesion is a bad thing necessarily. And not to fault GISMO or other coordinating groups. But I wonder if better information could’ve been provided to the public in a better way if all of us making the maps were in communication.

For example, for at least a day WNYC’s map lacked the evacuation center locations. I added the locations to OASIS using Fusion Tables. Then WNYC added the locations to its map, also using Fusion Tables. We easily could’ve shared the backend data, but WNYC never contacted us to discuss it. I sent a tweet to @jkeefe about it, but didn’t hear back. It was important to keep the evacuation center data up-to-date and consistent because the city changed the locations of 4 centers before Irene hit. Keeping the maps in sync would’ve minimized any confusion for the public.

Overall, I think the biggest takeaway is that the Mayor’s office and NYC agencies – especially DoITT (since they’re responsible for coordinating the city’s technology resources) – need to engage better with mapping/data/online communities in a much more open, collaborative way.

Despite the city’s talk of apps and open data, there’s still very much a closed approach on the city’s part when it comes to public/private partnerships. True, the city has developed partnerships with local startup tech companies. But the city’s nonprofit and academic communities, along with established private entities, have much to share and have proven they have the technological resources to do as good if not a better job than the city providing essential information online.

In terms of mapping Hurricane Irene in NYC, NGOs filled a big void. The city should not only recognize that effort, but cultivate it and help sustain it so that it works more smoothly and effectively next time.

Some good opendata news for NYC

The “Socratic Method” of publishing city data?

I was encouraged at the OpenGov Camp this past Sunday by an announcement from NYC DoITT  that the city will be using Socrata to provide online access to its data.  It’s a great platform.  It doesn’t ensure that the city will actually provide good data, or update it in a timely way, or expand its available data sets — but it’s a good step forward and hopefully a harbinger of better things to come.

The city is seeking feedback here.  They’ve indicated that an “end of summer launch” is planned for a NYC/Socrata rollout.  Here’s an example of what the site might look like.

OpenBaltimore opened my eyes

Earlier this year I had tweeted that a new municipal data portal — OpenBaltimore — blew away sites like NYC’s Datamine.

I was asked by Alex Howard (@digiphile) for my thoughts on OpenBaltimore and other, similar portals.  At the time I didn’t realize OpenBaltimore was using Socrata, but after I looked into it further, I came away impressed.  The platform is visually appealing, easy to search, and offers multiple ways of accessing/extracting data.

(I don’t want to endorse the Socrata product/service, but it seems to me to be a good choice for NYC.)

Useful features, and lots of them

One nice aspect of the platform is the ability to immediately preview the data, in your browser (no downloading needed just to see what it contains).  You can also view more details about each row in the file.  And you can visualize  the data in multiple ways — using an interactive map option built into the platform (if the dataset has a location component) or using one of 9 different chart options.

And if you want to download/export  a data set, they give you at least 8 formats for extracting/exporting, as well as an API for programmatic access.  NYC says that “all datasets will now be available as APIs” once they replace Datamine with NYC/Socrata.

Short links and “perma” links are available to each data set.  And there’s a “Discuss”  option where anyone can attach notes and commentary for each data set.  It’s user-generated metadata — you can immediately see, for example, if anyone else has commented about the data’s quality, or completeness, or how up-to-date it is.  I didn’t notice too many comments at the OpenBaltimore site, but there were some, and they were helpful (including responses from that city’s data team).

The  option includes a map, but didn’t seem to have real time geocoding.  So even if a list has street addresses, it can’t be mapped through Socrata on the fly.  Each list needs a “location column” which presumably means lat/lon.  (It’s easy to submit feature requests to the Socrata team, though, so hopefully we’ll be seeing this addition soon.)

All in all it’s really great.  Other cities use Socrata, including ChicagoSeattle, and even smaller municipalities such as Manor, Texas (pop. 6,500).

However, not a silver bullet

Even though OpenBaltimore’s portal has been online for just a few months, already there are criticisms (for example, data hasn’t been updated since February, some data sets have quality problems, etc).  Many people (including me) have leveled these same criticisms at NYC’s Datamine effort.  So simply having a better portal won’t solve these issues.

But at least a platform like Socrata will make it easier to deploy data sets, it’ll certainly make it easier for the public to access those data sets, and it’ll make it easier to suggest improvements to the substance and the process.

NYC’s Datamine was an improvement in some ways over earlier opendata efforts in New York.  Now that it’s been around for two years, I think it’s fair to say that Datamine is clunky at best.  For me, I can’t wait for it to be replaced by something better.  I’m looking forward to the NYC/Socrata roll out.

What do you think?

On the lookout for ‘open data fatigue’ in NYC

I watched today’s news event by New York City’s Mayor Bloomberg and his colleagues about the city’s new “Digital Road Map” [PDF]. Impressive effort, including the livestream webcast.

But I thought the Twitter stream during the Mayor’s webcast was especially interesting. Seemed to me that there were just as many tweets about real-world problems (potholes, cops on the beat, subway service, etc) as there were about the technology announcements themselves. The technology is cool, and I agree it’s critically important for the city’s competitiveness, but it needs to be considered in the context of the substantive issues of improving city services, quality of life, engaging real people, maintaining a robust economy, etc.

I always worry when I see the city touting its technology efforts without also including local Community Boards, neighborhood groups, business advocates, urban planners, other elected officials, etc. who rely on access to public data so they can hold government accountable and do their jobs better. In my view, these groups need the data moreso than app developers. That is why open data efforts and policies are so important.

[Editorial update: I realized that in the preceding paragraph I omitted a critically important constituency regarding open data: the media.  I was thinking back on the many FOIL request I've made and various lawsuits I've been party to and hundreds of data requests I've made over the past two decades in an ongoing effort to pry loose public data sets from government agencies.  But I realized even that my longstanding involvement in data access efforts pales in comparison to the work done day in and out by reporters, editors, and journalists to not only further the open data cause, but just to do their jobs.

Media organizations absolutely rely on unfettered access to public data so they can shine a light onto government activities and educate us all about what our public officials are doing, perhaps especially when those officials don't want us to know.  So when we think about improving city (and state and federal) government by developing a "digital road map", the Foursquares and Tumblrs of the world are just distractions.  Provide unprecedented access to government data for the press -- and bloggers and tweeters -- and that will do more for better government than any number of Facebook pages, Foursquare check-ins, or officially-sanctioned NYC hackathons.]

But the city seems more focused on apps than on community. I understand the economic development appeal of fostering startups. But the open data movement long predated apps.  I highlighted this in my post last year (see the “Misplaced Priorities” section).

Apps are great (I use them constantly, and I’ve even developed one myself). And kudos to the city and its agencies for responding to app developers and making data more open so the developers can do great things with the data (things even the city might not do).

I just hope the latest announcements by the city will result in more real and lasting efforts to make data easier to access than the latest check-in craze. The Mayor already expressed some hesitation to making data accessible when a reporter asked him about CrashStat. CrashStat is a great example of my point — it wasn’t created to be an “app” per se; it’s an effort by a local nonprofit group to use public data to educate the public and hold government agencies more accountable about traffic injuries and fatalities. But the Mayor said he didn’t even know what CrashStat was, while making excuses about not making data available if it’s not in electronic format, or needs to be vetted, or is “sensitive”.  Blah blah blah – we’ve heard all that before and it undermines my confidence in the city’s pronouncements that more data will really be made open.  (I’d link to the city’s webcast at nyc.gov but it stops right when the Q&A begins.)

(In the livestream video, the Crashstat question comes at 27:00, and the Mayor acknowledges he doesn’t know what it is at 27:10. Thanks to Joly MacFie for the video link.)

So who knows, if the Mayor starts actually using Foursquare more and experiences ‘check-in fatigue‘, maybe he’ll eventually get ‘open data fatigue’ too. Let’s hope he stays as vigorous about public data access as he and his agencies say they will.

(photo via TechCrunch from IntangibleArts)

Open data in NYC? That’s so 2009.

Last fall I had high hopes that New York City would loosen the shackles that agencies too often held tightly around “their” data sets.  The city’s BigApps competition had just been announced, the new Data Mine website was launched with many data sets I never imagined would see the light of day, and the city (i.e., the Mayor’s office and his agencies) seemed to be jumping on the open data bandwagon.

These days, I’m less optimistic about NYC’s #opendata efforts.  Sure, there are bright spots (DOT, MTA, some aspects of City Planning’s Bytes of the Big Apple).  But for the past several months I’ve been hearing rumors that Data Mine will be updated “soon” and “any week now”.  So far, nothing new on the site — data is still from 2009.  I’ve also been hearing that Data Mine will be updated when the next BigApps competition is announced.  Maybe that’ll happen, but even if a new BigApps prompts the city to update Data Mine, that’s problematic – I explain below.

Words …

When Mayor Bloomberg announced BigApps, he made a big deal of how the city would be “providing information to New Yorkers as fast and in as many ways as possible” and of helping entreprenuers use city data to “increase accessibility and transparency in City government, generate jobs, and improve the quality of life for New Yorkers.”

And since then, the city’s own information technology agency indicated it would usher in a sea change in how city agencies made data publicly accessible.  This was somewhat buried in DoITT’s “30 Day Report” (issued Feb. 2010), but page 29 featured a section titled “Open Data/Transparent Information Architecture”.  It said [PDF],

In 2010, DoITT will work to establish citywide policies around “open data.” These efforts will align with Mayoral initiatives of openness and transparency, and further improve access to information by creating citywide standards that are practical and feasible. As a start, City agencies should be required to make available, to the greatest extent possible, all public‐facing data in usable electronic formats for publication in the NYC DataMine. This mandate would apply to all public data that is not subject to a valid restriction, such as public safety or personal information.  City data is by and large the property of the people it serves, and DoITT will be at the forefront of continuing to make it available in as many ways as possible. [Emphasis added.]

Note that this policy was meant to be only a beginning, and that DoITT would be “at the forefront” of aggressively making public data widely available.

… vs. action

The Data Mine website was launched in October 2009.  Most of the data sets at the site had a vintage of 2009 (and some were substantially older — for example, NYC Economic Development Corporation provides geographic data sets that are “based on PLUTO 2005” [PLUTO is the city's tax parcel data]).

The Data Mine website itself claims that it will be “… refreshed when new data becomes available.”  The data update frequency for many data sets on Data Mine is listed as daily (such as detailed school information from Dept of Education and traffic and parking data from Dept of Transportation), monthly (recycling rates from Dept of Sanitation), or quarterly (most of the geographic data from the Parks Department).  Others are listed as “annually” or “as required”, but the “as required” data sets include NYC landmarks and historic districts (several of which have been updated since Fall 2009) and 311 data.

Even though some of these data updates are already publicly available directly from the individual agencies, Data Mine — as the city’s portal to public data access — hasn’t kept up.  And it appears that Data Mine is really just an adjunct to the city’s BigApps competition, which is focused primarily on application development (and the resulting economic development from these apps), not so much on transparency and open data access.

For example, testimony from DoITT’s commissioner at a recent City Council hearing for Intro 029 (a bill requiring city agencies to provide formalized open access to their data) was revealing.  Among other things, she explained that the Mayor’s office would wait till the next iteration of the BigApps competition before updating the Data Mine website with new data sets.  (Note that this is the same Commissioner who issued DoITT’s 30-day report cited above.)

The commissioner’s presentation starts about 9 minutes into the clip below.  Here’s her testimony [PDF].  Another disconcerting point she made in her comments was that the Mayor’s office wanted to put a priority on data that they believed had value to the public (rather than posting data regardless of how the public might use it or value it).

Misplaced priorities

Linking Data Mine to BigApps has at least two problems.  The first is: Why wait?  Some agencies are already taking steps on their own to publish data and update it regularly (such as City Planning and Transportation).  I don’t see any reason to delay updates to Data Mine.  Otherwise the site is stale, and sends the wrong message.

In this era, it’s a no-brainer to make data widely and easily available, given all the amazing things people are doing with public data (helping reduce costs, promote economic development, enhance quality of lifeimprove government efficiency, etc).  As one blogger put it, “there’s really no reason for the city to spend the time to ‘discuss’ when the city could spend the time to ‘do’.”

The other problem is that we shouldn’t have to rely on a competition to make data publicly available.

Remember that when the state’s Freedom of Information Law (FOIL) was first enacted (in the mid-1970s), “apps” didn’t exist. It was all about accountability. The public had a right to know what its government knew — and to have easy access to that information so we could evaluate legislative, executive, and regulatory decision making.

Actually, the “legislative declaration” to FOIL in New York State makes a bolder statement: that public awareness of government actions is essential to maintain “a free society”. FOIL also emphasizes that people will (hopefully) understand and participate more fully in government when they know fully what their government is up to.

So apps are cool and powerful, but open government and open data goes much deeper than the latest iPhone app to find the best parking spots.  The more the city ties public data access to app development and competitions like BigApps, the more they veer away from facilitating the public’s fundamental right to know.

App competitions also take attention away from the vibrant community of nonprofits, neighborhood planning groups, Community Boards, and others who want to improve quality of life in the city and steer a progressive course when it comes to local development and citywide policies.  Not to mention the mainstream media and bloggers.  These players may not be developing apps, but they’re doing good work in other ways.  Information access for these groups and individuals is vital.  Some city agencies are smart and know how to work strategically with these groups to move good policies forward.  But too often agencies hunker down and get defensive, and don’t want anyone to have access to data.

Clay Johnson, former director of Sunlight Labs, also makes this point at his “InfoVegan” blog.  And even some BigApps competitors noted the downsides of relying on a competition to made public data accessible.

A better approach

I think the city’s open data efforts would be greatly enhanced by:

  • passing the City Council’s Intro 029;
  • opening up more data (things like property data that are still restricted by a license and access fees);
  • redesigning Data Mine as a pointer to existing agency data repositories; and
  • ensuring that public data sets are refreshed as often as practical.

Of course, we can’t place our faith in just putting the data out there.  It still takes people making policies and actual improving things. It still takes an educated public to take action, etc. But having more data, as long as it’s not in closed formats and is widely accessible, is a good thing.

(Disclaimer: my viewpoints on this blog are my own, not necessarily my employer’s.)

Follow

Get every new post delivered to your Inbox.

Join 1,512 other followers