NYC bikeshare maps & spatial analysis: an exploration of techniques

UPDATE (Feb. 2012)

  1. Reader Steve Vance suggests in the comments below that I could use Google Refine to parse the JSON file and convert it to Excel without relying on the tedious Microsoft Word editing process I summarize below.  He’s right.  Google Refine is amazing. It converted the JSON file to rows/columns in about a second.  And it has powerful editing/cleaning capabilities built-in.  Thanks Google!
  2. Alas, I had hoped to test Google Refine on the latest list of user-suggested bikeshare stations.  But when I checked in mid-February, the link at http://a841-tfpweb.nyc.gov/bikeshare/get_bikeshare_points no longer returns all the detailed info about each suggested site.  It only returns an ID and lat/lon for each site.  There’s another link I found that returns the details (http://a841-tfpweb.nyc.gov/bikeshare/get_point_info?point=1), but it seems to be just one at a time (change the “point=1″ value).  Sigh.  If someone wanted to replicate what I’ve done with the latest data, perhaps either NYC DOT or OpenPlans could provide the file directly.

Original Post (Sept. 2011)

Two weeks ago New York City announced an ambitious bikeshare program, designed to provide 10,000 bikes at 600 bike-sharing stations in Manhattan and parts of Brooklyn by next summer.  I had two immediate thoughts:

  1. I wondered if all 10,000 new bikers will ride like delivery staff and further terrorize me and my pedestrian 5-year olds; and
  2. safe or not, the bike stations would be put somewhere, and maps can likely help figure out where.

I’m a cartographer, so I’ll focus on the second issue for the purpose of this blog post. My maps and analysis below don’t provide any definitive answers — they’re more of an exploration of spatial analysis techniques using the bikeshare data as an example.  I don’t know if this will be helpful to DOT, but if it is, then that’s great.  If not, hopefully at least they’ll be of interest to GIS and biking geeks alike.

NYC’s bikeshare stations: crowdsourcing suggestions

To help figure out where the bikeshare stations might be located, the city’s Dept of Transportation partnered with OpenPlans to provide an interactive map where anyone could suggest a location and provide a reason why they thought it was a good spot. If someone has already picked your favorite spot on the map, you can select that marker and click a “♥ Support Station!” button to register your approval.  Added up, these supporting clicks can provide a “rating” of how many people like each location.

It’s a great, easy to use app. Within just a few days several thousand people had posted their suggestions.  According to DOT,

As of September 20 at 3:30pm [just 6 days after the suggest-a-site went live], we have received 5,566 individual station nominations and 32,887 support clicks.

(via OpenPlans)

But the map looked overwhelmed! Manhattan was covered, as was most of downtown Brooklyn.  It seemed like almost everyone wanted a bikeshare station on their block.  New York Magazine put it this way:

As you can see in the map above, New Yorkers have spoken: The best spots for bike stations are … everywhere/wherever is right next to them.

I wondered how useful this crowdsourced data actually would be for identifying the best sites for bikesharing stations.  NYC DOT says it will be conducting “an intensive community process” to involve multiple stakeholders in helping decide where the 600 stations will go.  Presumably several factors will determine station locations, but it seemed like the crowdsourced data could play a key role — hopefully the website was more than just a PR ploy.

Given all those “dots on a map,” it seemed like a good opportunity to examine how spatial analysis tools could be used — first to see if the crowdsourced location patterns meant anything, but then to see if there’s any value to using them in siting analysis.  Had “the crowd” told us something new and useful, or was it something we already knew and would be better determined through DOT’s public process?

Spatial patterns

Luckily OpenPlans (and DOT) designed the suggest-a-station website so all those dots on the map could be scooped up via a simple HTTP request and converted to GIS format.  At the end of this post I describe how we got the data and put it into a mappable format.  Once we did, we were able to analyze it spatially.  I’ll post the shapefile, as well as a version at Google’s Fusion Tables, shortly.

A few days after the program was announced, DOT produced a “heat map” that “illustrated the number of suggestions and supports per square mile as of September 19″ (map at right).

Our version of a “heat map” using the September 19 data (based on the results as of 9am that day) is shown below.  (Our map uses the same rating scale as the DOT map, but its slightly different patterns could be due to different model specifications to create the map.  We used ArcGIS’s “Kernel Density” function to develop our map — DOT may have used a different method. Even if we both used kernel estimation, this technique can result in different surface patterns based on different inputs such as cell size and search radius.)

But do these maps really tell us anything useful? Some people tweeted that the concentration of suggested bikeshare sites matched New York’s “hipster” population.  Others said that the patterns were “almost perfectly congruent with race/class/culture divides” in the city.

I disagree — I don’t think the suggested bikeshare patterns match any obvious demographic characteristics, whether it’s race/ethnicity or “hipsterism”.  (This may be worth pursuing further, but for now I leave that to others.)

I think a more likely relationship is based on where people work.  The orange-to-red areas on both maps — indicating a high concentration of suggested bikeshare sites with high ratings from website visitors — match the locations of the city’s commercial areas: Manhattan below 59th Street and downtown Brooklyn.

Another possibility, though, is that people who suggested bikeshare locations were just following DOT’s preferences – a spatial version of survey response bias.  In its bikeshare FAQ, DOT says that phase 1 of the program will focus on the following areas:

Manhattan’s Central Business District and nearby residential areas, including Brooklyn neighborhoods of DUMBO, Downtown, Fort Greene, Bedford-Stuyvesant, Williamsburg, Greenpoint and Park Slope

The NYC Planning Department produced a map of these areas in a Spring 2009 report [PDF] as follows:

Superimposing the Phase 1 area on the rating density map above shows that there’s almost an exact match between Phase 1 (outlined in dark pink) and the highest concentrations of suggested/supported sites (the dark orange and red areas on the map):

So based on these density maps (“heat maps”), it’s not clear if the overall patterns from these maps tell us anything interesting about the wisdom of the crowd, or useful about where to put bikesharing stations.

Digging deeper

But whether the overall patterns mean anything or not, maybe the suggested locations could be analyzed to see if they have value as criteria for local siting decisions. In other words, within the patterns, maybe we can use the crowd’s suggestions as a key piece of analytic information, providing quantifiable indicators about where the stations should go.

More than 2,700 bikeshare locations (as of Sept. 25) were suggested within the Phase 1 area — four and a half times the 600 sites that will eventually be sited.  Perhaps they covered every possible bikeshare site. But perhaps there’s also a pattern (or patterns) to the suggestions that will help with the decision to whittle 2.700 down to 600.

For simplicity’s sake I evaluated the suggested station locations against one criteria — proximity to subway station entrances.  Obviously there are other factors to examine (threshold bikeshare station density, proximity to specific residential or employment centers, terrain, etc).  But several people have noted that a bikeshare program can extend the reach of subways — transit riders could ride to a distant subway more easily, cheaply, and quickly than a bus or a cab, or when they reach their subway stop they could pick up a bike and ride to their final destination without the hassles of a cab, etc.  So my assumption is that proximity to subway stations will be a key factor in determining bikeshare station locations.

But how do the suggested locations from the DOT/OpenPlans map compare with that hypothesis?  Are the highly rated bikeshare sites near subway stops?  About 10% of the suggested sites included reasons that mentioned subways.  Did website visitors suggest enough bikeshare sites near subways to make it easier for DOT to pick and choose which ones are best?

(Btw, this same type of analysis can be applied to bike routes, for example.  I just wanted to focus on one component for now.)

Spatial analytics

I used several spatial analysis techniques available through ArcGIS’s toolbox to shed some light on these questions.  The tools are powerful, and ESRI has made them easy to use and interpret.  The tools also underscore the power of GIS beyond making maps — extracting information based on the spatial relationships of multiple geo-referenced data sets.

In order to compare suggested bikeshare sites with subway stations, I used the file of subway entrances/exits available from MTA (current as of July 19, 2011).  The file provides the latitude/longitude of 1,866 entrances and exits, identifies the station name for each one, and lists the routes that serve these stations.  It provides a more precise spatial measure of access to the subways than a single point representing the center of each station (which is how stations are shown on most interactive and print subway maps).

With this file, we can determine how close each suggested bikeshare site is to the actual spots where people exit and enter the subway system.

To calculate proximity, I used the “Near” feature in the ArcGIS Toolbox, which “[d]etermines the distance from each feature in the input features to the nearest feature in the near features.”  I analyzed 5,587 suggested bikeshare sites based on the DOT/OpenPlans map as of Sept. 25 (see data discussion at the end of this post). Here are some statistics:

  • 92 sites were within 25 feet of a subway entrance;
  • fully one-third (1,954 suggested sites) were between 25 and 500 feet of a subway entrance (the length from one Manhattan avenue to the next is usually about 600 feet);
  • another quarter of the sites (1,677) were within 500 and 1,250 feet (1250 ft being roughly a quarter mile, the rule-of-thumb distance that people will walk for public transportation); and
  • the remaining 2,134 were more than a quarter mile from a subway entrance.

Seems like lots of bikeshare stations were suggested in close proximity to subway entrances. If the actual bikeshare sites will be near subways entrances, which entrances should we pick?

(An aside: since just over a third of suggested bikeshare sites were located relatively far away from subway entrances, we can also evaluate these patterns.  The hypothesis would be that if people are picking up bikes at subway stations, they’re using them to travel to destinations further away from subway stops.  Therefore some of the bikeshare sites will need to be located in these “destination” areas, and DOT will need some spatial criteria for locating them.  I’ll save this for a follow up blog post.  Thanks to Kristen Grady for suggesting it.)

One way to visualize the bikeshare/subway entrance relationships is with the following map, showing the subway stations in blue (just a center-point representing the middle of the station) and the bikeshare sites color-coded by proximity (I’ve limited the display of bikeshare sites to only those within 500 feet of a subway entrance so the map wasn’t too cluttered):

This map might be helpful, but you have to visually decide which clusters of close-by bikeshare sites are the most concentrated in order to prioritize which subway stations to focus on.  The map also omits the rating values.

If we incorporate ratings, the map below is an example of the result.  It only shows bikeshare sites very close to subway entrances — within 50 feet — and ranks the symbol size based on rating.  (We could just as easily pick another distance threshold, or display several maps each using a different distance threshold.)

This helps us focus on which subway stations might be best for a nearby bikeshare station, based on suggested bikeshare sites nearby that are ranked highest.

But we can use GIS to be more precise.  Another approach would be to visualize the pattern of the subway entrances themselves, based on average rating of each entrance’s closest bikeshare sites.  In other words, I’d like to use the ratings given to each suggested bikeshare site and assign those ratings to their closest subway entrances.  This will have the effect of combining subway proximity with bikeshare rating, and the resulting map will integrate these patterns.

Here’s an example of the result, with the rated subway entrances juxtaposed with the density map of rated bikeshare sites from earlier in this post:

This map says, “If you want to put bikeshare stations near subway entrances, these are the entrances you’d pick based on the average rating of the closest stations suggested by ‘the crowd’.”  It’s a way of prioritizing the bikeshare station siting process.  These subway entrances are the ones you’d likely start with, based on the preferences of the (bike)riding public who contributed to the DOT/OpenPlans map.

It looks like many subway entrances follow the overall pattern of bikeshare sites with the highest ratings. But there are some interesting differences in the above map. A couple of sites are completely outside the Phase 1 area (an outlier each in the Bronx and Queens), and only two subway entrances with average high ratings are in Brooklyn. The rest are in lower Manhattan. But only one of the Manhattan sites is near the highest rated area centered around NYU:

Here’s another view of this area, with the rated subway entrances overlain on a Bing street map:

In order to create the rated subway entrance map, I used the Voronoi polygon technique, also know as Thiessen polygons (Voronoi was a Russian mathemetician, Thiessen was an American meterologist.)  Voronoi polygons are enclosed areas surrounding each point (subway entrance) so all the other locations (in this case, bikeshare sites) within the polygon are closest to the enclosed subway entrance than any other entrance.  The subway entrance Voronoi polygons look like this:

Here’s a close up, with the subway entrances displayed as pink stars, and the suggested bikeshare stations as blue dots:

The blue dots (bikeshare sites) within a polygon are closer to that particular polygon’s subway entrance than any other entrance in the city. Other GIS techniques, such as creating a buffer around each subway entrance, or even using the “Near” calculations I described earlier in this post, wouldn’t precisely determine the closest criteria for all the points automatically and at once.

The other nice thing about creating Voronoi polygons is that the attributes of the reference points are transferred to the polygons (the polygons end up with more than just a random ID number; in this case, they include all the corresponding subway entrance attributes).  From there I did a spatial join in ArcGIS, joining the bikeshare sites to the polygons.  This automatically calculates the count of all points in each polygon, as well as statistics such as average and sum for any numeric attributes in the point file.  In this case, each subway entrance Voronoi polygon gets a count of the bikeshare sites within it (i.e., the ones that are closest to that entrance) as well as the summed rating and average rating.

From there we could create a choropleth map of the Voronoi polygons. But since we’re interested in the entrance locations rather than an aggregated area around them, I chose to create a graduated symbol map of the actual subway entrances. So I did an attribute join between the Voronoi polygons and the entrances using the shapefile ID field.  That enabled me to make the “Average rating by subway entrance” map above.

Limitations

One limitation to the Voronoi approach is that closeness is measured “as the crow flies.” There are other techniques that measure proximity using “Manhattan distance” (i.e., distance along streets rather than a straight line), such as ESRI’s Network Analyst extension for ArcGIS, but I’ll leave that to the DOT analysts who are going to decide on the actual bike share sites.

Other limitations of this approach have to do with the data themselves.  The bikeshare data from the DOT/OpenPlans website has issues such as:

  • entries accompanied by fictitious names (some examples from the Sept. 25 data include “Andy Warhol”, “George Costanza”, “Holden Caulfield”, “Lady Liberty”, and “United States”.  One or more people using the “United States” pseudonym submitted 51 entries throughout Manhattan, Brooklyn, and Queens, plus a single entry in the Bronx); and
  • multiple entries submitted by a single person. Someone – or some people – named Ryan submitted 143 entries.  Someone named Andrew Watanabe submitted 85 entries. Ryan and Andrew were the top two submitters.  After them and “United States”, there were 4 others who submitted 40 or more entries. It’s possible that these were all sincere. But some seem to be pretty goofy. Of Watanabe’s 85 suggested sites, for example, several included the following reasons:
    • “When whales accidentally swim into the Gowanus, they will be able to ride bike share bikes back out to sea.” (site on the Gowanus Canal)
    • “This will keep drunk booksellers from passing out on the sidewalk.” (site near the Bedford Ave L train stop in Williamsburg)
    • “When the zombie apocalypse comes, they will be riding bicycles. BRAAAAAINS!” (site in the middle of Mt. Laurel Cemetery in Queens)

Multiple entries might be fine, but if someone started plunking down markers on the map just for fun, this doesn’t really help us with meaningful location criteria.

There’s another concern about the crowdsourced data – the squeaky wheel problem.  The first map below shows the bikeshare suggestion pattern as of September 19; the second map below shows the patterns as of September 25.  The more recent map shows a new concentration of sites at the northern tip of Roosevelt Island (as well as a greater concentration in lower Manhattan and downtown Brooklyn, areas that already were very dense):

 

Sept. 19 patterns

 

Sept. 25 patterns

Why did northern Roosevelt Island all of a sudden become such a bikeshare hotspot?  I can’t say for certain.  But in a blog post on September 14 at the Roosevelt Islander, residents were urged to add sites to the DOT/OpenPlans map.  The post ended with the pitch:

So here’s what you can do to bring bike sharing to Roosevelt Island. Click on this link and say you want a bike sharing station on Roosevelt Island – do it now – please [emphasis added]

I don’t think making a pitch like this is a bad thing. (Far from it! It seems to have succeeded in getting attention on bikeshare sites on the island).  But whoever will be analyzing the sites from the DOT/OpenPlans map will need to decide if (and how) they should discount these crowdsourced lobbying efforts so the squeaky wheels don’t skew the map.

Making sense of it all

My analysis in this post is more for illustration than for actually determining best locations for bikeshare stations. A more rigorous analysis would need to deal with the data limitations I mentioned above, and also factor in other criteria.

But it was a fun exploration of the data and the techniques, and hopefully provides some useful ideas if readers are thinking of other spatial analysis projects involving proximity (especially the “closest” criteria).  I’m indebted to DOT & OpenPlans for enabling the creation of an interesting data set — the suggested bikeshare sites — for me to brush up on my spatial analysis skills.

Does my initial exploration shed any light on the wisdom of the crowd? It’s probably too early to tell (or my analysis was too limited to meaningfully evaluate the suggested sites).  But even so, I think the techniques I’ve described are helpful for prioritizing sites and for quantifying the results.  In that respect, the crowd’s input is a good thing.

Data issues, as always

Here are the steps we used to download the suggested bikeshare sites from the DOT/OpenPlans website in order to map and analyze the data:

  1. We used Fiddler to figure out that the suggested station locations were being maintained in a text file (in JSON format) available via http://a841-tfpweb.nyc.gov/bikeshare/get_bikeshare_points (Dave Burgoon ferreted this out).
  2. The JSON data looks like this:
{
"id":"4830",
"lat":"40.742031",
"lon":"-73.777397",
"neighborhood":"Fresh Meadows",
"user_name":"David",
"user_avatar_url":"",
"user_zip":"11355",
"reason":"There is no public transportation from Brooklyn-Queens greenway (Underhill Ave) to Flushing Meadows. By placing bike stations from Cunningham Park thru Kissena Park to Flushing Meadows will allow residents enjoy the parks more.",
"ck_rating_up":"1",
"voted":false
}

I don’t know of a straightforward way to read a JSON file into a desktop GIS package, so I needed to restructure the file into rows & columns.  I chose to do that with a series of Find/Replace statements in MS Word (perhaps there’s a better/more efficient way, but this approach worked), then added a row of field names, and saved the result as a .TXT file (one row of which is shown below):

id,lat,lon,neighborho,username,avatar,zipcode,reason,rating,voted
4830,40.742031,-73.777397,Fresh Meadows,David,,11355,There is no public transportation from Brooklyn-Queens greenway (Underhill Ave) to Flushing Meadows. By placing bike stations from Cunningham Park thru Kissena Park to Flushing Meadows will allow residents enjoy the parks more.,1,FALSE
  1. We’re primarily an ESRI shop at the CUNY Center for Urban Research (with periodic forays into open source, as well as a longstanding reliance on MapInfo for some key tasks).  So my next step was to convert this to a shapefile — which I did by using ArcGIS’s “Display X/Y Points” tool to create a point file based on the lat/lon values.
  2. Just in case there were multiple points at the same location, I ran the ArcGIS script called “Collect Events“, which aggregates point data based on location, and creates a new shapefile of each unique location with a count of all the points at each location.
    • I downloaded the JSON file a couple of times between Sept 19 and 25.  In the latest one (September 25, downloaded at 11pm) there were 55 points at latitude 40.7259, longitude -73.99 (a location at the intersection of E. 3rd Street and Second Ave in Manhattan).  But the user-supplied ZIP Codes and comments for most of these points indicated that they should have been all over the city.
    • Turns out this location is the center point of the Google map that’s displayed at the DOT bikeshare website.  If you zoom in on the DOT/OpenPlans map you’ll bring the map center into close view — and you can see the heavy map marker shadow due to all the points placed at that spot:

    • Presumably what happened here is that when you click the “Suggest Station” button, a marker is put at this spot by default. The marker is accompanied by a note that says DRAG ME! Then click ‘Confirm Station.’  But I’m assuming that 55 people didn’t drag the marker, but just left it there after they had entered their information. (I guess that’s not too bad — only 1 percent of the people using the site didn’t follow directions.)
    • Earlier this week (9/27) it looks like these sites were removed from the live map.  For my purposes, I removed those points from the shapefile, otherwise it would skew the analysis.  I could have put them somewhere in the ZIP Code that was entered with each spot, but I couldn’t be sure of the precise location (the reasons were vague regarding location), and I didn’t want to skew the analysis the other way.
  1. Other data notes:
    • There were 8 locations outside the immediate New York City area – some as far away as Montreal and Portland, Oregon.
    • The reason provided for Portland location was: “Even though it’s a whole continent from NYC it always seems to me like our cultures admire one another. I think NYC would enjoy all the benefits of positioning one of their Bike Share stations in Portland as sign of goodwill and mutual admiration.”
  1. There were also 53 points with lat/lon = 0, which I assumed was just a data entry/processing error.

Out of the 5,973 points as of 11pm September 25, after I removed the 55 locations and zoomed in on the points in or immediately near New York City (and omitting the 8 outside the city and the 53 with lat/lon=0), I ended up with 5,857 points.

About these ads

9 Responses

  1. [...] Finally, a city than has a very sense of public space and the means for the commons. Thanks to Steven Romalewski for [...]

  2. [...] Innovation of the day: Crowdsourcing the locations of NYC’s bikeshare program. You can suggest to NYC DOT where to locate the new bike sharing kiosks! The program will have over 10,000 bikes with 600 stations. The second image shows the GIS results from a total FTW geek-out by cartographer/blogger Steven Romalewski.  [...]

  3. To parse that JSON data without Microsoft Word, I recommend using Google Refine. It’s wonderful at manipulating tabular data in multiple formats.

    Wonderful analysis. I am covering bike sharing in Chicago (we’re kind of neck and neck with New York City!) on my blog, Grid Chicago.

    http://gridchicago.com/tag/bike-sharing/

    I hope the planners in Chicago are as smart as you when it comes to figuring out where stations should go.

    • Thanks Steve. Glad you like the analysis. I’ve been thinking about some other ways of looking at the patterns, but just haven’t had the time to blog about them. Hope what I’ve done so far, though, is helpful.

      Thanks for the tip about Google Refine. I’ve heard others talk about it; I should take a closer look.

    • I just looked at Google Refine, and it’s great! It looks like it will convert a JSON file to Excel in a snap. And then cleaning it up is just as easy. Thanks!

      • You’re welcome.

        Basically anything that you think you should do (or would have done) with CSV or Excel files (any tabular data), say to yourself, “Wait, Google Refine can do this faster”.

        The latest version is a bit more user friendly, especially when importing data. If you are importing a lot of data, then you should adjust in the configuration how much memory the underlying Java app is allowed to take.

  4. [...] Last year I examined the thousands of bike share kiosks suggested by “the crowd” to see how closely they were located to subway entrances.  I determined that, as of late September 2011 based on almost 6,000 suggestions, one-third of the suggested sites were within 500 feet (actually, if I had used 750 feet — the average distance between avenues in Manhattan – it would’ve been 45% of the suggested sites located that distance or closer to a subway entrance).  You can still see the crowdsourced locations here. [...]

  5. Hello, I wrote a simple set of ruby scripts to grab the lat/lon and location details data from the city’s draft station map. The results can be downloaded in .shp and .kml at the repository below:

    https://github.com/louiedog98/bikescrape

Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 1,532 other followers

%d bloggers like this: