I wasn’t able to attend this week’s “MTA Unconference for Developers,” but it sounds like it was a great event. My colleague Dave Burgoon sat in, and I followed the Twitter stream and read several of the follow up posts.
The shift in attitude and action at MTA to open up access to their data and invite developers and others to work with them to use it is heartening. I hope it spurs other agencies to do the same. (In the wake of MTA’s new-found openness, it’s especially mind-boggling that, for example, the NYC Dept of City Planning still requires a license fee for its real property data, and Nassau County and the Suffolk County Real Property office do the same.)
The raw data
The first thing I did as I was reading the conference tweets was to look at the new data the MTA has released. (MTA will keep you updated on data changes via email if you subscribe here.) Most of it is in the GTFS format (formerly “Google Transit Feed Specification”, now the “General TFS” to get away from any corporation-specific connotations). At first that was disconcerting — where were the GIS files? I wanted pre-set shapefiles or KML files, but nothing was listed.
Of course, someone had already thought of that🙂. Data in GTFS format includes latitude/longitude, so that was encouraging. And after digging a bit further, there are open source tools for exporting GTFS data into KML format, and then importing into other programs (such as ArcGIS) for mapping and spatial analysis. Now that I’ve worked with the data a bit, it all makes sense — the GTFS format gives you flexibility to import the data and analyze it however you’d like, with whatever software you’d like. And of course it’s structured to facilitate mapping — silly me, Google wouldn’t create a data format that couldn’t be easily integrated with Google Maps, etc.
But I’m more familiar with ArcGIS than the TransitFeedDistribution tools that convert to KML etc. So instead I’ve created shapefiles in ArcGIS of some of MTA’s key data sets. I’ve posted links to the shapefiles below — feel free to use them however you’d like. I’ve added some notes on that process. And here’s a map of one of those shapefiles — bus routes in Brooklyn:
To put this in some context, it’s amazing to me that this data is now publicly and easily available. I’ve been using GIS professionally for almost 20 years, and I think it’s safe to say that those of us working with GIS in New York have grown weary of fighting to obtain data that you’d think would be commonplace — such as bus routes, subway routes, commuter rail lines, and related usage and performance statistics. When I directed the Community Mapping Assistance Project at NYPIRG, or more recently with the CUNY Graduate Center, clients and project partners would ask us to add bus routes to their maps, or to analyze bus transit options, and we’d always have the same answer: the MTA refuses to provide access to the data, so you’re out of luck. (Or, maybe I was able to find someone years ago who “unofficially” slipped me a floppy disk with bus routes in TransCad format, but now it’s out of date and I can’t get a newer version.) Of course I’m not the only one who wants to map bus routes and other transit data, so the MTA’s new data access is great news for many people and institutions — not to mention the riding public.
To create the shapefiles, here’s what I did:
- downloaded the .txt files from MTA’s website;
- opened these in Notepad (or Excel or SPSS, for the larger files) to get a sense of the file content and relational structure; and
- then added them to an ArcGIS data frame.
- For the points I used the “Display XY Data” function to create a point representation of the stops (see screenshot).
- I assigned the “North American Datum of 1983” (NAD83) for each file’s spatial reference, but did not project the data. That way anyone accessing the shapefiles can project them as needed. (One exception is the NYCT bus data — I projected the stops and routes files using the New York State Plane Long Island (feet) coordinate system. If anyone needs these files unprojected, just let me know.)
- The “stops.txt” files do not include route information, only “stop IDs” that can be associated with other MTA files to obtain route names and descriptions. Any given stop can be associated with multiple routes, so I decided not to join the route data to the stops – that can be done in your application as needed.
- To finish, I exported each stops file and renamed it with the category and date (such as “nycbusstops_100401”).
Here are the shapefiles for the stops (provided in zip file format):
For the routes, the methodology was slightly different:
- Although the GTFS includes a “routes” file, this just includes the route ID, route name, and other descriptive information — no geometry. Instead, there’s a separate “shapes” file that includes the latitude/longitude for each point, or node, along the route. In ArcGIS, the trick is to create a point representation of these nodes and then literally “connect the dots” to create the corresponding line representations.
- I used the “Display XY Data” function to create the points, and then used the nifty “ET GeoWizards” toolkit to convert the points to polylines. (ET GeoWizards is a “collection of powerful data manipulation and topology creation functions for ArcGIS”. It’s really great, and many of the tools are free, with additional functionality for a modest fee.)
- In the conversion process, ET GeoWizards uses the “stop ID” field, the lat/lon fields, and a “shape_pt_sequence” field to determine which points are connected together to draw the lines properly.
- I exported each “shapes” file to native shapefile format, but I wasn’t done. The “shapes” files don’t include any route information. The “trips” files contain “shape ID” and “route ID” fields, which provide the linkage between the “shapes” files and the “routes” files. (For the LI Bus routes, I also copied the route info and URLs from the MTA website and reformatted that to join to the LI Bus shapes file.)
- After joining the data, a last step was needed. The shapes file provides a separate shape, or line, for each type of route — such as “inbound” and “outbound” — for each actual route (the B3, the M103, etc). In order to create a GIS layer that includes individual features for each route, I needed to collapse the data (the LI bus file, for example, includes 2,048 discrete shapes, but only 104 routes — and even this includes duplicates — when it’s finally pared down, there are only 59 actual LI Bus routes).
- I used the “Dissolve” tool in ArcToolbox to collapse the shapes files with the joined route information. Then I exported these files and renamed them with the category and date (such as “libusroutes_100308”).
Perhaps there’s an easier way of doing all this with the TransitFeedDistribution tools, but for my purposes the ArcGIS tools worked just fine. Here are the shapefiles of the routes (also in zip file format):
- Long Island Bus
- LIRR (Important Note: the MTA didn’t include a “shapes.txt” file for the Long Island Rail Road, so I couldn’t create a shapefile of train routes. But our mapping service at CUNY already has that. Therefore, the LIRR link is not from the MTA data, but from an earlier shapefile directly from LIRR.)
- Metro North
- NYCT Bus (I provide two files for NYCT bus routes – one called “grouped” which includes 248 features, one for each bus route; and a second called “tripheadinfo” which includes “trip headsign” text — this includes duplicate line features [for a total of 733 features] because buses on a given route may be travelling to different end points.)
Note that I have not provided route files for NYCT subways or “bus company” routes. The shape_ids in the trips.txt file for the subways were mostly NULL, so I wasn’t able to link the subway shapes.txt file via the trips.txt and routes.txt files to add the route names. But, I already have a shapefile of subway routes, which you can download here.
Note also that the shapefiles of routes only includes the route geometry with some basic attributes (route names, and maybe MTA URLs). I did not attach any of the scheduling or performance data that MTA has also provided. That data is part of the rest of the MTA’s GTFS data feed if you want to link it yourself to the route shapefiles.
Some other issues I encountered
Long Island bus systems: as far as I can tell, MTA’s data does not include bus stop or route information for buses in the City of Long Beach (Nassau County), the Huntington Area Rapid Transit (HART) system in Suffolk County, or the Suffolk County bus system itself. So the LI Bus files above do not provide a comprehensive set of GIS files for bus stops and routes throughout Long Island. In the past we’ve cobbled this together from various sources, but if anyone has up-to-date files for these areas, I’d love to hear about them.
“Landmarks”. The MTA’s “Bus Company” files include a “landmarks.txt” file. Presumably this represents easily recognized local features, but it’s impossible to tell for sure without a description from MTA. Also, I’m not sure why the file is included with the “bus company” data and not the other categories (or on its own). The file includes a “Type” field but no description of what the type codes mean. Some of them seem obvious (ES=elementary school?). but others are opaque (the landmark called “2 Bay Club Dr” has a type code of “AH” – what does that mean?). But in case you want to use it, here it is (at your own risk!).
Lack of metadata. Although the GTFS website provides general descriptions of field names and data types, I wish there were better metadata from MTA directly — for example, it would be helpful to know how the lat/lon data were generated (i.e., what basemap was used, what scale is the data best viewed at). And what’s the difference between the stops in the “Bus Company” file and the “New York City Transit – Bus” file?
But these are minor things. Overall, it’s a huge step that MTA has opened its data doors. Kudos to the new MTA leadership and everyone else who nudged (or aggressively pushed them) along the way. I’m looking forward to great apps to come out of this, and to other agencies to follow suit.