New Subway Station in NYC: Hudson Yards

7lineExtensionIn September 2015, the MTA opened its first new subway station in NYC in decades. I’ve added the new station and extension of the 7 line to the Center for Urban Research’s (CUR’s) maps and underlying GIS data, and we’re making this updated data freely available.

Here’s the post at CUR’s website, and here are the links below with the data:

If you use the data (which I hope you do), please let me know how it works out.  If you use the files, please reference the “Center for Urban Research at the Graduate Center/CUNY” especially if you use the layer symbology in any printed maps or online applications.  Thanks!


When it rains it pours: NYC GIS data floodgates opened

Lately NYC agencies have started to step up the pace in producing an impressive amount of publicly accessible GIS (and other) data.  It’s a very good direction (and hopefully one that all agencies will soon follow).

This summer, the big news was that MapPLUTO was all of a sudden available for free.  And then ACRIS was opened up (not geospatial, but key to analyzing spatial patterns of property transactions).  And before that HPD had posted a large amount of housing data (albeit in a wacky XML format, but nonetheless it was a lot and it was freely available and it was being updated regularly).

But today there’s even more…

… Historical MapPLUTO!

The latest news – spotted by eagle eye GIS star Jessie Braden – is that historical versions of PLUTO and MapPLUTO are now freely available, going back to 2002.  Really great.

And City Planning included an important but bittersweet note at the historical download page: sweet because all of us who had to sign licenses to obtain PLUTO data are now absolved from the license restrictions, but bitter because there was no mention of the thousands of dollars each of us have had to spend unnecessarily over the years to obtain that data that is now online for free.  Sigh.  Here’s the note:

Note to Licensees:
DCP releases all licensees of PLUTO and MapPLUTO versions 02a through 12v2 from all license restrictions.

One thing to point out about the historical PLUTO data is to be careful if you’re hoping to compare and analyze parcels year to year. Our team at the CUNY Graduate Center tried that a few years ago, and it was painful. So many data inconsistencies and related issues. The best we were able to do was display historical land use patterns via the website (for example, look at the disappearance of industrial land use in Williamsburg from 2003 to 2010). I’d be glad to explain in more detail if anyone is interested.

Building footprint data too

Other good news for all of us who use the city’s GIS data is that it seems that building footprints are being updated on a more regular basis, and more attribute information is being added (hat tip to Pratt’s Fred Wolf for discovering it).  The latest building footprint data is dated September 2013, and includes new attributes such as building height and type, and includes a supplemental data set on “historic” buildings (ie., ones that have been demolished, with date of demolition).

Agency data web portals are a beautiful thing

Thankfully the NYC Dept of City Planning staff are continuing to maintain the Bytes of the Big Apple website, where the PLUTO data is available along with many other spatial and non-spatial planning-related data sets.  The Bytes pages provide essential metadata about each data set, easily accessible contact information, and context about the data sets.

All of that is missing from the city’s open data portal, which I think is a major failure with the city’s open data practices.  (Someone even commented on the buildings data set noted above, asking great questions about how the building heights were calculated, and about the source of these calculations – essential information that is too often missing from data sets available through the portal, though usually included when you download the data from the agencies directly.)

As long as the data portal doesn’t undermine invaluable agency websites like Bytes of the Big Apple, and more data keeps getting freed and accessible on these agency sites, that’s a great thing.  And hopefully more agencies will either continue to maintain their own online data repositories (such as the departments of Buildings, Finance, HPD, Health, and others) or launch new ones (such as MTA did a couple of years ago).

Happy holidays …

… and big kudos to the City Planning department for explicitly posting the historical PLUTO data sets!

NYC’s MapPLUTO is free!

Download away!

This essential corpus of public data is now (finally!) freely accessible.  According to the metadata:

Access Constraints: MapPLUTO is freely available to all New York City agencies and the public.

Thanks to:

  • Jessie Braden at Pratt Institute for pointing it out,
  • 596 Acres for pressing the FOIL case with the city (and a similar effort from Muck Rock),
  • The New York World for highlighting the irony of charging fees and licenses for this data,
  • the NYC Transparency Working Group for pressing the city on all things opendata,
  • anyone and everyone in city government who pressed for this change from the inside, and
  • thanks to everyone else who helped shine a light on this ongoing failure of the city’s open data efforts — that has now been turned around!

Who Represents You in NYC


NY1 – New York City’s 24-hour cable news channel – featured the maps in a segment they aired during the Thursday, June 20 segment of the “Road to City Hall”. We’ve posted a link to the video below:

NY1 Road to City Hall 6/20/13 segment on Who Represents Me

The Graduate Center also posted a news release about the project.

JUNE 10, 2013

Today our Center for Urban Research at the Graduate Center / CUNY joined with the League of Women Voters to launch an online service so anyone can identify their elected officials in New York City.

The idea behind this “Who Represents Me” service is not new (in fact, my old team at NYPIRG’s Community Mapping Assistance Project pioneered it more than a decade ago).  But now that redistricting has changed all the legislative boundaries in the city (and the City Council lines will all be new by January 2014) it seemed like the perfect time for a reprise of our Who Represents Me service from 2000, updated with new data and new technology.


How it works

Anyone can enter a street address at the “Who Represents Me” website, or if they’re using a mobile device they can tap the Use My Current Location link. The site displays a list of all city, state, and federal elected representatives (as well as NYC Community Board), an interactive map of the district and all districts nearby, contact information for local offices, and links for more information such as email addresses, individual websites, Twitter feeds, and Facebook pages of elected officials.

Users can also link to candidate information using the League’s interactive voter guide. And we provide district-specific links to’s candidate summaries.

According to Mary Lou Urban, Co-President of the League of Women Voters of the City of New York,

resources like are what it takes to make participation in government appealingly simple and is a logical approach to increasing voter participation.

New data

We believe our Who Represents Me service will be even more popular and helpful than it was over a decade ago.

First, the League of Women Voters is providing up-to-date info for all elected officials across the city.  The League keeps this information current through ongoing contact with all officials at all levels of government.  Initially the League collected this data for its 2013 They Represent You brochure (which you can order here).  And they’ll be providing new info periodically for the online service.

We supplemented the League’s info with data from Sunlight Foundation, the Open States project, and local websites with contact information and photographs of City Council members, state legislators, congressional representatives, and executive branch officials.


New features

One of the best features of the service is that Who Represents Me can be embedded in anyone’s website, blog, etc. So all the advocacy groups, elected officials, media outlets, and others who use the service can widely share it and make it their own.

Anyone can use the service, Tweet about it, post it to Facebook, and/or create and share a location-specific link to the list of representatives.  Just click the “LINK / EMBED” option at the top of the page and the link like the one below will automatically display the list of officials for that location:

District maps

We used a combination of cartoDB, Google Maps API, and the Twitter Bootstrap framework to add a flexible and helpful interactive map overlay to the service.  Just click a thumbnail map of any district, and a new window is displayed that shows all the district boundaries for that location.  Hover over the list of districts and each one is highlighted on the map.  Double-click on a district in the list, and the map zooms to its extent.


Most important, you can click anywhere on the map and new districts are highlighted for that location.  And the list of representatives is automatically updated when you close the map window.

So the maps — combined with the address search and current location feature — enable you to determine elected representatives literally for any and every location in the city.


“Who Represents Me: NYC” has been developed with the generous support of the New York Community Trust.

Geographic data sources for the service include:

The geographic data representing district boundaries is hosted at cartoDB. The overall site design relies on the Twitter Bootstrap framework. We use the Google Maps API for address matching, “typeahead” address search, and basemaps.

A Modest Victory regarding NYC Tax Parcel Data

After I blogged this morning about the frustrations of the City Planning Department’s restrictions on mapped tax parcel data, I learned that the foundation of their “MapPLUTO” product is now available for free online.

This is a partial – but very important – victory for anyone who has been impacted by the city’s burdensome fees and license restrictions associated with MapPLUTO.

The good news is that the Department of Finance has decided to post its “Digital Tax Map” in GIS format online for free download.  (Thanks to Colin Reilly for alerting me to the online data.)  Here’s the link:

Start mapping real property data!

To some extent, this pulls the rug out from under City Planning’s efforts to restrict access to tax parcel data, enabling anyone now to analyze and map the spatial patterns of land use, real property tax assessment, ownership, and more across the five boroughs.  Here’s what you’ll need to do:

  1. Download the Digital Tax Map file from the city’s Open Data Portal (when you unzip this file, you’ll actually receive a collection of GIS shapefiles and data tables);
  2. Download the assessment roll file (also for free online).  The Finance Dept makes this available in Microsoft Access format.  You’ll want to download the separate files for “Tax Class 1” and “Tax Classes 2, 3, and 4”.  The “condensed” version of the file only has a limited number of fields, nowhere near what MapPLUTO has;
  3. Combine the two “Tax Class” downloads into a single file;
  4. Join this combined file with the “DTM_1212_Tax_Lot_Polygon” shapefile using a robust GIS package such as ArcGIS or QGIS; and
  5. Map away!

Not a total MapPLUTO replacement, though

The “Tax Class 1” and “Tax Classes 2, 3, and 4” assessment roll files contain most, though not all, of what the City Planning Department packages as part of its MapPLUTO product.  Some missing items include:

  • Detailed parcel-level zoning characteristics;
  • Floor Area Ratio (FAR), a critical factor in making parcel-specific land use decisions;
  • Land use characteristics (though this can be calculated based on the assessment roll’s “building class” codes and a formula published by City Planning);
  • The various tract and district IDs for each parcel (but this can be calculated using GIS);
  • Parcel-specific easements;
  • If the property is a designated NYC landmark; and
  • There may be other differences that I haven’t noticed.

While some of these characteristics can be calculated, others cannot without City Planning’s involvement (since they maintain the data on parcel-by-parcel zoning and FAR, for example).

Also, there can be some confusion over linking assessment roll tabular data to tax parcel boundaries for tax lots that are condos.  I can discuss this in a separate blog post, or perhaps others can weigh in on this topic.

So the Digital Tax Map plus the assessment roll files are not a complete replacement for MapPLUTO.  That’s one reason this is only a partial victory.  I would imagine that the information in these combined files will enable many groups and individuals to avoid using MapPLUTO completely.  But other organizations that rely on characteristics such as FAR, detailed zoning, easements, etc will still need the more complete MapPLUTO package.

We still need to Free MapPLUTO

But the availability of the Digital Tax Map shapefiles greatly undercuts City Planning’s ability to levy fees and impose license restrictions on the public for this data so essential to understanding our city.  It underscores how unnecessary it was for City Planning to be involved in selling the data in the first place.  For several years now the tax parcel boundaries have been maintained by the Dept of Finance, and the assessment roll data that provides the bulk of PLUTO is created and maintained by Finance too.  So why has City Planning been selling data from other agencies as its own?

It also begs the question: now that the Digital Tax Map and the assessment roll data is free online, why is City Planning still selling/licensing MapPLUTO?  Is this an oversight on their part?  Or does City Planning think they that an unaware public will still come to them for MapPLUTO so they can extract more fees?  Either way the fees for MapPLUTO should end immediately, even if it requires the Mayor’s office to step in and require his agencies to comply with Local Law 11.

And it would be more than a nice gesture if the city refunded the past decade’s worth of license fees the city has collected on the backs of local community groups, academic institutions, students, and others who’ve had to pay City Planning in order to access mapped tax parcel data.

Communication to help bridge the data gap?

Btw, while it’s wonderful the Digital Tax Map files are now available online, I wonder why it took my blog post to reveal the availability of the files?

I’ve known for some time that the Dept of Information Technology and Telecommunications (DoITT) maintains an interactive map displaying the tax parcel boundaries.  But there’s no download option at that mapping site for the boundary data.

Also, I regularly check the Finance Department’s website where you can download the assessment roll data.  Even today, there’s no mention that the Digital Tax Map is available for free online.  Nor does the Finance Department’s web page explaining the Digital Tax Map project mention anything about a download option.

I also regularly search the city’s Open Data Portal, but I hadn’t come across the Digital Tax Map file until Colin posted a comment today at my blog.  If you sort the Open Data list by “Newest” or “Recently Updated”, the Digital Tax Map doesn’t show up in the first several pages.

I think this speaks to the need for communication between the agencies that create the data, and the various constituencies of groups that use (or might hope to use) the city’s data sets.  Simply posting something to the portal is not enough.  If the city truly wants to foster innovation by making its data files more open, it would help if either the agencies or the Mayor’s office or some entity within city government provided regular communication about data that’s available, how to use it, what it shouldn’t be used for, etc.

Nonetheless, the city has taken an important step in opening up access to tax parcel information with the Digital Tax Map.  Looking forward to more to come!

A Modest Proposal for NYC Tax Parcel Data

On behalf of all the urban planning students, local nonprofits, neighborhood groups, Community Boards, journalists, and others who’ve paid cold hard cash to the NYC Department of City Planning for the “privilege” of having license-restricted access to the city’s tax parcel data, I’d like to make a modest proposal:

New York City’s Planning Department should refund the fees they’ve collected for the past decade from all of MapPLUTO’s licensees, and  MapPLUTO should be posted online for free downloading.

We’re talking real money for many local groups

The MapPLUTO database was conceived by City Planning circa 2003 as the successor to earlier efforts to license and sell tax parcel boundaries.

Based on an article this week from The New York World, City Planning has collected up to $80,000 a year from the sale of MapPLUTO data.  Over a decade, that’s $800,000.  According to a response from City Planning to a Freedom of Information Law (FOIL) request by 596 Acres for a list of all PLUTO licensees from 2003 to 2012, there have been almost 400 licensees (including several dozen city agencies, which I’ll discuss separately below).

It’s hard to say the exact amounts that each group has paid to City Planning; as far as I know, City Planning has never released a full accounting of the fees they’ve received from MapPLUTO licenses.  In this era of transparent government, we should be able to find this out.  But this information is hidden behind City Planning’s walls.  Even a search for “MapPLUTO” or “PLUTO” at CheckbookNYC reveals nothing.

I do know that my organization, the Center for Urban Research at The Graduate Center / CUNY, has spent $7,500 in MapPLUTO license fees since 2006.  Before that, the mapping project I co-founded at NYPIRG also licensed MapPLUTO and paid City Planning several thousand dollars over several years.

That’s real money, especially to a nonprofit group and modest academic research center.  And it’s money I think we – and all the other MapPLUTO licensees – never should have had to pay.

(Note: my critique shouldn’t detract from the great work that the Dept of City Planning does in so many other areas, including the other data sets that the agency makes available for free online.)

Why does the Dept of City Planning restrict access to such an important database?

This week’s New York World article highlights the absurdity of the city’s efforts to charge fees for the data.  MapPLUTO is based on data that City Planning obtains from other city agencies. It’s not new data. It’s not data that has been created so that City Planning can sell it.  It’s data that’s been compiled using taxpayer dollars, for the purposes of land use analysis and planning.  The data has already been paid for by the public, and the Planning Department shouldn’t be justified in charging extra for it.

City Planning has worked hard to keep a lock on the fees they receive (though as I understand it, the Planning Department doesn’t even receive the fees directly – the funds are put in the city’s general fund):

  • The Planning Department requires MapPLUTO users to sign a license agreement that prohibits any kind of sharing or reuse of the data.
  • According to the agreement, “unlicensed third parties” cannot have access to the data.
  • There’s an additional prohibition for distributing the “geographic coordinates” contained in MapPLUTO – i.e, the GIS representation of tax parcel boundaries that you need to map the data and analyze it spatially.
  • Using the data for a product that will be resold (such as a mobile app) is prohibited.
  • And MapPLUTO “or any of its components” cannot be “place[d]… on the Internet.”

Now that the city’s Open Data Law requires agencies to post data online, City Planning is falling back on the argument that since the MapPLUTO data comes from other agencies, they don’t have to post it (an exemption in the law).  It’s up to the other agencies to do so.  But as Dominic Mauro from the Transparency Working Group puts it:

If you’re getting paid for this data, I don’t see how they can reasonably claim that this is not their data.

In other words, City Planning can’t have it both ways.

City Planning also claims copyright over the MapPLUTO data.  I’m all for giving credit where credit is due – City Planning should be cited whenever MapPLUTO data is used (and for that matter, all the individual agencies from whom City Planning gets the data should be cited as well).  But why control what can be done with the data?  Why limit its use?  This just stifles innovation and entrepreneurship, not to mention any local community planning work that might be prohibited by the license or by copyright.

Indeed, allowing app developers, realtors, architectural firms, consulting groups, and any other for-profit entity to use the city’s tax parcel data at no cost and with no restrictions can only help the city.  Removing these restrictions opens up business opportunities, and with business growth comes job creation and tax revenue, precisely the kinds of things that our current Mayor has been keen on promoting.

And removing barriers to MapPLUTO makes it easier for nonprofits, academic institutions, and students to engage in local planning efforts on a level playing field.  If only groups that can afford the data can use it, the rest of us are at a disadvantage.

Will the city actually enforce its restrictive practices?

What if you decide to ignore the license or copyright restrictions? City Planning reserves its right to come after you.  According to the New York World article, the Department of City Planning says that “Any such use without a license could give rise to an enforcement action.”

An “enforcement action”?  Really?  When Mayor Bloomberg signed Local Law 11, he said “If we’re going to continue leading the country in innovation and transparency, we’re going to have to make sure that all New Yorkers have access to the data that drives our City.”   I don’t think there’s any dispute that real estate is one of the key drivers of the city.  So I wonder if Mayor Bloomberg’s planning agency would go after a NYC BigApps entrant who uses MapPLUTO data in a web-based app?  Would the Mayor approve of City Planning suing a local nonprofit group that posts the data online? Would he let City Planning take enforcement action against Cornell University if Cornell’s new technology campus developed a profitable product that relied on MapPLUTO data?

And from the perspective of investing tax dollars, would city funds be best spent on lawsuits, or on facilitating innovation?

You’re not alone: other city agencies have paid for MapPLUTO, too!

City Planning’s efforts to control access to MapPLUTO data haven’t been reserved only for those outside city government.  The Planning Department has even imposed its restrictions and fees on other city agencies.

According to the list of MapPLUTO licensees uncovered by 596 Acres, City Planning has issued licenses to the Mayor’s Office, the Dept of Information Technology and Telecommuncations (DoITT), the Office of Emergency Management (OEM), the Police Department, the Law Department (presumably they’re the ones who need to review the license in the first place!), the City Council, and several Community Boards.

What possible reason could City Planning have for wanting or needing to know how and why these agencies are using MapPLUTO data?  Why should city agencies need to license data from another city agency?  And some of these agencies – especially Dept of Finance, but also Parks and Recreation, the Dept of Citywide Administrative Services (DCAS), and the Landmarks Commission – are the very agencies that City Planning gets the data from to create MapPLUTO in the first place!

Not to pile on (but it’s so easy to do with such an absurd situation), City Planning historically has not only required licenses from other city agencies, but City Planning previously required other agencies to pay a fee to obtain tax parcel boundary files.  In 2000, for example, not only were the fees for tax parcel data files higher ($1,150 per borough, rather than the current $300/borough fee), but the fees were “$750 per borough for New York City agencies”.

That must’ve made for some interesting discussions among agency heads during budget time.  As far as I know, that practice ended soon thereafter.  But it’s evidence of City Planning’s inexplicable and ongoing effort to control access to tax parcel data, and to try to profit from it, even from their own colleagues in city government.

Tear down the paywall (and offer some payback while you’re at it)

Now that the city has a law requiring data to be freely available online, there’s strong justification for removing the fees and the license requirements and copyright restrictions.  But frankly, we’ve already had a law requiring data such as MapPLUTO to be made available with no restrictions and for no more than the cost of distribution (such as what it could cost to copy the files to a DVD or to post them online).  That’s the New York State Freedom of Information Law, in effect since the mid-1970s.

What’s especially curious – and frustrating – about City Planning’s persistence in restricting access to tax parcel data is that the agency has made great strides in opening up access to other data sets it maintains.

A decade ago City Planning was charging fees to the public and other agencies for data as simple as a GIS file representing borough boundaries, or Census tract boundaries, or Community Boards.  One by one the agency has removed these fees and developed what I consider a model website for making agency data publicly accessible: the “Bytes of the Big Apple” website (overly cute name for a very useful site).

Even as recently as Fall 2012, the Planning Department removed the fee it had been charging for its “Geosupport Desktop Edition”, a software and data package that takes a list of street addresses and returns information about each address’s building ID, tax parcel ID, and more.  City Planning previously was selling this package for $2,500 a year (and more if you wanted more frequent updates).  In terms of the time involved by City Planning to create this application, maintain it, and keep it updated, I would imagine it’s worth much more than the effort to update the MapPLUTO data.  Yet “Geosupport” is now free, but we still have to pay for MapPLUTO.  I don’t get it.

Free MapPLUTO!

The time is right for the Department of City Planning to change its ways regarding MapPLUTO – the one remaining major data set it licenses for a fee.  I think City Planning should take two simple steps:

  1. immediately remove any fees and restrictions on MapPLUTO (and post the data online for anyone to download it); and
  2. refund all its MapPLUTO fees from the last 10 years.   The city should think of this as a relatively small but important investment in innovation and entrepreneurship.  And it would be a way of apologizing to all the groups and individuals who’ve effectively paid taxes twice on this information that’s so essential to understanding land use and real estate – arguably the lifeblood of the city.

Such a sensible proposal!

But what if City Planning continues to dig in its heels?  Perhaps you can try the Freedom of Information route and request the data via FOIL.  That’s what 596 Acres did, and they received MapPLUTO for a mere $5 fee!  City Planning still claimed copyright restrictions, but maybe if City Planning receives enough FOIL requests they’ll be persuaded that there’s no point in maintaining MapPLUTO’s high fees and restrictive licenses.  Here’s the link

Mapping NYC stop and frisks: some cartographic observations

WNYC’s map of stop and frisk data last week got a lot of attention by other media outlets, bloggers, and of course the Twittersphere.  (The social media editor at Fast Company even said it was “easily one of 2012’s most important visualizations“.)

I looked at the map with a critical eye, and it seemed like a good opportunity to highlight some issues with spatial data analysis, cartographic techniques, and map interpretation – hence this post. New York’s stop and frisk program is such a high profile and charged issue: maps could be helpful in illuminating the controversy, or they could further confuse things if not done right. In my view the WNYC map falls into the latter category, and I offer some critical perspectives below.


It’s a long post 🙂 . Here’s the summary:

  • WNYC’s map seems to show an inverse relationship between stop and frisks and gun recovery, and you can infer that perhaps the program is working (it’s acting as a deterrent to guns) or it’s not (as WNYC argues, “police aren’t finding guns where they’re looking the hardest”). But as a map, I don’t think it holds up well, and with a closer look at the data and a reworking of the map, the spatial patterns of gun recovery and stop and frisks appear to overlap.
  • That said, the data on gun recovery is so slim that it’s hard to develop a map that reveals meaningful relationships. Other visualizations make the point much better; the map just risks obscuring the issue. When we’re dealing with such an important — and controversial — issue, obscuring things is not what we want. Clarity is paramount.
  • I also make some other points about cartographic techniques (diverging vs. sequential color schemes, black light poster graphics vs. more traditional map displays). And I note that there’s so much more to the stop and frisk data that simply overlaying gun recovery locations compared with annual counts of stop and frisks seems like it will miss all sorts of interesting, and perhaps revealing, patterns.

As far as the map itself, here’s a visual summary comparing the WNYC map with other approaches.  I show three maps below (each one zoomed in on parts of Manhattan and the Bronx with stop and frisk hot spots):

  • the first reproduces WNYC’s map, with its arbitrary and narrow depiction of “hot spots” (I explain why I think it’s arbitrary and narrow later in the post);

WNYC map

  • the second map uses WNYC’s colors but the shading reflects the underlying data patterns (it uses a threshold that represents 10% of the city’s Census blocks and 70% of the stop and frisks); and

Modified hot spots (10% blocks representing 70% stop and frisks)

  • the third uses a density grid technique that ignores artificial Census block boundaries and highlights the general areas with concentrated stop and frisk activity, overlain with gun recoveries to show that the spatial patterns are similar.

Density grid

What WNYC’s map seems to show

The article accompanying the map says:

We located all the “hot spots” where stop and frisks are concentrated in the city, and found that most guns were recovered on people outside those hot spots—meaning police aren’t finding guns where they’re looking the hardest.

The map uses a fluorescent color scheme to show the pattern, by Census block, of the number of stop and frisk incidents in 2011 compared with point locations mapped in fluorescent green to show the number of stop and frisks that resulted in gun recovery.

The map is striking, no question. And at first glance it appears to support the article’s point that guns are being recovered in different locations from the “hot spots” of stop, question, and frisk incidents.

But let’s dig a bit deeper.

Do the data justify a map?

This is a situation where I don’t think I would’ve made a map in the first place. The overall point – that the number of guns recovered by stop and frisks in New York is infinitesimally small compared to the number of stop and frisk incidents, putting the whole program into question – is important. But precisely because the number of gun recovery incidents is so small (less than 800 in 2011 vs. more than 685,000 stop and frisks), it makes it unlikely that we’ll see a meaningful spatial pattern, especially at the very local level (in this case, Census blocks which form the basis of WNYC’s map).

And the point about extremely low levels of gun recovery compared with the overwhelming number of stop and frisk incidents has already been presented effectively with bar charts and simple numeric comparisons, or even infographics like this one from NYCLU’s latest report:

If we made a map, how would we represent the data?

For the point of this blog post, though, let’s assume the data is worth mapping.

WNYC’s map uses the choropleth technique (color shading varies in intensity corresponding to the intensity of the underlying data), and they use an “equal interval” approach to classify the data. They determined the number of stop and frisk incidents by Census block and assigned colors to the map by dividing the number of stop and frisks per block into equal categories: 1 to 100, 100 to 200, 200 to 300, and 400 and above.

(Later in this post I comment on the color pattern itself – diverging, rather than sequential – and also about the fluorescent colors on a black background.)

Although they don’t define “hot spot,” it appears that a hot spot on WNYC’s map is any block with more than either 200, 300, or 400 stop and frisks (the pink-to-hotpink blocks on their map).  If we take the middle value (300 stop and frisks per block), then the article’s conclusion that “most guns were recovered on people outside those hot spots” is correct:

  • there are a mere 260 Census blocks with a stop and frisk count above 300, and in these blocks there were only 81 stop and frisk incidents during which guns were recovered;
  • this accounts for only 10% of the 779 stop and frisks that resulted in gun recoveries in that year.

But you could argue that not only is the WNYC definition of a “hot spot” arbitrary, but it’s very narrow. Their “hot spot” blocks accounted for about 129,000 stop and frisks, or only 19% of the incidents that had location coordinates (665,377 stop and frisks in 2011). These blocks also represent less than 1% (just 0.66%) of the 39,148 Census blocks in the city, so these are extreme hot spots.

The underlying data do not show any obvious reason to use 300 (or 200 or 400) as the threshold for a hot spot – there’s no “natural break” in the data at 300 stop and frisks per block, for example, and choosing the top “0.66%” of blocks rather than just 1%, or 5%, or 10% of blocks doesn’t seem to fit any statistical rationale or spatial pattern.

If we think of hot spots as areas (not individual Census blocks) where most of the stop and frisk activity is taking place, while also being relatively concentrated geographically, a different picture emerges and WNYC’s conclusion doesn’t hold up.

[A note on my methodology: In order to replicate WNYC’s map and data analysis, I used the stop and frisk data directly from the NYPD, and used ArcGIS to create a shapefile of incidents based on the geographic coordinates in the NYPD file. I joined this with the Census Bureau’s shapefile of 2010 Census blocks. I determined the number of stop and frisks that resulted in gun recovery slightly different than WNYC: they only included stop and frisks that recovered a pistol, rifle, or machine gun. But the NYPD data also includes a variable for the recovery of an assault weapon; I included that in my totals.]

Choropleth maps: it’s all in the thresholds

Creating a meaningful choropleth map involves a balancing act of choosing thresholds, or range breaks, that follow breaks in the data and also reveal interesting spatial patterns (geographic concentration, dispersion, etc) while being easy to comprehend by your map readers.

If we look at the frequency distribution of stop and frisks in 2011 by Census block, we start to see the following data pattern (the excerpt below is the first 40 or so rows of the full spreadsheet, which is available here: sqf_2011_byblock_freq):

Click the image for a high-resolution version.

The frequency distribution shows that most blocks on a citywide basis have very few stop and frisks:

  • Almost a third have no incidents.
  • 70% of blocks have less than 9 incidents each while the remaining 30% of blocks account for almost 610,000 incidents (92%).
  • 80% of blocks have less than 17 stop and frisks each, while the remaining 20% account for 560,000 incidents (almost 85%).
  • 90% of the blocks have 38 or fewer incidents, while the remaining 10% account for 460,000 incidents (just under 70% of all stop and frisks).

It’s a very concentrated distribution. And it’s concentrated geographically as well. The following maps use WNYC’s color scheme, modified so that there’s one blue color band for the blocks with the smallest number of stop and frisks, and then pink-to-hot pink for the relatively few blocks with the greatest number of stop and frisks. The maps below vary based on the threshold values identified in the spreadsheet above:

30% of blocks are “hot”, accounting for 92% of stop and frisks

20% of blocks are “hot”, accounting for 84% of stop and frisks

10% of blocks are “hot”, accounting for 70% of stop and frisks

In the choropleth balancing act, I would say that a threshold of 9 or 17 stop and frisks per block is low, and results in too many blocks color-coded as “hot”. A threshold of 38 reveals the geographic concentrations, follows a natural break in the data, and uses an easily understood construct: 10% of the blocks accounting for 70% of the stop and frisks.

We could take this a step further and use the threshold corresponding to the top 5% of blocks, and it would look like the following — here’s an excerpt from the spreadsheet that identifies the number of stop and frisks per block that we would use for the range break (74):

Click the image for a high-resolution version.

And here’s the resulting map:

But this goes perhaps too far – the top 5% of blocks only account for half of the stop and frisks, and the geographic “footprint” of the highlighted blocks become too isolated – they lose some of the area around the bright pink blocks that represent areas of heightened stop and frisk activity. (Although even the 74 stop and frisks per block threshold is better than the arbitrary value of 300 in WNYC’s map.)

The two maps below compare WNYC’s map with this modified approach that uses 38 stop and frisks per block as the “hot spot” threshold (for map readability purposes I rounded up to 40). The maps are zoomed in on two areas of the city with substantial concentrations of stop and frisk activity – upper Manhattan and what would loosely be called the “South Bronx”:

WNYC map

Modified thresholds: 1-40, 41-100, 101-400, 400+

To me, the second map is more meaningful:

  • it’s based on a methodology that follows the data;
  • visually, it shows that the green dots are located generally within the pink-to-hot pink areas, which I think is probably more in line with how the Police Department views its policing techniques — they certainly focus on specific locations, but community policing is undertaken on an area-wide basis; and
  • quantitatively the second map reveals that most gun recoveries in 2011 were in Census blocks where most of the stop and frisks took place (the opposite of WNYC’s conclusion). The pink-to-hot pink blocks in the second map account for 433 recovered guns, or 56% of the total in 2011.

The following two maps show this overlap on a citywide basis, and zoomed in on the Brooklyn-Queens border:

Modified thresholds, citywide, with gun recovery incidents

Modified thresholds, along Brooklyn-Queens border, with gun recovery incidents

I’m not defending the NYPD’s use of stop and frisks; I’m simply noting that a change in the way a map is constructed (and in this case, changed to more closely reflect the underlying data patterns) can substantially alter the conclusion you would make based on the spatial relationships.

Hot spot rasters: removing artificial boundaries

If I wanted to compare the stop and frisk incidents to population density, then I’d use Census blocks. But that’s not necessarily relevant here (stop and frisks may have more to do with where people shop, work, or recreate than where they live).

It might be more appropriate to aggregate and map the number of stop and frisks by neighborhood (if your theory is to understand the neighborhood dynamics that may relate to this policing technique), or perhaps by Community Board (if there are land use planning issues at stake), or by Police Precinct (since that’s how the NYPD organizes their activities).

But each of these approaches runs into the problem of artificial boundaries constraining the analysis. If we are going to aggregate stop and frisks up to a geographic unit such as blocks, we need to know a few things that aren’t apparent in the data or the NYPD’s data dictionaries:

  • Were the stop and frisks organized geographically by Census block in the first place, or were they conducted along a street (which might be straddled by two Census blocks) or perhaps within a given neighborhood in a circular pattern over time around a specific location in the hopes of targeting suspects believed to be concealing weapons, that resulted in a single gun recovery preceded by many area-wide stop and frisks? In other words, I’m concerned that it’s arbitrary to argue that a gun recovery has to be located within a Census block to be related to only the stop and frisks within that same block.
  • Also, we need to know more about the NYPD’s geocoding process. For example, how were stop and frisks at street intersections assigned latitude/longitude coordinates? If the intersection is a common node for four Census blocks, were the stop and frisks allocated to one of those blocks, or dispersed among all four? If the non-gun recovery stop and frisks were assigned to one block but the gun recovery stop and frisk was assigned to an immediately adjacent block, is the gun recovery unrelated to the other incidents?

As I’ve noted above, the meager number of gun recoveries makes it challenging to develop meaningful spatial theories. But if I were mapping this data, I’d probably use a hot spot technique that ignored Census geography and followed the overall contours of the stop and frisk patterns.

A hot spot is really more than individual Census blocks with the highest stop and frisk incidents. It also makes sense to look at the Census blocks that are adjacent to, and perhaps nearby, the individual blocks with the most stop and frisks. That’s typically what a hot spot analysis is all about, as one of the commenters at the WNYC article pointed out (Brian Abelson). He referred to census tracts instead of blocks, but he noted that:

A census tract is a highly arbitrary and non-uniform boundary which has no administrative significance. If we are truly interested in where stops occur the most, we would not like those locations to be a product of an oddly shaped census tract (this is especially a problem because census tracts are drawn along major streets where stops tend to happen). So a hot spot is only a hot spot when the surrounding census tracts are also hot, or at least “warm.”

Census block boundaries are less arbitrary than tracts, but the principle applies to blocks as well. A hot spot covers an area not constrained by artificial administrative boundaries. The National Institute of Justice notes that “hot spot” maps often use a density grid to reveal a more organic view of concentrated activity:

Density maps, for example, show where crimes occur without dividing a map into regions or blocks; areas with high concentrations of crime stand out.

If we create a density grid and plot the general areas where a concentration of stop and frisks has taken place, using the “natural breaks” algorithm to determine category thresholds (modified slightly to add categories in the lower values to better filter out areas with low levels of incidence), we get a map that looks like this:

There were so many stop and frisks in 2011 that the density numbers are high. And of course, the density grid is an interpolation of the specific locations – so it shows a continuous surface instead of discrete points (in effect, predicting where stop and frisks would take place given the other incidents in the vicinity). But it highlights the areas where stop and frisk activity was the most prevalent – the hot spots – regardless of Census geography or any other boundaries.

Plotting the individual gun recovery locations against these hot spots produces the following map:

The spatial pattern of gun recoveries generally matches the hot spots.

Nonetheless, even this density map perhaps is too generalized. There are additional analyses we can do on the stop and frisk data that might result in a more precise mapping of the hot spots – techniques such as natural neighbor, kriging, and others; controlling the density surface by introducing boundaries between one concentration of incidents and others (such as highways, parks, etc); and filtering the stop and frisk data using other variables in the data set (more on that below). Lots of resources available online and off to explore. And many spatial analysts that are much more expert at these techniques than me.

Other map concerns

I replicated WNYC’s diverging color scheme for my modified maps above. But diverging isn’t really appropriate for data that go from low number of stop and frisks per Census block to high. A sequential color pattern is probably better, though I think that would’ve made it harder to use the fluorescent colors chosen by WNYC (a completely pink-to-hot pink map may have been overwhelming). As ColorBrewer notes, a diverging color scheme:

puts equal emphasis on mid-range critical values and extremes at both ends of the data range. The critical class or break in the middle of the legend is emphasized with light colors and low and high extremes are emphasized with dark colors that have contrasting hues.

With this data, there’s no need for a “critical break” in the middle, and the low and high values don’t need emphasis, just the high. The following example map offers an easier to read visualization of the patterns than the fluorescent colors, where the low value areas fade into the background and the high value “hot spots” are much more prominent:

This map might be a bit boring compared to the WNYC version 🙂 but to me it’s more analytically useful. I know that recently the terrific team at MapBox put together some maps using fluorescent colors on a black background that were highly praised on Twitter and in the blogs. To me, they look neat, but they’re less useful as maps. The WNYC fluorescent colors were jarring, and the hot pink plus dark blue on the black background made the map hard to read if you’re trying to find out where things are. It’s a powerful visual statement, but I don’t think it adds any explanatory value.

Other data considerations

The stop and frisk databases from NYPD include an incredible amount of information. All sorts of characteristics of each stop and frisk are included, the time each one took place, the date, etc. And the data go back to 2003. If you’d like to develop an in-depth analysis of the data – spatially, temporally – you’ve got a lot to work with. So I think a quick and not very thorough mapping of gun recovery compared with number of stop and frisks doesn’t really do justice to what’s possible with the information. I’m sure others are trying to mine the data for all sorts of patterns. I look forward to seeing the spatial relationships.

The takeaway

No question that a massive number of stop and frisks have been taking place in the last few years with very few resulting in gun recovery. But simply mapping the two data sets without accounting for underlying data patterns, temporal trends, and actual hot spots rather than artificial block boundaries risks jumping to conclusions that may be unwarranted. When you’re dealing with an issue as serious as individual civil rights and public safety, a simplified approach may not be enough.

The WNYC map leverages a recent fad in online maps: fluorescent colors on a black background. It’s quite striking, perhaps even pretty (and I’m sure it helped draw lots of eyeballs to WNYC’s website). I think experimenting with colors and visual displays is good. But in this case I think it clouds the picture.