Advertisements
  • SR_spatial tweets

    • RT @MikeLydon: All politics aside, we should celebrate Alabama’s vote for human decency tonight. However, we should also loathe that it was… 8 hours ago
    • RT @DrPhilGoff: I get that this is a big election. But I really won’t have a high tolerance for folks celebrating “the sex offending slaver… 9 hours ago
    • RT @BillKristol: I’m sitting at Midway airport and—you won’t believe this!—I don’t see anyone else on his computer feverishly trying to fig… 9 hours ago

Mapping NYC stop and frisks: some cartographic observations

WNYC’s map of stop and frisk data last week got a lot of attention by other media outlets, bloggers, and of course the Twittersphere.  (The social media editor at Fast Company even said it was “easily one of 2012’s most important visualizations“.)

I looked at the map with a critical eye, and it seemed like a good opportunity to highlight some issues with spatial data analysis, cartographic techniques, and map interpretation – hence this post. New York’s stop and frisk program is such a high profile and charged issue: maps could be helpful in illuminating the controversy, or they could further confuse things if not done right. In my view the WNYC map falls into the latter category, and I offer some critical perspectives below.

TL; DR

It’s a long post 🙂 . Here’s the summary:

  • WNYC’s map seems to show an inverse relationship between stop and frisks and gun recovery, and you can infer that perhaps the program is working (it’s acting as a deterrent to guns) or it’s not (as WNYC argues, “police aren’t finding guns where they’re looking the hardest”). But as a map, I don’t think it holds up well, and with a closer look at the data and a reworking of the map, the spatial patterns of gun recovery and stop and frisks appear to overlap.
  • That said, the data on gun recovery is so slim that it’s hard to develop a map that reveals meaningful relationships. Other visualizations make the point much better; the map just risks obscuring the issue. When we’re dealing with such an important — and controversial — issue, obscuring things is not what we want. Clarity is paramount.
  • I also make some other points about cartographic techniques (diverging vs. sequential color schemes, black light poster graphics vs. more traditional map displays). And I note that there’s so much more to the stop and frisk data that simply overlaying gun recovery locations compared with annual counts of stop and frisks seems like it will miss all sorts of interesting, and perhaps revealing, patterns.

As far as the map itself, here’s a visual summary comparing the WNYC map with other approaches.  I show three maps below (each one zoomed in on parts of Manhattan and the Bronx with stop and frisk hot spots):

  • the first reproduces WNYC’s map, with its arbitrary and narrow depiction of “hot spots” (I explain why I think it’s arbitrary and narrow later in the post);

WNYC map

  • the second map uses WNYC’s colors but the shading reflects the underlying data patterns (it uses a threshold that represents 10% of the city’s Census blocks and 70% of the stop and frisks); and

Modified hot spots (10% blocks representing 70% stop and frisks)

  • the third uses a density grid technique that ignores artificial Census block boundaries and highlights the general areas with concentrated stop and frisk activity, overlain with gun recoveries to show that the spatial patterns are similar.

Density grid


What WNYC’s map seems to show

The article accompanying the map says:

We located all the “hot spots” where stop and frisks are concentrated in the city, and found that most guns were recovered on people outside those hot spots—meaning police aren’t finding guns where they’re looking the hardest.

The map uses a fluorescent color scheme to show the pattern, by Census block, of the number of stop and frisk incidents in 2011 compared with point locations mapped in fluorescent green to show the number of stop and frisks that resulted in gun recovery.

The map is striking, no question. And at first glance it appears to support the article’s point that guns are being recovered in different locations from the “hot spots” of stop, question, and frisk incidents.

But let’s dig a bit deeper.

Do the data justify a map?

This is a situation where I don’t think I would’ve made a map in the first place. The overall point – that the number of guns recovered by stop and frisks in New York is infinitesimally small compared to the number of stop and frisk incidents, putting the whole program into question – is important. But precisely because the number of gun recovery incidents is so small (less than 800 in 2011 vs. more than 685,000 stop and frisks), it makes it unlikely that we’ll see a meaningful spatial pattern, especially at the very local level (in this case, Census blocks which form the basis of WNYC’s map).

And the point about extremely low levels of gun recovery compared with the overwhelming number of stop and frisk incidents has already been presented effectively with bar charts and simple numeric comparisons, or even infographics like this one from NYCLU’s latest report:

If we made a map, how would we represent the data?

For the point of this blog post, though, let’s assume the data is worth mapping.

WNYC’s map uses the choropleth technique (color shading varies in intensity corresponding to the intensity of the underlying data), and they use an “equal interval” approach to classify the data. They determined the number of stop and frisk incidents by Census block and assigned colors to the map by dividing the number of stop and frisks per block into equal categories: 1 to 100, 100 to 200, 200 to 300, and 400 and above.

(Later in this post I comment on the color pattern itself – diverging, rather than sequential – and also about the fluorescent colors on a black background.)

Although they don’t define “hot spot,” it appears that a hot spot on WNYC’s map is any block with more than either 200, 300, or 400 stop and frisks (the pink-to-hotpink blocks on their map).  If we take the middle value (300 stop and frisks per block), then the article’s conclusion that “most guns were recovered on people outside those hot spots” is correct:

  • there are a mere 260 Census blocks with a stop and frisk count above 300, and in these blocks there were only 81 stop and frisk incidents during which guns were recovered;
  • this accounts for only 10% of the 779 stop and frisks that resulted in gun recoveries in that year.

But you could argue that not only is the WNYC definition of a “hot spot” arbitrary, but it’s very narrow. Their “hot spot” blocks accounted for about 129,000 stop and frisks, or only 19% of the incidents that had location coordinates (665,377 stop and frisks in 2011). These blocks also represent less than 1% (just 0.66%) of the 39,148 Census blocks in the city, so these are extreme hot spots.

The underlying data do not show any obvious reason to use 300 (or 200 or 400) as the threshold for a hot spot – there’s no “natural break” in the data at 300 stop and frisks per block, for example, and choosing the top “0.66%” of blocks rather than just 1%, or 5%, or 10% of blocks doesn’t seem to fit any statistical rationale or spatial pattern.

If we think of hot spots as areas (not individual Census blocks) where most of the stop and frisk activity is taking place, while also being relatively concentrated geographically, a different picture emerges and WNYC’s conclusion doesn’t hold up.

[A note on my methodology: In order to replicate WNYC’s map and data analysis, I used the stop and frisk data directly from the NYPD, and used ArcGIS to create a shapefile of incidents based on the geographic coordinates in the NYPD file. I joined this with the Census Bureau’s shapefile of 2010 Census blocks. I determined the number of stop and frisks that resulted in gun recovery slightly different than WNYC: they only included stop and frisks that recovered a pistol, rifle, or machine gun. But the NYPD data also includes a variable for the recovery of an assault weapon; I included that in my totals.]

Choropleth maps: it’s all in the thresholds

Creating a meaningful choropleth map involves a balancing act of choosing thresholds, or range breaks, that follow breaks in the data and also reveal interesting spatial patterns (geographic concentration, dispersion, etc) while being easy to comprehend by your map readers.

If we look at the frequency distribution of stop and frisks in 2011 by Census block, we start to see the following data pattern (the excerpt below is the first 40 or so rows of the full spreadsheet, which is available here: sqf_2011_byblock_freq):

Click the image for a high-resolution version.

The frequency distribution shows that most blocks on a citywide basis have very few stop and frisks:

  • Almost a third have no incidents.
  • 70% of blocks have less than 9 incidents each while the remaining 30% of blocks account for almost 610,000 incidents (92%).
  • 80% of blocks have less than 17 stop and frisks each, while the remaining 20% account for 560,000 incidents (almost 85%).
  • 90% of the blocks have 38 or fewer incidents, while the remaining 10% account for 460,000 incidents (just under 70% of all stop and frisks).

It’s a very concentrated distribution. And it’s concentrated geographically as well. The following maps use WNYC’s color scheme, modified so that there’s one blue color band for the blocks with the smallest number of stop and frisks, and then pink-to-hot pink for the relatively few blocks with the greatest number of stop and frisks. The maps below vary based on the threshold values identified in the spreadsheet above:

30% of blocks are “hot”, accounting for 92% of stop and frisks

20% of blocks are “hot”, accounting for 84% of stop and frisks

10% of blocks are “hot”, accounting for 70% of stop and frisks

In the choropleth balancing act, I would say that a threshold of 9 or 17 stop and frisks per block is low, and results in too many blocks color-coded as “hot”. A threshold of 38 reveals the geographic concentrations, follows a natural break in the data, and uses an easily understood construct: 10% of the blocks accounting for 70% of the stop and frisks.

We could take this a step further and use the threshold corresponding to the top 5% of blocks, and it would look like the following — here’s an excerpt from the spreadsheet that identifies the number of stop and frisks per block that we would use for the range break (74):

Click the image for a high-resolution version.

And here’s the resulting map:

But this goes perhaps too far – the top 5% of blocks only account for half of the stop and frisks, and the geographic “footprint” of the highlighted blocks become too isolated – they lose some of the area around the bright pink blocks that represent areas of heightened stop and frisk activity. (Although even the 74 stop and frisks per block threshold is better than the arbitrary value of 300 in WNYC’s map.)

The two maps below compare WNYC’s map with this modified approach that uses 38 stop and frisks per block as the “hot spot” threshold (for map readability purposes I rounded up to 40). The maps are zoomed in on two areas of the city with substantial concentrations of stop and frisk activity – upper Manhattan and what would loosely be called the “South Bronx”:

WNYC map

Modified thresholds: 1-40, 41-100, 101-400, 400+

To me, the second map is more meaningful:

  • it’s based on a methodology that follows the data;
  • visually, it shows that the green dots are located generally within the pink-to-hot pink areas, which I think is probably more in line with how the Police Department views its policing techniques — they certainly focus on specific locations, but community policing is undertaken on an area-wide basis; and
  • quantitatively the second map reveals that most gun recoveries in 2011 were in Census blocks where most of the stop and frisks took place (the opposite of WNYC’s conclusion). The pink-to-hot pink blocks in the second map account for 433 recovered guns, or 56% of the total in 2011.

The following two maps show this overlap on a citywide basis, and zoomed in on the Brooklyn-Queens border:

Modified thresholds, citywide, with gun recovery incidents

Modified thresholds, along Brooklyn-Queens border, with gun recovery incidents

I’m not defending the NYPD’s use of stop and frisks; I’m simply noting that a change in the way a map is constructed (and in this case, changed to more closely reflect the underlying data patterns) can substantially alter the conclusion you would make based on the spatial relationships.

Hot spot rasters: removing artificial boundaries

If I wanted to compare the stop and frisk incidents to population density, then I’d use Census blocks. But that’s not necessarily relevant here (stop and frisks may have more to do with where people shop, work, or recreate than where they live).

It might be more appropriate to aggregate and map the number of stop and frisks by neighborhood (if your theory is to understand the neighborhood dynamics that may relate to this policing technique), or perhaps by Community Board (if there are land use planning issues at stake), or by Police Precinct (since that’s how the NYPD organizes their activities).

But each of these approaches runs into the problem of artificial boundaries constraining the analysis. If we are going to aggregate stop and frisks up to a geographic unit such as blocks, we need to know a few things that aren’t apparent in the data or the NYPD’s data dictionaries:

  • Were the stop and frisks organized geographically by Census block in the first place, or were they conducted along a street (which might be straddled by two Census blocks) or perhaps within a given neighborhood in a circular pattern over time around a specific location in the hopes of targeting suspects believed to be concealing weapons, that resulted in a single gun recovery preceded by many area-wide stop and frisks? In other words, I’m concerned that it’s arbitrary to argue that a gun recovery has to be located within a Census block to be related to only the stop and frisks within that same block.
  • Also, we need to know more about the NYPD’s geocoding process. For example, how were stop and frisks at street intersections assigned latitude/longitude coordinates? If the intersection is a common node for four Census blocks, were the stop and frisks allocated to one of those blocks, or dispersed among all four? If the non-gun recovery stop and frisks were assigned to one block but the gun recovery stop and frisk was assigned to an immediately adjacent block, is the gun recovery unrelated to the other incidents?

As I’ve noted above, the meager number of gun recoveries makes it challenging to develop meaningful spatial theories. But if I were mapping this data, I’d probably use a hot spot technique that ignored Census geography and followed the overall contours of the stop and frisk patterns.

A hot spot is really more than individual Census blocks with the highest stop and frisk incidents. It also makes sense to look at the Census blocks that are adjacent to, and perhaps nearby, the individual blocks with the most stop and frisks. That’s typically what a hot spot analysis is all about, as one of the commenters at the WNYC article pointed out (Brian Abelson). He referred to census tracts instead of blocks, but he noted that:

A census tract is a highly arbitrary and non-uniform boundary which has no administrative significance. If we are truly interested in where stops occur the most, we would not like those locations to be a product of an oddly shaped census tract (this is especially a problem because census tracts are drawn along major streets where stops tend to happen). So a hot spot is only a hot spot when the surrounding census tracts are also hot, or at least “warm.”

Census block boundaries are less arbitrary than tracts, but the principle applies to blocks as well. A hot spot covers an area not constrained by artificial administrative boundaries. The National Institute of Justice notes that “hot spot” maps often use a density grid to reveal a more organic view of concentrated activity:

Density maps, for example, show where crimes occur without dividing a map into regions or blocks; areas with high concentrations of crime stand out.

If we create a density grid and plot the general areas where a concentration of stop and frisks has taken place, using the “natural breaks” algorithm to determine category thresholds (modified slightly to add categories in the lower values to better filter out areas with low levels of incidence), we get a map that looks like this:

There were so many stop and frisks in 2011 that the density numbers are high. And of course, the density grid is an interpolation of the specific locations – so it shows a continuous surface instead of discrete points (in effect, predicting where stop and frisks would take place given the other incidents in the vicinity). But it highlights the areas where stop and frisk activity was the most prevalent – the hot spots – regardless of Census geography or any other boundaries.

Plotting the individual gun recovery locations against these hot spots produces the following map:

The spatial pattern of gun recoveries generally matches the hot spots.

Nonetheless, even this density map perhaps is too generalized. There are additional analyses we can do on the stop and frisk data that might result in a more precise mapping of the hot spots – techniques such as natural neighbor, kriging, and others; controlling the density surface by introducing boundaries between one concentration of incidents and others (such as highways, parks, etc); and filtering the stop and frisk data using other variables in the data set (more on that below). Lots of resources available online and off to explore. And many spatial analysts that are much more expert at these techniques than me.

Other map concerns

I replicated WNYC’s diverging color scheme for my modified maps above. But diverging isn’t really appropriate for data that go from low number of stop and frisks per Census block to high. A sequential color pattern is probably better, though I think that would’ve made it harder to use the fluorescent colors chosen by WNYC (a completely pink-to-hot pink map may have been overwhelming). As ColorBrewer notes, a diverging color scheme:

puts equal emphasis on mid-range critical values and extremes at both ends of the data range. The critical class or break in the middle of the legend is emphasized with light colors and low and high extremes are emphasized with dark colors that have contrasting hues.

With this data, there’s no need for a “critical break” in the middle, and the low and high values don’t need emphasis, just the high. The following example map offers an easier to read visualization of the patterns than the fluorescent colors, where the low value areas fade into the background and the high value “hot spots” are much more prominent:

This map might be a bit boring compared to the WNYC version 🙂 but to me it’s more analytically useful. I know that recently the terrific team at MapBox put together some maps using fluorescent colors on a black background that were highly praised on Twitter and in the blogs. To me, they look neat, but they’re less useful as maps. The WNYC fluorescent colors were jarring, and the hot pink plus dark blue on the black background made the map hard to read if you’re trying to find out where things are. It’s a powerful visual statement, but I don’t think it adds any explanatory value.

Other data considerations

The stop and frisk databases from NYPD include an incredible amount of information. All sorts of characteristics of each stop and frisk are included, the time each one took place, the date, etc. And the data go back to 2003. If you’d like to develop an in-depth analysis of the data – spatially, temporally – you’ve got a lot to work with. So I think a quick and not very thorough mapping of gun recovery compared with number of stop and frisks doesn’t really do justice to what’s possible with the information. I’m sure others are trying to mine the data for all sorts of patterns. I look forward to seeing the spatial relationships.

The takeaway

No question that a massive number of stop and frisks have been taking place in the last few years with very few resulting in gun recovery. But simply mapping the two data sets without accounting for underlying data patterns, temporal trends, and actual hot spots rather than artificial block boundaries risks jumping to conclusions that may be unwarranted. When you’re dealing with an issue as serious as individual civil rights and public safety, a simplified approach may not be enough.

The WNYC map leverages a recent fad in online maps: fluorescent colors on a black background. It’s quite striking, perhaps even pretty (and I’m sure it helped draw lots of eyeballs to WNYC’s website). I think experimenting with colors and visual displays is good. But in this case I think it clouds the picture.

Advertisements

Citi Bike NYC: the first and last mile quantified

The NYC Department of Transportation revealed last week where they’d like to place 400 or so bike share stations in Manhattan and parts of Brooklyn and Queens, as the next step in the city’s new bikeshare program starting this summer.  (By next spring the city plans to locate a total of 600 bike share kiosks for 10,000 bikes.)

Several blogs and news reports have criticized the cost of the program as too expensive for relatively long bike trips (more than 45 minutes). But the program is really designed primarily for the “first and last mile” of local commutes and tourist trips to and from their destinations.  Now that the city’s map is out, we can evaluate how likely it is that the locations will meet this goal.

Subway and bus proximity

Last year I examined the thousands of bike share kiosks suggested by “the crowd” to see how closely they were located to subway entrances.  I determined that, as of late September 2011 based on almost 6,000 suggestions, one-third of the suggested sites were within 500 feet (actually, if I had used 750 feet — the average distance between avenues in Manhattan — it would’ve been 45% of the suggested sites located that distance or closer to a subway entrance).  You can still see the crowdsourced locations here.

So about half of “the crowd’s” suggestions were close to public transit, and the other half further away.  That seems reasonable — perhaps half the suggesters were thinking of how to link bike share with the subway system, and the other half was thinking about linking bike share to destination sites further away from mass transit.

Here’s my map from last year of the subway entrances symbolized based on the ratings of the closest suggested bike share kiosks.  This map says, “If you want to put bikeshare stations near subway entrances, these are the entrances you’d pick based on the average rating of the closest stations suggested by the crowd”:

I had suggested this as a way of prioritizing the bikeshare station siting process.  These subway entrances are the ones you’d likely start with, based on the preferences of the (bike)riding public who contributed to the DOT/OpenPlans map.

But now that the bikeshare station siting process is pretty much done, I’ve examined whether the proposed kiosks are close enough to subway and bus stops to actually facilitate their use by the intended audiences.

How do the actual proposed locations measure up?

For me, the city’s proposed bike share program is a great deal — if the kiosks are near my home and my office.  I live on Manhattan’s west side and work in midtown.  Since I live near my office I’m lucky to have a pretty easy commute.  But usually that involves a good amount of walking: my trip uptown is just one subway stop, and then going crosstown involves either a bus (luckily the M34 Select Bus is pretty reliable) or a schlep walk of several avenues.  Don’t get me wrong — walking is great exercise.  But if I could shorten the walk and save money, I’m all in.

According to DOT’s map [PDF], there’s a bike share kiosk proposed down the block from my apartment, and another one a block from my office.  Nice!  I could actually replace the subway/bus combo with a bike ride for a fraction of the cost.  But what about the rest of the Phase 1 area?  Are the kiosk locations designed to easily extend subway and bus trips for the “last mile”?

Here’s what I found: most of the proposed bikeshare locations are relatively close to subway entrances, and even more are closer to bus stops.  At least regarding the locations, the system seems right on track to meet its goals of facilitating New York’s commuter and tourist trips.

Here’s what I measured

The DOT bike share website displays the proposed kiosks on a Google Map.  But a separate URL lists the lat/lons of each site (in JSON format).  There are 414 bike share lat/lons at this URL (not the 420 that all the news accounts referenced), and one location has a lat/lon of zero (ID 12405), so I deleted it leaving me with 413 locations.  (I used Google Refine to convert the JSON file to CSV and imported it to ArcGIS to analyze the locations.)

But this data just shows the locations. It omits information about each site (such as “North side of East 47th Street near Madison Avenue “), and the number of bike “docks” at each proposed kiosk.  Separately, Brian Abelson wrote a script to access this information from DOT’s website, based on a URL that looks like this:

http://a841-tfpweb.nyc.gov/bikeshare/get_point_info?point=12127

(His R script is here: https://gist.github.com/2690803 .  With this data I was able to map the kiosks based on number of docks at each one; see map below.  Big thanks to Brian!)

Here’s an interactive version (thanks to cartoDB), and here are links if you’d like to download the file in GIS format:

Proximity to subways

Here’s the map of proposed kiosks in relation to the closest subway entrances (based on the latest data from MTA on subway entrances/exits); I used ArcGIS’s “Near” function to calculate the distance:

Here are the stats:

  • 89 locations (22%) between 14 and 250 feet (length of a typical Manhattan block);
  • 117 kiosks (28%) between 250 and 750 feet (the average distance between Manhattan avenues);
  • 97 kiosks (24%) between 750 and 1,320 ft (a quarter mile);
  • 89 kiosks (22%) between 1,320 and 2,640 ft (a half mile); and
  • 21 kiosks (5%) further than 2,640  feet.

(The percentages do not equal 100% due to rounding.)

Closest/furthest:

  • The proposed kiosk closest to a subway entrance is in lower Manhattan, on the west side of Greenwich St near Rector St (ID 12364), 14 feet from the Rector St entrance to the 1 train.
  • The kiosk furthest from a subway entrance is on Manhattan’s west side, in the Hudson River Greenway near West 40th Street (at the West Midtown Ferry Terminal; ID 12092), almost three-quarters of a mile (3,742 feet) from the 40th St entrance to the 42nd St/Port Authority Bus Terminal station.

In other words, half of the proposed kiosks are within an avenue of a subway entrance, one-quarter are within two avenues, and the rest are further away.

So I guess it depends on your level of optimism (glass half full or half empty), and/or how far you’re willing to walk between your destination and a bike rack to participate in the Citi Bike program.  But in general it seems that the proposed kiosks match the overall location patterns of the crowdsourced suggestions, and also support the goal of facilitating first/last mile transportation.

Proximity to buses

Here’s the map of proposed kiosks in relation to the closest bus stops (based on the latest data from MTA / ZIP file).  Note that I didn’t differentiate between local, limited, or express bus stops.  As with subway entrances, I used ArcGIS’s “Near” function to calculate the distance:

For bus riders, the bike share locations are even better suited than subway riders to help them go the last mile:

  • 55 proposed kiosks (13%) between 27 and 100 feet (less than a typical Manhattan block);
  • a whopping 199 kiosks (48%) between 100 and 250 feet (length of a typical block);
  • 139 kiosks (34%) between 250 and 750 ft  (typical distance between Manhattan avenues);
  • 16 kiosks (4%) between 750 and 1,320 ft (quarter mile); and
  • only 4 kiosks (1%) further than 1,320 ft — and none further than 1,652 feet away (about a third of a mile);

So for bus riders, almost two-thirds of the proposed kiosks are within a block of a bus stop, and almost all of them (95%) are within an avenue.  Pretty good odds that bus riders will have extremely convenient access to the Citi Bike program.

I was skeptical of the program at first (and I’m still a bit wary of so many more bikes on the road all of a sudden — I walk in fear when I cross a city street, because of cars and bikes).  But now that the Citi Bike program is moving closer to reality and the numbers look so good, I’m looking forward to trying it out.

NYC bikeshare maps & spatial analysis: an exploration of techniques

UPDATE (Feb. 2012)

  1. Reader Steve Vance suggests in the comments below that I could use Google Refine to parse the JSON file and convert it to Excel without relying on the tedious Microsoft Word editing process I summarize below.  He’s right.  Google Refine is amazing. It converted the JSON file to rows/columns in about a second.  And it has powerful editing/cleaning capabilities built-in.  Thanks Google!
  2. Alas, I had hoped to test Google Refine on the latest list of user-suggested bikeshare stations.  But when I checked in mid-February, the link at http://a841-tfpweb.nyc.gov/bikeshare/get_bikeshare_points no longer returns all the detailed info about each suggested site.  It only returns an ID and lat/lon for each site.  There’s another link I found that returns the details (http://a841-tfpweb.nyc.gov/bikeshare/get_point_info?point=1), but it seems to be just one at a time (change the “point=1” value).  Sigh.  If someone wanted to replicate what I’ve done with the latest data, perhaps either NYC DOT or OpenPlans could provide the file directly.

Original Post (Sept. 2011)

Two weeks ago New York City announced an ambitious bikeshare program, designed to provide 10,000 bikes at 600 bike-sharing stations in Manhattan and parts of Brooklyn by next summer.  I had two immediate thoughts:

  1. I wondered if all 10,000 new bikers will ride like delivery staff and further terrorize me and my pedestrian 5-year olds; and
  2. safe or not, the bike stations would be put somewhere, and maps can likely help figure out where.

I’m a cartographer, so I’ll focus on the second issue for the purpose of this blog post. My maps and analysis below don’t provide any definitive answers — they’re more of an exploration of spatial analysis techniques using the bikeshare data as an example.  I don’t know if this will be helpful to DOT, but if it is, then that’s great.  If not, hopefully at least they’ll be of interest to GIS and biking geeks alike.

NYC’s bikeshare stations: crowdsourcing suggestions

To help figure out where the bikeshare stations might be located, the city’s Dept of Transportation partnered with OpenPlans to provide an interactive map where anyone could suggest a location and provide a reason why they thought it was a good spot. If someone has already picked your favorite spot on the map, you can select that marker and click a “♥ Support Station!” button to register your approval.  Added up, these supporting clicks can provide a “rating” of how many people like each location.

It’s a great, easy to use app. Within just a few days several thousand people had posted their suggestions.  According to DOT,

As of September 20 at 3:30pm [just 6 days after the suggest-a-site went live], we have received 5,566 individual station nominations and 32,887 support clicks.

(via OpenPlans)

But the map looked overwhelmed! Manhattan was covered, as was most of downtown Brooklyn.  It seemed like almost everyone wanted a bikeshare station on their block.  New York Magazine put it this way:

As you can see in the map above, New Yorkers have spoken: The best spots for bike stations are … everywhere/wherever is right next to them.

I wondered how useful this crowdsourced data actually would be for identifying the best sites for bikesharing stations.  NYC DOT says it will be conducting “an intensive community process” to involve multiple stakeholders in helping decide where the 600 stations will go.  Presumably several factors will determine station locations, but it seemed like the crowdsourced data could play a key role — hopefully the website was more than just a PR ploy.

Given all those “dots on a map,” it seemed like a good opportunity to examine how spatial analysis tools could be used — first to see if the crowdsourced location patterns meant anything, but then to see if there’s any value to using them in siting analysis.  Had “the crowd” told us something new and useful, or was it something we already knew and would be better determined through DOT’s public process?

Spatial patterns

Luckily OpenPlans (and DOT) designed the suggest-a-station website so all those dots on the map could be scooped up via a simple HTTP request and converted to GIS format.  At the end of this post I describe how we got the data and put it into a mappable format.  Once we did, we were able to analyze it spatially.  I’ll post the shapefile, as well as a version at Google’s Fusion Tables, shortly.

A few days after the program was announced, DOT produced a “heat map” that “illustrated the number of suggestions and supports per square mile as of September 19” (map at right).

Our version of a “heat map” using the September 19 data (based on the results as of 9am that day) is shown below.  (Our map uses the same rating scale as the DOT map, but its slightly different patterns could be due to different model specifications to create the map.  We used ArcGIS’s “Kernel Density” function to develop our map — DOT may have used a different method. Even if we both used kernel estimation, this technique can result in different surface patterns based on different inputs such as cell size and search radius.)

But do these maps really tell us anything useful? Some people tweeted that the concentration of suggested bikeshare sites matched New York’s “hipster” population.  Others said that the patterns were “almost perfectly congruent with race/class/culture divides” in the city.

I disagree — I don’t think the suggested bikeshare patterns match any obvious demographic characteristics, whether it’s race/ethnicity or “hipsterism”.  (This may be worth pursuing further, but for now I leave that to others.)

I think a more likely relationship is based on where people work.  The orange-to-red areas on both maps — indicating a high concentration of suggested bikeshare sites with high ratings from website visitors — match the locations of the city’s commercial areas: Manhattan below 59th Street and downtown Brooklyn.

Another possibility, though, is that people who suggested bikeshare locations were just following DOT’s preferences – a spatial version of survey response bias.  In its bikeshare FAQ, DOT says that phase 1 of the program will focus on the following areas:

Manhattan’s Central Business District and nearby residential areas, including Brooklyn neighborhoods of DUMBO, Downtown, Fort Greene, Bedford-Stuyvesant, Williamsburg, Greenpoint and Park Slope

The NYC Planning Department produced a map of these areas in a Spring 2009 report [PDF] as follows:

Superimposing the Phase 1 area on the rating density map above shows that there’s almost an exact match between Phase 1 (outlined in dark pink) and the highest concentrations of suggested/supported sites (the dark orange and red areas on the map):

So based on these density maps (“heat maps”), it’s not clear if the overall patterns from these maps tell us anything interesting about the wisdom of the crowd, or useful about where to put bikesharing stations.

Digging deeper

But whether the overall patterns mean anything or not, maybe the suggested locations could be analyzed to see if they have value as criteria for local siting decisions. In other words, within the patterns, maybe we can use the crowd’s suggestions as a key piece of analytic information, providing quantifiable indicators about where the stations should go.

More than 2,700 bikeshare locations (as of Sept. 25) were suggested within the Phase 1 area — four and a half times the 600 sites that will eventually be sited.  Perhaps they covered every possible bikeshare site. But perhaps there’s also a pattern (or patterns) to the suggestions that will help with the decision to whittle 2.700 down to 600.

For simplicity’s sake I evaluated the suggested station locations against one criteria — proximity to subway station entrances.  Obviously there are other factors to examine (threshold bikeshare station density, proximity to specific residential or employment centers, terrain, etc).  But several people have noted that a bikeshare program can extend the reach of subways — transit riders could ride to a distant subway more easily, cheaply, and quickly than a bus or a cab, or when they reach their subway stop they could pick up a bike and ride to their final destination without the hassles of a cab, etc.  So my assumption is that proximity to subway stations will be a key factor in determining bikeshare station locations.

But how do the suggested locations from the DOT/OpenPlans map compare with that hypothesis?  Are the highly rated bikeshare sites near subway stops?  About 10% of the suggested sites included reasons that mentioned subways.  Did website visitors suggest enough bikeshare sites near subways to make it easier for DOT to pick and choose which ones are best?

(Btw, this same type of analysis can be applied to bike routes, for example.  I just wanted to focus on one component for now.)

Spatial analytics

I used several spatial analysis techniques available through ArcGIS’s toolbox to shed some light on these questions.  The tools are powerful, and ESRI has made them easy to use and interpret.  The tools also underscore the power of GIS beyond making maps — extracting information based on the spatial relationships of multiple geo-referenced data sets.

In order to compare suggested bikeshare sites with subway stations, I used the file of subway entrances/exits available from MTA (current as of July 19, 2011).  The file provides the latitude/longitude of 1,866 entrances and exits, identifies the station name for each one, and lists the routes that serve these stations.  It provides a more precise spatial measure of access to the subways than a single point representing the center of each station (which is how stations are shown on most interactive and print subway maps).

With this file, we can determine how close each suggested bikeshare site is to the actual spots where people exit and enter the subway system.

To calculate proximity, I used the “Near” feature in the ArcGIS Toolbox, which “[d]etermines the distance from each feature in the input features to the nearest feature in the near features.”  I analyzed 5,587 suggested bikeshare sites based on the DOT/OpenPlans map as of Sept. 25 (see data discussion at the end of this post). Here are some statistics:

  • 92 sites were within 25 feet of a subway entrance;
  • fully one-third (1,954 suggested sites) were between 25 and 500 feet of a subway entrance (the length from one Manhattan avenue to the next is usually about 600 feet);
  • another quarter of the sites (1,677) were within 500 and 1,250 feet (1250 ft being roughly a quarter mile, the rule-of-thumb distance that people will walk for public transportation); and
  • the remaining 2,134 were more than a quarter mile from a subway entrance.

Seems like lots of bikeshare stations were suggested in close proximity to subway entrances. If the actual bikeshare sites will be near subways entrances, which entrances should we pick?

(An aside: since just over a third of suggested bikeshare sites were located relatively far away from subway entrances, we can also evaluate these patterns.  The hypothesis would be that if people are picking up bikes at subway stations, they’re using them to travel to destinations further away from subway stops.  Therefore some of the bikeshare sites will need to be located in these “destination” areas, and DOT will need some spatial criteria for locating them.  I’ll save this for a follow up blog post.  Thanks to Kristen Grady for suggesting it.)

One way to visualize the bikeshare/subway entrance relationships is with the following map, showing the subway stations in blue (just a center-point representing the middle of the station) and the bikeshare sites color-coded by proximity (I’ve limited the display of bikeshare sites to only those within 500 feet of a subway entrance so the map wasn’t too cluttered):

This map might be helpful, but you have to visually decide which clusters of close-by bikeshare sites are the most concentrated in order to prioritize which subway stations to focus on.  The map also omits the rating values.

If we incorporate ratings, the map below is an example of the result.  It only shows bikeshare sites very close to subway entrances — within 50 feet — and ranks the symbol size based on rating.  (We could just as easily pick another distance threshold, or display several maps each using a different distance threshold.)

This helps us focus on which subway stations might be best for a nearby bikeshare station, based on suggested bikeshare sites nearby that are ranked highest.

But we can use GIS to be more precise.  Another approach would be to visualize the pattern of the subway entrances themselves, based on average rating of each entrance’s closest bikeshare sites.  In other words, I’d like to use the ratings given to each suggested bikeshare site and assign those ratings to their closest subway entrances.  This will have the effect of combining subway proximity with bikeshare rating, and the resulting map will integrate these patterns.

Here’s an example of the result, with the rated subway entrances juxtaposed with the density map of rated bikeshare sites from earlier in this post:

This map says, “If you want to put bikeshare stations near subway entrances, these are the entrances you’d pick based on the average rating of the closest stations suggested by ‘the crowd’.”  It’s a way of prioritizing the bikeshare station siting process.  These subway entrances are the ones you’d likely start with, based on the preferences of the (bike)riding public who contributed to the DOT/OpenPlans map.

It looks like many subway entrances follow the overall pattern of bikeshare sites with the highest ratings. But there are some interesting differences in the above map. A couple of sites are completely outside the Phase 1 area (an outlier each in the Bronx and Queens), and only two subway entrances with average high ratings are in Brooklyn. The rest are in lower Manhattan. But only one of the Manhattan sites is near the highest rated area centered around NYU:

Here’s another view of this area, with the rated subway entrances overlain on a Bing street map:

In order to create the rated subway entrance map, I used the Voronoi polygon technique, also know as Thiessen polygons (Voronoi was a Russian mathemetician, Thiessen was an American meterologist.)  Voronoi polygons are enclosed areas surrounding each point (subway entrance) so all the other locations (in this case, bikeshare sites) within the polygon are closest to the enclosed subway entrance than any other entrance.  The subway entrance Voronoi polygons look like this:

Here’s a close up, with the subway entrances displayed as pink stars, and the suggested bikeshare stations as blue dots:

The blue dots (bikeshare sites) within a polygon are closer to that particular polygon’s subway entrance than any other entrance in the city. Other GIS techniques, such as creating a buffer around each subway entrance, or even using the “Near” calculations I described earlier in this post, wouldn’t precisely determine the closest criteria for all the points automatically and at once.

The other nice thing about creating Voronoi polygons is that the attributes of the reference points are transferred to the polygons (the polygons end up with more than just a random ID number; in this case, they include all the corresponding subway entrance attributes).  From there I did a spatial join in ArcGIS, joining the bikeshare sites to the polygons.  This automatically calculates the count of all points in each polygon, as well as statistics such as average and sum for any numeric attributes in the point file.  In this case, each subway entrance Voronoi polygon gets a count of the bikeshare sites within it (i.e., the ones that are closest to that entrance) as well as the summed rating and average rating.

From there we could create a choropleth map of the Voronoi polygons. But since we’re interested in the entrance locations rather than an aggregated area around them, I chose to create a graduated symbol map of the actual subway entrances. So I did an attribute join between the Voronoi polygons and the entrances using the shapefile ID field.  That enabled me to make the “Average rating by subway entrance” map above.

Limitations

One limitation to the Voronoi approach is that closeness is measured “as the crow flies.” There are other techniques that measure proximity using “Manhattan distance” (i.e., distance along streets rather than a straight line), such as ESRI’s Network Analyst extension for ArcGIS, but I’ll leave that to the DOT analysts who are going to decide on the actual bike share sites.

Other limitations of this approach have to do with the data themselves.  The bikeshare data from the DOT/OpenPlans website has issues such as:

  • entries accompanied by fictitious names (some examples from the Sept. 25 data include “Andy Warhol”, “George Costanza”, “Holden Caulfield”, “Lady Liberty”, and “United States”.  One or more people using the “United States” pseudonym submitted 51 entries throughout Manhattan, Brooklyn, and Queens, plus a single entry in the Bronx); and
  • multiple entries submitted by a single person. Someone – or some people – named Ryan submitted 143 entries.  Someone named Andrew Watanabe submitted 85 entries. Ryan and Andrew were the top two submitters.  After them and “United States”, there were 4 others who submitted 40 or more entries. It’s possible that these were all sincere. But some seem to be pretty goofy. Of Watanabe’s 85 suggested sites, for example, several included the following reasons:
    • “When whales accidentally swim into the Gowanus, they will be able to ride bike share bikes back out to sea.” (site on the Gowanus Canal)
    • “This will keep drunk booksellers from passing out on the sidewalk.” (site near the Bedford Ave L train stop in Williamsburg)
    • “When the zombie apocalypse comes, they will be riding bicycles. BRAAAAAINS!” (site in the middle of Mt. Laurel Cemetery in Queens)

Multiple entries might be fine, but if someone started plunking down markers on the map just for fun, this doesn’t really help us with meaningful location criteria.

There’s another concern about the crowdsourced data – the squeaky wheel problem.  The first map below shows the bikeshare suggestion pattern as of September 19; the second map below shows the patterns as of September 25.  The more recent map shows a new concentration of sites at the northern tip of Roosevelt Island (as well as a greater concentration in lower Manhattan and downtown Brooklyn, areas that already were very dense):

 

Sept. 19 patterns

 

Sept. 25 patterns

Why did northern Roosevelt Island all of a sudden become such a bikeshare hotspot?  I can’t say for certain.  But in a blog post on September 14 at the Roosevelt Islander, residents were urged to add sites to the DOT/OpenPlans map.  The post ended with the pitch:

So here’s what you can do to bring bike sharing to Roosevelt Island. Click on this link and say you want a bike sharing station on Roosevelt Island – do it now – please [emphasis added]

I don’t think making a pitch like this is a bad thing. (Far from it! It seems to have succeeded in getting attention on bikeshare sites on the island).  But whoever will be analyzing the sites from the DOT/OpenPlans map will need to decide if (and how) they should discount these crowdsourced lobbying efforts so the squeaky wheels don’t skew the map.

Making sense of it all

My analysis in this post is more for illustration than for actually determining best locations for bikeshare stations. A more rigorous analysis would need to deal with the data limitations I mentioned above, and also factor in other criteria.

But it was a fun exploration of the data and the techniques, and hopefully provides some useful ideas if readers are thinking of other spatial analysis projects involving proximity (especially the “closest” criteria).  I’m indebted to DOT & OpenPlans for enabling the creation of an interesting data set — the suggested bikeshare sites — for me to brush up on my spatial analysis skills.

Does my initial exploration shed any light on the wisdom of the crowd? It’s probably too early to tell (or my analysis was too limited to meaningfully evaluate the suggested sites).  But even so, I think the techniques I’ve described are helpful for prioritizing sites and for quantifying the results.  In that respect, the crowd’s input is a good thing.

Data issues, as always

Here are the steps we used to download the suggested bikeshare sites from the DOT/OpenPlans website in order to map and analyze the data:

  1. We used Fiddler to figure out that the suggested station locations were being maintained in a text file (in JSON format) available via http://a841-tfpweb.nyc.gov/bikeshare/get_bikeshare_points (Dave Burgoon ferreted this out).
  2. The JSON data looks like this:
{
"id":"4830",
"lat":"40.742031",
"lon":"-73.777397",
"neighborhood":"Fresh Meadows",
"user_name":"David",
"user_avatar_url":"",
"user_zip":"11355",
"reason":"There is no public transportation from Brooklyn-Queens greenway (Underhill Ave) to Flushing Meadows. By placing bike stations from Cunningham Park thru Kissena Park to Flushing Meadows will allow residents enjoy the parks more.",
"ck_rating_up":"1",
"voted":false
}

I don’t know of a straightforward way to read a JSON file into a desktop GIS package, so I needed to restructure the file into rows & columns.  I chose to do that with a series of Find/Replace statements in MS Word (perhaps there’s a better/more efficient way, but this approach worked), then added a row of field names, and saved the result as a .TXT file (one row of which is shown below):

id,lat,lon,neighborho,username,avatar,zipcode,reason,rating,voted
4830,40.742031,-73.777397,Fresh Meadows,David,,11355,There is no public transportation from Brooklyn-Queens greenway (Underhill Ave) to Flushing Meadows. By placing bike stations from Cunningham Park thru Kissena Park to Flushing Meadows will allow residents enjoy the parks more.,1,FALSE
  1. We’re primarily an ESRI shop at the CUNY Center for Urban Research (with periodic forays into open source, as well as a longstanding reliance on MapInfo for some key tasks).  So my next step was to convert this to a shapefile — which I did by using ArcGIS’s “Display X/Y Points” tool to create a point file based on the lat/lon values.
  2. Just in case there were multiple points at the same location, I ran the ArcGIS script called “Collect Events“, which aggregates point data based on location, and creates a new shapefile of each unique location with a count of all the points at each location.
    • I downloaded the JSON file a couple of times between Sept 19 and 25.  In the latest one (September 25, downloaded at 11pm) there were 55 points at latitude 40.7259, longitude -73.99 (a location at the intersection of E. 3rd Street and Second Ave in Manhattan).  But the user-supplied ZIP Codes and comments for most of these points indicated that they should have been all over the city.
    • Turns out this location is the center point of the Google map that’s displayed at the DOT bikeshare website.  If you zoom in on the DOT/OpenPlans map you’ll bring the map center into close view — and you can see the heavy map marker shadow due to all the points placed at that spot:

    • Presumably what happened here is that when you click the “Suggest Station” button, a marker is put at this spot by default. The marker is accompanied by a note that says DRAG ME! Then click ‘Confirm Station.’  But I’m assuming that 55 people didn’t drag the marker, but just left it there after they had entered their information. (I guess that’s not too bad — only 1 percent of the people using the site didn’t follow directions.)
    • Earlier this week (9/27) it looks like these sites were removed from the live map.  For my purposes, I removed those points from the shapefile, otherwise it would skew the analysis.  I could have put them somewhere in the ZIP Code that was entered with each spot, but I couldn’t be sure of the precise location (the reasons were vague regarding location), and I didn’t want to skew the analysis the other way.
  1. Other data notes:
    • There were 8 locations outside the immediate New York City area – some as far away as Montreal and Portland, Oregon.
    • The reason provided for Portland location was: “Even though it’s a whole continent from NYC it always seems to me like our cultures admire one another. I think NYC would enjoy all the benefits of positioning one of their Bike Share stations in Portland as sign of goodwill and mutual admiration.”
  1. There were also 53 points with lat/lon = 0, which I assumed was just a data entry/processing error.

Out of the 5,973 points as of 11pm September 25, after I removed the 55 locations and zoomed in on the points in or immediately near New York City (and omitting the 8 outside the city and the 53 with lat/lon=0), I ended up with 5,857 points.

GIS and Census participation

It’s been too long since my last blog post. Have been quite busy with work, and even though Twitter is a microblogging service, sending a tweet now and then really isn’t an excuse to keep up my actual blog.

One of the projects keeping me (very) busy is our work to help boost participation in the 2010 Census. I thought I’d write about some of our interactive mapping and participation rate analysis along these lines.

In August I described how our team at CUNY’s Center for Urban Research was creating metro-scale maps showing where hard-to-count communities were located so local census advocates could target their outreach. Then in late January we launched our interactive version of those maps at www.CensusHardToCountMaps.org. Originally we designed the site to show hard-to-count areas, but these are only where it was expected there’d be low census participation. Then, on March 23, the Census Bureau started publishing the actual participation rates on a local and national basis. So a week later we updated our site to emphasize the latest participation rates (this link zooms in to Manhattan showing tract-level participation overlaid on a map of hard-to-count tracts).

Though the Census Bureau’s Take 10 map (and a related Google Earth application) display the daily participation rates nationwide, we decided to provide several features that the Census Bureau doesn’t. At our site you can:

  • type in a county and highlight the tracts below a certain participation rate (you can enter whatever rate you want);
  • sort the resulting list so you can see at a glance the highest and lowest performing tracts (this also will be highlighted on the map so you can see how concentrated they are); and
  • compare the 2010 rate map with the 2000 rate map (click the “More…” tab and check the box for “Participation Rate in 2000″).

(Of course, you can also click on any spot on the map to display the latest participation rate for that area — state, county, or tract — depending on how close in or out you’ve zoomed.)

These are the types of data analysis and spatial visualization tools that were requested by census advocates, so they can use the maps to focus on areas that need their help the most.

In order to provide some context for the interactive map, our Center also posted an analysis of the first week’s participation rate. It was a combination of basic statistical analysis and mapping. We examined the correlation between participation rate and hard-to-count scores at the tract level nationwide, and not surprisingly found that rates tended to be lower in hard to count areas. This should help bolster the work of groups who’ve been working in these communities, confirming that they’re focused on areas that need support the most if we want to achieve a 100% count.

We also examined county-level statistics on race/ethnicity using the Census Bureau’s latest population estimates from 2008. (The American Community Survey would provide a richer set of characteristics to examine, but any data from areas with less than 20,000 population are suppressed in the ACS — and this accounts for about 1,300 of the nation’s 3,200 counties.)

The county-level data indicate that race/ethnicity is strongly correlated with census participation (at least in the first week), with participation rates tending to be higher in counties with a greater percentage of whites while the rates tended to be lower in counties with a greater percentage of blacks and Latinos. Because we didn’t have other socio-economic data to evaluate, we weren’t able to disentangle the effects of other characteristics such as low educational attainment, poverty, housing conditions, etc. that may have a stronger correlation while cutting across racial and ethnic categories. An opportunity for further research. As a next step we may also examine county-level unemployment rates and participation rates, as well as evaluating how well the first week’s analysis holds up as time goes on.

Friday (April 2) we added another feature, information about the areas that will receive a second census questionnaire.  (The Funders Census Initiative sent out a news advisory highlighting this service on April 5.)  Now when you click on the www.CensusHardToCountMaps.org map or type in your street address, you’ll see a popup window that (among other things) tells you if households in your tract will be receiving replacement Census forms. We think this will help minimize confusion over people receiving another census form (even if they’ve already mailed their’s in!). This is a “just in case” thing from the Bureau — mailing another form to households in historically low response areas, and mailing another form to households in moderately low response areas who haven’t yet sent their’s in. But the geographic scope of the “blanket” and “target” replacement mailing areas are pretty extensive in most cities (see maps at CUR’s website), so lots of people may be confused. Our mapping site provides a simple way of clearing the air.

We’ve also mapped those second mailing areas. When you visit www.CensusHardToCountMaps.org, select the “More…” tab and zoom in to your area of interest. For example, here’s Boston, MA. Click either or both check boxes in the “April 2010 Replacement Questionnaires” section to map the tracts receiving replacement census forms.

Our hard-to-count mapping site still has its original functionality — such as visualizing the demographic characteristics that will make it difficult to achieve a complete Census count; overlaying ZIP Codes, mail return rates from 2000, recent foreclosure risks by tract; and seeing who’s tweeting about the Census in your area.

But we’re also planning for next steps, thinking of the mapping application as a platform for future Census-related efforts (tracking how successful census advocates were, displaying the 2010 results, enabling the general public to get involved in a meaningful way in the redistricting process). Any ideas? We’d love to hear them.

Manhattan’s daytime population: map source found!

This post starts with an exploration of what happens when bloggers don’t source their material (in this case, maps), but ends with a cool discovery of a resource for all you dasymetric mappers out there (you know who you are).

Earlier this month my wife forwarded me a link to a Gothamist Map of the Day — see below.

newyorkdaynight

The map is interesting (though somewhat problematic) in several ways.  But what struck me about it is that: 1) I definitely had seen it before, but 2) there was no attribution at Gothamist – no indication of what time period the map covered, who made it, the source of the data, etc.  So, a cool map, but zero context and no way to verify it or really understand what the map was telling us.

Sure there was a link to the “source” — but this just took me to Buzzfeed, where someone had posted the map, also with zero attribution or any other context.  (One of the commenters at Buzzfeed says that “this is the most frustrating thing about buzzfeed— they fail to attribute a lot of stuff properly.”  Someone responded to this commenter noting that “it is annoying sometimes [providing no attribution], especially with stuff like this chart which is supposedly based on some facts” (my emphasis).

Another commenter at Buzzfeed has a link to doobybrain.com which claimed the map (or, “infographic” as doobybrain described it) was from a 2007 issue of Time Magazine, though the source for the doobybrain item was another blog post.  (In the world of web anonymity, the doobybrain source is misterstarfish.typepad.com from haj718(at)mac(dot)com .)  Alas, misterstarfish/haj718 offers no attribution or context either.

I looked around a bit on Google and Bing to see what I could find, but didn’t turn up any other useful references (I found a few other blogs and graphics sites that had re-posted the map, but no details).  I even commented on the Buzzfeed piece, but no one responded with more info.

Then I happened to have a conversation last week with the director of Urban Omnibus, an online project facilitating a conversation about New York’s architecture, planning, development, and all things urban.  He mentioned in passing the “map” tag on his site, I clicked it, and struck gold.  Earlier this year (March 2009), Urban Omnibus published an interview with Joe Lertola titled “Let’s Talk About Maps 2” (the first installment of Let’s Talk About … being an intro to the column).  The interview highlighted Lertola’s work at several publications, including Time Magazine, and of course included the “Day and Night Population” map along with a brief description from Joe.  Mystery solved! … mostly.

I needed to visit Lertola’s website directly to find out more, and it turns out he did create this map for Time in 2007 (the Nov. 26, 2007 issue, to be exact – click the “City Population Shift” tab).  But it was part of an overall layout that highlighted portions of several cities across the US, and the “NYC” graphic was just an inset of a larger graphic — which is why it only focuses on lower Manhattan (not all of New York City, as several blog commenters pointed out).

But even Lertola’s website and Time itself didn’t provide more precise details on the data source.  The original Time piece lists the sources as:

Census Bureau, Bureau of Labor Statistics; Texas Transportation Institute; Oak Ridge National Laboratory/UT-Battelle LLC.

Lertola’s website goes a bit further, noting that:

the Geographic Information Science and Technology group at Oak Ridge National Laboratory has developed LandScan USA, the most detailed population model available.  By integrating Census data with extensive information on other daily activities, LandScan can predict the population of any U.S. location at any time of day.

Aha, a searchable term – LandScan!  It didn’t tell me data vintage or anything like that, but at least I can go to the source.  And the results are intriguing.  The LandScan website states that

LandScan USA is more spatially refined than the resolution of block-level census data and includes demographic attributes (age, sex, race). The model includes development of an “ambient population” (average over 24 hours) for global LandScan and development of spatial distributions for “residential or nighttime population” as well as for “daytime population” as part of LandScan USA. Locating daytime populations requires not only census data, but data on places of work, journey to work, and other mobility factors. The combination of both residential and daytime populations will provide the best estimate of who is potentially exposed to ambient pollutants.

In other words, the data claims to address two interesting GIS issues related to demographic analysis: dasymetric mapping (modeling population patterns for smaller areas than typical Census geography, which allocates population across an entire tract, say, regardless of where people actually live within that tract) and daytime population (Census population data correspond spatially to where people live — i.e., the population at night — rather than where people work or shop — i.e., the population during the day).  Each of these issues is compelling for a variety of policy areas, spatial analysis theory and practice, and creating cool maps.

Needless to say I’ll be emailing the keepers of the data (Oak Ridge National Lab at LandScanTechnical@ornl.gov) to find out more.  If anyone has any other leads, please let me know.