To read about our Oakland’s 100 Block Plans results and why we published the study read this article. Below is our methodology for our research.

The Data

To build our model of the 100 most violent blocks in Oakland, we used ArcGIS (a desktop geographic information system) to determine the intensity of crime for every city block. With crime report data, however, there are some key assumptions and limitations to bear in mind:

  • Crime reports provided by Oakland Police reflect only a portion of the actual crimes in Oakland;an unknown amount of crimes are not reported.
  • Recorded locations of reported crimes are typically between 70 – 80 percent accurate.
    • No way to know which side of the street nor which end of a block,e.g. 1800 block Grand Ave
    • No precise location recorded, e.g. across the street from Bank of America ATM

Because the City has not publicly released what years their model was based on, we have developed our model using both 2011 and the five-year period from 2007-11 for comparative purposes. What was clear in the City’s plan is that the two crime types used are homicides and shootings. For our model we have used the following data sets:

  • Homicide & shooting reports from 2007-2011 & 2011 (crime reports, geocoded and cleaned by Urban Strategies Council)
  • Filtered shootings at individuals
  • PC245 A2, PC245 B, PC245 D, PC664/187 A, and PC245 C (all shooting crimes)
  • US Census 2010 Blocks

Because crime report locations are imprecise at the best of times (until our police force zaps a precise GPS location for every event), simply calculating a sum of all the crimes on each city block will result in all kinds of erroneous, unreliable results. Crimes that are wrongly listed as 2500 Broadway will be assigned to the even numbered block on the right side of the street, whereas the crime may well have occurred out front of 2515 Broadway.

Building a Realistic Model

To deal with the problems of variable quality address data and the issues arising from wrongly assigning a crime to the wrong city block, we instead chose to use a model that incorporates all crime within a reasonable distance of each city block. We used census blocks for our model; these polygons are formed along the street centerlines of every city block and are a useful boundary type for analysis such as this. We calculated a buffer zone for every city block, using a radius of 292 feet around each block (this radius was determined by averaging the width of a sample of typical city blocks). This buffer ensures that all crimes that are located across the street from a particular block, and those that occur along the side street adjoining each block get included in the score for that original block. This model assumes that crimes happening within view of the corner of a block contribute to the violence perceived on that block and should influence it’s score.

This map shows an example of the buffer used to capture crimes in each block to compensate for inaccuracies in the crime reporting process.



(The rectangular red polygon is the 292 foot buffer and the blue polygon is the size of the actual census block)

This method does count each crime multiple times and as a result the total scores are somewhat arbitrary yet result in a reliable score indicating the associated violence for each city block. This type of model is often referred to as smoothing. In this case we are smoothing out errors in the original data and in problem areas influenced by violent crime.

The Process

1. First we created the buffer layer- every single census block received a buffer of 292 feet.

2. We counted every single shooting and homicide crime within these buffered polygons (using just 2011 data and also the five year 2007-2011 data) using a spatial join.

3. We rank sorted the counts of homicides & shootings & selected the top 100 blocks with the highest counts of homicides & shootings (for both the single year and multi year data).

4. When we rank sorted the 2011 data set we noticed that in the top 100 blocks we had a lot of blocks with a homicide/shooting count of 6, which was the lowest count included in the ranking. Including ALL the blocks with 6 counts, there were 131 blocks. When we removed the blocks with 6 counts, there were 72 blocks. To adjust for this problem we added a second ranking of all shooting types (based on the actual block boundary and NOT the buffer this time) and then selected the top 100.

5. After adding the second ranking of all shooting types we removed the census blocks with 6 counts and 0 counts of all shooting types (based on the discrete blocks not the buffers). This left us with 100 blocks for the 2011 data set.

Penal Codes Used:


Download the following files used in our research & analysis:


The prior download link contains the following:

  • Our model- Python script
  • Our model- description
  • Our model- model layout image
  • Assault reports
  • Homicide & Shooting reports
  • Metadata file
  • List of all Census Blocks with buffer scores for 2011
  • List of all Census Blocks with buffer scores for 2007-2011
  • Shapefile of all Oakland Census Blocks with crime buffer counts for 2011
  • Shapefile of all Oakland Census Blocks with crime buffer counts for 2007-2011
  • Final 100 Blocks Shapefile for 2011
  • Final 100 Blocks Shapefile for 2007-2011
  • Plain text description of the 100 blocks using bounding streets for 2011
  • Plain text description of the 100 blocks using bounding streets for 2007-2011


Get Data in Alternate Formats or Through an API: