{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Replicating Propublica's COMPAS Audit" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Why COMPAS?\n", "\n", "Propublica started the COMPAS Debate with the article [Machine Bias](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencin). With their article, they also released details of their methodology and their [data and code](https://github.com/propublica/compas-analysis). This presents a real data set that can be used for research on how data is used in a criminal justice setting without researchers having to perform their own requests for information, so it has been used and reused a lot of times.\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import scipy\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import itertools\n", "from sklearn.metrics import roc_curve\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "tags": [] }, "outputs": [], "source": [ "propublica_data_url = 'https://github.com/propublica/compas-analysis/raw/master/compas-scores-two-years.csv'\n", "df_pp = pd.read_csv(propublica_data_url,\n", " header=0).set_index('id')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namefirstlastcompas_screening_datesexdobageage_catracejuv_fel_count...v_decile_scorev_score_textv_screening_datein_custodyout_custodypriors_count.1startendeventtwo_year_recid
id
1miguel hernandezmiguelhernandez2013-08-14Male1947-04-1869Greater than 45Other0...1Low2013-08-142014-07-072014-07-140032700
3kevon dixonkevondixon2013-01-27Male1982-01-223425 - 45African-American0...1Low2013-01-272013-01-262013-02-050915911
4ed philoedphilo2013-04-14Male1991-05-1424Less than 25African-American0...3Low2013-04-142013-06-162013-06-16406301
5marcu brownmarcubrown2013-01-13Male1993-01-2123Less than 25African-American0...6Medium2013-01-13NaNNaN10117400
6bouthy pierrelouisbouthypierrelouis2013-03-26Male1973-01-224325 - 45Other0...1Low2013-03-26NaNNaN20110200
\n", "

5 rows × 52 columns

\n", "
" ], "text/plain": [ " name first last compas_screening_date sex \\\n", "id \n", "1 miguel hernandez miguel hernandez 2013-08-14 Male \n", "3 kevon dixon kevon dixon 2013-01-27 Male \n", "4 ed philo ed philo 2013-04-14 Male \n", "5 marcu brown marcu brown 2013-01-13 Male \n", "6 bouthy pierrelouis bouthy pierrelouis 2013-03-26 Male \n", "\n", " dob age age_cat race juv_fel_count ... \\\n", "id ... \n", "1 1947-04-18 69 Greater than 45 Other 0 ... \n", "3 1982-01-22 34 25 - 45 African-American 0 ... \n", "4 1991-05-14 24 Less than 25 African-American 0 ... \n", "5 1993-01-21 23 Less than 25 African-American 0 ... \n", "6 1973-01-22 43 25 - 45 Other 0 ... \n", "\n", " v_decile_score v_score_text v_screening_date in_custody out_custody \\\n", "id \n", "1 1 Low 2013-08-14 2014-07-07 2014-07-14 \n", "3 1 Low 2013-01-27 2013-01-26 2013-02-05 \n", "4 3 Low 2013-04-14 2013-06-16 2013-06-16 \n", "5 6 Medium 2013-01-13 NaN NaN \n", "6 1 Low 2013-03-26 NaN NaN \n", "\n", " priors_count.1 start end event two_year_recid \n", "id \n", "1 0 0 327 0 0 \n", "3 0 9 159 1 1 \n", "4 4 0 63 0 1 \n", "5 1 0 1174 0 0 \n", "6 2 0 1102 0 0 \n", "\n", "[5 rows x 52 columns]" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_pp.head() # fist 5 rows \n", "# THIS is a comment, english in the code" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namefirstlastcompas_screening_datesexdobageage_catracejuv_fel_count...v_decile_scorev_score_textv_screening_datein_custodyout_custodypriors_count.1startendeventtwo_year_recid
id
10996steven butlerstevenbutler2013-11-23Male1992-07-1723Less than 25African-American0...5Medium2013-11-232013-11-222013-11-240186000
10997malcolm simmonsmalcolmsimmons2014-02-01Male1993-03-2523Less than 25African-American0...5Medium2014-02-012014-01-312014-02-020179000
10999winston gregorywinstongregory2014-01-14Male1958-10-0157Greater than 45Other0...1Low2014-01-142014-01-132014-01-140080800
11000farrah jeanfarrahjean2014-03-09Female1982-11-173325 - 45African-American0...2Low2014-03-092014-03-082014-03-093075400
11001florencia sanmartinflorenciasanmartin2014-06-30Female1992-12-1823Less than 25Hispanic0...4Low2014-06-302015-03-152015-03-152025801
\n", "

5 rows × 52 columns

\n", "
" ], "text/plain": [ " name first last compas_screening_date \\\n", "id \n", "10996 steven butler steven butler 2013-11-23 \n", "10997 malcolm simmons malcolm simmons 2014-02-01 \n", "10999 winston gregory winston gregory 2014-01-14 \n", "11000 farrah jean farrah jean 2014-03-09 \n", "11001 florencia sanmartin florencia sanmartin 2014-06-30 \n", "\n", " sex dob age age_cat race \\\n", "id \n", "10996 Male 1992-07-17 23 Less than 25 African-American \n", "10997 Male 1993-03-25 23 Less than 25 African-American \n", "10999 Male 1958-10-01 57 Greater than 45 Other \n", "11000 Female 1982-11-17 33 25 - 45 African-American \n", "11001 Female 1992-12-18 23 Less than 25 Hispanic \n", "\n", " juv_fel_count ... v_decile_score v_score_text v_screening_date \\\n", "id ... \n", "10996 0 ... 5 Medium 2013-11-23 \n", "10997 0 ... 5 Medium 2014-02-01 \n", "10999 0 ... 1 Low 2014-01-14 \n", "11000 0 ... 2 Low 2014-03-09 \n", "11001 0 ... 4 Low 2014-06-30 \n", "\n", " in_custody out_custody priors_count.1 start end event two_year_recid \n", "id \n", "10996 2013-11-22 2013-11-24 0 1 860 0 0 \n", "10997 2014-01-31 2014-02-02 0 1 790 0 0 \n", "10999 2014-01-13 2014-01-14 0 0 808 0 0 \n", "11000 2014-03-08 2014-03-09 3 0 754 0 0 \n", "11001 2015-03-15 2015-03-15 2 0 258 0 1 \n", "\n", "[5 rows x 52 columns]" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_pp.tail() # bottom 5 rows \n", "# put anything here bro" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## WE can GET HELP from holding SHIFT + Tab inside perenthesis (in code)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(7214, 52)" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_pp.shape\n" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_pp.head" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Data Cleaning" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "clean_data_url = 'https://raw.githubusercontent.com/ml4sts/outreach-compas/main/data/compas_c.csv'" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "df= pd.read_csv(clean_data_url,\n", " header= 0).set_index('id')\n", "# *changed the \"Propublicia to \"clean\", that changed the URL...to what I need" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(7214, 52)" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_pp.shape\n", "# first graph" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(5278, 14)" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape\n", "#second graph shape\n", "\n", "# ^ look :0 the second graph is smaller = less people in second graph" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "race_counts = df['race'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* `age`: defendant's age\n", "* `c_charge_degree`: degree charged (Misdemeanor of Felony)\n", "* `race`: defendant's race\n", "* `age_cat`: defendant's age quantized in \"less than 25\", \"25-45\", or \"over 45\"\n", "* `score_text`: COMPAS score: 'low'(1 to 5), 'medium' (5 to 7), and 'high' (8 to 10).\n", "* `sex`: defendant's gender\n", "* `priors_count`: number of prior charges\n", "* `days_b_screening_arrest`: number of days between charge date and arrest where defendant was screened for compas score\n", "* `decile_score`: COMPAS score from 1 to 10 (low risk to high risk)\n", "* `is_recid`: if the defendant recidivized\n", "* `two_year_recid`: if the defendant within two years\n", "* `c_jail_in`: date defendant was imprisoned\n", "* `c_jail_out`: date defendant was released from jail\n", "* `length_of_stay`: length of jail stay" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "African-American 3175\n", "Caucasian 2103\n", "Name: race, dtype: int64" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "race_counts\n", "# press tab when typing variable to auto-complete" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "race_counts.plot(kind= 'pie')\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 4 }