This video presents our analysis of the NYPD Stop-and-Frisk dataset, focusing on fairness, data quality, and predictive modeling. We examine arrest outcomes across demographic groups, clean and standardize the 2017 dataset, and build a baseline logistic regression model supported by chi-square tests and Cramer’s V. Our goal is to understand whether observed disparities reflect meaningful differences in outcomes once the data is corrected and analyzed using statistical methods.
Team Members:
Kayla Johnson
Maria Thomas
Tools Used:
Python (Pandas, NumPy, scikit-learn), Google Colab, Matplotlib, SciPy
Topics Covered:
*Data cleaning and schema alignment
*Exploratory analysis (arrest rates by race and sex)
*Logistic regression performance
*Fairness evaluation
*Chi-Square and Cramer’s V
*Interpretation, limitations, and future work
*This presentation is part of the final deliverables for CS 620 at Old Dominion University.
References
Data Source
New York City Police Department. (2012–2022). Stop, Question and Frisk Database.
Retrieved from https://www.nyc.gov/site/nypd/stats/report...
Educational and Conceptual Resources
Skew the Script. (n.d.). 9.2 – The Chi-Square Test for Independence.
Retrieved from https://skewthescript.org/9-2
Skew the Script. (2020, February 10). Chi-Square Test for Independence — Skew the Script AP Stats Unit 9 Lesson 2. [Video]. YouTube.
• AP Stats 9.2 - Chi-Square Tests for Two-Wa...
(Used for conceptual guidance on Chi-Square testing and as inspiration for framing fairness-based data analysis in educational settings.)
Statistical and Computational Tools
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011).
Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
McKinney, W. (2010). Data Structures for Statistical Computing in Python.
Proceedings of the 9th Python in Science Conference, 51–56. (Pandas Library)
Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment.
Computing in Science & Engineering, 9(3), 90–95.
Analytical Framework
Cramér, H. (1946). Mathematical Methods of Statistics. Princeton University Press.
(Original source of the Cramer’s V association measure.)
Информация по комментариям в разработке