Auditing Algorithms Lab | DSC 290 | UC San Diego | Fall 2021
Details
- Class: DSC 290
- Instructor: R. Stuart Geiger
- Time: 5-6:50pm on Tuesdays
- Place: SOLIS 109
- Units: 2
Description
This seminar is for students interested in empirically investigating the outputs of real-world algorithmic systems of all kinds, particularly those where the code and/or training data are not publicly available. The first few weeks of the class will include more readings and lectures, when we cover the history of auditing and legal/ethical issues it raises. This includes studying classic audits of non-algorithmic decision systems (e.g. equal opportunity hiring investigations) to contemporary issues around the Computer Fraud and Abuse Act and the IRB. We will learn various approaches to investigate such systems, including auditing via training datasets, code, user reports, API scraping, sockpuppet accounts, and headless browsers. We will read and discuss various algorithmic audits by researchers and regulators, which will be a mix of selected readings and readings students choose. The second half of the class will be more discussion- and activity-based, as we perform audits on several real-world models whose developers have encouraged public auditing (e.g. Wikipedia’s content moderation classifiers). Students will work towards a final project, where they will conduct their own audits and develop strategies for how systems can be designed for auditability.
Prerequsites
There are no official prerequsites to register and students from all departments are welcome to enroll. The class will generally assume knowledge of:
- Introductory Statistics: The math and statistics of basic auditing are not as complex as those used in developing machine learning algorithms, but does involve statistics at the undergraduate level: correlations, hypothesis testing, linear regressions, and analysis of variance (ANOVA) (e.g. what is taught in COGS 14B). You don’t need to know how to mathematically derive these tests, just enough to be a well-informed user of them. However, some metrics of fairness, bias, causal inference, and related concepts do involve more advanced math and statistics, and students who want to work with such metrics will be able to do so.
- Introductory programming for data collection and analysis: A working knowledge of a scripting language like python or R is highly recommended (e.g. what is taught in CSE 8A or this coursera class). Some audit methods involve automated data collection. We will learn how to query APIs and run headless browsers using standard libraries. Jupyter Notebooks will be used for literate programming. This class will be registered for UCSD datahub, which provides a web/cloud-based Jupyter environment in python and R.
- All students must take and pass the UCSD/CITI IRB Human Subject Protection Training online course (Social and Behavioral Basic Course), by the end of week 2 of the class. This takes about 2-3 hours total and can be taken at any time, even during the summer. Register at citiprogram.org (no SSO available) and affiliate with UCSD. See more info at this video of me registering for the proper course. If you have passed the course in the past 3 years, your certificate is still valid; if not, you must take the shorter refresher course.
Please get in touch if you have any doubts or concerns about the prerequsites.
Potential readings
Note that the final reading list and schedule has not yet been finalized. Please reach out to Stuart Geiger if you have any suggestions or ideas. And thanks to auditingalgorithms.science for many of these!
What is an audit?
- Inioluwa Deborah Raji, Andrew Smart, Rebecca N. White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. 2020. Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. In Proc. of FAT*. https://dl.acm.org/doi/pdf/10.1145/3351095.3372873
- Sandvig, Christian, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. forthcoming. “Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms.” Computational Culture. http://www-personal.umich.edu/~csandvig/research/Auditing%20Algorithms%20–%20Sandvig%20–%20ICA%202014%20Data%20and%20Discrimination%20Preconference.pdf
- Saltman, J. (1975). Implementing Local Housing Laws Through Social Action. Journal of Applied Behavioral Science, 11(1): 39-61.
Classic auditing in non-algorithmic systems
- National Research Council Panel on Measuring Racial Discrimination. (2004). Measuring Racial Discrimination. Washington, DC: National Academies Press. http://www.nap.edu/catalog/10887/measuring-racial-discrimination
- Schulman KA, Berlin JA, Harless W, Kerner JF, Sistrunk S, et al. The effect of race and sex on physicians’ recommendations for cardiac catheterization. N. Engl. J. Med. 1999;340(8):618–626. https://doi.org/10.1056/NEJM199902253400806
- Saltman, J. (1975). Implementing Local Housing Laws Through Social Action. Journal of Applied Behavioral Science, 11(1): 39-61. https://doi.org/10.1177%2F002188637501100105
- Ayres, I. & Siegelman, P. (1995). Race and Gender Discrimination in Bargaining for a New Car. American Economic Review 85(3): 304-321. https://inequality.stanford.edu/sites/default/files/media/_media/pdf/Reference%20Media/Ayres_Siegelman_1995_Discrimination.pdf
Algorithmic auditing frameworks and introductions
- Sandvig, C., Hamilton, K., Karahalios, K., & Langbort, C. (2014). “An Algorithm Audit.” In: Seeta Peña Gangadharan (ed.), Data and Discrimination: Collected Essays, pp. 6-10. Washington, DC: New America Foundation. http://www-personal.umich.edu/~csandvig/research/An%20Algorithm%20Audit.pdf
- Sandvig, Christian, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. forthcoming. “Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms.” Computational Culture. http://www-personal.umich.edu/~csandvig/research/Auditing%20Algorithms%20–%20Sandvig%20–%20ICA%202014%20Data%20and%20Discrimination%20Preconference.pdf
- Diakopoulos, Nicholas. 2015. ‘‘Algorithmic Accountability.’’ Digital Journalism 3 (3): 398-415. http://www.nickdiakopoulos.com/wp-content/uploads/2011/07/algorithmic_accountability_final.pdf
- Data & Society Research Institute. 2014 “Workshop Primer: Algorithmic Accountability” http://www.datasociety.net/pubs/2014-0317/AlgorithmicAccountabilityPrimer.pdf
- Bandy, J. (2021). Problematic Machine Behavior: A Systematic Literature Review of Algorithm Audits. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1), 1-34. https://dl.acm.org/doi/pdf/10.1145/3449148
The legal and ethical issues of conducting audits (and violating Terms of Service to do so)
- Raji, I. D., Gebru, T., Mitchell, M., Buolamwini, J., Lee, J., & Denton, E. (2020, February). “Saving face: Investigating the ethical concerns of facial recognition auditing.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 145-151). https://dl.acm.org/doi/pdf/10.1145/3375627.3375820
- Vaccaro, K., Karahalios, K., Sandvig, C., Hamilton, K., & Langbort, C. (2015). “Agree or cancel? research and terms of service compliance.” In ACM CSCW Ethics Workshop: Ethics for Studying Sociotechnical Systems in a Big Data World. http://www-personal.umich.edu/~csandvig/research/Vaccaro-CSCW-Ethics-2015.pdf
- Bruckman, A. (2016, February 26). “Do Researchers Need to Abide by Terms of Service (TOS)? An Answer.” The Next Bison: Social Computing and Culture. https://nextbison.wordpress.com/2016/02/26/tos/
- Sandvig v. Barr ruling by the U.S. District Court for D.C.: https://www.aclu.org/sites/default/files/field_document/sandvig_opinion.pdf. Read pages 1-2 and 15-28 (part 2 beginning with “II. Interpreting the CFAA”)
- But even standing legal precedent does not automatically constrain the decisions of law enforcement; see https://enwp.org/United_States_v._Swartz (content warning: suicide)
- Wendler, David; Miller, Franklin G. “Deception in Research.” The Oxford Textbook of Clinical Research Ethics. Oxford University Press, New York. (2008) pgs. 315-324. https://auditlab.stuartgeiger.com/static/deception-ethics-wendler-miller.pdf
The UMN Linux kernel security audit
- Clark, M. 2021. The Verge. “University of Minnesota banned from contributing to Linux kernel.” https://www.theverge.com/2021/4/22/22398156/university-minnesota-linux-kernal-ban-research
- Linux Foundation Technical Advisory Board. 2021. “An emergency re-review of kernel commits authored by members of the University of Minnesota, due to the Hypocrite Commits research paper.” https://lore.kernel.org/lkml/202105051005.49BFABCE@keescook/
- Lu, K., Wu, Q., and Pakki, A. “An open letter to the Linux community - April 24, 2021” https://cse.umn.edu/cs/open-letter-linux-community-april-24-2021
- Hacker News Thread. 2021. “They introduce kernel bugs on purpose.” https://news.ycombinator.com/item?id=26887670
On legal challenges to algorithms used by government agencies
- Citron, Danielle Keats, and Frank A. Pasquale. 2014. “The Scored Society: Due Process for Automated Predictions.” Washington Law Review 89. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2376209
- Citron, Danielle Keats, Technological Due Process. Washington University Law Review, Vol. 85, pp. 1249-1313, 2007. READ § I.A., II.B.2, III.C.2. http://ssrn.com/abstract=1012360
- Stuart, G., “Databases, Felons, and Voting: Errors and Bias in the Florida Felons Exclusion List in the 2000 Presidential Elections” (September 2002). KSG Working Paper Series RWP 02-041. Read pp. 22-40. http://ssrn.com/abstract=336540
Metrics of fairness, bias, and related concepts
(A big TBD here!)
- Barocas, S., Hardt, M., & Narayanan, A. (2017). Fairness in machine learning: limitations and opportunities. https://fairmlbook.org
- Narayanan, A. (2021). 21 Fairness Definitions and Their Politics. In Tutorial presented at the ACM Conf. on Fairness, Accountability, and Transparency. https://www.youtube.com/embed/jIXIuYdnyyk
- Jacobs, A. Z., & Wallach, H. (2021). Measurement and fairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 375-385). https://dl.acm.org/doi/abs/10.1145/3442188.3445901 (and this tutorial video https://www.youtube.com/watch?v=va1lIfTCQ-E
- Keyes, O., Hutson, J., & Durbin, M. (2019). A mulching proposal: Analysing and improving an algorithmic system for turning the elderly into high-nutrient slurry. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1-11). https://ironholds.org/resources/papers/mulching.pdf
Cases of algorithmic audits
Online advertising and pricing
- Sweeney, L. Discrimination in Online Ad Delivery. CACM 56(5): 44-54. https://cacm.acm.org/magazines/2013/5/163753-discrimination-in-online-ad-delivery/abstract
- Hannak, A., Soeller, G., Lazer, D., Mislove, A., Wilson, C. (2014). Measuring Price Discrimination and Steering on E-commerce Web Sites. (IMC ’14). http://personalization.ccs.neu.edu/papers/price_discrimination.pdf
- Amit Datta, Michael Carl Tschantz, and Anupam Datta. (2015). Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination. PoPETs 2015: 1. https://www.degruyter.com/view/j/popets.2015.1.issue-1/popets-2015-0007/popets-2015-0007.xml
- Giridhari Venkatadri, Piotr Sapiezynski, Elissa Redmiles, Alan Mislove, Oana Goga, Michelle Mazurek, and Krishna P. Gummadi. 2019. Auditing Offline Data Brokers via Facebook’s Advertising Platform. The Web Conference 2019. https://doi.org/10.1145/3308558.3313666
- Datta, A., Datta, A., Makagon, J., Mulligan, D.K. & Tschantz, M.C.. (2018). “Discrimination in Online Advertising: A Multidisciplinary Inquiry.” Proceedings of the 1st Conference on Fairness, Accountability and Transparency, in PMLR 81:20-34. http://proceedings.mlr.press/v81/datta18a/datta18a.pdf
- Ali, M., Sapiezynski, P., Bogen, M., Korolova, A., Mislove, A., & Rieke, A. (2019). “Discrimination through optimization: How Facebook’s Ad delivery can lead to biased outcomes.” Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1-30. https://dl.acm.org/doi/pdf/10.1145/3359301
- Le Chen, Alan Mislove, and Christo Wilson. 2015. Peeking Beneath the Hood of Uber. In Proceedings of the 2015 Internet Measurement Conference (IMC ‘15). Association for Computing Machinery, New York, NY, USA, 495–508. https://doi.org/10.1145/2815675.2815681
Facial, biometric, speech recognition
- Buolamwini, J., & Gebru, T. (2018, January). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency (pp. 77-91). PMLR. http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf
- Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., Toups, C., Rickford, J.R., Jurafsky, D. and Goel, S., 2020. “Racial disparities in automated speech recognition.” Proceedings of the National Academy of Sciences, 117(14), pp.7684-7689. https://www.pnas.org/content/117/14/7684
- Tatman, R. (2017, April). Gender and dialect bias in YouTube’s automatic captions. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing (pp. 53-59). https://www.aclweb.org/anthology/W17-1606.pdf
- Tatman, R., & Kasten, C. (2017, August). Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions. In Interspeech (pp. 934-938). https://www.isca-speech.org/archive/Interspeech_2017/pdfs/1746.PDF
- Raji, I. D., Gebru, T., Mitchell, M., Buolamwini, J., Lee, J., & Denton, E. (2020, February). Saving face: Investigating the ethical concerns of facial recognition auditing. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 145-151). https://dl.acm.org/doi/pdf/10.1145/3375627.3375820
Recommender systems and search engine rankings
-
Marc Faddoul, Guillaume Chaslot, and Hany Farid. 2020. A longitudinal analysis of YouTube’s promotion of conspiracy videos. arxiv:2003.03318 [cs.CY] https://arxiv.org/abs/2003.0331
-
Ulloa, R., Makhortykh, M., & Urman, A. (2021). “Algorithm Auditing at a Large-Scale: Insights from Search Engine Audits.” arXiv preprint arXiv:2106.05831. https://arxiv.org/abs/2106.05831
-
Noble, S. U. (2018). Algorithms of oppression. New York University Press.
Social media and user-generated content (mostly NLP / sentiment analysis)
- M. Eslami, K. Vaccaro, K. Karahalios, and K. Hamilton. “Be careful; things can be worse than they appear”: Understanding Biased Algorithms and Users’ Behavior around Them in Rating Platforms (ICWSM 2017). http://social.cs.uiuc.edu/papers/ICWSM17-PrePrint.pdf
- King, G., Pan, J., & Roberts, M. E. (2014). Reverse-engineering censorship in China: Randomized experimentation and participant observation. Science, 345(6199). https://science.sciencemag.org/content/345/6199/1251722.abstract
- Blodgett, S. L., & O’Connor, B. (2017). Racial disparity in natural language processing: A case study of social media african-american english. arXiv preprint arXiv:1707.00061. https://arxiv.org/pdf/1707.00061.pdf
- Kiritchenko, S., & Mohammad, S. M. (2018). Examining gender and race bias in two hundred sentiment analysis systems. arXiv preprint arXiv:1805.04508. https://arxiv.org/abs/1805.04508
- Rios, A. (2020, April). FuzzE: Fuzzy fairness evaluation of offensive language classifiers on African-American English. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 01, pp. 881-889). https://doi.org/10.1609/aaai.v34i01.5434
- Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial bias in hate speech and abusive language detection datasets. arXiv preprint arXiv:1905.12516. https://arxiv.org/pdf/1905.12516
Hiring, admissions, and other social sorting
- Wilson, Christo, Avijit Ghosh, Shan Jiang, Alan Mislove, Lewis Baker, Janelle Szary, Kelly Trindel, and Frida Polli. (2021). “Building and auditing fair algorithms: A case study in candidate screening.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 666-677. https://evijit.github.io/docs/pymetrics_audit_FAccT.pdf
-
- See also their very open documentation about the legal agreement they signed, their budget, and other info: https://cbw.sh/audits.html
- Qiu, H. and Du, W. “A True Lie about Reed College: U.S News Ranking” https://raw.githubusercontent.com/huayingq1996/Reed-College-Ranking/master/paper.pdf
The ProPublica vs. COMPAS/Northpointe debate on COMPAS criminal risk scores
- Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica (2016). https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
- Dieterich, W., Mendoza, C., & Brennan, T. (2016). COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Northpointe Inc. https://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf
- Angwin, J., & Larson, J. (2016). Technical Response to Northpointe. ProPublica. https://www.propublica.org/article/technical-response-to-northpointe (and their annotations of the Dieterich et al response: https://www.documentcloud.org/documents/3248777-Lowenkamp-Fedprobation-sept2016-0.html
- Flores, A. W., Bechtel, K., & Lowenkamp, C. T. (2016). False positives, false negatives, and false analyses: rejoinder to machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. Federal Probation, 80(2), 38-46. https://www.crj.org/assets/2017/07/9_Machine_bias_rejoinder.pdf
- Dressel, J., & Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Science advances, 4(1), eaao5580. https://advances.sciencemag.org/content/advances/4/1/eaao5580.full.pdf
- Washington, A. L. (2018). How to argue with an algorithm: Lessons from the COMPAS-ProPublica debate. Colo. Tech. LJ, 17, 131. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3357874
- Coalition for Critical Technology. “Abolish the #TechToPrisonPipeline.” Medium. https://medium.com/@CoalitionForCriticalTechnology/abolish-the-techtoprisonpipeline-9b5b14366b16 (the footnotes go into much more detail and are basically an academic review article)
Other
- Stuart, G., “Databases, Felons, and Voting: Errors and Bias in the Florida Felons Exclusion List in the 2000 Presidential Elections” (September 2002). KSG Working Paper Series RWP 02-041. Read pp. 22-40. http://ssrn.com/abstract=336540