Auditing Algorithms Lab | DSC 290 | UC San Diego | Fall 2021

Details

Class: DSC 290
Instructor: R. Stuart Geiger
Time: 5-6:50pm on Tuesdays
Place: SOLIS 109
Units: 2

Description

This seminar is for students interested in empirically investigating the outputs of real-world algorithmic systems of all kinds, particularly those where the code and/or training data are not publicly available. The first few weeks of the class will include more readings and lectures, when we cover the history of auditing and legal/ethical issues it raises. This includes studying classic audits of non-algorithmic decision systems (e.g. equal opportunity hiring investigations) to contemporary issues around the Computer Fraud and Abuse Act and the IRB. We will learn various approaches to investigate such systems, including auditing via training datasets, code, user reports, API scraping, sockpuppet accounts, and headless browsers. We will read and discuss various algorithmic audits by researchers and regulators, which will be a mix of selected readings and readings students choose. The second half of the class will be more discussion- and activity-based, as we perform audits on several real-world models whose developers have encouraged public auditing (e.g. Wikipedia’s content moderation classifiers). Students will work towards a final project, where they will conduct their own audits and develop strategies for how systems can be designed for auditability.

Prerequsites

There are no official prerequsites to register and students from all departments are welcome to enroll. The class will generally assume knowledge of:

Introductory Statistics: The math and statistics of basic auditing are not as complex as those used in developing machine learning algorithms, but does involve statistics at the undergraduate level: correlations, hypothesis testing, linear regressions, and analysis of variance (ANOVA) (e.g. what is taught in COGS 14B). You don’t need to know how to mathematically derive these tests, just enough to be a well-informed user of them. However, some metrics of fairness, bias, causal inference, and related concepts do involve more advanced math and statistics, and students who want to work with such metrics will be able to do so.
Introductory programming for data collection and analysis: A working knowledge of a scripting language like python or R is highly recommended (e.g. what is taught in CSE 8A or this coursera class). Some audit methods involve automated data collection. We will learn how to query APIs and run headless browsers using standard libraries. Jupyter Notebooks will be used for literate programming. This class will be registered for UCSD datahub, which provides a web/cloud-based Jupyter environment in python and R.
All students must take and pass the UCSD/CITI IRB Human Subject Protection Training online course (Social and Behavioral Basic Course), by the end of week 2 of the class. This takes about 2-3 hours total and can be taken at any time, even during the summer. Register at citiprogram.org (no SSO available) and affiliate with UCSD. See more info at this video of me registering for the proper course. If you have passed the course in the past 3 years, your certificate is still valid; if not, you must take the shorter refresher course.

Please get in touch if you have any doubts or concerns about the prerequsites.

Potential readings

Note that the final reading list and schedule has not yet been finalized. Please reach out to Stuart Geiger if you have any suggestions or ideas. And thanks to auditingalgorithms.science for many of these!

What is an audit?

Inioluwa Deborah Raji, Andrew Smart, Rebecca N. White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. 2020. Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. In Proc. of FAT*. https://dl.acm.org/doi/pdf/10.1145/3351095.3372873
Sandvig, Christian, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. forthcoming. “Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms.” Computational Culture. http://www-personal.umich.edu/~csandvig/research/Auditing%20Algorithms%20–%20Sandvig%20–%20ICA%202014%20Data%20and%20Discrimination%20Preconference.pdf
Saltman, J. (1975). Implementing Local Housing Laws Through Social Action. Journal of Applied Behavioral Science, 11(1): 39-61.

Classic auditing in non-algorithmic systems

National Research Council Panel on Measuring Racial Discrimination. (2004). Measuring Racial Discrimination. Washington, DC: National Academies Press. http://www.nap.edu/catalog/10887/measuring-racial-discrimination
Schulman KA, Berlin JA, Harless W, Kerner JF, Sistrunk S, et al. The effect of race and sex on physicians’ recommendations for cardiac catheterization. N. Engl. J. Med. 1999;340(8):618–626. https://doi.org/10.1056/NEJM199902253400806
Saltman, J. (1975). Implementing Local Housing Laws Through Social Action. Journal of Applied Behavioral Science, 11(1): 39-61. https://doi.org/10.1177%2F002188637501100105
Ayres, I. & Siegelman, P. (1995). Race and Gender Discrimination in Bargaining for a New Car. American Economic Review 85(3): 304-321. https://inequality.stanford.edu/sites/default/files/media/_media/pdf/Reference%20Media/Ayres_Siegelman_1995_Discrimination.pdf

Algorithmic auditing frameworks and introductions

Sandvig, C., Hamilton, K., Karahalios, K., & Langbort, C. (2014). “An Algorithm Audit.” In: Seeta Peña Gangadharan (ed.), Data and Discrimination: Collected Essays, pp. 6-10. Washington, DC: New America Foundation. http://www-personal.umich.edu/~csandvig/research/An%20Algorithm%20Audit.pdf
Sandvig, Christian, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. forthcoming. “Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms.” Computational Culture. http://www-personal.umich.edu/~csandvig/research/Auditing%20Algorithms%20–%20Sandvig%20–%20ICA%202014%20Data%20and%20Discrimination%20Preconference.pdf
Diakopoulos, Nicholas. 2015. ‘‘Algorithmic Accountability.’’ Digital Journalism 3 (3): 398-415. http://www.nickdiakopoulos.com/wp-content/uploads/2011/07/algorithmic_accountability_final.pdf
Data & Society Research Institute. 2014 “Workshop Primer: Algorithmic Accountability” http://www.datasociety.net/pubs/2014-0317/AlgorithmicAccountabilityPrimer.pdf
Bandy, J. (2021). Problematic Machine Behavior: A Systematic Literature Review of Algorithm Audits. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1), 1-34. https://dl.acm.org/doi/pdf/10.1145/3449148

The legal and ethical issues of conducting audits (and violating Terms of Service to do so)

Raji, I. D., Gebru, T., Mitchell, M., Buolamwini, J., Lee, J., & Denton, E. (2020, February). “Saving face: Investigating the ethical concerns of facial recognition auditing.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 145-151). https://dl.acm.org/doi/pdf/10.1145/3375627.3375820
Vaccaro, K., Karahalios, K., Sandvig, C., Hamilton, K., & Langbort, C. (2015). “Agree or cancel? research and terms of service compliance.” In ACM CSCW Ethics Workshop: Ethics for Studying Sociotechnical Systems in a Big Data World. http://www-personal.umich.edu/~csandvig/research/Vaccaro-CSCW-Ethics-2015.pdf
Bruckman, A. (2016, February 26). “Do Researchers Need to Abide by Terms of Service (TOS)? An Answer.” The Next Bison: Social Computing and Culture. https://nextbison.wordpress.com/2016/02/26/tos/
Sandvig v. Barr ruling by the U.S. District Court for D.C.: https://www.aclu.org/sites/default/files/field_document/sandvig_opinion.pdf. Read pages 1-2 and 15-28 (part 2 beginning with “II. Interpreting the CFAA”)
- But even standing legal precedent does not automatically constrain the decisions of law enforcement; see https://enwp.org/United_States_v._Swartz (content warning: suicide)
Wendler, David; Miller, Franklin G. “Deception in Research.” The Oxford Textbook of Clinical Research Ethics. Oxford University Press, New York. (2008) pgs. 315-324. https://auditlab.stuartgeiger.com/static/deception-ethics-wendler-miller.pdf

The UMN Linux kernel security audit

Clark, M. 2021. The Verge. “University of Minnesota banned from contributing to Linux kernel.” https://www.theverge.com/2021/4/22/22398156/university-minnesota-linux-kernal-ban-research
Linux Foundation Technical Advisory Board. 2021. “An emergency re-review of kernel commits authored by members of the University of Minnesota, due to the Hypocrite Commits research paper.” https://lore.kernel.org/lkml/202105051005.49BFABCE@keescook/
Lu, K., Wu, Q., and Pakki, A. “An open letter to the Linux community - April 24, 2021” https://cse.umn.edu/cs/open-letter-linux-community-april-24-2021
Hacker News Thread. 2021. “They introduce kernel bugs on purpose.” https://news.ycombinator.com/item?id=26887670

On legal challenges to algorithms used by government agencies

Citron, Danielle Keats, and Frank A. Pasquale. 2014. “The Scored Society: Due Process for Automated Predictions.” Washington Law Review 89. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2376209
Citron, Danielle Keats, Technological Due Process. Washington University Law Review, Vol. 85, pp. 1249-1313, 2007. READ § I.A., II.B.2, III.C.2. http://ssrn.com/abstract=1012360
Stuart, G., “Databases, Felons, and Voting: Errors and Bias in the Florida Felons Exclusion List in the 2000 Presidential Elections” (September 2002). KSG Working Paper Series RWP 02-041. Read pp. 22-40. http://ssrn.com/abstract=336540

(A big TBD here!)

Barocas, S., Hardt, M., & Narayanan, A. (2017). Fairness in machine learning: limitations and opportunities. https://fairmlbook.org
Narayanan, A. (2021). 21 Fairness Definitions and Their Politics. In Tutorial presented at the ACM Conf. on Fairness, Accountability, and Transparency. https://www.youtube.com/embed/jIXIuYdnyyk
Jacobs, A. Z., & Wallach, H. (2021). Measurement and fairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 375-385). https://dl.acm.org/doi/abs/10.1145/3442188.3445901 (and this tutorial video https://www.youtube.com/watch?v=va1lIfTCQ-E
Keyes, O., Hutson, J., & Durbin, M. (2019). A mulching proposal: Analysing and improving an algorithmic system for turning the elderly into high-nutrient slurry. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1-11). https://ironholds.org/resources/papers/mulching.pdf

Cases of algorithmic audits

Online advertising and pricing

Sweeney, L. Discrimination in Online Ad Delivery. CACM 56(5): 44-54. https://cacm.acm.org/magazines/2013/5/163753-discrimination-in-online-ad-delivery/abstract
Hannak, A., Soeller, G., Lazer, D., Mislove, A., Wilson, C. (2014). Measuring Price Discrimination and Steering on E-commerce Web Sites. (IMC ’14). http://personalization.ccs.neu.edu/papers/price_discrimination.pdf
Amit Datta, Michael Carl Tschantz, and Anupam Datta. (2015). Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination. PoPETs 2015: 1. https://www.degruyter.com/view/j/popets.2015.1.issue-1/popets-2015-0007/popets-2015-0007.xml
Giridhari Venkatadri, Piotr Sapiezynski, Elissa Redmiles, Alan Mislove, Oana Goga, Michelle Mazurek, and Krishna P. Gummadi. 2019. Auditing Offline Data Brokers via Facebook’s Advertising Platform. The Web Conference 2019. https://doi.org/10.1145/3308558.3313666
Datta, A., Datta, A., Makagon, J., Mulligan, D.K. & Tschantz, M.C.. (2018). “Discrimination in Online Advertising: A Multidisciplinary Inquiry.” Proceedings of the 1st Conference on Fairness, Accountability and Transparency, in PMLR 81:20-34. http://proceedings.mlr.press/v81/datta18a/datta18a.pdf
Ali, M., Sapiezynski, P., Bogen, M., Korolova, A., Mislove, A., & Rieke, A. (2019). “Discrimination through optimization: How Facebook’s Ad delivery can lead to biased outcomes.” Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1-30. https://dl.acm.org/doi/pdf/10.1145/3359301
Le Chen, Alan Mislove, and Christo Wilson. 2015. Peeking Beneath the Hood of Uber. In Proceedings of the 2015 Internet Measurement Conference (IMC ‘15). Association for Computing Machinery, New York, NY, USA, 495–508. https://doi.org/10.1145/2815675.2815681

Facial, biometric, speech recognition

Buolamwini, J., & Gebru, T. (2018, January). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency (pp. 77-91). PMLR. http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf
Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., Toups, C., Rickford, J.R., Jurafsky, D. and Goel, S., 2020. “Racial disparities in automated speech recognition.” Proceedings of the National Academy of Sciences, 117(14), pp.7684-7689. https://www.pnas.org/content/117/14/7684
Tatman, R. (2017, April). Gender and dialect bias in YouTube’s automatic captions. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing (pp. 53-59). https://www.aclweb.org/anthology/W17-1606.pdf
Tatman, R., & Kasten, C. (2017, August). Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions. In Interspeech (pp. 934-938). https://www.isca-speech.org/archive/Interspeech_2017/pdfs/1746.PDF
Raji, I. D., Gebru, T., Mitchell, M., Buolamwini, J., Lee, J., & Denton, E. (2020, February). Saving face: Investigating the ethical concerns of facial recognition auditing. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 145-151). https://dl.acm.org/doi/pdf/10.1145/3375627.3375820

Recommender systems and search engine rankings

Marc Faddoul, Guillaume Chaslot, and Hany Farid. 2020. A longitudinal analysis of YouTube’s promotion of conspiracy videos. arxiv:2003.03318 [cs.CY] https://arxiv.org/abs/2003.0331
Ulloa, R., Makhortykh, M., & Urman, A. (2021). “Algorithm Auditing at a Large-Scale: Insights from Search Engine Audits.” arXiv preprint arXiv:2106.05831. https://arxiv.org/abs/2106.05831
Noble, S. U. (2018). Algorithms of oppression. New York University Press.

M. Eslami, K. Vaccaro, K. Karahalios, and K. Hamilton. “Be careful; things can be worse than they appear”: Understanding Biased Algorithms and Users’ Behavior around Them in Rating Platforms (ICWSM 2017). http://social.cs.uiuc.edu/papers/ICWSM17-PrePrint.pdf
King, G., Pan, J., & Roberts, M. E. (2014). Reverse-engineering censorship in China: Randomized experimentation and participant observation. Science, 345(6199). https://science.sciencemag.org/content/345/6199/1251722.abstract
Blodgett, S. L., & O’Connor, B. (2017). Racial disparity in natural language processing: A case study of social media african-american english. arXiv preprint arXiv:1707.00061. https://arxiv.org/pdf/1707.00061.pdf
Kiritchenko, S., & Mohammad, S. M. (2018). Examining gender and race bias in two hundred sentiment analysis systems. arXiv preprint arXiv:1805.04508. https://arxiv.org/abs/1805.04508
Rios, A. (2020, April). FuzzE: Fuzzy fairness evaluation of offensive language classifiers on African-American English. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 01, pp. 881-889). https://doi.org/10.1609/aaai.v34i01.5434
Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial bias in hate speech and abusive language detection datasets. arXiv preprint arXiv:1905.12516. https://arxiv.org/pdf/1905.12516

Wilson, Christo, Avijit Ghosh, Shan Jiang, Alan Mislove, Lewis Baker, Janelle Szary, Kelly Trindel, and Frida Polli. (2021). “Building and auditing fair algorithms: A case study in candidate screening.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 666-677. https://evijit.github.io/docs/pymetrics_audit_FAccT.pdf
- See also their very open documentation about the legal agreement they signed, their budget, and other info: https://cbw.sh/audits.html
Qiu, H. and Du, W. “A True Lie about Reed College: U.S News Ranking” https://raw.githubusercontent.com/huayingq1996/Reed-College-Ranking/master/paper.pdf

The ProPublica vs. COMPAS/Northpointe debate on COMPAS criminal risk scores

Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica (2016). https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Dieterich, W., Mendoza, C., & Brennan, T. (2016). COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Northpointe Inc. https://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf
Angwin, J., & Larson, J. (2016). Technical Response to Northpointe. ProPublica. https://www.propublica.org/article/technical-response-to-northpointe (and their annotations of the Dieterich et al response: https://www.documentcloud.org/documents/3248777-Lowenkamp-Fedprobation-sept2016-0.html
Flores, A. W., Bechtel, K., & Lowenkamp, C. T. (2016). False positives, false negatives, and false analyses: rejoinder to machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. Federal Probation, 80(2), 38-46. https://www.crj.org/assets/2017/07/9_Machine_bias_rejoinder.pdf
Dressel, J., & Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Science advances, 4(1), eaao5580. https://advances.sciencemag.org/content/advances/4/1/eaao5580.full.pdf
Washington, A. L. (2018). How to argue with an algorithm: Lessons from the COMPAS-ProPublica debate. Colo. Tech. LJ, 17, 131. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3357874
Coalition for Critical Technology. “Abolish the #TechToPrisonPipeline.” Medium. https://medium.com/@CoalitionForCriticalTechnology/abolish-the-techtoprisonpipeline-9b5b14366b16 (the footnotes go into much more detail and are basically an academic review article)

Other

Stuart, G., “Databases, Felons, and Voting: Errors and Bias in the Florida Felons Exclusion List in the 2000 Presidential Elections” (September 2002). KSG Working Paper Series RWP 02-041. Read pp. 22-40. http://ssrn.com/abstract=336540