Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor bugs in 13.md #243

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 72 additions & 72 deletions slides/13/13.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,22 +37,22 @@ which seems morally good or neutral…

# Why ML Ethics? (2)

* There are high-stake applications of ML: decision making (in public and business administration)
* There are high-stake applications of ML: decision making (in public and business administration).

<br />

~~~
* People tend to consider algorithm outputs as objective
* People tend to consider algorithm outputs as objective.

<br />

~~~
* There are bad actors (military use, massive surveillance, disinformation, ...)
* There are bad actors (military use, massive surveillance, disinformation, ...).

<br />

~~~
* Inherent problems of ML: Good intentions might lead to unintended harm (biases, lack of explainability)
* Inherent problems of ML: Good intentions might lead to unintended harm (biases, lack of explainability).


---
Expand All @@ -63,36 +63,36 @@ class: section
---
# What is Ethics

Ethics or moral philosophy is a branch of philosophy that "involves
Ethics or moral philosophy is a branch of philosophy that involves
systematizing, defending, and recommending concepts of right and wrong
behavior".
behavior.

<br />

~~~
* Simply taken: Study of what is **right and wrong**
* Simply taken: Study of what is **right and wrong**.

~~~
* Several major theoretical frameworks:

* **Deontological** ethics – rule/principle-driven, good action follows rules
* **Deontological** ethics – rule/principle-driven, good action follows rules.

* **Consequentialist** ethics / utilitarianism – consequences matter, good action has good consequences
* **Consequentialist** ethics / utilitarianism – consequences matter, good action has good consequences.

* **Virtue** ethics (encourage curiosity, creativity, solidarity, etc.) – good action is what a virtuous person would do
* **Virtue** ethics (encourage curiosity, creativity, solidarity, etc.) – good action is what a virtuous person would do.

* **Contract** ethics (everything is a social contract) – good action is what people implicitly agreed upon
* **Contract** ethics (everything is a social contract) – good action is what people implicitly agreed upon.

* **Care ethics** – relationships and responsibilities matter, good action nurtures care and empathy for others
* **Care ethics** – relationships and responsibilities matter, good action nurtures care and empathy for others.

---
# Deontological Ethics

* Focuses on the **inherent nature of actions** rather than their consequences
* Focuses on the **inherent nature of actions** rather than their consequences.

* Involves adhering to predefined **rules and principles** (e.g., the Universal Declaration of Human Rights, the Ten Commandments, Kant's Categorical Imperative)
* Involves adhering to predefined **rules and principles** (e.g., the Universal Declaration of Human Rights, the Ten Commandments, Kant's Categorical Imperative).

* In the ML context, it typically means principles like: beneficence, non-malevolence, privacy, non-discrimination, autonomy + informed consent
* In the ML context, it typically means principles like: beneficence, non-malevolence, privacy, non-discrimination, autonomy + informed consent.

<br />
~~~
Expand Down Expand Up @@ -125,9 +125,9 @@ behavior".
---
# Utilitarian Ethics

* An ethical theory that emphasizes the maximization of **overall happiness or well-being** or minimizing harm
* An ethical theory that emphasizes the maximization of **overall happiness or well-being** or minimizing harm.

* It focuses on **consequences of actions**
* It focuses on **consequences of actions**.

* Good decisions = decisions that lead to the greatest overall positive impact.

Expand All @@ -154,9 +154,9 @@ behavior".
~~~
<br />

* Implicitly behind most work on ML ethics that focuses on harmful consequences for various social groups
* Implicitly behind most work on ML ethics that focuses on harmful consequences for various social groups.

* Gets tricky once we consider very low-probability events with very high impact (special case longtermism)
* Gets tricky once we consider very low-probability events with very high impact (special case longtermism).

---
# Different Theories, Different Ideas
Expand Down Expand Up @@ -188,17 +188,17 @@ class: section
---
# Stages of ML development

Ethical problems might emerge in all stages of ML system development
Ethical problems might emerge in all stages of ML system development.

* **Problem definition** – some tasks are inherently problematic
* **Problem definition** – some tasks are inherently problematic.

* **Data collection** – biases in data, unethical collection
* **Data collection** – biases in data, unethical collection.

* **Model development** – design choices (i.e., most of this course)
* **Model development** – design choices (i.e., most of this course).

* **Model evaluation** – metrics do not cover important things
* **Model evaluation** – metrics do not cover important things.

* **Model deployment** – use outside of original scope, feedback loops
* **Model deployment** – use outside of original scope, feedback loops.

---
# Problem Definition
Expand All @@ -218,12 +218,12 @@ Many schools in the US use automatic essay scoring for Graduate Record Examinati
![w=30%,f=right](essays.png)
![w=30%,f=right](essay_map.png)

* **Lack of transparency**: students have the right to know why they were accepted
* **Lack of transparency**: students have the right to know why they were accepted.

* Allows **metric gaming** if you guess what the features might be <small>(so even the utility is low)</small>
* Allows **metric gaming** if you guess what the features might be <small>(so even the utility is low)</small>.

---
# Recidivism prediction: COMPAS
# Recidivism Prediction: COMPAS

![w=30%,f=right](compas.png)

Expand All @@ -242,31 +242,31 @@ Many schools in the US use automatic essay scoring for Graduate Record Examinati
~~~

**Deontology:** violates the right to a fair trial, equality before the law, lack of transparency
⇒ *Morally bad*
⇒ *Morally bad*.

~~~

**Utilitarianism:** the benefits <small>(the state saves money that can be use elsewhere)</small> are smaller than the harm <small>(lack of justice)</small>
⇒ *Morally bad* <br /> <small>(But what would be the suitable metric to compare money and fairness?)</small>
⇒ *Morally bad*. <br /> <small>(But what would be the suitable metric to compare money and fairness?)</small>

---
# Data Collection & Biases in Data

![w=30%,f=right](gendershades.png)

* Representation bias: The data might not be representative of the population
(missing minorities, poor people, ...)
(missing minorities, poor people, ...).

~~~
* Data (and especially text) from the Internet does not represent the world as
it is (only those who have access and are loud) and the world as it should be
it is (only those who have access and are loud) and the world as it should be.

![w=30%,f=right](imagenet.png)

~~~
* Historical bias: Inequalities from the past when the data was created are preserved in the datasets
* Historical bias: Inequalities from the past when the data was created are preserved in the datasets.

* Copyright issues, especially with generative models
* Copyright issues, especially with generative models.

![w=100%,h=center](female_doctor.png)

Expand All @@ -275,54 +275,54 @@ Many schools in the US use automatic essay scoring for Graduate Record Examinati

### Crowdsourcing

* People are hired to do the job the ML model will do
* People are hired to do the job the ML model will do.

* Not well paid (often in third world countries), monotonous work, occasionally causing psychological harm
* Not well paid (often in third world countries), monotonous work, occasionally causing psychological harm.

* Gig economy: what was originally meant as earning extra cash becomes a full-time job without labor protection
* Gig economy: what was originally meant as earning extra cash becomes a full-time job without labor protection.

<small>Crawford, Kate. The atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press, 2021. Chapter 2.</small>

~~~
### Log mining and user data collection

* Training data is collected from users that have no other choice than to provide data by using services <small>(not using them and keeping social/work/political life at the same is impossible)</small>
* Training data is collected from users that have no other choice than to provide data by using services <small>(not using them and keeping social/work/political life at the same is impossible)</small>.

* Nontransparent transaction: user gets service (for free or paid) and provides data
* Nontransparent transaction: user gets service (for free or paid) and provides data.

<small>Couldry, Nick, and Ulises A. Mejias. The costs of connection: How data is colonizing human life and appropriating it for capitalism. Stanford University Press, 2020.</small>

---
# Model Development

* Discretization of outputs might lead to bias amplification <br />
* Discretization of outputs might lead to bias amplification. <br />
<small>Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints (Zhao et al., EMNLP 2017)</small>

~~~

* Larger models are more prone to overfitting: might lead to memorization of very specific patterns (privacy issues, remembering particular names) <br />
* Larger models are more prone to overfitting: might lead to memorization of very specific patterns (privacy issues, remembering particular names). <br />
<small>Are Large Pre-Trained Language Models Leaking Your Personal Information? (Huang et al., Findings EMNLP 2022)</small>

~~~

* Distilled models are prone to stereotyping <br />
* Distilled models are prone to stereotyping. <br />
<small>Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT (Ahn et al., GeBNLP 2022)</small>

~~~

* Models might learn protected attributes by proxies (e.g., ethnicity from name, school) <br /> <small>Adversarial Removal of Demographic Attributes Revisited (Barrett et al., EMNLP-IJCNLP 2019)</small>
* Models might learn protected attributes by proxies (e.g., ethnicity from name, school). <br /> <small>Adversarial Removal of Demographic Attributes Revisited (Barrett et al., EMNLP-IJCNLP 2019)</small>

---
# Model Evaluation

* Metrics might not capture everything we need
* Metrics might not capture everything we need.

* E.g., translation fluency does not capture gender bias
* E.g., translation fluency does not capture gender bias.

* Micro-averaging might hide bad performance for specific user groups (typically minorities)
* Micro-averaging might hide bad performance for specific user groups (typically minorities).

~~~
* Human Resources: employment recommendation based on CV
* Human Resources: employment recommendation based on CV.

* Precision: The business implies optimizing for precision – you only
recommend few candidates and they need to be the good ones.
Expand All @@ -331,7 +331,7 @@ Many schools in the US use automatic essay scoring for Graduate Record Examinati
against gender, age, ethnicity, etc.

---
# Proxy metrics optimizing something else
# Proxy Metrics Optimizing Something Else

![w=30%,f=right](hal9000.jpg)

Expand All @@ -347,21 +347,21 @@ Many schools in the US use automatic essay scoring for Graduate Record Examinati

~~~
* Platforms like YouTube use watch time as a proxy for content quality (and
btw. more watch time brings them more money)
btw. more watch time brings them more money).

* Non-profit [algotransparency.org](https://www.algotransparency.org) monitors
stats on YouTube recommendations: 2016-2018 most recommended videos
supporting alternative narrative on political events (US and French
elections, mass shootings)
elections, mass shootings).

* Presumably, this was the type of content maximizing the watch time
* Presumably, this was the type of content maximizing the watch time.

<br />

<small>https://guillaumechaslot.medium.com/how-algorithms-can-learn-to-discredit-the-media-d1360157c4fa</small>

---
# Use of model in practice & Feedback loops
# Use of Model in Practice & Feedback Loops

### Mismatch of train/test data and use in practice

Expand Down Expand Up @@ -399,68 +399,68 @@ class: section
# Review of the Semester

---
# Theoretical concepts
# Theoretical Concepts

### Basic statistic

Bernoulli distribution, Categorical distribution, Normal
Bernoulli distribution, categorical distribution, Normal
Distribution, descriptive statistics (mean, variance, correlation), Maximum
Likelihood Estimation, Bayes Theorem
Likelihood Estimation, Bayes' Theorem.

### Information theory basics

Entropy, Conditional Entropy, Cross-Entropy, KL-Divergence, Mutual Information
Entropy, Conditional Entropy, Cross-Entropy, KL-Divergence, Mutual Information.

* Training = minimize how suprised we are from the data
* Maximum entropy principle = a view on generalization: do not bring in additional assumptions
* Training = minimize how suprised we are from the data.
* Maximum entropy principle = a view on generalization: do not bring in additional assumptions.

### Optimization

Set derivative to zero, Lagrange multipliers for additional constraints, numerical optimization with SGD
and second-order methods
and second-order methods.

---
# Machine learning methodology
# Machine Learning Methodology

### Working with data

* **Data annotation**: inter-annotator agreement (correlation, Cohen's alpha)
* **Data annotation**: inter-annotator agreement (correlation, Cohen's kappa).

* **Features**: numerical/categorical features, polynomial features, TF-IDF, use pre-trained embeddings, representation learning
* **Features**: numerical/categorical features, polynomial features, TF-IDF, use pre-trained embeddings, representation learning.

* **Normalization**: min-max scaling, standardization, whitening
* **Normalization**: min-max scaling, standardization, whitening.

### Training and Evaluation

* **Data splits**: optimization for unseen data, train, validation, test split
* **Data splits**: optimization for unseen data, train, validation, test split.

* **Overfitting and regularization**: early stopping + reading learning curves, $L^2$-regularization, dropout in MLP, prior in Bayesian models
* **Overfitting and regularization**: early stopping + reading learning curves, $L^2$-regularization, dropout in MLP, prior in Bayesian models.

* **Evaluation metrics**: accuracy, mean squared error, precision/recall/F-score, correlation, hypotheses testing
* **Evaluation metrics**: accuracy, mean squared error, precision/recall/F-score, correlation, hypotheses testing.

---
# Machine learning models
# Machine Learning Models

### Geometric intuition

Linear regression, Perceptron, Nearest Neighbors Classification and Regression, SVD, $k$-Means clustering
Linear regression, Perceptron, Nearest Neighbors Classification and Regression, SVD, $k$-Means clustering.

### Probabilistic intuition

Linear regression, logistic regression, Multi-layer Perceptron, Naive Bayes, PCA
Linear regression, logistic regression, Multilayer Perceptron, Naive Bayes, PCA.

### Decision trees

Random forest, Gradient boosted decision trees
Random forest, gradient-boosted decision trees.

---
# Course Objectives: What you hopefully learned
# Course Objectives: What You Hopefully Learned

After this course you should…

- Be able to reason about tasks/problems **suitable for ML**
- Know when to use classification, regression and clustering
- Be able to choose from this method Linear and Logistic Regression,
- Be able to choose from these methods: Linear and Logistic Regression,
Multilayer Perceptron, Nearest Neighbors, Naive Bayes, Gradient Boosted Decision
Trees, $k$-means clustering

Expand Down