{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ead0356d",
   "metadata": {},
   "source": [
    "# Classification Metrics "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "f694ef3c-96cb-472c-80c4-0409222fc4ac",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "from langfair.metrics.classification import ClassificationMetrics\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b634110-1aa9-413d-908a-6ba61cde007e",
   "metadata": {},
   "source": [
    "## 1. Introduction\n",
    "<a id='introduction'> </a> "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a159622-3c80-4efc-854e-d89aa1cf4d84",
   "metadata": {},
   "source": [
    "Large language models (LLMs) used in classification use cases should be assessed for group fairness (if applicable). Similar to traditional person-level classification challenges in machine learning, these use cases present the risk of allocational harms.  LangFair offers the following classification fairness metrics from the LLM fairness literature:\n",
    "\n",
    "* Predicted Prevalence Rate Disparity ([Feldman et al., 2015](https://arxiv.org/abs/1412.3756); [Bellamy et al., 2018](https://arxiv.org/abs/1810.01943); [Saleiro et al., 2019](https://arxiv.org/abs/1811.05577))\n",
    "* False Negative Rate Disparity ([Bellamy et al., 2018](https://arxiv.org/abs/1810.01943); [Saleiro et al., 2019](https://arxiv.org/abs/1811.05577))\n",
    "* False Omission Rate Disparity ([Bellamy et al., 2018](https://arxiv.org/abs/1810.01943); [Saleiro et al., 2019](https://arxiv.org/abs/1811.05577))\n",
    "* False Positive Rate Disparity ([Bellamy et al., 2018](https://arxiv.org/abs/1810.01943); [Saleiro et al., 2019](https://arxiv.org/abs/1811.05577))\n",
    "* False Discovery Rate Disparity ([Bellamy et al., 2018](https://arxiv.org/abs/1810.01943); [Saleiro et al., 2019](https://arxiv.org/abs/1811.05577))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a7059f0-cf44-437e-b0b9-12e33b6872ad",
   "metadata": {},
   "source": [
    "## 2. Assessment\n",
    "<a id='assessment'> </a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "820e7afb-e66b-4716-bdbf-a53ffba4c4ae",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Simulate dataset for this example. In practice, users should replace this data with predicted classes generated by the LLM,\n",
    "# corresponding ground truth values, and corresponding protected attribute group data.\n",
    "sample_size = 10000\n",
    "groups = np.random.binomial(n=1, p=0.5, size=sample_size)\n",
    "y_pred = np.random.binomial(n=1, p=0.3, size=sample_size)\n",
    "y_true = np.random.binomial(n=1, p=0.3, size=sample_size)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42f834ce-792c-44b2-a0ee-5c8343365697",
   "metadata": {},
   "source": [
    "## Classification Metrics\n",
    "***\n",
    "`ClassificationMetrics()` - Pairwise classification fairness metrics (class)\n",
    "\n",
    "**Class parameters:**\n",
    "- `metric_type` - (**{'all', 'assistive', 'punitive', 'representation'}, default='all'**) Specifies which metrics to use.\n",
    "\n",
    "**Methods:**\n",
    "1. `evaluate` - Returns min, max, range, and standard deviation of metrics across protected attribute groups.\n",
    "\n",
    "    **Method Parameters:**\n",
    "    - `groups` - (**array-like**) Group indicators. Must contain exactly two unique values.\n",
    "    - `y_pred` - (**array-like**) Binary model predictions. Positive and negative predictions must be 1 and 0, respectively.\n",
    "    - `y_true` - (**array-like**) Binary labels (ground truth values). Positive and negative labels must be 1 and 0, respectively.\n",
    "    - `ratio` - (**boolean**) Indicates whether to compute the metric as a difference or a ratio\n",
    "\n",
    "    Returns:\n",
    "    - Dictionary containing fairness metric values (**Dictionary**)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "133aaee2",
   "metadata": {},
   "source": [
    "Generate an instance of class `ClassificationMetrics` using default `metric_type='all'`, which includes \"assistive\", \"punitive\", and \"representation\" metrics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "727a49a4-3067-4e7d-9de7-adfd29f4f6a8",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "cm = ClassificationMetrics(metric_type='all')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "33e61ded-b56f-42f3-897a-0df80d03b626",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'FalseNegativeRateParity': 0.9683960547735326,\n",
       " 'FalseOmissionRateParity': 0.9682772917805723,\n",
       " 'FalsePositiveRateParity': 0.9832027144990514,\n",
       " 'FalseDiscoveryRateParity': 0.9750294817188464,\n",
       " 'PredictedPrevalenceRateParity': 1.010318056277584}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Metrics expressed as ratios (target value of 1)\n",
    "cm.evaluate(groups=groups, y_pred=y_pred, y_true=y_true, ratio=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "366c5853-ffb7-49c2-a9c5-37d38ba365e5",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'FalseNegativeRateParity': 0.022435421009698087,\n",
       " 'FalseOmissionRateParity': 0.009568167034658404,\n",
       " 'FalsePositiveRateParity': 0.005089653684952955,\n",
       " 'FalseDiscoveryRateParity': 0.01776013575685509,\n",
       " 'PredictedPrevalenceRateParity': 0.0030867911196673647}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Metrics expressed as differences (target value of 0)\n",
    "cm.evaluate(groups=groups, y_pred=y_pred, y_true=y_true, ratio=False)"
   ]
  }
 ],
 "metadata": {
  "environment": {
   "kernel": "langfair",
   "name": "workbench-notebooks.m125",
   "type": "gcloud",
   "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m125"
  },
  "kernelspec": {
   "display_name": "langfair-ZgpfWZGz-py3.9",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}