langfair.metrics.recommendation.recommendation.RecommendationMetrics#

class langfair.metrics.recommendation.recommendation.RecommendationMetrics(metrics=['Jaccard', 'PRAG', 'SERP'])#

Bases: object

__init__(metrics=['Jaccard', 'PRAG', 'SERP'])#

Class for LLM recommendation fairness metrics. Compute FaiRLLM (Fairness of Recommendation via LLM) metrics. This class enables calculation of Jaccard Similarity, SEarch Result Page Misinformation Score (SERP), and Pairwise Ranking Accuracy Gap (PRAG) across protected attribute groups.

For more information on these metrics, refer to Zhang et al. (2023): https://arxiv.org/abs/2305.07609

Parameters:

metrics (list of string/objects, default=["Jaccard", "PRAG", "SERP"]) – A list containing name or class object of metrics.

Methods

__init__([metrics])

Class for LLM recommendation fairness metrics.

evaluate_against_neutral(group_dict_list, ...)

Returns min, max, range, and standard deviation of SERP, Jaccard, and PRAG similarity metrics across protected attribute groups.

evaluate_pairwise(rec_lists1, rec_lists2)

Returns pairwise values of SERP, Jaccard, and PRAG similarity metrics for two protected attribute groups.

evaluate_against_neutral(group_dict_list, neutral_dict)#

Returns min, max, range, and standard deviation of SERP, Jaccard, and PRAG similarity metrics across protected attribute groups. Metrics are consistent with those provided by https://arxiv.org/pdf/2305.07609.pdf

Parameters:
  • neutral_dict (dictionary of lists) –

    Each value in the list corresponds to a recommendation list. For example, neutral_dict = {

    ’TS’: [

    “Love Story”, “You Belong with Me”, “Blank Space”, “Shake It Off”, “Bad Blood”, “Style”, “Wildest Dreams”, “Delicate”, “ME!”, “Cardigan”

    ], ‘ES’: [

    ”The A Team”, “Thinking Out Loud”, “Shape of You”, “Castle on the Hill”, “Perfect”, “Photograph”, “Dive”, “Galway Girl”, “Happier”, “Lego House”

    ]

    }

  • group_dict_list (list of dictionaries of lists) –

    Each element of the list corresponds to a protected attribute group. The values of each interior dictionary are recommendation lists in the format of neutral_dict. For example, group_dict_list = [

    {
    ‘TS’: [

    “Love Story”, “Shake It Off”, “Blank Space”, “You Belong with Me”, “Bad Blood”, “Style”, “Wildest Dreams”, “Delicate”, “Look What You Made Me Do”, “We Are Never Ever Getting Back Together”

    ], ‘ES’: [

    ”The A Team”, “Thinking Out Loud”, “Shape of You”, “Castle on the Hill”, “Perfect”, “Photograph”, “Dive”, “Sing”, “Galway Girl”, “I Don’t Care (with Justin Bieber)”

    ]

    }, {

    ’TS’: [

    “Love Story”, “You Belong with Me”, “Blank Space”, “Shake It Off”, “Style”, “Wildest Dreams”, “Delicate”, “ME!”, “Cardigan”, “Folklore”, ],

    ’ES’: [

    “Castle on the Hill”, “Perfect”, “Shape of You”, “Thinking Out Loud”, “Photograph”, “Galway Girl”, “Dive”, “Happier”, “Lego House”, “Give Me Love”

    ]

    }

    ]

Returns:

Dictionary containing mean, max, standard deviation, and range for Jaccard, SERP, PRAG across protected attribute groups

Return type:

dict

evaluate_pairwise(rec_lists1, rec_lists2)#

Returns pairwise values of SERP, Jaccard, and PRAG similarity metrics for two protected attribute groups. Metrics are pairwise analogs of those provided by https://arxiv.org/pdf/2305.07609.pdf

Parameters:
  • rec_lists1 (list of lists of strings) – A list of recommendation lists, each of length K, generated from prompts containing mentions of the same protected attribute group.

  • rec_lists2 (list of lists of strings) – A list of recommendation lists, each of length K, generated from prompts containing mentions of the same protected attribute group. Prompts should be identical to those used to generate rec_lists1 except they should mention a different protected attribute group.

Returns:

Dictionary containing pairwise metric values of SERP, Jaccard, and PRAG

Return type:

dict