Our paper about training Characteristic Functions in Reinforcement Learning was accepted as long-form talk at ICML 2022.

Characteristic functions are a concept from cooperative game theory. From a set of players they assign every subset a pay-off value. The goal is to distribute the pay-off of the whole set to the individual players. Shapley values were introduced as an attribution method with useful mathematicel properties.

We cannot straightforwardly use this for model interpretability, since classification or regeression models are not characteristic functions. The input of a subset of values is not defined. In the literature this is often remedied by blacking out parts of the input. However, as we argue in our paper, this is not a neutral operation and can lead to artefacts.

To compare different feature importance attribution methods, we directly trained a characteristic function with missing data in the game of Connect Four.

img1


Figure 1. The setup for comparing two interpretability methods. Connect Four is transformed to a two-vs-two player game, where the first player in each team selects a part of the board for which colour information is visible and the second bases their move on this information. Restricting only colour information allows the second player to be trained to always play legal moves.