$\newcommand{\kl}{\mathopen{}\left( #1 \right)\mathclose{}} \newcommand{\ekl}{\mathopen{}\left[ #1 \right]\mathclose{}} \newcommand{\skl}{\mathopen{}\left\{ #1 \right\}\mathclose{}} \newcommand{\bkl}{\mathopen{}\left| #1 \right|\mathclose{}} \newcommand{\nkl}{\mathopen{}\left\| #1 \right\|\mathclose{}} \newcommand{\bfa}{\mathbf{a}} \newcommand{\bfb}{\mathbf{b}} \newcommand{\bfc}{\mathbf{c}} \newcommand{\bfd}{\mathbf{d}} \newcommand{\bfe}{\mathbf{e}} \newcommand{\bff}{\mathbf{f}} \newcommand{\bfg}{\mathbf{g}} \newcommand{\bfh}{\mathbf{h}} \newcommand{\bfi}{\mathbf{i}} \newcommand{\bfj}{\mathbf{j}} \newcommand{\bfk}{\mathbf{k}} \newcommand{\bfl}{\mathbf{l}} \newcommand{\bfm}{\mathbf{m}} \newcommand{\bfn}{\mathbf{n}} \newcommand{\bfo}{\mathbf{o}} \newcommand{\bfp}{\mathbf{p}} \newcommand{\bfq}{\mathbf{q}} \newcommand{\bfr}{\mathbf{r}} \newcommand{\bfs}{\mathbf{s}} \newcommand{\bft}{\mathbf{t}} \newcommand{\bfu}{\mathbf{u}} \newcommand{\bfv}{\mathbf{v}} \newcommand{\bfw}{\mathbf{w}} \newcommand{\bfx}{\mathbf{x}} \newcommand{\bfy}{\mathbf{y}} \newcommand{\bfz}{\mathbf{z}} \newcommand{\bfA}{\mathbf{A}} \newcommand{\bfB}{\mathbf{B}} \newcommand{\bfC}{\mathbf{C}} \newcommand{\bfD}{\mathbf{D}} \newcommand{\bfE}{\mathbf{E}} \newcommand{\bfF}{\mathbf{F}} \newcommand{\bfG}{\mathbf{G}} \newcommand{\bfH}{\mathbf{H}} \newcommand{\bfI}{\mathbf{I}} \newcommand{\bfJ}{\mathbf{J}} \newcommand{\bfK}{\mathbf{K}} \newcommand{\bfL}{\mathbf{L}} \newcommand{\bfM}{\mathbf{M}} \newcommand{\bfN}{\mathbf{N}} \newcommand{\bfO}{\mathbf{O}} \newcommand{\bfP}{\mathbf{P}} \newcommand{\bfQ}{\mathbf{Q}} \newcommand{\bfR}{\mathbf{R}} \newcommand{\bfS}{\mathbf{S}} \newcommand{\bfT}{\mathbf{T}} \newcommand{\bfU}{\mathbf{U}} \newcommand{\bfV}{\mathbf{V}} \newcommand{\bfW}{\mathbf{W}} \newcommand{\bfX}{\mathbf{X}} \newcommand{\bfY}{\mathbf{Y}} \newcommand{\bfZ}{\mathbf{Z}} \newcommand{\bfone}{\mathbf{1}} \newcommand{\bfzero}{\mathbf{0}} \newcommand{\E}{\mathbb{E}} \newcommand{\R}{\mathbb{R}} \renewcommand{\P}{\mathbb{P}} \newcommand{\bfmu}{\bm{\mu}} \newcommand{\bfsigma}{\bm{\sigma}} \newcommand{\bfdelta}{\boldsymbol{\delta}} \newcommand{\bfSigma}{\bm{\Sigma}} \newcommand{\bfLambda}{\bm{\Lambda}} \newcommand{\bfeta}{\bm{\eta}} \newcommand{\bftheta}{\bm{\theta}} \newcommand{\CA}{\mathcal{A}} \newcommand{\CB}{\mathcal{B}} \newcommand{\CC}{\mathcal{C}} \newcommand{\CD}{\mathcal{D}} \newcommand{\CE}{\mathcal{E}} \newcommand{\CF}{\mathcal{F}} \newcommand{\CG}{\mathcal{G}} \newcommand{\CH}{\mathcal{H}} \newcommand{\CI}{\mathcal{I}} \newcommand{\CJ}{\mathcal{J}} \newcommand{\CK}{\mathcal{K}} \newcommand{\CL}{\mathcal{L}} \newcommand{\CM}{\mathcal{M}} \newcommand{\CN}{\mathcal{N}} \newcommand{\CO}{\mathcal{O}} \newcommand{\CP}{\mathcal{P}} \newcommand{\CQ}{\mathcal{Q}} \newcommand{\CR}{\mathcal{R}} \newcommand{\CS}{\mathcal{S}} \newcommand{\CT}{\mathcal{T}} \newcommand{\CU}{\mathcal{U}} \newcommand{\CV}{\mathcal{V}} \newcommand{\CW}{\mathcal{W}} \newcommand{\CX}{\mathcal{X}} \newcommand{\CY}{\mathcal{Y}} \newcommand{\CZ}{\mathcal{Z}} \newcommand{\frA}{\mathfrak{A}} \newcommand{\frB}{\mathfrak{B}} \newcommand{\frC}{\mathfrak{C}} \newcommand{\frD}{\mathfrak{D}} \newcommand{\frE}{\mathfrak{E}} \newcommand{\frF}{\mathfrak{F}} \newcommand{\frG}{\mathfrak{G}} \newcommand{\frH}{\mathfrak{H}} \newcommand{\frI}{\mathfrak{I}} \newcommand{\frJ}{\mathfrak{J}} \newcommand{\frK}{\mathfrak{K}} \newcommand{\frL}{\mathfrak{L}} \newcommand{\frM}{\mathfrak{M}} \newcommand{\frN}{\mathfrak{N}} \newcommand{\frO}{\mathfrak{O}} \newcommand{\frP}{\mathfrak{P}} \newcommand{\frQ}{\mathfrak{Q}} \newcommand{\frR}{\mathfrak{R}} \newcommand{\frS}{\mathfrak{S}} \newcommand{\frT}{\mathfrak{T}} \newcommand{\frU}{\mathfrak{U}} \newcommand{\frV}{\mathfrak{V}} \newcommand{\frW}{\mathfrak{W}} \newcommand{\frX}{\mathfrak{X}} \newcommand{\frY}{\mathfrak{Y}} \newcommand{\frZ}{\mathfrak{Z}} \newcommand{\CNP}{\mathcal{NP}} \newcommand{\CPP}{\mathcal{PP}} \newcommand{\SP}{\mathsf{P}} \newcommand{\SPP}{\mathsf{PP}} \newcommand{\SSP}{\mathsf{\#P}} \newcommand{\SNP}{\mathsf{NP}} \newcommand{\SBPP}{\mathsf{BPP}} \newcommand{\ScoNP}{\mathsf{coNP}} \newcommand{\bbone}{\mathbbm{1}} \newcommand{\ord}{\mathrm{ord}} \newcommand{\odr}{\vee} \newcommand{\und}{\wedge} \newcommand{\Odr}{\bigvee} \newcommand{\Und}{\bigwedge} \newcommand{\xor}{\oplus} \newcommand{\Xor}{\bigoplus} \newcommand{\bmat}{\begin{bmatrix} #1 \end{bmatrix}} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator*{\argmax}{argmax}$

This post is part 5 of my series on Interactive Classification

*TL;DR: We explain the computational complexity of interpreting neural network classifier.

1. Finding small precise features is a hard task even if done only approximately.
2. We can use heuristics as long as we can verify that the heuristic has succeeded a posteriori.
3. To audit a Merlin-Arthur classifier the auditor needs comparable computational resources as the designer of the classifier.

## Computational Complexity of finding features with High precision

On its face, it is not surprising that finding small features that have high precision is an NP-hard task, since it implies combinatorial search over sets of input variables.

We have shown this explicitely in (Wäldchen et al., 2021). That paper still uses the term $\delta$-relevant features, which is equivalent to a feature with a precision of $\delta$. We also show the stronger result that the smallest set of features with precision $\delta$ cannot be approximated better than $d^{1-\alpha}$ unless $\SP=\SNP$, where $d$ is the input dimension and $\alpha>0$. Note that for $\alpha=0$ we get the trivial approximation of simply taking that whole input as feature with perfect precision. This means that one cannot prove for any procedure that it systematically finds small precise features should they exist. This holds even for two-layer neural networks.

Instead of selecting the smalles set (cardinality-minimal), one can relax the question to selecting a set that cannot be made smaller by omitting any elements from it (inclusion-minimal). For monotone classificers, this makes the problem straight-forward to solve, as shown in (Shih et al., 2018), as one can simply successively omit input variables from a feature until any further omission would reduce the precision below $\delta$. The authors additionally show that this is efficiently possible for classifiers represented as Ordered Binary Decision Diagrams (OBDDs).

## The Result by Blanc et al.

While we have shown that there are networks and inputs for which finding small precise features is a hard task, a surprising result by (Blanc et al., 2021) shows for a random input it is feasible in polynomial time with high probability. The caveat here is the size of the found feature, which is polynomial in the size of the smallest precise feature. In fact, it grows so quickly that is unusuable in practice if there does not exist a precise feature that is orders of magnitude smaller than the whole input dimension. Nevertheless, this is an impressive result connecting interesting topics, such as stabiliser trees, implicit representation etc.

## How to overcome computational barrier?

The computational complexity can be ignored. Instead, we can use a heuristic method to select a feature and confirm high precision afterwards. The Merlin-Arthur framework is a heuristic as well in this regard, as we are not guaranteed to converge to a setup with high completeness and soundness, but we can easily check whether this has been achieved.

This is similar to the training process of neural networks. Designing a classifier with high accuracy is a computationally hard task. But SGD is a method that reliably succeeds in practice, and one can confirm success via the test set evaluation. Completness and soundness thus take the role of the test accuracy and confirm not only good performance bu also interpretability.

## Use of Arthur-Merlin Classifiers as Explainable classifiers

Let us come back to the main reason we introduce the Merlin-Arthur classifier for formal interpretability. We want a setup that is provably explainable, especially for the case when the designer of the classifier wants to hide the true reasoning of the classifier. This is important for commercial classifiers, e.g. for hiring decicions. An auditor would want to check if the reason a candidate was hired or rejected was not based on protected features like gender or race.

We have seen proved that if a sound and complete Merlin-Arthur classifiers has to exchange informative features. An auditor could confirm the soundness with their own Morgana, as to make sure that the setup is actually sound.

In our theorems we have seen that the precision bound depends on the relative success rate of Merlin and Morgana. This means that this scheme is successful as long as the company designing the classifier and the auditor have comparable computational resources. This again reflected in the AFC, since we have shown that determining the size of the AFC is as hard as exploiting it. This again reflects that the certification of the Merlin-Arthur classifier works as long as the auditor has comparable computational resources as the firm they are auditing. On the other hand, the auditor does not need to model the datamanifold that the classifier operates on. This task is potentially much harder and has to be done for every new classification task, compared with just designing a good search routine for Morgana.

◀ Previous Post

1. Wäldchen, S., Macdonald, J., Hauch, S., & Kutyniok, G. (2021). The computational complexity of understanding binary classifier decisions. Journal of Artificial Intelligence Research, 70, 351–387.
2. Shih, A., Choi, A., & Darwiche, A. (2018). A symbolic approach to explaining bayesian network classifiers. ArXiv Preprint ArXiv:1805.03364.
3. Blanc, G., Lange, J., & Tan, L.-Y. (2021). Provably efficient, succinct, and precise explanations. Advances in Neural Information Processing Systems, 34, 6129–6141.