Expanding Merlin-Arthur Classifiers

The idea is to expand Merlin-Arthur Classification setup in the following ways:

Larger Datasets
- Streetview Housenumbers, where we can compare with bounding boxes
- Medicine Data, where Explainability Methods had been challenged in the past *
Text-based Explanations
- Have agents explain their rationale via text
- Train Arthur as LLM to play a simple game (e.g. Connect Four) in some text based fashion
- Merlin and Morgana are implemented as LLMs reasoners explaining their reasoning for the next move and Arhtur has to decide with which move he goes.