AbductiveLearning
/
ABLkit

 
			
							Use ABL-Package Step by Step
============================

In a typical Abductive Learning process, as illustrated below, 
data inputs are first mapped to pseudo labels through a machine learning model. 
These pseudo labels then pass through a knowledge base :math:`\mathcal{KB}`
to obtain the logical result by deductive reasoning. During training, 
alongside the aforementioned forward flow (i.e., prediction --> deduction reasoning), 
there also exists a reverse flow, which starts from the logical result and 
involves abductive reasoning to generate pseudo labels. 
Subsequently, these labels are processed to minimize inconsistencies with machine learning, 
which in turn revise the outcomes of the machine learning model, and then 
fed back into the machine learning model for further training. 
To implement this process, the following four steps are necessary:

.. image:: img/ABL-Package.jpg

1. Prepare datasets

    Prepare the data's input, ground truth for pseudo labels (optional), and ground truth for logical results.

2. Build machine learning part

    Build a model that defines how to map input to pseudo labels. 
    And use ``ABLModel`` to encapsulate the model.

3. Build the reasoning part

    Build a knowledge base by creating a subclass of ``KBBase``,
    and instantiate a ``ReasonerBase`` for minimizing of inconsistencies 
    between the knowledge base and pseudo labels.

4. Bridge machine learning and reasoning so as to train and test

    Use ``SimpleBridge`` to bridge the machine learning and reasoning part
    for integrated training and testing. Before training or testing, we also have 
    to define the metrics for measuring accuracy by inheriting ``BaseMetric``.

Build the machine learning part
--------------------------------

First, we build the machine learning part, which needs to be wrapped in the ``ABLModel`` class. We can use machine learning models from scikit-learn or based on PyTorch to create an instance of ``ABLModel``. 

- for a scikit-learn model, we can directly use the model to create an instance of ``ABLModel``. For example, we can customize our machine learning model by

  .. code:: python

      # Load a scikit-learn model
      base_model = sklearn.neighbors.KNeighborsClassifier(n_neighbors=3)

      model = ABLModel(base_model)

- for a PyTorch-based neural network, we first need to encapsulate it within a ``BasicNN`` object and then use this object to instantiate an instance of ``ABLModel``.  For example, we can customize our machine learning model by

  .. code:: python

      # Load a PyTorch-based neural network
      cls = torchvision.models.resnet18(pretrained=True)

      # criterion and optimizer are used for training
      criterion = torch.nn.CrossEntropyLoss() 
      optimizer = torch.optim.Adam(cls.parameters())

      base_model = BasicNN(cls, criterion, optimizer)
      model = ABLModel(base_model)


In the MNIST Add example, the machine learning model looks like

.. code:: python

    cls = LeNet5(num_classes=10)
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(cls.parameters(), lr=0.001, betas=(0.9, 0.99))

    base_model = BasicNN(
        cls,
        criterion,
        optimizer,
        device=device,
        batch_size=32,
        num_epochs=1,
    )
    model = ABLModel(base_model)

Build the reasoning part
------------------------

Next, we build the reasoning part. In ABL-Package, the reasoning part is wrapped in the ``ReasonerBase`` class. In order to create an instance of this class, we first need to inherit the ``KBBase`` class to customize our knowledge base. Arguments of the ``__init__`` method of the knowledge base should at least contain ``pseudo_label_list`` which is a list of all pseudo labels. The ``logic_forward`` method of ``KBBase`` is an abstract method and we need to instantiate this method in our sub-class to give the ability of deduction to the knowledge base. In general, we can customize our knowledge base by

.. code:: python

    class MyKB(KBBase):
        def __init__(self, pseudo_label_list):
            super().__init__(pseudo_label_list)
        
        def logic_forward(self, *args, **kwargs):
            # Deduction implementation...
            return deduction_result

Aside from the knowledge base, the instantiation of the ``ReasonerBase`` also needs to set an extra argument called ``dist_func``, which is the consistency measure used to select the best candidate from all candidates. In general, we can instantiate our reasoner by

.. code:: python

    kb = MyKB(pseudo_label_list)
    reasoner = ReasonerBase(kb, dist_func="hamming")

In the MNIST Add example, the reasoner looks like

.. code:: python

    class AddKB(KBBase):
        def __init__(self, pseudo_label_list): 
            super().__init__(pseudo_label_list)

        # Implement the deduction function
        def logic_forward(self, nums):
            return sum(nums)

    kb = AddKB(pseudo_label_list=list(range(10)))    
    reasoner = ReasonerBase(kb, dist_func="confidence")

Build datasets and evaluation metrics
-------------------------------------

Next, we need to build datasets and evaluation metrics for training and validation. ABL-Package assumes data to be in the form of ``(X, gt_pseudo_label, Y)`` where ``X`` is the input of the machine learning model, ``Y`` is the ground truth of the reasoning result and ``gt_pseudo_label`` is the ground truth label of each element in ``X``. ``X`` should be of type ``List[List[Any]]``, ``Y`` should be of type ``List[Any]`` and ``gt_pseudo_label`` can be ``None`` or of the type ``List[List[Any]]``. 

In the MNIST Add example, the data loading looks like

.. code:: python

    # train_data and test_data are all tuples consist of X, gt_pseudo_label and Y.
    train_data = get_mnist_add(train=True, get_pseudo_label=True)
    test_data = get_mnist_add(train=False, get_pseudo_label=True)

To validate and test the model, we need to inherit from ``BaseMetric`` to define metrics and implement the ``process`` and ``compute_metrics`` methods where the process method accepts a batch of outputs. After processing this batch of data, we save the information to ``self.results`` property. The input results of ``compute_metrics`` is all the information saved in ``process``. Use these information to calculate and return a dict that holds the results of the evaluation metrics. 

We provide two basic metrics, namely ``SymbolMetric`` and ``SemanticsMetric``, which are used to evaluate the accuracy of the machine learning model's predictions and the accuracy of the ``logic_forward`` results, respectively.

In the case of MNIST Add example, the metric definition looks like

.. code:: python

    metric_list = [SymbolMetric(prefix="mnist_add"), SemanticsMetric(kb=kb, prefix="mnist_add")]

Bridge the machine learning and reasoning parts
-----------------------------------------------

We next need to bridge the machine learning and reasoning parts. In ABL-Package, the ``BaseBridge`` class gives necessary abstract interface definitions to bridge the two parts and ``SimpleBridge`` provides a basic implementation. 
We build a bridge with previously defined ``model``, ``reasoner``, and ``metric_list`` as follows:

.. code:: python

    bridge = SimpleBridge(model, reasoner, metric_list)

In the MNIST Add example, the bridge creation looks the same.

Use ``Bridge.train`` and ``Bridge.test`` to train and test
----------------------------------------------------------

``BaseBridge.train`` and ``BaseBridge.test`` trigger the training and testing processes, respectively.

The two methods take the previous prepared ``train_data`` and ``test_data`` as input.

.. code:: python

    bridge.train(train_data)
    bridge.test(test_data)

Aside from data, ``BaseBridge.train`` can also take some other training configs shown as follows:

.. code:: python

    bridge.train(
        # training data
        train_data,
        # number of Abductive Learning loops
        loops=5,
        # data will be divided into segments and each segment will be used to train the model iteratively
        segment_size=10000,
        # evaluate the model every eval_interval loops
        eval_interval=1,
        # save the model every save_interval loops
        save_interval=1,
        # directory to save the model
        save_dir='./save_dir',
    )

In the MNIST Add example, the code to train and test looks like

.. code:: python

    bridge.train(train_data, loops=5, segment_size=10000, save_interval=1, save_dir=weights_dir)
    bridge.test(test_data)