You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

t_ensemble.rst 2.4 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
  1. .. _ensemble:
  2. Ensemble
  3. ========
  4. We currently support voting and stacking methods.
  5. Voting
  6. ------
  7. A voter essentially constructs a weighted sum of the predictions of base learners. Given an evaluation metric, the weights of base learners are specified in some way to maximize the validation score.
  8. We adopt Rich Caruana's method for weight specification. This method first finds a collection of (possibly redundant) base learners with equal weights via a greedy search, then specifies the weights in the voter by the number of occurrence in the collection.
  9. You can customize your own weight specification method by overwriting the ``_specify_weights`` method.
  10. .. code-block :: python
  11. # An example : use equal weights for all base learners.
  12. class EqualWeightVoting(Voting):
  13. def _specify_weights(self, predictions, label, feval):
  14. return np.ones(self.n_models)/self.n_models
  15. # just allocate the same weight for each base learner
  16. Stacking
  17. --------
  18. A stacker trains a meta-model with the predictions of base learners as input to find an optimal combination of these base learners.
  19. Currently we support generalized linear model (GLM) and gradient boosting model (GBM) as the meta-model.
  20. Create a New Ensembler
  21. ----------------------
  22. You can create your own ensembler by inheriting the base ensembler, and overloading methods ``fit`` and ``ensemble``.
  23. .. code-block :: python
  24. # An example : use the currently available best model.
  25. from autogl.module.ensemble.base import BaseEnsembler
  26. import numpy as np
  27. class BestModel(BaseEnsembler):
  28. def fit(self, predictions, label, identifiers, feval):
  29. if not isinstance(feval, list):
  30. feval = [feval]
  31. scores = np.array([feval[0].evaluate(pred, label) for pred in predictions]) * (1 if feval[0].is_higher_better else -1)
  32. self.scores = dict(zip(identifiers, scores)) # record validation score of base learners
  33. ensemble_pred = predictions[np.argmax(scores)]
  34. return [fx.evaluate(ensemble_pred, label) for fx in feval]
  35. def ensemble(self, predictions, identifiers):
  36. best_idx = np.argmax([self.scores[model_name] for model_name in identifiers]) # choose the currently best model in the identifiers
  37. return predictions[best_idx]