.. _degradation-overview:

Detecting Performance Changes
=============================

For every new minor version of project (or every project release), developers should usually
generate new batch of performance profiles with the same concrete configuration of resource
collection (i.e. the set of collectors and postprocessors run on the same commands).These profiles
are then assigned to the minor version to preserve the history of the project performance. However,
every change of the project, and every new minor version, can cause a performance degradation of
the project. And manual evaluation whether the degradation has happened is hard.

Perun allows one to automatically check the performance degradation between various minor versions
within the history and protect the project against potential degradation introduced by new minor
versions. One can employ multiple strategies for different configurations of profiles, each
suitable for concrete types of degradation or performance bugs. Potential changes of performance
are then reported for pairs of profiles, together with more precise information, such as the
location, the rate or the confidence of the detected change. These information then help developer
to evaluate whether the detected changes are real or spurious. The spurious warnings can naturally
happen, since the collection of data is based on dynamic analysis and real runs of the program; and
both of them can be influenced heavily by environment or other various aspects, such as higher
processor utilization.

The detection of performance change is always checked between two profiles with the same
configuration (i.e collected by same collectors, postprocessed using same postprocessors, and
collected for the same combination of command, arguments and workload). These profiles correspond
to some minor version (so called target) and its parents (so called baseline). But baseline
profiles do not have to be necessarily the direct predecessor (i.e. the old head) of the target
minor version, and can be found deeper in the version hierarchy (e.g. the root of the project or
minor version from two days ago, etc.). During the check of degradation of one profile
corresponding to the target, we find the nearest baseline profile in the history. Then for one pair
of target and baseline profiles we can use multiple methods and these methods can then report
multiple performance changes (such as optimizations and degradations).

.. image:: /../figs/diff-analysis.*
    :align: center
    :width: 100%

.. _degradation-output:

Results of Detection
--------------------

Between the pair of target and baseline profile one can use multiple methods, each suitable for
specific type of change. Each such method can then yield multiple reports about detected
performance changes (however, some of these can be spurious). Each degradation report can contain
the following details:

  1. **Type of the change**---the overall general classification of the performance change, which
     can be one of the following six values representing both certain and uncertain answers:

     ``No Change``:

       Represents that the performance of the given uniquely identified resource group was not
       changed in any way and it stayed the same (within some bound of error). By default these
       changes are not reported in the standard output, but can be made visible by increasing the
       verbosity of the command line interface (see :doc:`cli` how to increase the verbosity of the
       output).

     ``Total Degradation`` or ``Total Optimization``:

       Represents an overall program degradation or optimization. The overall degradation or
       optimization report may actually be further divided into per-binary or per-file reports
       (e.g., a standalone report for ``mybin`` and its library ``mylib`` as done by
       :ref:`degradation-method-eto`).

     ``Not in Baseline`` or ``Not in Target``:

       Represents a performance change caused by new or deleted resources, e.g., functions that
       are newly introduced (resp newly missing) in the new project version. Reporting these
       changes is useful since even a simple function refactoring may introduce serious performance
       slowdown or speedup.

     ``Severe Degradation`` or ``Severe Optimization``:

       Represents that the performance of resource group has severely degraded (resp optimized),
       i.e., got severely worse (resp better) with a high confidence. Each report also usually
       shows the confidence of this report, e.g. by the value of coefficient of determination (see
       :ref:`postprocessors-regression-analysis`), which quantifies how the prediction or
       regression models of both versions were fitting the data.

     ``Degradation`` or ``Optimization``:

       Represents that the performance of resource group has degraded (resp optimized), i.e. got
       worse (resp got better) with a fairly high confidence. Each report also usually shows the
       confidence of this report, e.g. by the value of coefficient of determination (see
       :ref:`postprocessors-regression-analysis`), which quantifies how the prediction or
       regression models of both versions were fitting the data.


     ``Maybe Degradation`` or ``Maybe Optimization``:

       Represents detected performance change which is either unverified or with a low confidence
       (so the change can be either false positive or false negative). This classification of
       changes allows methods to provide more broader evaluation of performance change.

     ``Unknown``:

      Represents that the given method could not determine anything at all.

  2. **Subtype of the change**---the description of the type of the change in more details, such as
     that the change was in `complexity order` (e.g. the performance model degraded from linear
     model to power model) or `ratio` (e.g. the average speed degraded two times)

  3. **Confidence**---an indication how likely the degradation is real and not spurious or caused
     by badly collected data. The actual form of confidence is dependent on the underlying
     detection method. E.g. for methods based on :ref:`postprocessors-regression-analysis` this
     can correspond to the coefficient of determination which shows the fitness of the function
     models to the actually measured values.

  4. **Location**---the unique identification of the group of resources, such as the name of the
     function, the precise chunk of the code or line in code.

If the underlying method does not detect any change between two profiles, by default nothing is
reported at all. However, this behaviour can be changed by increasing the verbosity of the output
(see :doc:`cli` how to increase the verbosity of the output)

.. _degradation-methods:

Detection Methods
-----------------

Currently we support three simple strategies for detection of the performance changes:

  1. :ref:`degradation-method-bmoe` which is based on results of
     :ref:`postprocessors-regression-analysis` and only checks for each uniquely identified group
     of resources, whether the best performance (or prediction) model has changed (considering
     lexicographic ordering of model types), e.g. that the best model changed from `linear` to
     `quadratic`.

  2. :ref:`degradation-method-aat` which computes averages as a representation of the performance
     for each uniquely identified group of resources. Each average of the target is then compared
     with the average of the baseline and if the their ration exceeds a certain threshold interval,
     the method reports the change.

  3. :ref:`degradation-method-eto` which identifies outliers within the function exclusive time
     deltas. The outliers are identified using three different statistical techniques, resulting in
     three different change severity categories based on which technique discovered the outlier.

Refer to :ref:`degradation-custom` to create your own detection method.

.. _degradation-method-bmoe:

Best Model Order Equality
~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: perun.check.methods.best_model_order_equality

.. _degradation-method-aat:

Average Amount Threshold
~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: perun.check.methods.average_amount_threshold

.. _degradation-method-eto:

Exclusive Time Outliers
~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: perun.check.methods.exclusive_time_outliers

.. _degradation-fast-check:

Fast Check
~~~~~~~~~~

.. automodule:: perun.check.methods.fast_check

.. _degradation-lreg:

Linear Regression
~~~~~~~~~~~~~~~~~

.. automodule:: perun.check.methods.linear_regression

.. _degradation-preg:

Polynomial Regression
~~~~~~~~~~~~~~~~~~~~~

.. automodule:: perun.check.methods.polynomial_regression

.. _degradation-config:

Configuring Degradation Detection
---------------------------------

We apply concrete methods of performance change detection to concrete pairs of profiles according
to the specified `rules` based on profile collection configuration. By `configuration` we mean the
tuple of `(command, arguments, workload, collector, postprocessors)` which represent how the data
were collected for the given minor version. This way for each new version of project, it is
meaningful to collect new data using the same config and then compare the results. The actual rules
are specified in configuration files by :ckey:`degradation.strategies`. The strategies are
specified as an ordered list, and all of the applicable rules are collected through all of the
configurations (starting from the runtime configuration, through local ones, up to the global
configuration). This yields a `list of rules` (each rule represented as key-value dictionary)
ordered by the priority of their application. So for each pair of tested profiles, we iterate
through this ordered list and find either the first that is applicable according to the set rules
(by setting the :ckey:`degradation.apply` key to value ``first``) or all applicable rules (by
setting the :ckey:`degradation.apply` key to value ``all``).

The example of configuration snippet that sets rules and strategies for one project can be as
follows:

  .. code-block:: yaml

      degradation:
        apply: first
        strategies:
          - type: mixed
            postprocessor: regression_analysis
            method: bmoe
          - cmd: mybin
            type: memory
            method: bmoe
          - method: aat

The following list of strategies will first try to apply the :ref:`degradation-method-bmoe` method
to either mixed profiles postprocessed by :ref:`postprocessors-regression-analysis` or to memory
profiles collected from command ``mybin``. All of the other profiles will be checked using
:ref:`degradation-method-aat`. Note that applied methods can either be specified by their full name
or using the short strings by taking the first letters of each word of the name of the method, so
e.g. `BMOE` stands for :ref:`degradation-method-bmoe`.

.. _degradation-custom:

Create Your Own Degradation Checker
-----------------------------------

New performance change checkers can be registered within Perun in several steps. The checkers have
just small requirements and have to `yield` the reports about degradation as a instances of
``DegradationInfo`` objects specified as follows:

.. currentmodule: perun.utils.structs
.. autoclass::  perun.utils.structs.common_structs.DegradationInfo
   :members:

.. autoclass::  perun.utils.structs.common_structs.PerformanceChange
   :members:

You can register your new performance change checker as follows:

    1. Run ``perun utils create check my_degradation_checker`` to generate a new modules in
       ``perun.check.methods`` directory with the following structure. The command takes a predefined
       templates for new degradation checkers and creates ``my_degradation_checker.py`` according
       to the supplied command line arguments (see :ref:`cli-utils-ref` for more information about
       interface of ``perun utils create`` command)::

        /perun
        |-- /check
            |-- __init__.py
            |-- average_amount_threshold.py
            |-- my_degradation_checker.py

    2. Implement the ``my_degradation_checker.py`` file, including the module docstring with brief
       description of the change check with the following structure:

      .. literalinclude:: /_static/templates/degradation_api.py
          :language: python
          :linenos:

    3. Next, in the ``__init__.py`` module register the short string for your new method as
       follows:

      .. literalinclude:: /_static/templates/degradation_init_new_check.py
          :language: python
          :linenos:
          :diff: /_static/templates/degradation_init.py

    4. Preferably, verify that registering did not break anything in the Perun and if you are not
       using developer instalation, then reinstall Perun::

        make test
        make install

    5. At this point you can start using your check using ``perun check head``, ``perun check
       all`` or ``perun check profiles``.

    6. If you think your collector could help others, please, consider making `Pull Request`_.

.. _Pull Request: https://github.com/Perfexionists/perun/pull/new/develop

.. _degradation-cli:

Degradation CLI
---------------

:doc:`cli` contains group of two commands for running the checks in the current project---``perun
check head`` (for running the check for one minor version of the project; e.g. the current `head`)
and ``perun check all`` for iterative application of the degradation check for all minor versions
of the project. The first command is mostly meant to run as a hook after each new commit (obviously
after successfull run o f``perun run matrix`` generating the new batch of profiles), while the
latter is meant to be used for new projects, after crawling through the whole history of the
project and collecting the profiles. Additionally ``perun check profiles`` can be used for an
isolate comparison of two standalone profiles (either registered in index or as a standalone file).

.. click:: perun.cli_groups.check_cli:check_head
   :prog: perun check head

.. click:: perun.cli_groups.check_cli:check_all
   :prog: perun check all

.. click:: perun.cli_groups.check_cli:check_profiles
   :prog: perun check profiles