.. _postprocessors-overview: Postprocessors Overview ======================= .. automodule:: perun.postprocess .. _postprocessors-list: Supported Postprocessors ------------------------ Perun's tool suite currently contains the following five postprocessors: 1. :ref:`postprocessors-regression-analysis` (authored by **Jirka Pavela**) attempts to do a regression analysis by finding the fitting model for dependent variable based on other independent one. Currently the postprocessor focuses on finding a well suited model (linear, quadratic, logarithmic, etc.) for the amount of time duration depending on size of the data structure the function operates on. 2. :ref:`postprocessors-regressogram` (authored by **Simon Stupinsky**) also known as the binning approach, is the simplest non-parametric estimator. This method trying to fit models through data by dividing the interval into N equal-width bucket and the resultant value in each bucket is equal to result of selected statistical aggregation function (mean/median) within the values in the relevant bucket. In short, we can describe the regressogram as a step function (i.e. constant function by parts). 3. :ref:`postprocessors-moving-average` (authored by **Simon Stupinsky**) also know as the rolling average or running average, is the statistical analysis belongs to non-parametric approaches. This method is based on the analysis of the given data points by creating a series of values based on the specific aggregation function, most often average or possibly median. The resulting values are derived from the different subsets of the full data set. We currently support the two main methods of this approach and that the **Simple** Moving Average and the **Exponential** Moving Average. In the first method is an available selection from two aggregation function: **mean** or **median**. 4. :ref:`postprocessors-kernel-regression` (authored by **Simon Stupinsky**) is a non-parametric approach to estimate the conditional expectation of a random variable. Generally, the main goal of this approach is to find non-parametric relation between a pair of random variables X and Y . Different from parametric techniques (e.g. linear regression), kernel regression does not assume any underlying distribution (e.g. linear, exponential, etc.) to estimate the regression function. The main idea of kernel regression is putting the **kernel**, that have the role of weighted function, to each observation point in the dataset. Subsequently, the kernel will assign weight to each point in depends on the distance from the current data point. The kernel basis formula depends only on the *bandwidth* from the current ('local') data point X to a set of neighboring data points X. All of the listed postprocessors can be run from command line. For more information about command line interface for individual postprocessors refer to :ref:`cli-postprocess-units-ref`. Postprocessors modules are implementation independent and only requires a simple python interface registered within Perun. For brief tutorial how to create and register your own postprocessors refer to :ref:`postprocessors-custom`. .. _postprocessors-regression-analysis: Regression Analysis ~~~~~~~~~~~~~~~~~~~ .. automodule:: perun.postprocess.regression_analysis .. _postprocessors-regression-analysis-cli: Command Line Interface """""""""""""""""""""" .. click:: perun.postprocess.regression_analysis.run:regression_analysis :prog: perun postprocessby regression_analysis .. _postprocessors-regression-analysis-examples: Examples """""""" .. literalinclude:: /../examples/complexity-sll-short-with-ra.perf :language: json :linenos: :emphasize-lines: 44-153 The profile above shows the complexity profile taken from :ref:`collectors-trace-examples` and postprocessed using the full method. The highlighted part shows all of the fully computed models of form :math:`y = b_0 + b_1*f(x)`, represented by their types (e.g. `linear`, `quadratic`, etc.), concrete found coeficients :math:`b_0` and :math:`b_1` and e.g. coeficient of determination :math:`R^2` for measuring the fitting of the model. .. image:: /../examples/complexity-scatter-with-models-full.* The :ref:`views-scatter` above shows the interpreted models of different complexity example, computed using the **full computation** method. In the picture, one can see that the depedency of running time based on the structural size is best fitted by `linear` models. .. image:: /../examples/complexity-scatter-with-models-initial-guess.* The next `scatter plot` displays the same data as previous, but regressed using the `initial guess` strategy. This strategy first does a computation of all models on small sample of data points. Such computation yields initial estimate of fitness of models (the initial sample is selected by random). The best fitted model is then chosen and fully computed on the rest of the data points. The picture shows only one model, namely `linear` which was fully computed to best fit the given data points. The rest of the models had worse estimation and hence was not computed at all. .. _postprocessors-regressogram: Regressogram method ~~~~~~~~~~~~~~~~~~~~~~~~ .. automodule:: perun.postprocess.regressogram .. _postprocessors-regressogram-cli: Command Line Interface """""""""""""""""""""" .. click:: perun.postprocess.regressogram.run:regressogram :prog: perun postprocessby regressogram .. _postprocessors-regressogram-examples: Examples """""""" .. code-block:: json { "bucket_stats": [ 13.0, 25.5 ], "uid": "linear::test2", "bucket_method": "doane", "method": "regressogram", "r_square": 0.7575757575757576, "x_end": 9.0, "statistic_function": "mean", "x_start": 0.0 } .. _Doanes: https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram_bin_edges.html#numpy.histogram_bucket_edges The example above shows an example of profile post-processed by regressogram method (note that this is only an excerpt of the whole profile). Each such model of shows the computed values in the individual buckets, that are represented by *bucket_stats*. The next value in this example is *statistic_function*, which represented the statistic to compute the value in each bucket. Further contains the name of the method (*bucket_method*) by which was calculated the optimal number of buckets, in this case specifically computed with Doanes_ formula, and *coefficient of determination* (:math:`R^2`) for measuring the fitting of the model. Each such model can be used in the further interpretation of the models (either by :ref:`views-scatter` or :ref:`degradation-method-aat`). .. image:: /../examples/exp_data_regressogram.* The :ref:`views-scatter` above shows the interpreted model, computed using the **regressogram** method. In the picture, one can see that the dependency of running time based on the structural size is best fitted by `exponential` models. .. _postprocessors-moving-average: Moving Average Methods ~~~~~~~~~~~~~~~~~~~~~~~~ .. automodule:: perun.postprocess.moving_average .. _postprocessors-moving-average-cli: Command Line Interface """""""""""""""""""""" .. click:: perun.postprocess.moving_average.run:moving_average :prog: perun postprocessby moving_average .. click:: perun.postprocess.moving_average.run:simple_moving_average :prog: perun postprocessby moving_average sma .. click:: perun.postprocess.moving_average.run:simple_moving_median :prog: perun postprocessby moving_average smm .. click:: perun.postprocess.moving_average.run:exponential_moving_average :prog: perun postprocessby moving_average ema .. _postprocessors-moving-average-examples: Examples """""""" .. code-block:: json { "bucket_stats": [ 0.0, 3.0, 24.0, 81.0, 192.0, 375.0 ], "per_key": "structure-unit-size", "uid": "pow::test3", "x_end": 5, "r_square": 1.0, "method": "moving_average", "moving_method": "sma", "x_start": 0, "window_width": 1 } The example above shows an example of profile post-processed by moving average postprocessor (note that this in only an excerpt of the whole profile). Each such model of moving average model shows the computed values, that are represented by *bucket_stats*. The important role has value *moving_method*, that represents the method, which was used to create this model. In this field may be one from the following shortcuts *SMA*, *SMM*, *EMA*, which represents above described methods. The value *r_square* serves to assess the suitability of the model and represents the *coefficient of determination* (:math:`R^2`). Another significant value in the context of the information about the moving average models is the *window_width*. This value represents the width of the window, that was used at creating this model. Since each model can be used in the further interpretation (either by :ref:`views-scatter` or :ref:`degradation-method-aat`), another values have auxiliary character and serves for a different purposes at its interpretation. Additional values that contain the information about postprocess parameters can be found in the whole profile, specifically in the part about used post-processors. .. image:: /../examples/exp_data_ema.* The :ref:`views-scatter` above shows the interpreted model, computed using the **exponential moving average** method, running with default values of parameters. In the picture, one can see that the dependency of running time based on the structural size is best fitted by `exponential` models. .. _postprocessors-kernel-regression: Kernel Regression Methods ~~~~~~~~~~~~~~~~~~~~~~~~~ .. automodule:: perun.postprocess.kernel_regression .. _postprocessors-kernel_regression-cli: Command Line Interface """""""""""""""""""""" .. click:: perun.postprocess.kernel_regression.run:kernel_regression :prog: perun postprocessby kernel-regression .. _postprocessors-kernel-regression-estimator_settings: .. click:: perun.postprocess.kernel_regression.run:estimator_settings :prog: perun postprocessby kernel-regression estimator-settings .. _postprocessors-kernel-regression-user_selection: .. click:: perun.postprocess.kernel_regression.run:user_selection :prog: perun postprocessby kernel-regression user-selection .. _postprocessors-kernel-regression-method_selection: .. click:: perun.postprocess.kernel_regression.run:method_selection :prog: perun postprocessby kernel-regression method-selection .. _postprocessors-kernel-regression-kernel_smoothing: .. click:: perun.postprocess.kernel_regression.run:kernel_smoothing :prog: perun postprocessby kernel-regression kernel-smoothing .. _postprocessors-kernel-regression-kernel_ridge: .. click:: perun.postprocess.kernel_regression.run:kernel_ridge :prog: perun postprocessby kernel-regression kernel-ridge .. _postprocessors-kernel-regression-examples: Examples """""""" .. code-block:: json { "per_key": "structure-unit-size", "uid": "quad::test1", "kernel_mode": "estimator", "r_square": 0.9990518378010778, "method": "kernel_regression", "x_start": 10, "bandwidth": 2.672754640321602, "x_end": 64, "kernel_stats": [ 115.6085941489687, 155.95838478107163, 190.27598428091824, 219.36576520977312, 252.80699243117965, 268.4600214673941, 283.3744716372719, 282.7535719770607, 276.27153279181573, 269.69580474542016, 244.451017529157, 226.98819185034756, 180.72465187812492 ] } The example above shows an example of profile post-processed by *kernel regression* (note that this is only an excerpt of the whole profile). Each such kernel model shows the values of resulting kernel estimate, that are part of *kernel_stats* list. Another fascinating value is stored in *kernel_mode* field and means the relevant mode, which executing the *kernel regression* over this model. In this field may be one from the following words, which represents the individual modes of kernel regression postprocessor. The value *r_square* serves to assess the suitability of the kernel model and represents the *coefficient of determination* (:math:`R^2`). In the context of another kernel estimates for decreasing or increasing the resulting accuracy is important the field *bandwidth*, which represents the kernel bandwidth in the current kernel model. Since each model can be used in the further interpretation (either by :ref:`views-scatter` or :ref:`degradation-method-aat`), another values have auxiliary character and serves for a different purposes at its interpretation. Additional values that contain the information about selected parameters at kernel regression postprocessor and its modes, can be found in the whole profile, specifically in the part about used post-processors. .. image:: /../examples/example_kernel.* The :ref:`views-scatter` above shows the interpreted model, computed using the *kernel regression* postprocessor, concretely with default value of parameters in **estimator-settings** mode of this postprocessor. In the picture, can be see that the dependency of running time based on the structural size. .. _postprocessors-custom: Creating your own Postprocessor ------------------------------- New postprocessors can be registered within Perun in several steps. Internally they can be implemented in any programming language and in order to work with perun requires one to three phases to be specified as given in :ref:`postprocessors-overview`---``before()``, ``postprocess()`` and ``after()``. Each new postprocessor requires a interface module ``run.py``, which contains the three function and, moreover, a CLI function for Click_ framework. You can register your new postprocessor as follows: 1. Run ``perun utils create postprocess mypostprocessor`` to generate a new modules in ``perun/postprocess`` directory with the following structure. The command takes a predefined templates for new postprocessors and creates ``__init__.py`` and ``run.py`` according to the supplied command line arguments (see :ref:`cli-utils-ref` for more information about interface of ``perun utils create`` command):: /perun |-- /postprocess |-- /mypostprocessor |-- __init__.py |-- run.py |-- /kernel_regression |-- /moving_average |-- /regression_analysis |-- /regressogram |-- __init__.py 2. First, implement the ``__init__py`` file, including the module docstring with brief postprocessor description and definitions of constants that are used for internal checks which has the following structure: .. literalinclude:: /_static/templates/postprocess_init.py :language: python :linenos: 3. Next, implement the ``run.py`` module with ``postprocess()`` fucntion, (and optionally with ``before()`` and ``after()`` functions). The ``postprocess()`` function should do the actual postprocessing of the profile. Each function should return the integer status of the phase, the status message (used in case of error) and dictionary including params passed to additional phases and 'profile' with dictionary w.r.t. :ref:`profile-spec`. .. literalinclude:: /_static/templates/postprocess_run.py :language: python :linenos: 4. Additionally, implement the command line interface function in ``run.py``, named the same as your collector. This function will be called from the command line as ``perun postprocessby mypostprocessor`` and is based on Click_libary. .. literalinclude:: _static/templates/postprocess_run_api.py :language: python :linenos: :diff: _static/templates/postprocess_run.py 5. Finally register your newly created module in ``get_supported_module_names`` located in ``perun.utils.common.cli_kit.py``: .. literalinclude:: _static/templates/supported_module_names_postprocess.py :language: python :linenos: :diff: _static/templates/supported_module_names.py 6. Preferably, verify that registering did not break anything in the Perun and if you are not using the developer installation, then reinstall Perun:: make test make install 7. At this point you can start using your postprocessor either using ``perun postprocessby`` or using the following to set the job matrix and run the batch collection of profiles:: perun config --edit perun run matrix 8. If you think your postprocessor could help others, please, consider making `Pull Request`_. .. _Pull Request: https://github.com/Perfexionists/perun/pull/new/develop .. _Click: https://click.palletsprojects.com/en/latest/