accasim.utils.plot_factory module

MIT License

Copyright (c) 2017 cgalleguillosm, AlessioNetti

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

class accasim.utils.plot_factory.PlotFactory(plot_class, sim_params_fname=None, config=None, resource=None, workload_parser=None, debug=False)[source]

Bases: object

A class for plot production and schedule files pre-processing.

In this class, some basic algorithms are implemented for pre-processing the schedule files produced through simulation, and for producing some common evaluation plots.

BENCHMARK_CLASS = 'benchmark'
EFFICIENCY_PLOT = 'efficiency'
LOAD_RATIO_PLOT = 'load_ratio'
PLOT_TYPES = {'benchmark': ['scalability', 'sim_time', 'sim_memory'], 'schedule': ['slowdown', 'queue_size', 'load_ratio', 'efficiency']}
QUEUE_SIZE_PLOT = 'queue_size'
SCALABILITY_PLOT = 'scalability'
SCHEDULE_CLASS = 'schedule'
SIMULAION_MEMORY_PLOT = 'sim_memory'
SIMULATION_TIME_PLOT = 'sim_time'
SLOWDOWN_PLOT = 'slowdown'
box_plot(data, title='', ylabel='', scale='linear', figsize=(7, 5), meansonly=False, output='Output.pdf', groups=1, **kwargs)[source]

Produces a box-and-whiskers plot for the input data’s distributions.

Parameters:
  • data – the input data; must be a list, in which each element is again a list containing all of the data regarding a certain test instance; the ordering must be that of the labels;
  • title – the title of the plot;
  • ylabel – the Y-axis label;
  • scale – the scale of the plot;
  • figsize – the size of the figure, is a tuple;
  • meansonly – if True only the mean values for each distribution are depicted;
  • output – the path to the output file;
  • **kwargs
    • fig_format: {
      ‘format’: eps or pdf, ‘dpi’: Int number

    } - xlim: the left-right axis boundaries, is a tuple; - ylim: the bottom-top axis boundaries, is a tuple;

box_plot_memory(data, title='', scale='linear', xlim=(None, None), ylim=(None, None), figsize=(7, 5), legend=True, output='Output.pdf')[source]

Produces a bar plot for the memory usage in the simulations, across test instances.

The bars depict average and maximum memory usage in the simulation.

Parameters:
  • data – the data for memory usage in each simulation step. Is a list, where each element is again a list containing the data for a certain test instance;
  • title – the title of the plot;
  • scale – the scale of the plot;
  • xlim – the left-right boundaries for the plot, is a tuple;
  • ylim – the bottom-top boundaries for the plot, is a tuple;
  • figsize – the size of the figure, is a tuple;
  • legend – enables or disables visualization of the legend;
  • output – the path to the output file;
box_plot_times(dataman, datasched, title='', scale='linear', xlim=(None, None), ylim=(None, None), figsize=(7, 5), legend=True, output='Output.pdf')[source]

Produces a bar plot for the timings in the simulations, across test instances.

The bars will depict the average time required to perform dispatching in each simulation step, and the time required to perform simulation-related tasks in the simulation.

Parameters:
  • dataman – the data for the time required in each step to perform simulation-related tasks. Is a list, where each element is again a list containing the data for a certain test instance;
  • datasched – the data for the time required in each step to perform dispatching. Is a list, where each element is again a list containing the data for a certain test instance;
  • title – the title of the plot;
  • scale – the scale of the plot;
  • xlim – the left-right boundaries for the plot, is a tuple;
  • ylim – the bottom-top boundaries for the plot, is a tuple;
  • figsize – the size of the figure, is a tuple;
  • legend – enables or disables visualization of the legend;
  • output – the path to the output file;
distribution_scatter_plot(xdata, ydata, title='', scale='linear', xlim=(0, 1.05), ylim=(0, 1.05), figsize=(7, 5), alpha=0.005, output='Output.pdf')[source]

Creates a distribution scatter plot for the system’s resource efficiency.

The X values represent the amount of used nodes in a certain time step, while the Y values represent the fraction of used resources in such nodes. Darker areas of the plot represent values with higher frequency. The method creates one plot per test instance, automatically.

Parameters:
  • xdata
  • ydata
  • alpha – the alpha to be used for each dot in the plot;
  • title – the title of the plot;
  • scale – the scale of the plot;
  • xlim – the left-right boundaries for the plot, is a tuple;
  • ylim – the bottom-top boundaries for the plot, is a tuple;
  • figsize – the size of the figure, is a tuple;
  • output – the path to the output files: the label for each test instance will be automatically added for each file;
get_preprocessed_benchmark_data()[source]

Returns all of the pre-processed benchmark-related data.

A tuple is returned; each element of the tuple is related to a specific kind of metric that was processed. Also, each element of the tuple is a list, with as many entries as the files that were processed, in the same order. Each element of these lists contains then the data related to a specific metric, for a specific test instance. All data is stored in standard Python lists.

Returns:a tuple in which every element is a list containing, in each element, a specific kind of data regarding one of the test instances. The tuple contains, in this order:
  • the resource usage statistics’ dictionaries;
  • the lists of dispatching times for each time step;
  • the lists of management times for each time step;
  • the lists of memory usage values for each time step;
  • the X scalability data containing the queue size for each test instance;
  • the Y scalability data containing the average dispatching times for each test instance;
get_preprocessed_schedule_data()[source]

Returns all of the pre-processed schedule-related data.

A tuple is returned; each element of the tuple is related to a specific kind of metric that was processed. Also, each element of the tuple is a list, with as many entries as the files that were processed, in the same order. Each element of these lists contains then the data related to a specific metric, for a specific test instance. All data is stored in standard Python lists.

Returns:a tuple in which every element is a list containing, in each element, the data regarding one of the test instances. The tuple contains, in this order:
  • the slowdown values for jobs;
  • the queue sizes for all time steps;
  • the resource allocation efficiencies for all jobs;
  • the X data regarding the load ratios (fraction of used nodes) for all time steps;
  • the Y data regarding the load ratios (fraction of used resources) for all time steps;
pre_process(trimSlowdown=True, trimQueueSize=False)[source]

Performs pre-processing on all specified files, according to their type.

If the files are of the schedule type, a meta-simulation is run for each of them, computing data like slowdown, queue size, load ratios and such. If the data is of the benchmark type, the files are simply parsed and their information stored.

Param:trimSlowdown: boolean flag. If True, slowdown values equal to 1 will be discarded. Default is True
Param:trimQueueSize: boolean flag. If True, queue size values equal to 0 will be discarded. Default is False
produce_plot(type, title='', scale='linear', xlim=(None, None), ylim=(None, None), legend=True, figsize=(7, 5), meansonly=False, alpha=0.005, smooth=30, output='Output.pdf', groups=1, **kwargs)[source]

Produces a single plot on the pre-processed files.

The user can produce plots among the available types. These are:
  • slowdown: a box-plot distribution plot for slowdown values across test instances
  • queue_size: a box-plot for queue size in the simulation across test instances
  • load_ratio: a distribution scatter plot for the load ratio in function of the number of used nodes, for
    test instances separately;
  • efficiency: a box-plot for resource allocation efficiency across test instances
  • scalability: a scalability plot for dispatching methods across test instances
  • sim_time: a bar plot for the simulation timings across test instances
  • sim_memory: a bar plot for memory usage across test instances
Parameters:
  • type – the type of the plot, must be one of the above;
  • title – the title of the plot;
  • scale – the scale of the plot (see matplotlib documentation);
  • xlim – the left-right bounds for axis scaling, is a tuple;
  • ylim – the bottom-top bounds for axis scaling, is a tuple;
  • legend – activates the legend, is a boolean;
  • figsize – the size of the figure, is a tuple;
  • meansonly – triggers the plot of mean values alone in box-plots, is a boolean;
  • alpha – the alpha of certain features in plots, in particular for distribution scatter plots;
  • smooth – smoothing factor used for the Savitzky-Golay filter in the scalabily plot. The lower the number, the higher the smoothing;
  • output – path of the output PDF file;
scalability_plot(xdata, ydata, title='', scale='linear', xlim=(None, None), ylim=(None, None), figsize=(7, 5), legend=True, smooth=30, linestyles=None, markers=None, output='Output.pdf')[source]

Creates a scalability plot for all test instances, where X represents the queue size, and Y the average time required by each dispatching method in the instances.

Parameters:
  • xdata – the X data, containing the queue sizes for each test instance; is a list, where each element contains a list with the data for each test instance;
  • ydata – the Y data, containing the average times required to perform dispatching in each test instance; is a list, where each element contains a list with the data for each test instance;
  • title – the title of the plot;
  • scale – the scale of the plot;
  • xlim – the left-right boundaries for the plot, is a tuple;
  • ylim – the bottom-top boundaries for the plot, is a tuple;
  • figsize – the size of the figure, is a tuple;
  • legend – enables or disables visualization of the legend;
  • smooth – smoothing factor for the Savitzky-Golay filter. The lower the number, the higher the smoothing;
  • output – the path of the output file;
set_files(paths, labels)[source]

Set the paths and labels of the files to be analyzed.

Parameters:
  • paths – A list of filepaths related to the files to be analyzed;
  • labels – the labels associated to each single file, used in the plots; must have the same length as paths;