Class HistogramDataAggregator<C,O>

java.lang.Object
com.amalgamasimulation.utils.HistogramDataAggregator<C,O>
Type Parameters:
C - type of category. In example with peoples' ages, this could be something like gender enumeration. If no categories is needed, Object type can be specified as type parameter.
O - type of objects associated with data points. In example with peoples' ages, this could be Person class. If no outliers functionality is needed, Object type can be specified as type parameter.
Direct Known Subclasses:
DynamicHistogramDataAggregator, FixedHistogramDataAggregator

public abstract class HistogramDataAggregator<C,O> extends Object
Abstract class for storing the distributions data to be shown on histograms. The distribution data is collected and stored by categories.

For example, if we want to collect distribution of peoples' ages, the gender can be the category. Every column on a histogram with such categories will be split in two parts (e.g. of different colors for males and females) that show the fraction of each gender in each bucket.

This class can also store the values of objects associated with several biggest and smallest objects. In our example with people, this can be 5 oldest persons and 5 youngest persons. This feature can be used for outliers analysis in many applications.

Author:
Aleksey Kirillov
  • Field Details

    • argumentObjectsCount

      protected int argumentObjectsCount
    • lastBucketIncludesRightBorder

      protected boolean lastBucketIncludesRightBorder
    • biggestArgumentObjects

      protected List<O> biggestArgumentObjects
    • biggestArgumentObjectsMap

      protected Map<O,Accumulator<C>> biggestArgumentObjectsMap
    • smallestArgumentObjects

      protected List<O> smallestArgumentObjects
    • smallestArgumentObjectsMap

      protected Map<O,Accumulator<C>> smallestArgumentObjectsMap
    • maxArgumentDescriptor

      protected HistogramDataAggregator<C,O>.MaxArgumentDescriptor maxArgumentDescriptor
    • groupSize

      protected double groupSize
      Size of a single bucket
    • groupSizeModifier

      protected int groupSizeModifier
      Current modifier of bucket size
    • maxGroupsCount

      protected int maxGroupsCount
      Current max buckets count
    • count

      protected double count
    • dataGroups

      List of buckets (i.e. data groups) that is maintained in up-to-date state
    • biggestGroup

      protected HistogramDataAggregator<C,O>.HistogramDataGroup biggestGroup
      Bucket / group with the biggest sum of values
    • sum

      protected double sum
      Sum of values in all buckets
    • argumentsSum

      protected double argumentsSum
    • createInitialZeroGroup

      protected boolean createInitialZeroGroup
      Flag, showing whether the first bucket must begin with zero
    • initialGroupLeftBound

      protected double initialGroupLeftBound
    • minGroupSizeAsPowerOf10

      protected int minGroupSizeAsPowerOf10
    • categories

      protected Set<C> categories
      List of all categories used so far
  • Constructor Details

    • HistogramDataAggregator

      protected HistogramDataAggregator(Collection<C> categories, double groupSize, int argumentObjectsCount)
      Creates a new instance of data aggregator.
      Parameters:
      categories - collection of categories. If no categories are needed, can contain a single element
      groupSize - size of buckets
      argumentObjectsCount - number of outlier objects to be stored
    • HistogramDataAggregator

      protected HistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double groupSize, int argumentObjectsCount)
      Creates a new instance of data aggregator.
      Parameters:
      categories - collection of categories. If no categories are needed, can contain a single element
      lastBucketIncludesRightBorder - flag specifying whether the last bucket includes right border
      groupSize - size of buckets
      argumentObjectsCount - number of outlier objects to be stored
    • HistogramDataAggregator

      protected HistogramDataAggregator(Collection<C> categories, double groupSize, int argumentObjectsCount, double initialGroupLeftBound)
      Creates a new instance of data aggregator.
      Parameters:
      categories - collection of categories. If no categories are needed, can contain a single element
      groupSize - size of buckets
      argumentObjectsCount - number of outlier objects to be stored
      initialGroupLeftBound - value where the first bucket is forced to begin
    • HistogramDataAggregator

      protected HistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double groupSize, int argumentObjectsCount, double initialGroupLeftBound)
      Creates a new instance of data aggregator.
      Parameters:
      categories - collection of categories. If no categories are needed, can contain a single element
      lastBucketIncludesRightBorder - flag specifying whether the last bucket includes right border
      groupSize - size of buckets
      argumentObjectsCount - number of outlier objects to be stored
      initialGroupLeftBound - value where the first bucket is forced to begin
    • HistogramDataAggregator

      protected HistogramDataAggregator(Collection<C> categories, double groupSize, int argumentObjectsCount, boolean createInitialZeroGroup)
      Creates a new instance of data aggregator.
      Parameters:
      categories - collection of categories. If no categories are needed, can contain a single element
      groupSize - size of buckets
      argumentObjectsCount - number of outlier objects to be stored
      createInitialZeroGroup - flag specifying whether to create the initial bucket that starts at zero
    • HistogramDataAggregator

      protected HistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double groupSize, int argumentObjectsCount, boolean createInitialZeroGroup)
      Creates a new instance of data aggregator.
      Parameters:
      categories - collection of categories. If no categories are needed, can contain a single element
      lastBucketIncludesRightBorder - flag specifying whether the last bucket includes right border
      groupSize - size of buckets
      argumentObjectsCount - number of outlier objects to be stored
      createInitialZeroGroup - flag specifying whether to create the initial bucket that starts at zero
    • HistogramDataAggregator

      protected HistogramDataAggregator(Collection<C> categories, double initialGroupSize, int argumentObjectsCount, boolean createInitialZeroGroup, int maxGroupsCount)
      Creates a new instance of data aggregator.
      Parameters:
      categories - collection of categories. If no categories are needed, can contain a single element
      initialGroupSize - initial size of buckets, can be adjusted later when more data is added
      argumentObjectsCount - number of outlier objects to be stored
      createInitialZeroGroup - flag specifying whether to create the initial bucket that starts at zero
      maxGroupsCount - maximum number of groups that can be created
    • HistogramDataAggregator

      protected HistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double initialGroupSize, int argumentObjectsCount, boolean createInitialZeroGroup, int maxGroupsCount)
      Creates a new instance of data aggregator.
      Parameters:
      categories - collection of categories. If no categories are needed, can contain a single element
      lastBucketIncludesRightBorder - flag specifying whether the last bucket includes right border
      initialGroupSize - initial size of buckets, can be adjusted later when more data is added
      argumentObjectsCount - number of outlier objects to be stored
      createInitialZeroGroup - flag specifying whether to create the initial bucket that starts at zero
      maxGroupsCount - maximum number of groups that can be created
    • HistogramDataAggregator

      protected HistogramDataAggregator(Collection<C> categories, double initialGroupSize, int argumentObjectsCount, double initialGroupLeftBound, int maxGroupsCount)
      Creates a new instance of data aggregator.
      Parameters:
      categories - collection of categories. If no categories are needed, can contain a single element
      initialGroupSize - initial size of buckets, can be adjusted later when more data is added
      argumentObjectsCount - number of outlier objects to be stored
      initialGroupLeftBound - value where the first bucket is forced to begin
      maxGroupsCount - maximum number of groups that can be created
    • HistogramDataAggregator

      protected HistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double initialGroupSize, int argumentObjectsCount, double initialGroupLeftBound, int maxGroupsCount)
      Creates a new instance of data aggregator.
      Parameters:
      categories - collection of categories. If no categories are needed, can contain a single element
      lastBucketIncludesRightBorder - flag specifying whether the last bucket includes right border
      initialGroupSize - initial size of buckets, can be adjusted later when more data is added
      argumentObjectsCount - number of outlier objects to be stored
      initialGroupLeftBound - value where the first bucket is forced to begin
      maxGroupsCount - maximum number of groups that can be created
    • HistogramDataAggregator

      protected HistogramDataAggregator(Collection<C> categories, double initialGroupSize, int argumentObjectsCount, double initialGroupLeftBound, int maxGroupsCount, int initialGroupsCount)
      Creates a new instance of data aggregator.
      Parameters:
      categories - collection of categories. If no categories are needed, can contain a single element
      initialGroupSize - initial size of buckets, can be adjusted later when more data is added
      argumentObjectsCount - number of outlier objects to be stored
      initialGroupLeftBound - value where the first bucket is forced to begin
      maxGroupsCount - maximum number of groups that can be created
      initialGroupsCount - number of initially created data groups
    • HistogramDataAggregator

      protected HistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double initialGroupSize, int argumentObjectsCount, double initialGroupLeftBound, int maxGroupsCount, int initialGroupsCount)
      Creates a new instance of data aggregator.
      Parameters:
      categories - collection of categories. If no categories are needed, can contain a single element
      lastBucketIncludesRightBorder - flag specifying whether the last bucket includes right border
      initialGroupSize - initial size of buckets, can be adjusted later when more data is added
      argumentObjectsCount - number of outlier objects to be stored
      initialGroupLeftBound - value where the first bucket is forced to begin
      maxGroupsCount - maximum number of groups that can be created
      initialGroupsCount - number of initially created data groups
  • Method Details

    • adjustDataGroups

      protected abstract void adjustDataGroups()
    • addValueInternal

      protected abstract void addValueInternal(double argument, C category, double value, O obj)
    • addValue

      public void addValue(double argument, C category, double contribution)
      Adds a new data point to this aggregator. The data point is defined by argument, category and contribution.

      In the example with people's ages (see HistogramDataAggregator), the data points can be:

      • one male person of age 36: argument = 36, category = MALE, contribution = 1
      • three female persons of age 15: argument = 15, category = FEMALE, contribution = 3
      Parameters:
      argument - determines to which bucket will the value go. In example with peoples' ages, this is the age of one or several persons.
      category - determines the category of the data point. In example with peoples' ages, this is the gender of the people
      contribution - determines the contribution to the bucket. In most cases, the contribution is equal to 1. In example with peoples' ages, this is number of persons.
    • addValue

      public void addValue(double argument, C category, double contribution, O obj)
      Adds a new data point to this aggregator. The data point is defined by argument, category and contribution.

      In the example with people's ages (see HistogramDataAggregator), the data points can be:

      • one male person of age 36: argument = 36, category = MALE, contribution = 1
      • three female persons of age 15: argument = 15, category = FEMALE, contribution = 3
      Parameters:
      argument - determines to which bucket will the value go. In example with peoples' ages, this is the age of one or several persons.
      category - determines the category of the data point. In example with peoples' ages, this is the gender of the people
      contribution - determines the contribution to the bucket. In most cases, the contribution is equal to 1. In example with peoples' ages, this is number of persons.
      obj - object that is associated with the data point. In example with peoples' ages, this can be an instance of Person class. Used for outliers functionality.
    • clear

      public void clear()
      Clears this aggregator. Deletes all the buckets and references to outlier objects from it.
    • getMaxGroupsCount

      public int getMaxGroupsCount()
      Returns the current limit on number of buckets.
      Returns:
      current limit on number of buckets
    • recalculateBiggestGroup

      protected void recalculateBiggestGroup()
      Recalculates the bucket with the largest value
    • getDataGroups

      Returns an unmodifiable list of buckets currently existing inside this aggregator.
      Returns:
      unmodifiable list of buckets
    • getDataGroup

      public HistogramDataAggregator<C,O>.HistogramDataGroup getDataGroup(double argument)
      Returns the bucket containing the specified argument, or null if there is no bucket containing the argument.
      Parameters:
      argument - specified argument
      Returns:
      bucket containing the specified argument, or null
    • getBiggestGroup

      public HistogramDataAggregator<C,O>.HistogramDataGroup getBiggestGroup()
      Returns the bucket with the largest sum of values.
      Returns:
      bucket with the largest sum of values
    • getBiggestArgumentObjects

      public List<O> getBiggestArgumentObjects()
      Returns an unmodifiable list of smallest outlier objects, i.e. objects associated with the biggest arguments. The number of stored objects is defined by a parameter of constructor of this class.

      For example, returns a list of top 5 eldest people.

      Returns:
      unmodifiable list of top outlier objects
    • getBiggestArgumentObjectValue

      public double getBiggestArgumentObjectValue(O obj)
      Returns the argument of associated with the specified outlier object. If the specified object is not contained in top outliers list (see getBiggestArgumentObjects()), returns 0.

      For example, for the given instance of Person class returns the person's age.

      Parameters:
      obj - specified outlier object
      Returns:
      argument of associated with the specified outlier object, or 0
    • getSmallestArgumentObjects

      public List<O> getSmallestArgumentObjects()
      Returns an unmodifiable list of smallest outlier objects, i.e. objects associated with the smallest arguments. The number of stored objects is defined by a parameter of constructor of this class.

      For example, returns a list of 5 youngest persons.

      Returns:
      unmodifiable list of top outlier objects
    • getSmallestArgumentObjectValue

      public double getSmallestArgumentObjectValue(O obj)
      Returns the argument of associated with the specified outlier object. If the specified object is not contained in smallest outliers list (see getSmallestArgumentObjects()), returns 0.

      For example, for the given instance of Person class returns the person's age.

      Parameters:
      obj - specified outlier object
      Returns:
      argument of associated with the specified outlier object, or 0
    • getCount

      public double getCount()
      Returns the total number of data points contained in this aggregator.
      Returns:
      the total number of data points contained in this aggregator
    • getCategories

      public List<C> getCategories()
      Returns the list of all categories currently contained in this aggregator. Each category can be contained only once in the returned list.
      Returns:
      list of all categories currently contained in this aggregator
    • getSum

      public double getSum()
      Returns the sum of all values in all buckets of this aggregator.
      Returns:
      sum of all values in all buckets
    • getAverage

      public double getAverage()
      Returns the average value of the distribution contained in this aggregator, or 0, if this aggregator is empty.
      Returns:
      average value of the distribution, or 0
    • getMinArgument

      public double getMinArgument()
      Returns the minimum argument of of the distribution stored in this aggregator, or 0 if this aggregator is empty.

      For example, returns the smallest age of all persons if this aggregator stores a distribution of peoples' ages.

      Returns:
      minimum argument of of the distribution, or 0
    • getMaxArgument

      public double getMaxArgument()
      Returns the maximum argument of of the distribution stored in this aggregator, or 0 if this aggregator is empty.

      For example, returns the biggest age of all persons if this aggregator stores a distribution of peoples' ages.

      Returns:
      maximum argument of of the distribution, or 0
    • getGroupSize

      public double getGroupSize()
      Returns current size of buckets.
      Returns:
      current size of buckets
    • getMinGroupSizeAsPowerOf10

      public int getMinGroupSizeAsPowerOf10()
    • setMinGroupSizeAsPowerOf10

      public void setMinGroupSizeAsPowerOf10(int minGroupSizeAsPowerOf10)
    • addDataGroup

      protected void addDataGroup(HistogramDataAggregator<C,O>.HistogramDataGroup histogramDataGroup, boolean handleCallbacks)
    • addDataGroup

      protected void addDataGroup(int injectionIndex, HistogramDataAggregator<C,O>.HistogramDataGroup histogramDataGroup, boolean handleCallbacks)
    • updateMaxArgumentDescriptor

      protected void updateMaxArgumentDescriptor()
    • onDataGroupsClear

      public void onDataGroupsClear()
    • onDataGroupsAdjust

      public void onDataGroupsAdjust()
    • onDataGroupAdd

      public void onDataGroupAdd(HistogramDataAggregator<C,O>.HistogramDataGroup histogramDataGroup, boolean addToListEnd)
    • toString

      public String toString()
      Overrides:
      toString in class Object