Class HistogramDataAggregator<C,O>
- Type Parameters:
C- type of category. In example with peoples' ages, this could be something like gender enumeration. If no categories is needed, Object type can be specified as type parameter.O- type of objects associated with data points. In example with peoples' ages, this could be Person class. If no outliers functionality is needed, Object type can be specified as type parameter.
- Direct Known Subclasses:
DynamicHistogramDataAggregator,FixedHistogramDataAggregator
For example, if we want to collect distribution of peoples' ages, the gender can be the category. Every column on a histogram with such categories will be split in two parts (e.g. of different colors for males and females) that show the fraction of each gender in each bucket.
This class can also store the values of objects associated with several biggest and smallest objects. In our example with people, this can be 5 oldest persons and 5 youngest persons. This feature can be used for outliers analysis in many applications.
- Author:
- Aleksey Kirillov
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclassClass representing a bucket of a distribution stored inside aHistogramDataAggregator.class -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected intprotected doubleprotected Map<O, Accumulator<C>> protected HistogramDataAggregator<C,O>.HistogramDataGroup Bucket / group with the biggest sum of valuesList of all categories used so farprotected doubleprotected booleanFlag, showing whether the first bucket must begin with zeroprotected List<HistogramDataAggregator<C, O>.HistogramDataGroup> List of buckets (i.e.protected doubleSize of a single bucketprotected intCurrent modifier of bucket sizeprotected doubleprotected booleanprotected HistogramDataAggregator<C,O>.MaxArgumentDescriptor protected intCurrent max buckets countprotected intprotected Map<O, Accumulator<C>> protected doubleSum of values in all buckets -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedHistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double groupSize, int argumentObjectsCount) Creates a new instance of data aggregator.protectedHistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double groupSize, int argumentObjectsCount, boolean createInitialZeroGroup) Creates a new instance of data aggregator.protectedHistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double initialGroupSize, int argumentObjectsCount, boolean createInitialZeroGroup, int maxGroupsCount) Creates a new instance of data aggregator.protectedHistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double groupSize, int argumentObjectsCount, double initialGroupLeftBound) Creates a new instance of data aggregator.protectedHistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double initialGroupSize, int argumentObjectsCount, double initialGroupLeftBound, int maxGroupsCount) Creates a new instance of data aggregator.protectedHistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double initialGroupSize, int argumentObjectsCount, double initialGroupLeftBound, int maxGroupsCount, int initialGroupsCount) Creates a new instance of data aggregator.protectedHistogramDataAggregator(Collection<C> categories, double groupSize, int argumentObjectsCount) Creates a new instance of data aggregator.protectedHistogramDataAggregator(Collection<C> categories, double groupSize, int argumentObjectsCount, boolean createInitialZeroGroup) Creates a new instance of data aggregator.protectedHistogramDataAggregator(Collection<C> categories, double initialGroupSize, int argumentObjectsCount, boolean createInitialZeroGroup, int maxGroupsCount) Creates a new instance of data aggregator.protectedHistogramDataAggregator(Collection<C> categories, double groupSize, int argumentObjectsCount, double initialGroupLeftBound) Creates a new instance of data aggregator.protectedHistogramDataAggregator(Collection<C> categories, double initialGroupSize, int argumentObjectsCount, double initialGroupLeftBound, int maxGroupsCount) Creates a new instance of data aggregator.protectedHistogramDataAggregator(Collection<C> categories, double initialGroupSize, int argumentObjectsCount, double initialGroupLeftBound, int maxGroupsCount, int initialGroupsCount) Creates a new instance of data aggregator. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidaddDataGroup(int injectionIndex, HistogramDataAggregator<C, O>.HistogramDataGroup histogramDataGroup, boolean handleCallbacks) protected voidaddDataGroup(HistogramDataAggregator<C, O>.HistogramDataGroup histogramDataGroup, boolean handleCallbacks) voidAdds a new data point to this aggregator.voidAdds a new data point to this aggregator.protected abstract voidaddValueInternal(double argument, C category, double value, O obj) protected abstract voidvoidclear()Clears this aggregator.doubleReturns the average value of the distribution contained in this aggregator, or 0, if this aggregator is empty.Returns an unmodifiable list of smallest outlier objects, i.e.doubleReturns the argument of associated with the specified outlier object.Returns the bucket with the largest sum of values.Returns the list of all categories currently contained in this aggregator.doublegetCount()Returns the total number of data points contained in this aggregator.getDataGroup(double argument) Returns the bucket containing the specified argument, ornullif there is no bucket containing the argument.Returns an unmodifiable list of buckets currently existing inside this aggregator.doubleReturns current size of buckets.doubleReturns the maximum argument of of the distribution stored in this aggregator, or 0 if this aggregator is empty.intReturns the current limit on number of buckets.doubleReturns the minimum argument of of the distribution stored in this aggregator, or 0 if this aggregator is empty.intReturns an unmodifiable list of smallest outlier objects, i.e.doubleReturns the argument of associated with the specified outlier object.doublegetSum()Returns the sum of all values in all buckets of this aggregator.voidonDataGroupAdd(HistogramDataAggregator<C, O>.HistogramDataGroup histogramDataGroup, boolean addToListEnd) voidvoidprotected voidRecalculates the bucket with the largest valuevoidsetMinGroupSizeAsPowerOf10(int minGroupSizeAsPowerOf10) toString()protected void
-
Field Details
-
argumentObjectsCount
protected int argumentObjectsCount -
lastBucketIncludesRightBorder
protected boolean lastBucketIncludesRightBorder -
biggestArgumentObjects
-
biggestArgumentObjectsMap
-
smallestArgumentObjects
-
smallestArgumentObjectsMap
-
maxArgumentDescriptor
-
groupSize
protected double groupSizeSize of a single bucket -
groupSizeModifier
protected int groupSizeModifierCurrent modifier of bucket size -
maxGroupsCount
protected int maxGroupsCountCurrent max buckets count -
count
protected double count -
dataGroups
List of buckets (i.e. data groups) that is maintained in up-to-date state -
biggestGroup
Bucket / group with the biggest sum of values -
sum
protected double sumSum of values in all buckets -
argumentsSum
protected double argumentsSum -
createInitialZeroGroup
protected boolean createInitialZeroGroupFlag, showing whether the first bucket must begin with zero -
initialGroupLeftBound
protected double initialGroupLeftBound -
minGroupSizeAsPowerOf10
protected int minGroupSizeAsPowerOf10 -
categories
List of all categories used so far
-
-
Constructor Details
-
HistogramDataAggregator
protected HistogramDataAggregator(Collection<C> categories, double groupSize, int argumentObjectsCount) Creates a new instance of data aggregator.- Parameters:
categories- collection of categories. If no categories are needed, can contain a single elementgroupSize- size of bucketsargumentObjectsCount- number of outlier objects to be stored
-
HistogramDataAggregator
protected HistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double groupSize, int argumentObjectsCount) Creates a new instance of data aggregator.- Parameters:
categories- collection of categories. If no categories are needed, can contain a single elementlastBucketIncludesRightBorder- flag specifying whether the last bucket includes right bordergroupSize- size of bucketsargumentObjectsCount- number of outlier objects to be stored
-
HistogramDataAggregator
protected HistogramDataAggregator(Collection<C> categories, double groupSize, int argumentObjectsCount, double initialGroupLeftBound) Creates a new instance of data aggregator.- Parameters:
categories- collection of categories. If no categories are needed, can contain a single elementgroupSize- size of bucketsargumentObjectsCount- number of outlier objects to be storedinitialGroupLeftBound- value where the first bucket is forced to begin
-
HistogramDataAggregator
protected HistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double groupSize, int argumentObjectsCount, double initialGroupLeftBound) Creates a new instance of data aggregator.- Parameters:
categories- collection of categories. If no categories are needed, can contain a single elementlastBucketIncludesRightBorder- flag specifying whether the last bucket includes right bordergroupSize- size of bucketsargumentObjectsCount- number of outlier objects to be storedinitialGroupLeftBound- value where the first bucket is forced to begin
-
HistogramDataAggregator
protected HistogramDataAggregator(Collection<C> categories, double groupSize, int argumentObjectsCount, boolean createInitialZeroGroup) Creates a new instance of data aggregator.- Parameters:
categories- collection of categories. If no categories are needed, can contain a single elementgroupSize- size of bucketsargumentObjectsCount- number of outlier objects to be storedcreateInitialZeroGroup- flag specifying whether to create the initial bucket that starts at zero
-
HistogramDataAggregator
protected HistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double groupSize, int argumentObjectsCount, boolean createInitialZeroGroup) Creates a new instance of data aggregator.- Parameters:
categories- collection of categories. If no categories are needed, can contain a single elementlastBucketIncludesRightBorder- flag specifying whether the last bucket includes right bordergroupSize- size of bucketsargumentObjectsCount- number of outlier objects to be storedcreateInitialZeroGroup- flag specifying whether to create the initial bucket that starts at zero
-
HistogramDataAggregator
protected HistogramDataAggregator(Collection<C> categories, double initialGroupSize, int argumentObjectsCount, boolean createInitialZeroGroup, int maxGroupsCount) Creates a new instance of data aggregator.- Parameters:
categories- collection of categories. If no categories are needed, can contain a single elementinitialGroupSize- initial size of buckets, can be adjusted later when more data is addedargumentObjectsCount- number of outlier objects to be storedcreateInitialZeroGroup- flag specifying whether to create the initial bucket that starts at zeromaxGroupsCount- maximum number of groups that can be created
-
HistogramDataAggregator
protected HistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double initialGroupSize, int argumentObjectsCount, boolean createInitialZeroGroup, int maxGroupsCount) Creates a new instance of data aggregator.- Parameters:
categories- collection of categories. If no categories are needed, can contain a single elementlastBucketIncludesRightBorder- flag specifying whether the last bucket includes right borderinitialGroupSize- initial size of buckets, can be adjusted later when more data is addedargumentObjectsCount- number of outlier objects to be storedcreateInitialZeroGroup- flag specifying whether to create the initial bucket that starts at zeromaxGroupsCount- maximum number of groups that can be created
-
HistogramDataAggregator
protected HistogramDataAggregator(Collection<C> categories, double initialGroupSize, int argumentObjectsCount, double initialGroupLeftBound, int maxGroupsCount) Creates a new instance of data aggregator.- Parameters:
categories- collection of categories. If no categories are needed, can contain a single elementinitialGroupSize- initial size of buckets, can be adjusted later when more data is addedargumentObjectsCount- number of outlier objects to be storedinitialGroupLeftBound- value where the first bucket is forced to beginmaxGroupsCount- maximum number of groups that can be created
-
HistogramDataAggregator
protected HistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double initialGroupSize, int argumentObjectsCount, double initialGroupLeftBound, int maxGroupsCount) Creates a new instance of data aggregator.- Parameters:
categories- collection of categories. If no categories are needed, can contain a single elementlastBucketIncludesRightBorder- flag specifying whether the last bucket includes right borderinitialGroupSize- initial size of buckets, can be adjusted later when more data is addedargumentObjectsCount- number of outlier objects to be storedinitialGroupLeftBound- value where the first bucket is forced to beginmaxGroupsCount- maximum number of groups that can be created
-
HistogramDataAggregator
protected HistogramDataAggregator(Collection<C> categories, double initialGroupSize, int argumentObjectsCount, double initialGroupLeftBound, int maxGroupsCount, int initialGroupsCount) Creates a new instance of data aggregator.- Parameters:
categories- collection of categories. If no categories are needed, can contain a single elementinitialGroupSize- initial size of buckets, can be adjusted later when more data is addedargumentObjectsCount- number of outlier objects to be storedinitialGroupLeftBound- value where the first bucket is forced to beginmaxGroupsCount- maximum number of groups that can be createdinitialGroupsCount- number of initially created data groups
-
HistogramDataAggregator
protected HistogramDataAggregator(Collection<C> categories, boolean lastBucketIncludesRightBorder, double initialGroupSize, int argumentObjectsCount, double initialGroupLeftBound, int maxGroupsCount, int initialGroupsCount) Creates a new instance of data aggregator.- Parameters:
categories- collection of categories. If no categories are needed, can contain a single elementlastBucketIncludesRightBorder- flag specifying whether the last bucket includes right borderinitialGroupSize- initial size of buckets, can be adjusted later when more data is addedargumentObjectsCount- number of outlier objects to be storedinitialGroupLeftBound- value where the first bucket is forced to beginmaxGroupsCount- maximum number of groups that can be createdinitialGroupsCount- number of initially created data groups
-
-
Method Details
-
adjustDataGroups
protected abstract void adjustDataGroups() -
addValueInternal
-
addValue
Adds a new data point to this aggregator. The data point is defined by argument, category and contribution.In the example with people's ages (see
HistogramDataAggregator), the data points can be:- one male person of age 36: argument = 36, category = MALE, contribution = 1
- three female persons of age 15: argument = 15, category = FEMALE, contribution = 3
- Parameters:
argument- determines to which bucket will the value go. In example with peoples' ages, this is the age of one or several persons.category- determines the category of the data point. In example with peoples' ages, this is the gender of the peoplecontribution- determines the contribution to the bucket. In most cases, the contribution is equal to 1. In example with peoples' ages, this is number of persons.
-
addValue
Adds a new data point to this aggregator. The data point is defined by argument, category and contribution.In the example with people's ages (see
HistogramDataAggregator), the data points can be:- one male person of age 36: argument = 36, category = MALE, contribution = 1
- three female persons of age 15: argument = 15, category = FEMALE, contribution = 3
- Parameters:
argument- determines to which bucket will the value go. In example with peoples' ages, this is the age of one or several persons.category- determines the category of the data point. In example with peoples' ages, this is the gender of the peoplecontribution- determines the contribution to the bucket. In most cases, the contribution is equal to 1. In example with peoples' ages, this is number of persons.obj- object that is associated with the data point. In example with peoples' ages, this can be an instance of Person class. Used for outliers functionality.
-
clear
public void clear()Clears this aggregator. Deletes all the buckets and references to outlier objects from it. -
getMaxGroupsCount
public int getMaxGroupsCount()Returns the current limit on number of buckets.- Returns:
- current limit on number of buckets
-
recalculateBiggestGroup
protected void recalculateBiggestGroup()Recalculates the bucket with the largest value -
getDataGroups
Returns an unmodifiable list of buckets currently existing inside this aggregator.- Returns:
- unmodifiable list of buckets
-
getDataGroup
Returns the bucket containing the specified argument, ornullif there is no bucket containing the argument.- Parameters:
argument- specified argument- Returns:
- bucket containing the specified argument, or
null
-
getBiggestGroup
Returns the bucket with the largest sum of values.- Returns:
- bucket with the largest sum of values
-
getBiggestArgumentObjects
Returns an unmodifiable list of smallest outlier objects, i.e. objects associated with the biggest arguments. The number of stored objects is defined by a parameter of constructor of this class.For example, returns a list of top 5 eldest people.
- Returns:
- unmodifiable list of top outlier objects
-
getBiggestArgumentObjectValue
Returns the argument of associated with the specified outlier object. If the specified object is not contained in top outliers list (seegetBiggestArgumentObjects()), returns 0.For example, for the given instance of Person class returns the person's age.
- Parameters:
obj- specified outlier object- Returns:
- argument of associated with the specified outlier object, or 0
-
getSmallestArgumentObjects
Returns an unmodifiable list of smallest outlier objects, i.e. objects associated with the smallest arguments. The number of stored objects is defined by a parameter of constructor of this class.For example, returns a list of 5 youngest persons.
- Returns:
- unmodifiable list of top outlier objects
-
getSmallestArgumentObjectValue
Returns the argument of associated with the specified outlier object. If the specified object is not contained in smallest outliers list (seegetSmallestArgumentObjects()), returns 0.For example, for the given instance of Person class returns the person's age.
- Parameters:
obj- specified outlier object- Returns:
- argument of associated with the specified outlier object, or 0
-
getCount
public double getCount()Returns the total number of data points contained in this aggregator.- Returns:
- the total number of data points contained in this aggregator
-
getCategories
Returns the list of all categories currently contained in this aggregator. Each category can be contained only once in the returned list.- Returns:
- list of all categories currently contained in this aggregator
-
getSum
public double getSum()Returns the sum of all values in all buckets of this aggregator.- Returns:
- sum of all values in all buckets
-
getAverage
public double getAverage()Returns the average value of the distribution contained in this aggregator, or 0, if this aggregator is empty.- Returns:
- average value of the distribution, or 0
-
getMinArgument
public double getMinArgument()Returns the minimum argument of of the distribution stored in this aggregator, or 0 if this aggregator is empty.For example, returns the smallest age of all persons if this aggregator stores a distribution of peoples' ages.
- Returns:
- minimum argument of of the distribution, or 0
-
getMaxArgument
public double getMaxArgument()Returns the maximum argument of of the distribution stored in this aggregator, or 0 if this aggregator is empty.For example, returns the biggest age of all persons if this aggregator stores a distribution of peoples' ages.
- Returns:
- maximum argument of of the distribution, or 0
-
getGroupSize
public double getGroupSize()Returns current size of buckets.- Returns:
- current size of buckets
-
getMinGroupSizeAsPowerOf10
public int getMinGroupSizeAsPowerOf10() -
setMinGroupSizeAsPowerOf10
public void setMinGroupSizeAsPowerOf10(int minGroupSizeAsPowerOf10) -
addDataGroup
protected void addDataGroup(HistogramDataAggregator<C, O>.HistogramDataGroup histogramDataGroup, boolean handleCallbacks) -
addDataGroup
protected void addDataGroup(int injectionIndex, HistogramDataAggregator<C, O>.HistogramDataGroup histogramDataGroup, boolean handleCallbacks) -
updateMaxArgumentDescriptor
protected void updateMaxArgumentDescriptor() -
onDataGroupsClear
public void onDataGroupsClear() -
onDataGroupsAdjust
public void onDataGroupsAdjust() -
onDataGroupAdd
public void onDataGroupAdd(HistogramDataAggregator<C, O>.HistogramDataGroup histogramDataGroup, boolean addToListEnd) -
toString
-