WO2017087003A1 - Segments of data entries - Google Patents

Segments of data entries Download PDF

Info

Publication number
WO2017087003A1
WO2017087003A1 PCT/US2015/061998 US2015061998W WO2017087003A1 WO 2017087003 A1 WO2017087003 A1 WO 2017087003A1 US 2015061998 W US2015061998 W US 2015061998W WO 2017087003 A1 WO2017087003 A1 WO 2017087003A1
Authority
WO
WIPO (PCT)
Prior art keywords
segment
sub
segments
share
attributes
Prior art date
Application number
PCT/US2015/061998
Other languages
French (fr)
Inventor
Renato Keshet
Alina Maor
Ron Maurer
Alexander MAYDANIK
Reuth Vexler
Olga SHAIN
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Priority to PCT/US2015/061998 priority Critical patent/WO2017087003A1/en
Publication of WO2017087003A1 publication Critical patent/WO2017087003A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising

Definitions

  • FIG. 1 is a block diagram of an example computing device
  • FIG. 2 illustrates an example plurality of data entries
  • FIG. 3 illustrates an example segmentation engine, an example plurality entries, an example set of segments and an example set of sub-segments
  • FIG. 4 illustrates an example display
  • FIG. 5 shows a flowchart of an example method
  • FIG. 8 is a block diagram of an example computing device.
  • identifying relevant information from large amounts of unprocessed data can be a very difficult computational task.
  • Combining, filtering, or sorting the data may facilitate its analysis, but may not help the user detect correlations between various data parameters, determine whether and how these correlations change over time, or identify changes having a particular significance to a particular user.
  • each data entry includes a plurality of values corresponding to a plurality of attributes, and for identifying among ail the data entries at least one segment (e.g., a set of data entries sharing at least one attribute) and at least one sub-segment within that segment (e.g., a subset of data entries sharing at least one other attribute) whose relative share in that segment has significantly changed at a particular point in time.
  • segment e.g., a set of data entries sharing at least one attribute
  • sub-segment within that segment e.g., a subset of data entries sharing at least one other attribute
  • IT security professionals could use such methods and systems to process numerous (e.g., millions, billions, or more) data entries describing network access attempts, and to determine whether a particular type of access (e.g., HTTP request) from a particular country has significantly and unexpectedly increased relatively to all other types of access from the same country, and/or whether there has been a significant and unexpected increase of HTTP requests from a particular country.
  • a business analyst may wish to know whether there has been a significant increase in market share of certain products, by certain vendors, in certain countries, etc.
  • the significance of the change may be determined based on various factors such as whether the new share is substantially different from its the predicted statistical distribution (e.g., based on historical data), whether the particular sub- segment and the particular segment are of particular interest to the user (e.g., based on user's past interactions with the computing device), etc.
  • the computing device may include, among other things, a segmentation engine, a segment analyzer, and a graphical user interface (GUI) engine.
  • the segmentation engine may, among other things, divide a plurality of data entries into a set of segments, where each segment is associated with a different set of attribute values, and divide each segment in the set of segments into a set of sub- segments based on at least one attribute value other than the set of attribute values associated with the segment.
  • the segment analyzer may, among other things, determine, for each sub-segment of each segment, a first share of the sub-segment in the segment at a first time and a second share of the sub- segment in the segment at a second time, and calculate, for each sub-segment of each segment, a significance value based at least on the first share and the second share.
  • the GUI engine may, among other things, display information about a plurality of sub-segments and segments associated with a plurality of highest significance values.
  • FIG, 1 is a block diagram of an example computing device 100.
  • Computing device 100 may include a smartphone, ceil phone, tablet, laptop, desktop, server, application-specific computing device, any other processing device or equipment.
  • computing device 100 may also include any combination of one or more computing devices of the same type or of different types.
  • computing device 100 may include at least a server device communicatively coupled to a client device.
  • server device communicatively coupled to a client device.
  • computing device 100 may include a segmentation engine 1 12, a segment analyzer 1 13, a graphical user interface (GUI) engine 1 14, a memory 1 18, and a processor 1 17.
  • GUI graphical user interface
  • Engines 1 12, 1 13, and 1 14 may each generally represent any combination of hardware and programming that may be embedded in computing device 100 or communicatively coupled thereto. Engines 1 12, 1 13, and 1 14 may correspond to separate modules or be a part of the same module.
  • Memory 1 16 may also be embedded in computing device 100 or communicatively coupled thereto, and may include any type of volatile or non-volatile memory, such as a random-access memory (RAM), flash memory, hard drive, memristor-based memory, and so forth.
  • Processor 1 17 may include, for example, one or multiple processors (e.g., central processing units (CPUs), semiconductor-based microprocessors, graphics processing units (GPUs), field-programmable gate arrays (FPGAs) configured to retrieve and execute instructions, or other electronic circuitry), which may be integrated in a single device or distributed across devices.
  • processors e.g., central processing units (CPUs), semiconductor-based microprocessors, graphics processing units (GPUs), field-programmable gate arrays (FPGAs) configured to retrieve and execute instructions, or other electronic circuitry
  • computing device 100 may also be communicatively coupled (e.g., through GUI engine 1 14) to display 1 18, which may or may not be embedded in computing device 100.
  • Display 1 18 may be implemented using any suitable technology, such as LCD, LED, OLED, TFT, Plasma, etc. in some implementations, display 1 18 may be a touch-sensitive display.
  • Segmentation engine 1 12 may obtain a plurality of data entries.
  • the plurality of data entries may be stored in a memory of computing device 100 (e.g., memory 1 16) and/or in a memory of another device that is communicatively coupled to computing device 100, e.g., via one or more networks, such as the Internet.
  • the plurality of data entries may be stored in a single database or file or in multiple databases or files, and may be organized in a single data table, in multiple data tables, or in any other type of data structure(s).
  • Each data entry 210 may describe, for example, an event, a state, a status, and so forth, or a summary of events, states, status, etc.
  • each data entry 210 describes a summary of sales of a particular type of product by a particular vendor, In a particular country, during a particular quarter.
  • each data entry may describe a network access attempt, indicating the source, the target, the manner, the time, and other attributes associated with the attempt.
  • data entries may describe blood sample of patients, climate measurements, crime statistics, or any other type of quantifiable data that may change over time.
  • Each data entry 210 may Include or be associated with temporal information that may describe, for example, a point in time or a period of time (e.g., 2015-Q1 ) corresponding to the particular event, state, status, etc., described by the particular data entry.
  • the temporal information may be included in each data entry; in other examples, each data entry may be associated with temporal information associated with the data table that includes the data entry.
  • data entries 210 may be stored in a plurality of data tables, where each data table is associated with a different point or period of time.
  • Each data entry may also include or be associated with a plurality of attribute values corresponding to a plurality of attributes.
  • data entry 210-1 has attribute values "1.1 M,” “750K,” “A,” “USA,” “Tower server,” “Intel,” “1 ,” etc., corresponding to attributes Revenue, Units, Vendor, Country, Product Type, Processor Type, Max Processors, etc., respectively.
  • the attribute values may be numeric, alphabetic, alphanumeric, or of any other type.
  • segmentation engine 1 12 may group data entries into a set of one or more segments, some of which may overlap, meaning that some data entries may be included in more than one segment.
  • Each segment may include data entries corresponding to the same time and sharing at least one attribute value of at least one attribute.
  • each segment may be defined by a particular time and a set of one or more attribute values corresponding to a set of one or attributes.
  • a segment may include all data entries 210 from quarter 2015-Q1 whose Vendor attribute is set to "A" (e.g., 210-1 , 210-2, and 210-5).
  • a segment may include ail data entries 210 from quarter 2015-Q1 having a Vendor attribute set to "A” and a Country attribute set to "USA” (e.g., 210-1 and 210-2).
  • the set of segments may also include a "global" segment, i.e., a segment that includes ail data entries associated with a particular time.
  • segmentation engine 1 12 may determine all possible segments for data entries of the same time, i.e., ail possible combinations of one or more attribute values that would yield a segment that includes at least one data entry 210.
  • segmentation engine 1 12 may obtain (e.g., from GUI engine 1 14) a user input indicating a set of attributes of interest selected by the user from the plurality of attributes.
  • segmentation engine 1 12 may determine the set of segments based only on attribute values corresponding to the attributes of interest. For example, segmentation engine 1 12 may determine the set of segments by determining all possible combinations of attribute values of attributes of interest that would yield a segment comprising at least one data entry.
  • engine 1 12 may determine the set of segments such that each segment includes data entries that either have the same vendor or the same country, or both.
  • the user may further reduce the number and/or the size of segments by using various filters.
  • Reducing the number and/or the size of the segments and sub-segments, while maintaining the ability to identify significant data changes that were previously unnoticed, can greatly improve the performance of computing device 1 10 (e.g., by reducing its processing time, memory consumption, power consumption, etc.) while also providing great improvements to the field of data analytics.
  • each segment may include a number of sub-segments, where each sub- segment may be defined by at least one value of at least one additional attribute, i.e., an attribute that is not used to define the segment itself.
  • FIG. 3 shows an example set of segments 305 and sub-segments (e.g., 310-1 , 310-2, 310-3, etc.) that may be determined by segmentation engine 1 12 based on data entries of 2015-Q1.
  • sub-segments e.g., 310-1 , 310-2, 310-3, etc.
  • segmentation engine 1 12 may determine all possible sub-segments that can be defined for that segment as described above.
  • the set of sub-segments may include only sub-segments defined by attributes of interest, which, as mentioned above, may be predefined and/or selected by the user.
  • both the segments and the sub-segments may be defined by various (in some examples - by ail) combinations of values of attributes of interest, and segmentation engine 1 12 may disregard the values of other attributes when determining the sets of segments and sub-segments.
  • the user may further reduce the number and/or the size of sub-segments by using various filters that need to be matched by ail data entries included in the sub- segments.
  • one of the attributes of data entries may be predefined and/or selected by the user to be the quantifying attribute based on which the sizes of segments and sub-segments are to be calculated, as discussed below.
  • the quantifying attribute can be "Revenue” or "Units,” for example.
  • segmentation engine 1 12 may disregard the values of the quantifying attribute when determining the set of segments and their respective sub-segments.
  • segmentation engine 1 12 may identify and disregard any segments or sub-segments that are only associated with one time, in other examples, if a segment or a sub-segment is only associated with one time, segmentation engine 1 12 may assume that the segment or sub-segment has at least one virtual entry associated with at least one other time, the virtual entry having all its values set to zero.
  • segment analyzer 1 13 may analyze the plurality of data entries in accordance with those determinations. In some examples, segment analyzer 1 13 may determine a share of each sub-segment in its respective segment. For example, segment analyzer 1 13 may determine a share of sub-segment x in a segment y by calculating an empiric conditional probability p t (x
  • segment analyzer 1 13 may calculate a sum of quantifying attribute values of all data entries within sub-segment x and segment y, respectively. For example, referring to FIG.
  • segment analyzer 1 13 may determine a significance value for each sub-segment's share in its segment at a certain time (e.g., fe).
  • the significance value may generally represent the extent to which a particular sub-segment's share and/or the change of the share is likely to be of interest to the particular user.
  • the significance value may be determined based on the Maha!anobis distance between the sub-segment's share at time fcand the predicted distribution of the sub-segment's share at time I2.
  • the distribution of the sub-segment's share in a given segment is a Gaussian distribution, in which case the Mahalanobis distance may be expressed as -?--- 2 ----i-------- where ⁇ and ⁇ are the mean and the standard deviation of the sub-segment's share in the segment.
  • the Mahalanobis distance may be assumed to be a distance to a zero- order Gaussian prediction of Pt 2 (x ⁇ y " ), and may be calculated as follows:
  • segment analyzer 1 13 may use Mahalanobis distance to other types of predicted distributions to determine the significance value. For example, segment analyzer 1 13 may analyze historical data entries using deep learning techniques or other machine learning methods to determine the prediction for p iz (xjy). Such methods can take into account various additional factors, such as trends, seasonality, sub- segment similarities, segment similarities, and so forth.
  • the significance value may also be determined by segment analyzer 1 13 based on various relevance factors, i.e., factors indicating or predicting the extent to which the change in the particular sub-segment's share is relevant to the particular user.
  • factors may include, for example, the size of the segment (e.g., at time h), For example, if a user is more interested in changes occurring in larger segments, segment analyzer 1 13 may increase the significance value as the segment size increases, and decrease the significance value as the segment size decreases, in some examples, segment analyzer 1 13 may also change the significance value based on one or more user inputs.
  • segment analyzer 1 13 may determine based on one or more historical user inputs (e.g., using machine learning or other types of methods) that some types of segments or segment share changes are more relevant or interesting to the user than others.
  • the relevance factors may in some examples include an adjustable weight value that may be initially set to a default value (e.g., 1 ) and then dynamically increased and/or decreased by segment analyzer 1 13 based on user inputs, as further discussed below.
  • Relevance factors may also include or be associated with trends, seasonality, sub-segment similarities, segment similarities, and various other factors.
  • x , y) is the significance value of a share change of sub-segment x in segment y between times and fe, >;
  • ' '' J is the Mahalanobis distance (e.g., to a zero-order Gaussian prediction of p (x ⁇ y ' )), 3 ⁇ 4 ( ') is the size of segment y at time ?2, and W x y is the adjustable weight value associated with sub-segment x and segment y, as discussed above.
  • W x y is the adjustable weight value associated with sub-segment x and segment y, as discussed above.
  • GUI engine 1 14 can provide for display (e.g., on display 1 18) information about sub-segments and segments whose share changes are associated with highest significance values, as illustrated in the example of FIG. 4.
  • the information may be presented in a descending order of significance values, and may indicate, for each sub-segment/segrnent pair, the attribute values defining the sub-segment (e.g., 41 1 ) and the segment (e.g., 412), the new share (e.g., 414), and the previous share (e.g., 413) of the sub-segment in the segment.
  • GUI engine 1 14 may also provide for display any additional information (not shown for brevity) describing or associated with the sub-segment, its segment, and the change in the share, in some examples, the additional information may be displayed in a graphical and/or textual manner, upon obtaining a user input (e.g., a touch or a click) associated with a particular sub-segment, segment, or share change, in some examples, GUi engine 1 14 may also provide visual indicators allowing the user to quickly determine the nature of the most significant share changes. For example, GUI engine 1 14 may display, for each share change, a shape and/or an arrow (e.g., 410) indicating whether the change was positive or negative, where the color and/or saturation of the shape may indicate the significance of the change.
  • a shape and/or an arrow e.g., 410
  • GUI engine 1 14 may also display a list of all attributes (e.g., 405) associated with the data entries, allowing the user to select attributes of interest; a text window 415 to collect user input indicating one or more filters to be applied during the segment/sub-segment determination; one or more selection widgets 420 to enable the user to select at least two times (e.g., time periods) to be compared; and a selection widget 425 to enable the user to select a quantifying attribute.
  • attributes e.g., 405
  • a text window 415 to collect user input indicating one or more filters to be applied during the segment/sub-segment determination
  • one or more selection widgets 420 to enable the user to select at least two times (e.g., time periods) to be compared
  • a selection widget 425 to enable the user to select a quantifying attribute.
  • GUI engine 1 14 may also, upon receiving a user input (e.g., a touch or a click) selecting a particular share change, display a set of one or more additional share changes associated with the selected share change.
  • additional share changes may include, for example, the most significant share changes of the same sub-segment as that of the selected share change in segments other than that of the selected share change.
  • GUI engine 1 14 may also collect various inputs by the user based on which segment analyzer 1 13 may adjust weights associated with various sub- segment/segment pairs. For example, GUI engine 1 14 may determine which sub- segment/segment pairs are more interesting to the user based on which sub- segment/segment pairs are selected by the user, based on how long the user examines them, etc. In some examples, the user may explicitly indicate which pairs the user is interested in and/or which pairs the user is not interested in, e.g., by using one or more graphical widgets (e.g., "likes" and/or "dislikes”) next to each displayed pair.
  • graphical widgets e.g., "likes" and/or "dislikes
  • segment analyzer 1 13 may increase (or decrease) the weights associated with the corresponding pairs, increasing (or decreasing) the significance values associated with these pairs, thereby increasing (or decreasing) the likelihood that these pairs would be displayed to the user in the future.
  • engines 1 12, 1 13, and 1 14 were described as any combinations of hardware and programming. Such components may be implemented in a number of fashions.
  • the programming may be processor executable instructions stored on a tangible, non-transitory computer-readable medium and the hardware may include a processing resource for executing those instructions.
  • the processing resource may include one or multiple processors (e.g., central processing units (CPUs), semiconductor-based microprocessors, graphics processing units (GPUs), field-programmable gate arrays (FPGAs) configured to retrieve and execute instructions, or other electronic circuitry), which may be integrated in a single device or distributed across devices.
  • the computer-readabie medium can be said to store program instructions that when executed by the processor resource implement the functionality of the respective component.
  • the computer-readable medium may be integrated in the same device as the processor resource or it may be separate but accessible to that device and the processor resource.
  • the program instructions can be part of an installation package that when installed can be executed by the processor resource to implement the corresponding component, in this case, the computer-readabie medium may be a portable medium such as a CD, DVD, or flash drive or a memory maintained by a server from which the installation package can be downloaded and installed.
  • the program instructions may be part of an application or applications already installed, and the computer-readable medium may include integrated memory such as a hard drive, solid state drive, or the like.
  • FIG. 5 is a flowchart of an example method 500.
  • Method 500 may be described below as being executed or performed by a system or by a computing device such as computing device 100 of FIG. 1. Other suitable systems and/or computing devices may be used as well.
  • Method 500 may be implemented in the form of executable instructions stored on at least one non-transitory machine- readable storage medium of the system and executed by at least one processor of the system.
  • method 500 may be implemented in the form of electronic circuitry (e.g., hardware).
  • one or more blocks of method 500 may be executed substantially concurrently or in a different order than shown in FIG. 5.
  • method 500 may include more or less blocks than are shown in FIG. 5.
  • one or more of the blocks of method 500 may, at certain times, be ongoing and/or may repeat.
  • method 500 may obtain a plurality of data entries stored in a memory, each data entry including a plurality of attribute values of a plurality of attributes.
  • the method may determine (e.g., by the processor) a set of segments, each segment being defined by a set of attribute values of a set of attributes.
  • the method may determine (e.g., by the processor) for each segment a set of sub-segments, each sub-segment being defined by at least one additional attribute value of at least one additional attribute not in the set of attributes associated with the segment.
  • the method may compute (e.g., by the processor), for each sub-segment of each segment, a significance value associated with a change in the sub-segment's share within the segment.
  • the method may determine a set of selected sub-segments based on the significance value computed for each sub-segment of each segment.
  • the method may provide for display (e.g., on display 1 18) a visual representation of the set of selected sub-segments. As discussed above, in some examples, the method may include fewer blocks or additional blocks not shown in FIG. 5 for brevity.
  • FIG. 6 is a block diagram of an example computing system 600.
  • Computing device 600 may be similar to computing device 100 of FIG. 1 .
  • computing device 600 includes a processor 610 and a non- transitory machine-readable storage medium 620.
  • processor 610 and a non- transitory machine-readable storage medium 620.
  • the instructions may be distributed (e.g., stored) across multiple machine-readable storage mediums and the instructions may be distributed (e.g., executed by) across multiple processors.
  • Processor 810 may be one or more central processing units (CPUs), microprocessors, and/or other hardware devices suitable for retrieval and execution of instructions stored in non-transitory machine-readable storage medium 820.
  • processor 610 may fetch, decode, and execute instructions 622, 624, 626, 628, 630, or any other instructions (not shown for brevity).
  • processor 610 may include one or more electronic circuits comprising a number of electronic components for performing the functionality of one or more of the instructions in machine-readable storage medium 620.
  • executable instruction representations e.g., boxes
  • executable instructions and/or electronic circuits included within one box may, in alternate examples, be included in a different box shown in the figures or in a different box not shown.
  • Non-transitory machine-readable storage medium 620 may be any electronic, magnetic, optical, or other physical storage device that stores executable instructions.
  • medium 620 may be, for example, Random Access Memory (RAM), an Electrically-Erasable Programmable Read-Only Memory (EEPROM), a storage drive, an optical disc, and the like.
  • Medium 620 may be disposed within computing device 600, as shown in FIG. 6. In this situation, the executable instructions may be "installed" on computing device 600.
  • medium 620 may be a portable, external or remote storage medium, for example, that allows computing device 600 to download the instructions from the portable/external/remote storage medium. In this situation, the executable instructions may be part of an "installation package". As described herein, medium 620 may be encoded with executable instructions.
  • instructions 622 when executed by a processor (e.g., 610), may cause a computing device (e.g., 600) to obtain a plurality of data entries, each data entry comprising a plurality of attribute values of a plurality of attributes
  • instructions 624 when executed by the processor, may cause the computing device to determine a set of segments, each segment being defined by a set of attribute values of a set of attributes.
  • Instructions 826 when executed by the processor, may cause the computing device to determine for each segment a set of sub- segments, each sub-segment being defined by at least one additional attribute value of at least one additional attribute not in the set of attributes associated with the segment.
  • Instructions 828 when executed by the processor, may cause the computing device to compute, for each sub-segment of each segment, a distance between the sub-segment's share in the segment and a predicted distribution of the sub-segment's share in the segment, instructions 630, when executed by the processor, may cause the computing device to select (and in some examples, provide for display) at least one sub-segment of at least one segment based at least on the computed distance.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Examples disclosed herein relate, among other things, to a method. The method may obtain a plurality of data entries stored in a memory, each data entry comprising a plurality of attribute values of a plurality of attributes; determine a set of segments, each segment being defined by a set of attribute values of a set of attributes; for each segment, determine a set of sub-segments, each sub-segment being defined by at least one additional attribute value of at least one additional attribute not in the set of attributes associated with the segment; for each sub-segment of each segment, compute a significance value associated with a change in the sub-segment's share within the segment; and determine a set of selected sub-segments and segments based on the significance value computed for each sub-segment of each segment.

Description

SEGMENTS OF DATA ENTRIES
BACKGROUND
[0001 ] Computing devices today possess great storage and computational capacities, allowing researchers and analysts to obtain, store, and process vast amounts of data. However, as the amounts of processed data are increased, determining which data has particular significance to a particular user and presenting such data to the user in a meaningful manner becomes an increasingly challenging task.
BREF DESCRIPTION OF THE DRAWINGS
The following detailed description references the drawings, wherein:
FIG. 1 is a block diagram of an example computing device;
FIG. 2 illustrates an example plurality of data entries;
FIG. 3 illustrates an example segmentation engine, an example plurality entries, an example set of segments and an example set of sub-segments;
FIG. 4 illustrates an example display;
FIG. 5 shows a flowchart of an example method; and
FIG. 8 is a block diagram of an example computing device.
DETAILED DESCRIPTOR
[0009] As mentioned above, identifying relevant information from large amounts of unprocessed data can be a very difficult computational task. Combining, filtering, or sorting the data may facilitate its analysis, but may not help the user detect correlations between various data parameters, determine whether and how these correlations change over time, or identify changes having a particular significance to a particular user.
[0010] These types of analyses, however, can be very important in a variety fields and applications. Specifically, it would be beneficial to have methods and systems for analyzing a plurality of data entries where each data entry includes a plurality of values corresponding to a plurality of attributes, and for identifying among ail the data entries at least one segment (e.g., a set of data entries sharing at least one attribute) and at least one sub-segment within that segment (e.g., a subset of data entries sharing at least one other attribute) whose relative share in that segment has significantly changed at a particular point in time. For example, IT security professionals could use such methods and systems to process numerous (e.g., millions, billions, or more) data entries describing network access attempts, and to determine whether a particular type of access (e.g., HTTP request) from a particular country has significantly and unexpectedly increased relatively to all other types of access from the same country, and/or whether there has been a significant and unexpected increase of HTTP requests from a particular country. As another example, a business analyst may wish to know whether there has been a significant increase in market share of certain products, by certain vendors, in certain countries, etc.
[001 1 ] it is appreciated that the number of possible segment/sub-segment into which data can be divided can grow exponentially as a function of the number of attributes describing the data and the number of different values those attributes have in a particular data set. The number of possible combinations is further increased if the segments and sub-segments can be defined by more than one attribute, as will be illustrated in some examples below. Accordingly, having a computing device analyze the thousands, millions, or even billions of segment/sub- segment combinations and identify those sub-segments whose share in the segment changed significantly may provide data analysts with many valuable insights. The significance of the change may be determined based on various factors such as whether the new share is substantially different from its the predicted statistical distribution (e.g., based on historical data), whether the particular sub- segment and the particular segment are of particular interest to the user (e.g., based on user's past interactions with the computing device), etc.
[0012] Examples disclosed herein describe, among other things, a computing device. The computing device may include, among other things, a segmentation engine, a segment analyzer, and a graphical user interface (GUI) engine. The segmentation engine may, among other things, divide a plurality of data entries into a set of segments, where each segment is associated with a different set of attribute values, and divide each segment in the set of segments into a set of sub- segments based on at least one attribute value other than the set of attribute values associated with the segment. The segment analyzer may, among other things, determine, for each sub-segment of each segment, a first share of the sub-segment in the segment at a first time and a second share of the sub- segment in the segment at a second time, and calculate, for each sub-segment of each segment, a significance value based at least on the first share and the second share. The GUI engine may, among other things, display information about a plurality of sub-segments and segments associated with a plurality of highest significance values.
[0013] FIG, 1 is a block diagram of an example computing device 100. Computing device 100 may include a smartphone, ceil phone, tablet, laptop, desktop, server, application-specific computing device, any other processing device or equipment. In some examples, computing device 100 may also include any combination of one or more computing devices of the same type or of different types. For example, computing device 100 may include at least a server device communicatively coupled to a client device. As illustrated in FIG. 1. computing device 100 may include a segmentation engine 1 12, a segment analyzer 1 13, a graphical user interface (GUI) engine 1 14, a memory 1 18, and a processor 1 17.
[0014] Engines 1 12, 1 13, and 1 14 may each generally represent any combination of hardware and programming that may be embedded in computing device 100 or communicatively coupled thereto. Engines 1 12, 1 13, and 1 14 may correspond to separate modules or be a part of the same module. Memory 1 16 may also be embedded in computing device 100 or communicatively coupled thereto, and may include any type of volatile or non-volatile memory, such as a random-access memory (RAM), flash memory, hard drive, memristor-based memory, and so forth. Processor 1 17 may include, for example, one or multiple processors (e.g., central processing units (CPUs), semiconductor-based microprocessors, graphics processing units (GPUs), field-programmable gate arrays (FPGAs) configured to retrieve and execute instructions, or other electronic circuitry), which may be integrated in a single device or distributed across devices. As illustrated in FIG. 1 , computing device 100 may also be communicatively coupled (e.g., through GUI engine 1 14) to display 1 18, which may or may not be embedded in computing device 100. Display 1 18 may be implemented using any suitable technology, such as LCD, LED, OLED, TFT, Plasma, etc. in some implementations, display 1 18 may be a touch-sensitive display.
[0015] Segmentation engine 1 12 may obtain a plurality of data entries. The plurality of data entries may be stored in a memory of computing device 100 (e.g., memory 1 16) and/or in a memory of another device that is communicatively coupled to computing device 100, e.g., via one or more networks, such as the Internet. The plurality of data entries may be stored in a single database or file or in multiple databases or files, and may be organized in a single data table, in multiple data tables, or in any other type of data structure(s).
[0016] An example plurality of data entries 210 (e.g., 210-1 , 210-2, etc.) is shown in FIG. 2. Each data entry 210 may describe, for example, an event, a state, a status, and so forth, or a summary of events, states, status, etc. For instance, in the example Illustrated in FIG. 2 each data entry 210 describes a summary of sales of a particular type of product by a particular vendor, In a particular country, during a particular quarter. As another example, each data entry may describe a network access attempt, indicating the source, the target, the manner, the time, and other attributes associated with the attempt. As additional examples, data entries may describe blood sample of patients, climate measurements, crime statistics, or any other type of quantifiable data that may change over time.
[0017] Each data entry 210 may Include or be associated with temporal information that may describe, for example, a point in time or a period of time (e.g., 2015-Q1 ) corresponding to the particular event, state, status, etc., described by the particular data entry. In some examples, the temporal information may be included in each data entry; in other examples, each data entry may be associated with temporal information associated with the data table that includes the data entry. For example, as illustrated in FIG. 2, data entries 210 may be stored in a plurality of data tables, where each data table is associated with a different point or period of time.
[0018] Each data entry may also include or be associated with a plurality of attribute values corresponding to a plurality of attributes. For example, in FIG. 2, data entry 210-1 has attribute values "1.1 M," "750K," "A," "USA," "Tower server," "Intel," "1 ," etc., corresponding to attributes Revenue, Units, Vendor, Country, Product Type, Processor Type, Max Processors, etc., respectively. The attribute values may be numeric, alphabetic, alphanumeric, or of any other type.
[0019] After obtaining the plurality of data entries, segmentation engine 1 12 may group data entries into a set of one or more segments, some of which may overlap, meaning that some data entries may be included in more than one segment. Each segment may include data entries corresponding to the same time and sharing at least one attribute value of at least one attribute. Put differently, each segment may be defined by a particular time and a set of one or more attribute values corresponding to a set of one or attributes. For example, a segment may include all data entries 210 from quarter 2015-Q1 whose Vendor attribute is set to "A" (e.g., 210-1 , 210-2, and 210-5). As another example, a segment may include ail data entries 210 from quarter 2015-Q1 having a Vendor attribute set to "A" and a Country attribute set to "USA" (e.g., 210-1 and 210-2). in some examples, in addition to the segments defined by various combinations of attribute values, the set of segments may also include a "global" segment, i.e., a segment that includes ail data entries associated with a particular time.
[0020] In some examples, segmentation engine 1 12 may determine all possible segments for data entries of the same time, i.e., ail possible combinations of one or more attribute values that would yield a segment that includes at least one data entry 210. In other examples, segmentation engine 1 12 may obtain (e.g., from GUI engine 1 14) a user input indicating a set of attributes of interest selected by the user from the plurality of attributes. In these examples, segmentation engine 1 12 may determine the set of segments based only on attribute values corresponding to the attributes of interest. For example, segmentation engine 1 12 may determine the set of segments by determining all possible combinations of attribute values of attributes of interest that would yield a segment comprising at least one data entry. For example, if the user selects only "Vendor" and "Country" as attributes of interest, then engine 1 12 may determine the set of segments such that each segment includes data entries that either have the same vendor or the same country, or both. Such segments may include, for example, segment "vendor=A" (comprising at least data entries 210- 1 , 210-2, and 210-5), segment "country=Germany" (comprising at least data entries 210-6 and 210-7), segment "vendor=B, country=Russia" (comprising at least data entry 210-4). and so forth. By selecting a particular set of attributes of interest from the entire plurality of attributes, the user can reduce the number of segments, and focus on particular types of segments.
[0021 ] in some examples, the user may further reduce the number and/or the size of segments by using various filters. For example, the user may input one or more values or regular expressions (e.g., "Country=Brazil" OR "Country=Argentina") that need to be found in or matched by a data entry for that data entry to be included in a segment (or a sub-segment, as discussed below). Reducing the number and/or the size of the segments and sub-segments, while maintaining the ability to identify significant data changes that were previously unnoticed, can greatly improve the performance of computing device 1 10 (e.g., by reducing its processing time, memory consumption, power consumption, etc.) while also providing great improvements to the field of data analytics.
[0022] After segmentation engine 1 12 determines the set of segments for data entries of a particular time, segmentation engine 1 12 may determine (or "select") a set of one or more sub-segments for each segment. For example, segment "vendor^A, country=USA" (comprising entries 210-1 , 210-2, 210-5) may have the following sub-segments: "vendor=A, country=USA. product type=tower server" (comprising entries 210-1 and 210-2); "vendor=A, country=USA, product type=blade server" (comprising entries 210-5); "vendor=A, country=USA, max processors^ " (comprising entries 210-1 and 210-2); "vendor=A, country=USA, product type=AMD, max processors=1 " (comprising entries 210-2); and so on. Thus, each segment may include a number of sub-segments, where each sub- segment may be defined by at least one value of at least one additional attribute, i.e., an attribute that is not used to define the segment itself. As illustrated above, while some sub-segments may be defined by only one additionai value (of one additional attribute), other sub-segments may be defined by two or more additional values (of two or more additional attributes). The examples discussed above are further illustrated in FIG. 3. FIG. 3 shows an example set of segments 305 and sub-segments (e.g., 310-1 , 310-2, 310-3, etc.) that may be determined by segmentation engine 1 12 based on data entries of 2015-Q1.
[0023] With continued reference to FIG. 1 , in some examples, segmentation engine 1 12 may determine all possible sub-segments that can be defined for that segment as described above. In other examples, the set of sub-segments may include only sub-segments defined by attributes of interest, which, as mentioned above, may be predefined and/or selected by the user. Thus, in some examples, both the segments and the sub-segments may be defined by various (in some examples - by ail) combinations of values of attributes of interest, and segmentation engine 1 12 may disregard the values of other attributes when determining the sets of segments and sub-segments. Also, as described above, the user may further reduce the number and/or the size of sub-segments by using various filters that need to be matched by ail data entries included in the sub- segments.
[0024] In some examples, one of the attributes of data entries ma be predefined and/or selected by the user to be the quantifying attribute based on which the sizes of segments and sub-segments are to be calculated, as discussed below. In the example of FIG. 1 , the quantifying attribute can be "Revenue" or "Units," for example. In some examples, segmentation engine 1 12 may disregard the values of the quantifying attribute when determining the set of segments and their respective sub-segments.
[0025] in some examples, for every segment or sub-segment associated with a particular time, segmentation engine 1 12 may find the same segment or sub- segment associated with a different time, that is, a segment or sub-segment defined by the same set of attribute values but associated with a different time. For example, segmentation engine 1 12 may identify a segment "vendor^A, country=USA" associated with 2015-Q1 (i.e., comprising data entries corresponding to 2015-Q1 ), and identify the same segment "vendor=A, country=USA" associated with 2014-Q4 (i.e., comprising data entries corresponding to 2014-Q4). In some examples, segmentation engine 1 12 may identify and disregard any segments or sub-segments that are only associated with one time, in other examples, if a segment or a sub-segment is only associated with one time, segmentation engine 1 12 may assume that the segment or sub-segment has at least one virtual entry associated with at least one other time, the virtual entry having all its values set to zero.
[0026] After segmentation engine 1 12 determines the various segments and sub-segments for the plurality of data entries, as discussed above, segment analyzer 1 13 may analyze the plurality of data entries in accordance with those determinations. In some examples, segment analyzer 1 13 may determine a share of each sub-segment in its respective segment. For example, segment analyzer 1 13 may determine a share of sub-segment x in a segment y by calculating an empiric conditional probability pt(x|y) using the following formula:
[0027] pt(x \y) = St(x)/St(y)
where St(x) and St(y) represent the size of sub-segment x and the size of segment y, respectively, at time t. in some examples, to determine the sizes St(x) and St y , segment analyzer 1 13 may calculate a sum of quantifying attribute values of all data entries within sub-segment x and segment y, respectively. For example, referring to FIG. 2, if the measurement attribute is "Revenue" then the size of a segment (or a sub-segment) defined as "vendor=A, Country=USA" for time period 2015-Q1 may be calculated as a sum of revenue values of data entries 210-1 (1 .1 M), 210-2 (718M), 210-5 (2.2M), and any other entries in that segment/sub-segment (not shown in FIG. 2 for brevity).
[0028] In some examples, segment analyzer 1 13 may determine a significance value for each sub-segment's share in its segment at a certain time (e.g., fe). As will be illustrated in the following examples, the significance value may generally represent the extent to which a particular sub-segment's share and/or the change of the share is likely to be of interest to the particular user. [0029] in some examples, the significance value may be determined based on the Maha!anobis distance between the sub-segment's share at time fcand the predicted distribution of the sub-segment's share at time I2. in some examples, it may be assumed that the distribution of the sub-segment's share in a given segment is a Gaussian distribution, in which case the Mahalanobis distance may be expressed as -?---2----i-------- where μ and σ are the mean and the standard deviation of the sub-segment's share in the segment. In some examples, the sub- segment's share at time ti- may be predicted to be identical to its share at an earlier (e.g., the closest earlier) time , meaning that = ρ, , ίχ ΐ ν). In addition, it the standard deviation of the segment at time h may be predicted to be identical to its standard deviation at time , meaning that σ = oti (y). Accordingly, in some examples, the Mahalanobis distance may be assumed to be a distance to a zero- order Gaussian prediction of Pt2 (x\y"), and may be calculated as follows:
Figure imgf000011_0001
[0030] In other examples, instead of a zero-order Gaussian prediction, segment analyzer 1 13 may use Mahalanobis distance to other types of predicted distributions to determine the significance value. For example, segment analyzer 1 13 may analyze historical data entries using deep learning techniques or other machine learning methods to determine the prediction for piz (xjy). Such methods can take into account various additional factors, such as trends, seasonality, sub- segment similarities, segment similarities, and so forth.
[0031 ] in some examples, the significance value may also be determined by segment analyzer 1 13 based on various relevance factors, i.e., factors indicating or predicting the extent to which the change in the particular sub-segment's share is relevant to the particular user. Such factors may include, for example, the size of the segment (e.g., at time h), For example, if a user is more interested in changes occurring in larger segments, segment analyzer 1 13 may increase the significance value as the segment size increases, and decrease the significance value as the segment size decreases, in some examples, segment analyzer 1 13 may also change the significance value based on one or more user inputs. For example, segment analyzer 1 13 may determine based on one or more historical user inputs (e.g., using machine learning or other types of methods) that some types of segments or segment share changes are more relevant or interesting to the user than others. Accordingly, the relevance factors may in some examples include an adjustable weight value that may be initially set to a default value (e.g., 1 ) and then dynamically increased and/or decreased by segment analyzer 1 13 based on user inputs, as further discussed below. Relevance factors may also include or be associated with trends, seasonality, sub-segment similarities, segment similarities, and various other factors.
[0032] To illustrate the examples discussed above, the significance value may be calculated by segment analyzer 1 13, for example, using the following formula: pil>t2 ( . y) = [D% 2St2 (y)WXty where ptl,t2 (.x, y) is the significance value of a share change of sub-segment x in segment y between times and fe, >;''' J is the Mahalanobis distance (e.g., to a zero-order Gaussian prediction of p (x \y')), ¾ ( ') is the size of segment y at time ?2, and Wx y is the adjustable weight value associated with sub-segment x and segment y, as discussed above. It is appreciated that the example formula provided above is illustrative only, and that segment analyzer 1 13 may calculate the significance value using any other formula that takes into account fewer or more factors and that is consistent with the various examples discussed above.
[0033] After segment analyzer 1 13 calculates significance values for share changes of all sub-segments in their respective segments, GUI engine 1 14 can provide for display (e.g., on display 1 18) information about sub-segments and segments whose share changes are associated with highest significance values, as illustrated in the example of FIG. 4. In some examples, the information (e.g., 410) may be presented in a descending order of significance values, and may indicate, for each sub-segment/segrnent pair, the attribute values defining the sub-segment (e.g., 41 1 ) and the segment (e.g., 412), the new share (e.g., 414), and the previous share (e.g., 413) of the sub-segment in the segment. GUI engine 1 14 may also provide for display any additional information (not shown for brevity) describing or associated with the sub-segment, its segment, and the change in the share, in some examples, the additional information may be displayed in a graphical and/or textual manner, upon obtaining a user input (e.g., a touch or a click) associated with a particular sub-segment, segment, or share change, in some examples, GUi engine 1 14 may also provide visual indicators allowing the user to quickly determine the nature of the most significant share changes. For example, GUI engine 1 14 may display, for each share change, a shape and/or an arrow (e.g., 410) indicating whether the change was positive or negative, where the color and/or saturation of the shape may indicate the significance of the change.
[0034] As illustrated in FIG. 4, GUI engine 1 14 may also display a list of all attributes (e.g., 405) associated with the data entries, allowing the user to select attributes of interest; a text window 415 to collect user input indicating one or more filters to be applied during the segment/sub-segment determination; one or more selection widgets 420 to enable the user to select at least two times (e.g., time periods) to be compared; and a selection widget 425 to enable the user to select a quantifying attribute.
[0035] As also illustrated in FIG. 4, GUi engine 1 14 may also, upon receiving a user input (e.g., a touch or a click) selecting a particular share change, display a set of one or more additional share changes associated with the selected share change. Such additional share changes may include, for example, the most significant share changes of the same sub-segment as that of the selected share change in segments other than that of the selected share change.
[0036] As discussed above, in some examples (not illustrated in FIG. 4 for brevity), GUi engine 1 14 may also collect various inputs by the user based on which segment analyzer 1 13 may adjust weights associated with various sub- segment/segment pairs. For example, GUI engine 1 14 may determine which sub- segment/segment pairs are more interesting to the user based on which sub- segment/segment pairs are selected by the user, based on how long the user examines them, etc. In some examples, the user may explicitly indicate which pairs the user is interested in and/or which pairs the user is not interested in, e.g., by using one or more graphical widgets (e.g., "likes" and/or "dislikes") next to each displayed pair. Upon receiving such implicit or explicit indications from the user, segment analyzer 1 13 may increase (or decrease) the weights associated with the corresponding pairs, increasing (or decreasing) the significance values associated with these pairs, thereby increasing (or decreasing) the likelihood that these pairs would be displayed to the user in the future.
[0037] In the foregoing discussion, engines 1 12, 1 13, and 1 14 were described as any combinations of hardware and programming. Such components may be implemented in a number of fashions. The programming may be processor executable instructions stored on a tangible, non-transitory computer-readable medium and the hardware may include a processing resource for executing those instructions. The processing resource, for example, may include one or multiple processors (e.g., central processing units (CPUs), semiconductor-based microprocessors, graphics processing units (GPUs), field-programmable gate arrays (FPGAs) configured to retrieve and execute instructions, or other electronic circuitry), which may be integrated in a single device or distributed across devices. The computer-readabie medium can be said to store program instructions that when executed by the processor resource implement the functionality of the respective component. The computer-readable medium may be integrated in the same device as the processor resource or it may be separate but accessible to that device and the processor resource. In one example, the program instructions can be part of an installation package that when installed can be executed by the processor resource to implement the corresponding component, in this case, the computer-readabie medium may be a portable medium such as a CD, DVD, or flash drive or a memory maintained by a server from which the installation package can be downloaded and installed. In another example, the program instructions may be part of an application or applications already installed, and the computer-readable medium may include integrated memory such as a hard drive, solid state drive, or the like.
[0038] FIG. 5 is a flowchart of an example method 500. Method 500 may be described below as being executed or performed by a system or by a computing device such as computing device 100 of FIG. 1. Other suitable systems and/or computing devices may be used as well. Method 500 may be implemented in the form of executable instructions stored on at least one non-transitory machine- readable storage medium of the system and executed by at least one processor of the system. Alternatively or in addition, method 500 may be implemented in the form of electronic circuitry (e.g., hardware). In alternate examples of the present disclosure, one or more blocks of method 500 may be executed substantially concurrently or in a different order than shown in FIG. 5. In alternate examples of the present disclosure, method 500 may include more or less blocks than are shown in FIG. 5. In some examples, one or more of the blocks of method 500 may, at certain times, be ongoing and/or may repeat.
[0039] At block 505, method 500 may obtain a plurality of data entries stored in a memory, each data entry including a plurality of attribute values of a plurality of attributes. At block 510, the method may determine (e.g., by the processor) a set of segments, each segment being defined by a set of attribute values of a set of attributes. At block 515, the method may determine (e.g., by the processor) for each segment a set of sub-segments, each sub-segment being defined by at least one additional attribute value of at least one additional attribute not in the set of attributes associated with the segment. At block 520, the method may compute (e.g., by the processor), for each sub-segment of each segment, a significance value associated with a change in the sub-segment's share within the segment. At block 525, the method may determine a set of selected sub-segments based on the significance value computed for each sub-segment of each segment. At block 530, the method may provide for display (e.g., on display 1 18) a visual representation of the set of selected sub-segments. As discussed above, in some examples, the method may include fewer blocks or additional blocks not shown in FIG. 5 for brevity.
[0040] FIG. 6 is a block diagram of an example computing system 600. Computing device 600 may be similar to computing device 100 of FIG. 1 . in the example of FIG. 6, computing device 600 includes a processor 610 and a non- transitory machine-readable storage medium 620. Although the following descriptions refer to a single processor and a single machine-readable storage medium, it is appreciated that multiple processors and multiple machine-readable storage mediums may be anticipated in other examples. In such other examples, the instructions may be distributed (e.g., stored) across multiple machine-readable storage mediums and the instructions may be distributed (e.g., executed by) across multiple processors.
[0041 ] Processor 810 may be one or more central processing units (CPUs), microprocessors, and/or other hardware devices suitable for retrieval and execution of instructions stored in non-transitory machine-readable storage medium 820. In the particular example shown in FIG. 6, processor 610 may fetch, decode, and execute instructions 622, 624, 626, 628, 630, or any other instructions (not shown for brevity). As an alternative or in addition to retrieving and executing instructions, processor 610 may include one or more electronic circuits comprising a number of electronic components for performing the functionality of one or more of the instructions in machine-readable storage medium 620. With respect to the executable instruction representations (e.g., boxes) described and shown herein, it should be understood that part or all of the executable instructions and/or electronic circuits included within one box may, in alternate examples, be included in a different box shown in the figures or in a different box not shown.
[0042] Non-transitory machine-readable storage medium 620 may be any electronic, magnetic, optical, or other physical storage device that stores executable instructions. Thus, medium 620 may be, for example, Random Access Memory (RAM), an Electrically-Erasable Programmable Read-Only Memory (EEPROM), a storage drive, an optical disc, and the like. Medium 620 may be disposed within computing device 600, as shown in FIG. 6. In this situation, the executable instructions may be "installed" on computing device 600. Alternatively, medium 620 may be a portable, external or remote storage medium, for example, that allows computing device 600 to download the instructions from the portable/external/remote storage medium. In this situation, the executable instructions may be part of an "installation package". As described herein, medium 620 may be encoded with executable instructions.
[0043] Referring to FIG. 6, instructions 622, when executed by a processor (e.g., 610), may cause a computing device (e.g., 600) to obtain a plurality of data entries, each data entry comprising a plurality of attribute values of a plurality of attributes, instructions 624, when executed by the processor, may cause the computing device to determine a set of segments, each segment being defined by a set of attribute values of a set of attributes. Instructions 826, when executed by the processor, may cause the computing device to determine for each segment a set of sub- segments, each sub-segment being defined by at least one additional attribute value of at least one additional attribute not in the set of attributes associated with the segment. Instructions 828, when executed by the processor, may cause the computing device to compute, for each sub-segment of each segment, a distance between the sub-segment's share in the segment and a predicted distribution of the sub-segment's share in the segment, instructions 630, when executed by the processor, may cause the computing device to select (and in some examples, provide for display) at least one sub-segment of at least one segment based at least on the computed distance.

Claims

CLASMS
1 . A method comprising:
obtaining a plurality of data entries stored in a memory, each data entry comprising a plurality of attribute values of a plurality of attributes;
determining, by a processor, a set of segments, each segment being defined by a set of attribute values of a set of attributes;
for each segment, determining, by the processor, a set of sub-segments, each sub-segment being defined by at least one additional attribute value of at least one additional attribute not in the set of attributes associated with the segment;
for each sub-segment of each segment, computing, by the processor, a significance value associated with a change in the sub-segment's share within the segment;
determining a set of selected sub-segments and segments based on the significance value computed for each sub-segment of each segment; and
providing for display a visual representation of the set of selected sub- segments and segments.
2. The method of claim 1 , further comprising receiving a user input selecting a set of attributes of interest from the plurality of attributes, wherein the set of segments determined by the processor comprises at least one segment for every subset of the set of attributes of interest.
3. The method of claim 1 , wherein the significance value is computed for each sub-segment in each segment based at least on a distance between the sub-segment's share within the segment and a predicted distribution of the sub- segment's share within the segment.
4. The method of claim 3, wherein the distance comprises a Mahalonobis distance.
5. The method of claim 1 , wherein the significance value is computed for each sub-segment in each segment based at least on a difference between the sub-segment's share within the segment at a first time and the sub-segment's share within the segment at a second time.
8. The method of claim 5, wherein the significance value is computed further based at least one of:
a standard deviation of the segment;
a size of the segment; and
an adjustable weight associated with the sub-segment and the segment, wherein the adjustable weight is adjustable based on user inputs.
7. The method of claim 1 , further comprising:
receiving a user input comprising a filter value; and
filtering the set of segments based on the filter value.
8. A computing device comprising:
a segmentation engine to:
divide a plurality of data entries into a set of segments, where each segment is associated with a different set of attribute values, and
divide each segment in the set of segments into a set of sub- segments based on at least one attribute value other than the set of attribute values associated with the segment;
a segment analyzer to:
determine, for each sub-segment of each segment, a first share of the sub-segment in the segment at a first time and a second share of the sub-segment in the segment at a second time,
for each sub-segment of each segment, calculate a significance value based at least on the first share and the second share; and a graphical user interface (GUI) engine to:
display information about a plurality of sub-segments and segments associated with a plurality of highest significance values.
9. The computing device of claim 8, wherein the segment analyzer is to determine the first share of the sub-segment in the segment at the first time based at least on a sum of attribute values of a quantifying attribute of all data entries, in the sub-segment, that are associated with the first time.
10. The computing device of claim 9, wherein the GUI engine is to obtain a user input selecting the quantifying attribute from a plurality of attributes associated with the plurality of data entries.
1 1. The computing device of claim 8, wherein the segment analyzer is to determine the significance value also based on at least one of:
a standard deviation of the segment at the first time;
a size of the segment at the second time; and
an adjustable weight associated with the sub-segment and the segment.
12. The computing device of claim 1 1 , wherein the GUI engine is further to obtain a user input associated with a first sub-segment of a first segment, and wherein the segment analyzer is to adjust the adjustable weight associated with the first sub-segment and the first segment based on the user input.
13. The computing device of claim 1 1 , wherein the GUI engine is further to obtain a user input associated with a first sub-segment of a first segment, and in response to the user input, display additional information associated with at least one of the first segment and a second segment comprising the first sub-segment.
14. A non-transitory machine-readable storage medium encoded with instructions executable by a processor of a computing device to cause the computing device to:
obtain a plurality of data entries, each data entry comprising a plurality of attribute values of a plurality of attributes;
determine a set of segments, each segment being defined by a set of attribute values of a set of attributes;
for each segment, determine a set of sub-segments, each sub-segment being defined by at least one additional attribute value of at least one additional attribute not in the set of attributes associated with the segment;
for each sub-segment of each segment, computing a distance between the sub-segment's share in the segment and a predicted distribution of the sub- segment's share in the segment; and
selecting at least one sub-segment of at least one segment based at least on the computed distance.
15. The non-transitory machine-readable storage medium of claim 14, wherein the instructions further cause the computing device to select the at least one sub-segment of the at least one segment based further on at least one of:
a standard deviation of the at least one segment;
a size of the at least one segment; and
an adjustable weight associated with the at least one sub-segment and the at least one segment.
PCT/US2015/061998 2015-11-20 2015-11-20 Segments of data entries WO2017087003A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/061998 WO2017087003A1 (en) 2015-11-20 2015-11-20 Segments of data entries

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/061998 WO2017087003A1 (en) 2015-11-20 2015-11-20 Segments of data entries

Publications (1)

Publication Number Publication Date
WO2017087003A1 true WO2017087003A1 (en) 2017-05-26

Family

ID=58717631

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/061998 WO2017087003A1 (en) 2015-11-20 2015-11-20 Segments of data entries

Country Status (1)

Country Link
WO (1) WO2017087003A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030212691A1 (en) * 2002-05-10 2003-11-13 Pavani Kuntala Data mining model building using attribute importance
JP2005203895A (en) * 2004-01-13 2005-07-28 Fuji Xerox Co Ltd Data importance evaluation apparatus and method
JP2008287698A (en) * 2007-05-16 2008-11-27 Fuji Xerox Co Ltd Indexing system and indexing program
US20090100454A1 (en) * 2006-04-25 2009-04-16 Frank Elmo Weber Character-based automated media summarization
EP1073272B1 (en) * 1999-02-15 2011-09-07 Sony Corporation Signal processing method and video/audio processing device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1073272B1 (en) * 1999-02-15 2011-09-07 Sony Corporation Signal processing method and video/audio processing device
US20030212691A1 (en) * 2002-05-10 2003-11-13 Pavani Kuntala Data mining model building using attribute importance
JP2005203895A (en) * 2004-01-13 2005-07-28 Fuji Xerox Co Ltd Data importance evaluation apparatus and method
US20090100454A1 (en) * 2006-04-25 2009-04-16 Frank Elmo Weber Character-based automated media summarization
JP2008287698A (en) * 2007-05-16 2008-11-27 Fuji Xerox Co Ltd Indexing system and indexing program

Similar Documents

Publication Publication Date Title
US11734233B2 (en) Method for classifying an unmanaged dataset
US10410138B2 (en) System and method for automatic generation of features from datasets for use in an automated machine learning process
Jung et al. Clustering performance comparison using K-means and expectation maximization algorithms
US10885059B2 (en) Time series trends
US9576248B2 (en) Record linkage sharing using labeled comparison vectors and a machine learning domain classification trainer
CA2898054C (en) Efficient query processing using histograms in a columnar database
WO2018103718A1 (en) Application recommendation method and apparatus, and server
US20170200205A1 (en) Method and system for analyzing user reviews
US9390142B2 (en) Guided predictive analysis with the use of templates
CN113435602A (en) Method and system for determining feature importance of machine learning sample
US20140019088A1 (en) Computer-Implemented Systems and Methods for Time Series Exploration
US20150220539A1 (en) Document relationship analysis system
Hung et al. Customer segmentation using hierarchical agglomerative clustering
Zhao et al. Price trend prediction of stock market using outlier data mining algorithm
US20190065550A1 (en) Query optimizer for combined structured and unstructured data records
CN111782824B (en) Information query method, device, system and medium
Fagan et al. Change point analysis of historical battle deaths
US20190205341A1 (en) Systems and methods for measuring collected content significance
US11361003B2 (en) Data clustering and visualization with determined group number
US20150170068A1 (en) Determining analysis recommendations based on data analysis context
WO2017087003A1 (en) Segments of data entries
Bajwa et al. A comprehensive comparative performance analysis of Laplacianfaces and Eigenfaces for face recognition
Janošcová Mining big data in weka
Wang et al. What's In a Name? Data Linkage, Demography and Visual Analytics.
Kim et al. Integer-valued GARCH processes for Apple technology analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15908987

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15908987

Country of ref document: EP

Kind code of ref document: A1