CN111723122A - Method, device and equipment for determining association rule between data and readable storage medium - Google Patents

Method, device and equipment for determining association rule between data and readable storage medium Download PDF

Info

Publication number
CN111723122A
CN111723122A CN201910208892.7A CN201910208892A CN111723122A CN 111723122 A CN111723122 A CN 111723122A CN 201910208892 A CN201910208892 A CN 201910208892A CN 111723122 A CN111723122 A CN 111723122A
Authority
CN
China
Prior art keywords
data
determining
key pair
support
data key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910208892.7A
Other languages
Chinese (zh)
Inventor
屠志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910208892.7A priority Critical patent/CN111723122A/en
Publication of CN111723122A publication Critical patent/CN111723122A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a method, a device, equipment and a readable storage medium for determining association rules among data, wherein the method comprises the following steps: determining a data key pair according to a data source, and determining the corresponding support degree of the data key pair; determining a support label corresponding to the data key pair according to the support degree and a threshold corresponding to the data source; and determining association rules between the data key pairs according to the support tags, and constructing a data tree according to the association rules. In the method, the device, the equipment and the readable storage medium provided by the disclosure, the data in the data source is stored in the format of the data key pair, and the field content of the data is conveniently expanded in the data processing process. In addition, in the method, the device, the equipment and the readable storage medium provided by the disclosure, the support degree of the data is determined before the association relationship is determined, and the support degree is converted into a discrete data form supporting the label, so that even if no shared parameter exists in the data, the association rule among the data can be determined due to the existence of the support label.

Description

Method, device and equipment for determining association rule between data and readable storage medium
Technical Field
The present disclosure relates to data processing technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for determining an association rule between data.
Background
Currently, with the development of network technology, online shopping has become a shopping mode commonly adopted by users. There are also a large number of cyber malls on the web where users can shop.
In the prior art, in order to find out potential operation risks and data anomalies, analysis is generally performed according to shopping data of a user, so as to determine the correlation among the data in the shopping data, and whether anomalous data exists in the shopping data is searched based on the correlation. In the prior art, Apriori algorithm is generally adopted to analyze data generated by a user during shopping, so as to find out common characteristics in the shopping data of the user.
However, the Apriori algorithm requires that data sources have the same parameters to be able to perform calculation, and the existing network mall has numerous data sources, such as a mobile phone end and a computer end, and may also include various media, such as web page data, client data, and the like, which causes different data sources to have different parameters, and causes the method in the prior art to be unable to meet the requirements.
Disclosure of Invention
The disclosure provides a method, a device and equipment for determining association rules among data and a readable storage medium, which are used for solving the problem that shopping data in multiple data sources cannot be analyzed simultaneously in the prior art.
A first aspect of the present disclosure provides a method for determining an association rule between data, including:
determining a data key pair according to a data source, and determining the corresponding support degree of the data key pair;
determining a support label corresponding to the data key pair according to the support degree and a threshold corresponding to the data source;
and determining association rules between the data key pairs according to the support tags, and constructing a data tree according to the association rules.
Another aspect of the present disclosure is to provide an apparatus for determining an association rule between data, including:
the conversion module is used for determining a data key pair according to a data source;
the support degree determining module is used for determining the support degree corresponding to the data key pair;
the label determining module is used for determining a support label corresponding to the data key pair according to the support degree and a threshold corresponding to the data source;
and the rule determining module is used for determining association rules between the data key pairs according to the support labels and constructing a data tree according to the association rules.
Still another aspect of the present disclosure is to provide an apparatus for determining an association rule between data, including:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method for determining the association rule between data as described in the first aspect above.
Yet another aspect of the present disclosure is to provide a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method for determining an association rule between data as described in the above first aspect.
The method, the device, the equipment and the readable storage medium for determining the association rule among the data have the technical effects that:
the method, the device, the equipment and the readable storage medium for determining the association rule among the data provided by the disclosure comprise the following steps: determining a data key pair according to a data source, and determining the corresponding support degree of the data key pair; determining a support label corresponding to the data key pair according to the support degree and a threshold corresponding to the data source; and determining association rules between the data key pairs according to the support tags, and constructing a data tree according to the association rules. In the method, the device, the equipment and the readable storage medium provided by the disclosure, the data in the data source is stored in the format of the data key pair, and the field content of the data is conveniently expanded in the data processing process. In addition, in the method, the device, the equipment and the readable storage medium provided by the disclosure, the support degree of the data is determined before the association relationship is determined, and the support degree is converted into a discrete data form supporting the label, so that even if no shared parameter exists in the data, the association rule among the data can be determined due to the existence of the support label.
Drawings
Fig. 1 is a flowchart illustrating a method for determining an association rule between data according to an exemplary embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for determining association rules between data according to another exemplary embodiment of the present invention;
fig. 3 is a block diagram illustrating an apparatus for determining an association rule between data according to an exemplary embodiment of the present invention;
fig. 4 is a block diagram illustrating an apparatus for determining an association rule between data according to another exemplary embodiment of the present invention;
fig. 5 is a block diagram illustrating an apparatus for determining an association rule between data according to an exemplary embodiment of the present invention.
Detailed Description
At present, in order to increase sales volume in the online shopping mall, daily shopping data of the user is usually analyzed to find out shopping characteristics, such as the probability of finding which goods the user purchases at the same time is high, and further, for example, the relationship between the amount of orders and the turnover number. These can be the basis for the administrator to make decisions. In the prior art, an Apriori algorithm is usually adopted to dig out the association relationship in the shopping data, and the algorithm has certain limitation when the types of data sources are more.
In the Apriori algorithm, a data set including the least content is constructed first, for example, an order includes a commodity a, and the order includes a commodity B. And then calculating the support degree of each set, and then filtering out a part of data sets (possibly all the data sets meet the support degree threshold) based on the support degree threshold. For the set meeting the support threshold, the support threshold is obtained by splicing the sets, for example, the order includes the commodity A, B at the same time, at this time, the support of the currently obtained data set may be calculated, and then the support is filtered based on the support threshold. Such steps are repeated until the stop condition position is satisfied. In this way, shared parameters are required to exist in the data sources so that they can be spliced to obtain a new data set. When the types of data sources are more, the data sources cannot be analyzed simultaneously.
In the scheme provided by the embodiment of the invention, data in a data source is converted into a form of a data key pair (key-value), the support degree of the data key pair in the data source is calculated in advance, and a support label of the data key pair is determined according to the support degree.
Fig. 1 is a flowchart illustrating a method for determining an association rule between data according to an exemplary embodiment of the present invention.
As shown in fig. 1, the method for determining an association rule between data provided in this embodiment includes:
step 101, determining a data key pair according to a data source, and determining a corresponding support degree of the data key.
The method provided by the embodiment can be executed by an electronic device with computing capability. The electronic device can obtain a data source. For example, the electronic device may be connected to a database so that the data source therein can be read, or the data source may be directly stored in the electronic device.
Specifically, the data source may be a data source completed based on big data statistics, and may be a data source obtained by processing the user shopping data. For example, the number of orders per day during a period of time, the sales per day during the period of time, and the information on the goods included in each order during the period of time may be included.
Further, the data source may include a plurality of pieces of data, and may specifically be in the form of a data table. For example, if the data source is the daily order size over a period of time, each date can correspond to a piece of data.
In practical application, the electronic device may convert a plurality of pieces of data in the data source into a data key pair format, i.e., a key-value format. The key refers to a data identifier corresponding to a piece of data, and the value refers to the content of the piece of data. In the conversion process, the key is a unique identifier of the data.
For example, according to the sequence in the data report, the identifier of the first row of data may be set to "flag 1", and the identifier of the second row of data may be set to "flag 2". For another example, the data identifiers may be determined one by one in the order of 1, 2, 3, and 4. And directly taking the content in the data report as the data content in the data key pair. For example, the report may include a plurality of field contents, and the field contents corresponding to each piece of data may be directly used as values in the data key pair.
Specifically, the support degree corresponding to each piece of data may be determined. The support may be a data field in the data, for example, the current day sales included in the data. The support degree can also be calculated according to the data content in the data, for example, the ratio of the current day sales to a base number, and the base number can be preset, for example, the average sales in a period of time.
When a plurality of data sources exist, the data in each data source can be converted into a data key pair format, and can also be placed in the same report. The data in the data source is set into the form of data key pairs, so that the number of fields in the data source is not limited, the field content is convenient to expand, and the data can be used more flexibly.
And 102, determining a support label corresponding to the data key pair according to the support degree and the threshold corresponding to the data source.
Further, the threshold values corresponding to different data sources can be predetermined. Due to different data sources, the corresponding thresholds will be different. For example, when the data source is the number of orders per day over a period of time, the threshold may be a value corresponding to the number of orders, such as 500. When the data source is a daily sales amount for a period of time, the threshold may be a value corresponding to the sales amount, such as 5 ten thousand dollars.
In practical application, in the method provided by this embodiment, the digital index of the support degree may be converted into a label index of the support label. Specifically, the support degree of each piece of data may be compared with a threshold, for example, if the support degree is less than the threshold, the support degree tag may be determined as "no", and if the support degree is greater than or equal to the threshold, the support degree tag may be determined as "yes". In addition, the support degree label can also be "higher", "lower", etc., and can be specifically set according to the requirements.
When a plurality of data sources need to be analyzed simultaneously and the types of the data sources are different, the association relationship among the data cannot be directly judged according to the support degree, so that the digital index of the support degree can be converted into discrete data such as a support degree label, and when the data sources are different, the adopted support index data are the same, namely, the support index data are support labels.
Specifically, the support degree of the data key pair may be bound to the data key pair, for example, the support tag may be written into the data content of the data key pair, i.e. the value. A field "support tag" may be added to the value and the determined support tag is written to the field.
And 103, determining association rules between the data key pairs according to the support labels, and constructing a data tree according to the association rules.
Further, in the method provided by this embodiment, association rules between each data key pair may also be determined. Specifically, a rule may be set, and if the rule is satisfied, it is determined that the data key pair has an association relationship, otherwise, it is determined that the data key pair does not have an association relationship. For example, if some field contents between two data key pairs are considered to be the same and the supporting labels are the same, then an association relationship is provided.
In actual application, the data keys can be processed one by one from top to bottom according to the sequence of the data keys in the report. For example, for the first row of data, the data key pairs with which the rule is satisfied may be determined in the report, and the data key pairs are considered to have an association relationship with the first row of data key pairs.
In this case, the child node that is most matched with the first row of data key pairs may be determined in the selected data key pairs. For example, an algorithm may be set to sort the data key pairs, and then select the child nodes according to the sorting result. At this time, the first row data and the sub data have an association relationship.
Specifically, the child node may be further determined by taking the child node as a root node. Specifically, the child node may be determined in the data after the child node is in the report until the last node of the child node is determined.
Further, other nodes with undetermined association relation can be found out from the report, and are used as root nodes to continuously find child nodes. In this way, a plurality of associations between data in the entire report can be obtained, and the associations have pointing information, such as data 1-data 3-data 5. Data 1 points to data 3 and data 3 points to data 5.
In actual application, the data tree can be constructed based on the obtained association relationship. Multiple nodes are included in the data tree, and a node may be both a parent node and a child node. For example, data 3 in the above example is a child node of data 1, and data 3 is a parent node of data 5. Through the structure of the data tree, the user can clearly see the relationship among the data, so that the user can determine the marketing means by utilizing the relationship more conveniently.
The method provided by the embodiment is used for determining the association relationship between data, and is executed by a device provided with the method provided by the embodiment, and the device is generally implemented in a hardware and/or software manner.
The method for determining the association rule between data provided by the embodiment includes: determining a data key pair according to a data source, and determining the corresponding support degree of the data key pair; determining a support label corresponding to the data key pair according to the support degree and a threshold corresponding to the data source; and determining association rules between the data key pairs according to the support tags, and constructing a data tree according to the association rules. In the method provided by the embodiment, the data in the data source is stored in the format of the data key pair, and the field content of the data is conveniently expanded in the data processing process. In addition, in the method provided by this embodiment, the support degree of the data is determined before the association relationship is determined, and the support degree is converted into a discrete data form of the support tag, so that even if no shared parameter exists in the data, the association rule between the data can be determined due to the existence of the support tag.
Fig. 2 is a flowchart illustrating a method for determining an association rule between data according to another exemplary embodiment of the present invention.
As shown in fig. 2, the method for determining an association rule between data provided in this embodiment includes:
step 201, determining a data identifier corresponding to each piece of data in the data source according to a preset rule, and determining a data key pair according to the data identifier and the data.
Before determining the association rule between the data, the electronic device needs to be enabled to acquire the corresponding data. In particular, a data source comprising data may be stored in a database, from which the electronic device may read the data source. A data source comprising data may also be stored directly on a magnetic disk of the electronic device so that it can be read by a processor of the electronic device. The data source may include a variety of data sources.
In particular, the electronic device may convert data in the data source into a format of a data key pair. A data identifier corresponding to each piece of data may be determined, and the data itself may be used as the data content corresponding to the data identifier, thereby forming a key pair. The data identification can be used as the unique identification of the data, so that the processing processes of reading, analyzing and the like of the data cannot be influenced when fields are added in the data and operations such as modification and the like are carried out.
Further, a general data source may be stored in a table form, and the content of a row is a piece of data. At this time, the corresponding data identifier may be determined according to the row identifier of the data. For example, the row of the first row of data is identified as "label 1", the row of the second row of data is identified as "label 2", and so on.
If a plurality of data sources of different types exist, the data in the data sources can be stored in the same table and sorted according to a certain rule. For example, the data identifiers may be sorted according to the data production time included in each piece of data, or sorted according to the number of fields included in the data, and then generated based on the sorting of each piece of data. In practical application, even if data in a plurality of data sources are rearranged, the data source information corresponding to each data can still be marked.
In addition, if the data association relationship between the multiple data sources does not need to be analyzed, and only the data association relationship in the same data source needs to be analyzed, the data in the multiple data sources can be processed at the same time, and the method provided by the embodiment is executed.
Step 202, determining the corresponding support degree of the data key pair.
Step 201 is similar to the specific principle and implementation manner of determining the support degree in step 101, and is not described herein again.
Step 203, acquiring a data source threshold corresponding to the data key pair.
Wherein, the data source identification corresponding to the data key pair can be determined first. For example, the data key pair is converted from data a, the data a belongs to a data source A, and the data source is identified as A. Each data key pair is obtained by converting one piece of data, and each piece of data has the data source to which the data key pair belongs, so that the corresponding data source can be determined for each data key pair.
Specifically, a support degree dictionary may be preset, in which thresholds corresponding to different data sources are stored. The data sources may be encoded to determine an identification for each data source, and a matching threshold may be assigned to each data source identification. The data source identification may also be used to indicate when determining the data source to which the data key pair belongs.
Further, a data source identifier corresponding to the data key pair may be obtained, and then a data source threshold corresponding to the identifier may be obtained in the preset support dictionary.
Step 204, comparing the support degree with a data source threshold value.
In step 205, if the support degree is greater than or equal to the data source threshold, it is determined that the support label corresponding to the data key pair is yes. And if the support degree is smaller than the data source threshold value, determining that the support degree label corresponding to the data key pair is negative.
In actual application, the support degree corresponding to the data key pair and the data source threshold corresponding to the data key pair can be compared. For example, if the support of a data key pair is 500 and the data source threshold corresponding to the data key pair is 560, the support is less than the data source threshold.
By comparing the support degree with the data source threshold value, the support degree can be converted into a discrete support degree label. Therefore, in the case of different data source types, the association relationship among the data included therein can be determined.
The specific conversion manner may be that, if the support degree is greater than or equal to the data source threshold, the support label corresponding to the data key pair is determined to be the first label. And if the support degree is smaller than the data source threshold value, determining that the support degree label corresponding to the data key pair is a second label. The specific content of the first label and the second label can be set according to requirements, for example, the first label can be yes, the second label can be no, and further, the first label can be higher, and the second label can be lower.
At step 206, the current parent node is determined based on the row identifier of the data key pair.
Specifically, the method provided by this embodiment may determine the association relationship between data in the data source. And the association relationship has direction information.
First, the current parent node may be determined in the data key pair. Specifically, the data key pair in the first row in the data report may be used as the current parent node. When summarizing data included in a plurality of data sources, all data can be sorted according to a certain rule, for example, data with fewer fields can be arranged at a pre-examination position, so that data with coarser granularity is used as a parent node.
Further, the data key pair of the first row may be directly determined as the current parent node. And then determining child nodes of the current parent node in other data key pairs.
And step 207, determining the alternative child node of the current parent node in the data key pair according to the preset field and the support label.
In practical application, the alternative child node matched with the current parent node can be determined in other data key pairs. In the method provided by this embodiment, a part of fields may be preset, and when the content of the data key pair is the same as that of the preset fields of the parent node and the supporting tags are also the same, the data key pair may be considered as a preselected child node.
Wherein the preset field comprises at least one of the following: date, time, index name.
Specifically, the support tag may be added to the data content of the data key pair as the field content, and at this time, the preset field may further include the support tag.
Further, the preset field can also be determined based on the specific field content in the data source.
And 208, sequencing the alternative child nodes, and determining the child node corresponding to the current parent node according to the sequencing result.
In practice, there may be multiple candidate child nodes for a current parent node. At this time, a child node of the current parent node may be determined among the plurality of candidate child nodes.
Wherein the alternative child nodes may be ordered. For example, each candidate child node may be calculated based on a preset function, a rank corresponding to the candidate child node is obtained, and then the child node is determined based on the rank.
Specifically, in one embodiment, the ranking result may be equal to (value-7 day average value)2Average daily value of/7. The numerical value may be the specific content of one field in the data key pair, for example, the sales amount, and the order number. The 7-day average value may be a 7-day average value corresponding to the field content, and may be, for example, an average sales amount of the last 7 days and an average amount of orders of the last 7 days.
Further, the sorting result corresponding to each candidate child node pair may be calculated according to the formula, and the candidate child node ranked first in the sorting result may be determined as the child node of the current parent node.
Step 209, determine the child node as the current parent node.
In actual application, after the child node is determined, the association relationship between the data key pair may be continuously determined based on the child node. Specifically, the current child node may be taken as the current parent node, and the step 207 is continued.
The above loop steps may be executed multiple times until there is no matching child node on the basis of the current parent node. At this time, the data key pairs with undetermined association relationship can be searched by traversing in the report in sequence, and the data key pairs are used as the current father nodes to continue executing the steps. In the process of determining the child node, the child node may be determined only in the data key pair for which the association relationship is not determined.
And step 210, constructing a data tree according to the relationship between the father node and the child node.
Specifically, the data tree may be constructed according to the determined parent node and child node. For example, a child node may be considered to be the left sub-tree of a parent node. Thereby constructing a left inclined tree. The structure of the data tree may also include other types, such as huffman trees, full binary trees, and the like.
Further, if the types of the data sources are more, according to the method provided by this embodiment, a plurality of data trees can be obtained.
The scheme provided by the present embodiment is explained below by a simple example.
For example, there is a batch of data lines:
Figure BDA0001999869650000101
firstly, the data line 1 is used as a current father node to find child nodes, firstly, the index names and the support degree labels are the same according to the association rule, and then, on the basis of the data line 1, the data line 1 plus 1 can determine that alternative child nodes are a data line 2 and a data line 3. The result of the ordering of data row 2 is calculated to be 450 and the result of the ordering of data row 3 is calculated to be 2400. After sorting, data line 3 is further forward, so data line 3 is a child of data line 1. The iterative computation, the child node of data row 3 is data row 5. Data line 5 has no next level child nodes and the drill-down iteration for data line 1 stops. At the time of output, we present the entire tree, data line 1- > data line 3- > data line 5.
Fig. 3 is a block diagram illustrating an apparatus for determining an association rule between data according to an exemplary embodiment of the present invention.
As shown in fig. 3, the apparatus for determining an association rule between data provided in this embodiment includes:
a conversion module 31, configured to determine a data key pair according to a data source;
a support degree determining module 32, configured to determine a support degree corresponding to the data key pair;
a tag determining module 33, configured to determine, according to the support degree and the threshold corresponding to the data source, a support tag corresponding to the data key pair;
and the rule determining module 34 is configured to determine association rules between the data key pairs according to the support tags, and construct a data tree according to the association rules.
The inter-data association rule device provided by the embodiment comprises a conversion module, a data key pair determining module and a data key pair determining module, wherein the conversion module is used for determining the data key pair according to a data source; the support degree determining module is used for determining the support degree corresponding to the data key pair; the label determining module is used for determining a support label corresponding to the data key pair according to the support degree and a threshold corresponding to the data source; and the rule determining module is used for determining association rules between the data key pairs according to the support labels and constructing a data tree according to the association rules. In the apparatus provided in this embodiment, data in the data source is stored in a data key pair format, and during the data processing, the field content of the data itself is conveniently expanded. In addition, in the apparatus provided in this embodiment, the support degree of the data is determined before the association relationship is determined, and the support degree is converted into a discrete data form of the support tag, so that even if no shared parameter exists in the data, the association rule between the data can be determined due to the presence of the support tag.
The specific principle and implementation of the apparatus for determining an association rule between data provided in this embodiment are similar to those of the embodiment shown in fig. 1, and are not described herein again.
Fig. 4 is a block diagram illustrating an apparatus for determining an association rule between data according to another exemplary embodiment of the present invention.
As shown in fig. 4, on the basis of the foregoing embodiment, in the apparatus for determining an association rule between data provided in this embodiment, optionally, the conversion module 31 is specifically configured to:
and determining a data identifier corresponding to each piece of data in the data source according to a preset rule, and determining the data key pair according to the data identifier and the data.
Optionally, the tag determining module 33 includes:
an obtaining unit 331, configured to obtain a data source threshold corresponding to the data key pair;
a comparing unit 332, configured to compare the support degree with the data source threshold;
a determining unit 333, configured to determine that the supporting label corresponding to the data key pair is a first label if the support degree is greater than or equal to the data source threshold.
Optionally, the determining unit 333 is further configured to:
and if the support degree is smaller than the data source threshold value, determining that the support degree label corresponding to the data key pair is a second label.
Optionally, the rule determining module 34 is specifically configured to:
the system comprises a data key pair, a parent node and a parent node, wherein the data key pair is used for identifying a row of the current parent node;
determining alternative child nodes of the current father node in the data key pair according to a preset field and the support label;
sequencing the alternative child nodes, and determining the child node corresponding to the current parent node according to the sequencing result;
and determining the child node as the current parent node, and continuing to execute the step of determining the alternative child node.
Optionally, the preset field includes at least one of the following:
date, time, index name.
Optionally, the data source comprises a variety of data sources.
Optionally, the rule determining module 34 is further configured to:
and constructing the data tree according to the relationship between the father node and the child node.
The specific principle and implementation of the apparatus for determining the association rule between data provided in this embodiment are similar to those of the embodiment shown in fig. 2, and are not described herein again.
Fig. 5 is a block diagram illustrating an apparatus for determining an association rule between data according to an exemplary embodiment of the present invention.
As shown in fig. 5, the apparatus for determining an association rule between data provided in this embodiment includes:
a memory 51;
a processor 52; and
a computer program;
wherein said computer program is stored in said memory 51 and configured to be executed by said processor 52 to implement any of the above-mentioned determination methods of association rules between data.
The present embodiments also provide a computer-readable storage medium, having stored thereon a computer program,
the computer program is executed by a processor to implement any of the above-described methods for determining an association rule between data.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (11)

1. A method for determining association rules between data is characterized by comprising the following steps:
determining a data key pair according to a data source, and determining the corresponding support degree of the data key pair;
determining a support label corresponding to the data key pair according to the support degree and a threshold corresponding to the data source;
and determining association rules between the data key pairs according to the support tags, and constructing a data tree according to the association rules.
2. The method of claim 1, wherein determining the data key pair from the data source comprises:
and determining a data identifier corresponding to each piece of data in the data source according to a preset rule, and determining the data key pair according to the data identifier and the data.
3. The method of claim 1, wherein the determining a support label corresponding to the data key pair according to the support degree and a threshold corresponding to the data source comprises:
acquiring a data source threshold corresponding to the data key pair;
comparing the support degree with the data source threshold value;
and if the support degree is greater than or equal to the data source threshold value, determining that the support label corresponding to the data key pair is a first label.
4. The method of claim 3, wherein if the support is less than the data source threshold, determining that the support label corresponding to the data key pair is a second label.
5. The method of claim 1, wherein determining association rules between the data key pairs according to the support labels comprises:
determining a current father node according to the row identification of the data key pair;
determining alternative child nodes of the current father node in the data key pair according to a preset field and the support label;
sequencing the alternative child nodes, and determining the child node corresponding to the current parent node according to the sequencing result;
and determining the child node as the current parent node, and continuing to execute the step of determining the alternative child node.
6. The method of claim 5, wherein the preset field comprises at least one of:
date, time, index name.
7. The method of any one of claims 1-6, wherein the data sources include a plurality of types of data sources.
8. The method of claim 5, wherein said building a data tree according to said association rule comprises:
and constructing the data tree according to the relationship between the father node and the child node.
9. An apparatus for determining association rules between data, comprising:
the conversion module is used for determining a data key pair according to a data source;
the support degree determining module is used for determining the support degree corresponding to the data key pair;
the label determining module is used for determining a support label corresponding to the data key pair according to the support degree and a threshold corresponding to the data source;
and the rule determining module is used for determining association rules between the data key pairs according to the support labels and constructing a data tree according to the association rules.
10. An apparatus for determining association rules between data, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of claims 1-8.
11. A computer-readable storage medium, having stored thereon a computer program,
the computer program is executed by a processor to implement the method according to any one of claims 1 to 8.
CN201910208892.7A 2019-03-19 2019-03-19 Method, device and equipment for determining association rule between data and readable storage medium Pending CN111723122A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910208892.7A CN111723122A (en) 2019-03-19 2019-03-19 Method, device and equipment for determining association rule between data and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910208892.7A CN111723122A (en) 2019-03-19 2019-03-19 Method, device and equipment for determining association rule between data and readable storage medium

Publications (1)

Publication Number Publication Date
CN111723122A true CN111723122A (en) 2020-09-29

Family

ID=72562386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910208892.7A Pending CN111723122A (en) 2019-03-19 2019-03-19 Method, device and equipment for determining association rule between data and readable storage medium

Country Status (1)

Country Link
CN (1) CN111723122A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112491991A (en) * 2020-11-17 2021-03-12 上海企翔智能科技有限公司 Industrial big data processing method and device and computer equipment
CN112527834A (en) * 2020-12-04 2021-03-19 海南大学 Cross-modal essence calculation and inference-oriented content query method and component

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112491991A (en) * 2020-11-17 2021-03-12 上海企翔智能科技有限公司 Industrial big data processing method and device and computer equipment
CN112491991B (en) * 2020-11-17 2023-01-06 上海企翔智能科技有限公司 Industrial big data processing method and device and computer equipment
CN112527834A (en) * 2020-12-04 2021-03-19 海南大学 Cross-modal essence calculation and inference-oriented content query method and component

Similar Documents

Publication Publication Date Title
WO2019214245A1 (en) Information pushing method and apparatus, and terminal device and storage medium
CN110147360B (en) Data integration method and device, storage medium and server
CN103336791B (en) Hadoop-based fast rough set attribute reduction method
CN103336790A (en) Hadoop-based fast neighborhood rough set attribute reduction method
CN111868710A (en) Random extraction forest index structure for searching large-scale unstructured data
US11238402B2 (en) Information operation
CN104834651A (en) Method and apparatus for providing answers to frequently asked questions
CN111723122A (en) Method, device and equipment for determining association rule between data and readable storage medium
US10467276B2 (en) Systems and methods for merging electronic data collections
US20210157847A1 (en) Attribute diversity for frequent pattern analysis
CN110765100B (en) Label generation method and device, computer readable storage medium and server
Zhang et al. Incremental graph pattern matching algorithm for big graph data
CN112596851A (en) Multi-source heterogeneous data batch extraction method and analysis method of simulation platform
CN111784402A (en) Multi-channel based order-descending rate prediction method and device and readable storage medium
CN114282119B (en) Scientific and technological information resource retrieval method and system based on heterogeneous information network
CN115630070A (en) Information pushing method, computer-readable storage medium and electronic device
CN115577147A (en) Visual information map retrieval method and device, electronic equipment and storage medium
CN112434140B (en) Reply information processing method and system
CN111026705B (en) Building engineering file management method, system and terminal equipment
CN111460088A (en) Similar text retrieval method, device and system
CN110609926A (en) Data tag storage management method and device
CN118092874B (en) API (application program interface) arrangement method and device based on atlas extension
CN115048543B (en) Image similarity judgment method, image searching method and device
CN114943004B (en) Attribute graph query method, attribute graph query device, and storage medium
CN113111132B (en) Method and device for identifying target user

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination