CN110442623B - Big data mining method and device and data mining server - Google Patents

Big data mining method and device and data mining server Download PDF

Info

Publication number
CN110442623B
CN110442623B CN201910728338.1A CN201910728338A CN110442623B CN 110442623 B CN110442623 B CN 110442623B CN 201910728338 A CN201910728338 A CN 201910728338A CN 110442623 B CN110442623 B CN 110442623B
Authority
CN
China
Prior art keywords
mining
service
data
service candidate
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910728338.1A
Other languages
Chinese (zh)
Other versions
CN110442623A (en
Inventor
罗茂锐
陈泉鑫
陈少海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Jiu Ling Creative Technology Ltd
Original Assignee
Xiamen Jiu Ling Creative Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Jiu Ling Creative Technology Ltd filed Critical Xiamen Jiu Ling Creative Technology Ltd
Priority to CN201910728338.1A priority Critical patent/CN110442623B/en
Publication of CN110442623A publication Critical patent/CN110442623A/en
Application granted granted Critical
Publication of CN110442623B publication Critical patent/CN110442623B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application provides a big data mining method, a big data mining device and a data mining server, different interestingness measurement dimensions are comprehensively considered, after all business big data of each dimension are clustered, the performance of the different interestingness measurement dimensions under different data mining project application-waiting scenes can be guaranteed to be more uniform, the big data mining capacity is improved, a plurality of data mining projects of services to be mined and the data dimension to be mined corresponding to each data mining project can be dynamically determined according to characteristic information of each cluster, follow-up data mining is carried out according to the data mining projects, and the problems that mining effect is poor or accuracy of data mining results is low due to the fact that fixed mining data dimensions are adopted in the prior art can be avoided.

Description

Big data mining method and device and data mining server
Technical Field
The application relates to the technical field of big data, in particular to a big data mining method and device and a data mining server.
Background
At present, for big data mining schemes such as various online services (for example, order behavior services, browsing behavior services, and the like), most of the big data mining schemes only use one interestingness measurement dimension, although some of the big data mining schemes pay attention to the research on attributes and behaviors of different interestingness measurement dimension modes, for a certain service to be mined, different interestingness measurement dimensions show different performance performances in different data mining project application scenarios, and the use limitation of the big data mining schemes limits the capacity of big data mining. Moreover, in the whole data mining process, fixed mining data dimensions are mostly adopted for mining. However, when the mining data dimension is too large, the fixed mining data dimensions may not achieve a better mining effect, or when the fixed mining data dimension is less, the fixed mining data dimensions may be more than actually needed, which wastes computational resources on one hand, and on the other hand, increases the probability that the accuracy of the data mining result is not high.
Disclosure of Invention
In order to overcome at least the above disadvantages in the prior art, one of the objectives of the present application is to provide a method, an apparatus, and a data mining server for mining big data, which can ensure that the performance of different interestingness measurement dimensions is more uniform in different scenarios of data mining items to be applied after clustering all business big data of each dimension by comprehensively considering different interestingness measurement dimensions, improve the capability of big data mining, dynamically determine a plurality of data mining items of the service to be mined and the data dimension to be mined corresponding to each data mining item according to the characteristic information of each cluster, and perform subsequent data mining according to the determined data mining items, thereby avoiding the problems in the prior art that mining effect is poor or the accuracy of the data mining result is not high when fixed mining data dimensions are adopted.
In a first aspect, the present application provides a big data mining method, which is applied to a data mining server in communication connection with each service server corresponding to a service to be mined, where the method includes:
acquiring service big data of multiple dimensions from each service server, and clustering all the service big data of the dimensions according to each dimension to obtain a cluster of each dimension;
extracting feature information of the clustering cluster of each dimension, and determining a plurality of data mining items of the service to be mined and the data dimension to be mined corresponding to each data mining item according to the feature information of the clustering cluster of each dimension;
according to the plurality of data mining items of the service to be mined and the dimensionality of the data to be mined corresponding to each data mining item, respectively acquiring business process data corresponding to the dimensionality of the data to be mined under each data mining item;
and obtaining the big data mining result of the service to be mined according to the business process data corresponding to the data dimension to be mined under each data mining item.
In a possible design of the first aspect, the step of obtaining a big data mining result of the service to be mined according to the business process data corresponding to the data dimension to be mined obtained under each data mining item includes:
obtaining a plurality of first mining service candidate item sets according to business process data corresponding to the data dimension to be mined under each data mining item, wherein each first mining service candidate item set comprises a plurality of mining service candidate items;
determining that mining service candidates identical to the preset mining service candidates in the preset mining service candidate retrieval table exist in the plurality of first mining service candidate sets according to a preset mining service candidate retrieval table, and using the mining service candidates as target mining service candidates of the plurality of first mining service candidate sets, wherein the preset mining service candidate retrieval table comprises the plurality of preset mining service candidates, a first association relation identifier for identifying every two preset mining service candidates with a first association relation and frequent item business levels of the preset mining service candidates with a second association relation, and the first association relation and the second association relation are respectively used for representing a strong association relation between frequent items and a weak association relation between frequent items;
according to the first incidence relation identification contained in the preset mining service candidate item retrieval table, determining a second mining service candidate item set of the preset mining service candidate items with a second incidence relation in each first mining service candidate item set with the preset mining service candidate items;
for each second mining service candidate item set, selecting one preset mining service candidate item as a parent mining service candidate item and other preset mining service candidate items as child mining service candidate items according to the corresponding frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table;
and obtaining a big data mining result of the service to be mined according to the parent mining service candidate item and the child mining service candidate item.
In a possible design of the first aspect, the step of selecting one preset mining service candidate as a parent mining service candidate and the other preset mining service candidates as child mining service candidates according to a frequent item service level of each preset mining service candidate in the second mining service candidate set in the preset mining service candidate retrieval table includes:
and selecting the preset mining service candidate item with the frequent item service level greater than other preset mining service candidate items as a parent mining service candidate item and taking other preset mining service candidate items as child mining service candidate items according to the corresponding frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table.
In a possible design of the first aspect, the step of obtaining a big data mining result of the service to be mined according to the parent mining service candidate and the child mining service candidate includes:
adding the parent mining service candidate into a specified mining item set, wherein the specified mining item set comprises mining strategies matched with the parent mining service candidate;
randomly generating a plurality of target sub mining service candidates from the plurality of sub mining service candidates according to the mining strategy matched with the parent mining service candidate and the proportion of the sub mining service candidates;
calculating the relevancy of each target sub mining service candidate item in the plurality of target sub mining service candidate items and the parent mining service candidate item;
according to the calculated degree of correlation between each target sub-mining service candidate item and the parent mining service candidate item, taking the target sub-mining service candidate item with the largest degree of correlation as a comparison sub-mining service candidate item, and selecting the remaining target sub-mining service candidate items in the plurality of target sub-mining service candidate items to obtain a selected target sub-mining service candidate item set;
carrying out crossover and mutation genetic operations on the selected target sub-mining service candidate item set to obtain a new target sub-mining service candidate item set;
calculating the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set, judging whether the new target sub mining service candidate item set meets a preset condition according to the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set and the relevance of the comparison sub mining service candidate items, and if so, outputting a big data mining result of the service to be mined, which corresponds to the new target sub mining service candidate item set and the parent mining service candidate item, according to the mining strategy.
In a possible design of the first aspect, the step of determining, according to feature information of a cluster of each dimension, a plurality of data mining items of the service to be mined and a dimension of the data to be mined corresponding to each data mining item includes:
analyzing the feature information of the clustering cluster of each dimension to obtain a high contribution value feature and a low contribution value feature;
calculating a first proportion of the high-contribution-value features in the feature information of the clustering cluster of each dimension and a second proportion of the low-contribution-value features in the feature information of the clustering cluster of each dimension;
determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion;
and determining the data dimension to be mined corresponding to each data mining project according to the plurality of data mining projects of the service to be mined and the contribution value of the service to be mined and a preset data dimension corresponding relation.
In one possible design of the first aspect, the determining the plurality of data mining items of the service to be mined according to the first and second ratios includes:
respectively determining a first mining coefficient of a high-contribution-value feature and a second mining coefficient of a low-contribution-value feature according to a first difference between the first proportion and a first set value and a second difference between the second proportion and a second set value;
determining a first proportion of data mining items corresponding to the high-contribution-value features and a second proportion of data mining items corresponding to the low-contribution-value features according to the first mining coefficient and the second mining coefficient;
and determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion.
In a second aspect, the present application further provides a big data mining apparatus, which is applied to a data mining server in communication connection with each service server corresponding to a service to be mined, where the apparatus includes:
the acquisition clustering module is used for acquiring the service big data of a plurality of dimensions from each service server, and clustering all the service big data of the dimension aiming at each dimension to obtain a clustering cluster of each dimension;
the extraction determining module is used for extracting the characteristic information of the clustering cluster of each dimension and determining a plurality of data mining items of the service to be mined and the dimension of the data to be mined corresponding to each data mining item according to the characteristic information of the clustering cluster of each dimension;
the data acquisition module is used for acquiring business process data corresponding to the dimensionality of the data to be mined under each data mining item according to the data mining items of the service to be mined and the dimensionality of the data to be mined corresponding to each data mining item;
and the data mining module is used for acquiring business process data corresponding to the dimensionality of the data to be mined under each data mining item to obtain a big data mining result of the service to be mined.
In a third aspect, an embodiment of the present application provides a data mining server, including a processor, a memory, and a network interface. The memory and the network interface processor can be connected through a bus system. The network interface is configured to receive a message, the memory is configured to store a program, instructions or code, and the processor is configured to execute the program, instructions or code in the memory to perform the operations of the first aspect or any possible design of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method of the first aspect or any possible design manner of the first aspect.
Based on any one of the above aspects, the method and the device have the advantages that different interestingness measurement dimensions are comprehensively considered, so that after all the business big data of each dimension are clustered, the performance of the different interestingness measurement dimensions under different data mining project to-be-applied scenes can be guaranteed to be more uniform, the big data mining capacity can be improved, a plurality of data mining projects of services to be mined and the data dimension to be mined corresponding to each data mining project can be dynamically determined according to the characteristic information of each clustering cluster, subsequent data mining can be performed according to the data mining performance, and the problems that the mining effect is poor or the accuracy of data mining results is low due to the fact that fixed mining data dimensions are adopted in the prior art can be solved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic view of an application scenario of a big data mining method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a big data mining method according to an embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating various sub-steps included in step S120 in one possible implementation shown in FIG. 2;
FIG. 4 is a flow chart illustrating various sub-steps included in step S140 in one possible implementation shown in FIG. 2;
fig. 5 is a functional module schematic diagram of a big data mining device according to an embodiment of the present application;
fig. 6 is a block diagram schematically illustrating a structure of a data mining server for executing the above-described big data mining method according to an embodiment of the present application.
Detailed Description
The present application will now be described in detail with reference to the drawings, and the specific operations in the method embodiments may also be applied to the apparatus embodiments or the system embodiments. In the description of the present application, "at least one" includes one or more unless otherwise specified. "plurality" means two or more. For example, at least one of A, B and C, comprising: a alone, B alone, a and B in combination, a and C in combination, B and C in combination, and A, B and C in combination. In this application, "/" means "or, for example, A/B may mean A or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone.
Fig. 1 is a schematic view of an application scenario of a big data mining method according to an embodiment of the present application. In this embodiment, the application scenario may include a data mining server 100 and a plurality of service servers 200 communicatively connected to the data mining server 100. Wherein, the data mining server 100 can provide data mining services for a plurality of business servers 200. Each service server 200 may be a server that individually performs various online services, such as order services, transaction services, and the like.
Fig. 2 is a schematic flow chart of a big data mining method according to an embodiment of the present disclosure. In this embodiment, the big data mining method may be executed by the data mining server 100 shown in fig. 1, and the big data mining method will be described in detail below.
Step S110, obtaining the service big data of multiple dimensions from each service server 200, and clustering all the service big data of the dimension to obtain a cluster of each dimension.
In this embodiment, the service to be mined may be a mining service actually determined according to a user requirement, specifically, the service server 200 associated with the service to be mined may be selected according to a setting of a user, then, the service big data of a plurality of dimensions are obtained from the service servers 200, and for each dimension, all the service big data of the dimension are clustered, so that a cluster of each dimension is obtained.
And step S120, extracting the characteristic information of the clustering cluster of each dimension, and determining a plurality of data mining items of the service to be mined and the dimension of the data to be mined corresponding to each data mining item according to the characteristic information of the clustering cluster of each dimension.
In this embodiment, for example, windowing may be performed on the cluster of each dimension; inputting the cluster of each dimension in the window into a CCIPCA algorithm to calculate the characteristic information of the cluster of each dimension.
Step S130, according to the plurality of data mining items of the service to be mined and the dimensionality of the data to be mined corresponding to each data mining item, business process data corresponding to the dimensionality of the data to be mined is acquired under each data mining item.
In this embodiment, the business process data may include, but is not limited to, business history data in the dimension of the data to be mined and current real-time generated history data.
Step S140, obtaining the business process data corresponding to the data dimension to be mined under each data mining item, and obtaining the big data mining result of the service to be mined.
Based on the above steps, in this embodiment, different interestingness measurement dimensions are comprehensively considered, so that after all the business big data of each dimension are clustered, it can be ensured that the performance of the different interestingness measurement dimensions is more uniform in different data mining project application scenarios, the big data mining capability is improved, a plurality of data mining projects of the service to be mined and the data dimension to be mined corresponding to each data mining project can be dynamically determined according to the feature information of each cluster, and subsequent data mining is performed according to the data mining performance, thereby avoiding the problem that mining effect is poor or accuracy of the data mining result is low when fixed mining data dimensions are adopted in the prior art.
In one possible design, please refer to fig. 3, and the step S120 may specifically include the following sub-steps:
and a substep S121, analyzing and obtaining a high contribution value feature and a low contribution value feature from the feature information of the cluster of each dimension.
And a substep S122, calculating a first ratio of the high-contribution-value features in the feature information of the cluster of each dimension and a second ratio of the low-contribution-value features in the feature information of the cluster of each dimension.
And a substep S123, determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion.
And a substep S124, determining the data dimension to be mined corresponding to each data mining item according to the plurality of data mining items of the service to be mined and the contribution value of the service to be mined and a preset data dimension corresponding relation.
In a possible implementation manner, for sub-step S123, first, a first mining coefficient of the feature with a high contribution value and a second mining coefficient of the feature with a low contribution value are determined according to a first difference between the first percentage and a first set value and a second difference between the second percentage and a second set value. And then, determining a first proportion of data mining items corresponding to the high-contribution-value features and a second proportion of data mining items corresponding to the low-contribution-value features according to the first mining coefficient and the second mining coefficient, and finally determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion.
Based on the above steps, the present embodiment further considers the ratio of the high contribution value feature and the low contribution value feature in the feature information of the cluster of each dimension, so as to determine a plurality of data mining items of the service to be mined. Moreover, the subjective influence of fixed mining data dimension selection can be effectively reduced, and the mining error rate is reduced.
In one possible design, please refer to fig. 4, and the step S140 may specifically include the following sub-steps:
and a substep S141, obtaining a plurality of first mining service candidate item sets according to the business process data corresponding to the data dimension to be mined under each data mining item.
In this embodiment, each first mining service candidate set may include a plurality of mining service candidates. In this sub-step, a plurality of first mining service candidate sets may be obtained by matching the business process data with the reference process data of each mining service candidate.
And a substep S142, determining, according to a preset mining service candidate item search table, that there is a mining service candidate item identical to a preset mining service candidate item included in the preset mining service candidate item search table in the plurality of first mining service candidate items, and using the mining service candidate item as a target mining service candidate item of the plurality of first mining service candidate items.
In this embodiment, the preset mining service candidate retrieval table may include a plurality of preset mining service candidates, a first association relationship identifier for identifying every two preset mining service candidates having a first association relationship, and a frequent item service level of each preset mining service candidate having a second association relationship, where the first association relationship and the second association relationship are respectively used to represent a strong association relationship between frequent items and a weak association relationship between frequent items. Optionally, the strong association relationship may indicate that the two preset mining service candidates have association in a business context order, and the weak association relationship may indicate that the two preset mining service candidates do not have association in a business context order.
And a substep S143, determining a second mining service candidate item set of the preset mining service candidate items with a second association relation in each first mining service candidate item set with the preset mining service candidate items according to the first association relation identifier included in the preset mining service candidate item retrieval table.
And a substep S144, for each second mining service candidate item set, selecting one preset mining service candidate item as a parent mining service candidate item and the other preset mining service candidate items as child mining service candidate items according to the frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table.
And a substep S145, obtaining a big data mining result of the service to be mined according to the parent mining service candidate and the child mining service candidate.
Based on the above steps, the embodiment further considers the strong association relationship between the frequent items and the weak association relationship between the frequent items, and performs retrieval and mining based on the strong association relationship and the weak association relationship, so that the situation that the mining result deviates from the dimension of the data to be mined due to the misassociation mining of the data in the process of mining the frequent items can be avoided, and the mining accuracy is further improved.
As an optional implementation manner, for the sub-step S144, according to the frequent item service level of each preset mining service candidate in the second mining service candidate set corresponding to the preset mining service candidate in the preset mining service candidate retrieval table, the preset mining service candidate whose frequent item service level is greater than that of other preset mining service candidates is selected as a parent mining service candidate, and the other preset mining service candidates are used as child mining service candidates.
As an optional implementation manner, for the substep S145, in order to adaptively adjust mining for different mining service candidates, and facilitate enhanced mining for large data with a small data size, and improve mining efficiency, this embodiment may add the parent mining service candidate to a specified mining item set, where the specified mining item set includes a mining policy matched with the parent mining service candidate. And then, according to the mining strategy matched with the parent mining service candidate and the proportion of the child mining service candidates, randomly generating a plurality of target child mining service candidates from the child mining service candidates, and calculating the correlation degree of each target child mining service candidate in the plurality of target child mining service candidates and the parent mining service candidate.
On this basis, the target sub-mining service candidate with the largest correlation degree may be further used as a comparison sub-mining service candidate according to the calculated correlation degree between each target sub-mining service candidate and the parent mining service candidate, and remaining target sub-mining service candidates in the plurality of target sub-mining service candidates may be selected to obtain a selected target sub-mining service candidate set. And then, carrying out crossover and mutation genetic operations on the selected target sub-mining service candidate item set to obtain a new target sub-mining service candidate item set. And then, calculating the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set, judging whether the new target sub mining service candidate item set meets a preset condition according to the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set and the relevance of the comparison sub mining service candidate item, and if so, outputting a big data mining result of the service to be mined, which corresponds to the new target sub mining service candidate item set and the parent mining service candidate item, according to the mining strategy.
Based on the foregoing description, the data mining server 100 may send the big data mining result of the service to be mined to each corresponding business server 200.
Fig. 5 is a schematic diagram of functional modules of a big data mining apparatus 300 according to an embodiment of the present application, and in this embodiment, the big data mining apparatus 300 may be divided into the functional modules according to the foregoing method embodiment. For example, the functional blocks may be divided for the respective functions, or two or more functions may be integrated into one processing block. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, the division of the modules in the present application is schematic, and is only a logical function division, and there may be another division manner in actual implementation. For example, in the case of dividing each function module according to each function, the big data mining apparatus 300 shown in fig. 5 is only an apparatus diagram. The big data mining apparatus 300 may include an obtaining clustering module 310, an extraction determining module 320, a data obtaining module 330, and a data mining module 340, and the functions of the functional modules of the big data mining apparatus 300 are described in detail below.
The obtaining clustering module 310 is configured to obtain service big data of multiple dimensions from each service server 200, and cluster all service big data of each dimension to obtain a cluster of each dimension.
The extraction determining module 320 is configured to extract feature information of the cluster of each dimension, and determine a plurality of data mining items of the service to be mined and a data dimension to be mined corresponding to each data mining item according to the feature information of the cluster of each dimension.
The data obtaining module 330 is configured to obtain, according to the multiple data mining items of the service to be mined and the data dimension to be mined corresponding to each data mining item, business process data corresponding to the data dimension to be mined under each data mining item.
And the data mining module 340 is configured to obtain the business process data corresponding to the data dimension to be mined according to each data mining item, and obtain a big data mining result of the service to be mined.
In a possible design, the data mining module 340 may obtain the big data mining result of the service to be mined by:
obtaining a plurality of first mining service candidate item sets according to business process data corresponding to the data dimension to be mined under each data mining item, wherein each first mining service candidate item set comprises a plurality of mining service candidate items;
determining that mining service candidates identical to the preset mining service candidates in the preset mining service candidate retrieval table exist in the plurality of first mining service candidate sets according to a preset mining service candidate retrieval table, and using the mining service candidates as target mining service candidates of the plurality of first mining service candidate sets, wherein the preset mining service candidate retrieval table comprises the plurality of preset mining service candidates, a first association relation identifier for identifying every two preset mining service candidates with a first association relation and frequent item business levels of the preset mining service candidates with a second association relation, and the first association relation and the second association relation are respectively used for representing a strong association relation between frequent items and a weak association relation between frequent items;
according to the first incidence relation identification contained in the preset mining service candidate item retrieval table, determining a second mining service candidate item set of the preset mining service candidate items with a second incidence relation in each first mining service candidate item set with the preset mining service candidate items;
for each second mining service candidate item set, selecting one preset mining service candidate item as a parent mining service candidate item and other preset mining service candidate items as child mining service candidate items according to the corresponding frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table;
and obtaining a big data mining result of the service to be mined according to the parent mining service candidate item and the child mining service candidate item.
In a possible design, the data mining module 340 may specifically select one preset mining service candidate as a parent mining service candidate and other preset mining service candidates as child mining service candidates by:
and selecting the preset mining service candidate item with the frequent item service level greater than other preset mining service candidate items as a parent mining service candidate item and taking other preset mining service candidate items as child mining service candidate items according to the corresponding frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table.
In a possible design, the data mining module 340 may obtain the big data mining result of the service to be mined by:
adding the parent mining service candidate into a specified mining item set, wherein the specified mining item set comprises mining strategies matched with the parent mining service candidate;
randomly generating a plurality of target sub mining service candidates from the plurality of sub mining service candidates according to the mining strategy matched with the parent mining service candidate and the proportion of the sub mining service candidates;
calculating the relevancy of each target sub mining service candidate item in the plurality of target sub mining service candidate items and the parent mining service candidate item;
according to the calculated degree of correlation between each target sub-mining service candidate item and the parent mining service candidate item, taking the target sub-mining service candidate item with the largest degree of correlation as a comparison sub-mining service candidate item, and selecting the remaining target sub-mining service candidate items in the plurality of target sub-mining service candidate items to obtain a selected target sub-mining service candidate item set;
carrying out crossover and mutation genetic operations on the selected target sub-mining service candidate item set to obtain a new target sub-mining service candidate item set;
calculating the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set, judging whether the new target sub mining service candidate item set meets a preset condition according to the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set and the relevance of the comparison sub mining service candidate items, and if so, outputting a big data mining result of the service to be mined, which corresponds to the new target sub mining service candidate item set and the parent mining service candidate item, according to the mining strategy.
In one possible design, the extraction determining module 320 may specifically determine the plurality of data mining items of the service to be mined and the data dimension to be mined corresponding to each data mining item by:
analyzing the feature information of the clustering cluster of each dimension to obtain a high contribution value feature and a low contribution value feature;
calculating a first proportion of the high-contribution-value features in the feature information of the clustering cluster of each dimension and a second proportion of the low-contribution-value features in the feature information of the clustering cluster of each dimension;
determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion;
and determining the data dimension to be mined corresponding to each data mining project according to the plurality of data mining projects of the service to be mined and the contribution value of the service to be mined and a preset data dimension corresponding relation.
In one possible design, the extraction determining module 320 may specifically determine the plurality of data mining items of the service to be mined by:
respectively determining a first mining coefficient of a high-contribution-value feature and a second mining coefficient of a low-contribution-value feature according to a first difference between the first proportion and a first set value and a second difference between the second proportion and a second set value;
determining a first proportion of data mining items corresponding to the high-contribution-value features and a second proportion of data mining items corresponding to the low-contribution-value features according to the first mining coefficient and the second mining coefficient;
and determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion.
Fig. 6 is a schematic structural diagram of a data mining server 100 for performing the above big data mining method according to an embodiment of the present disclosure, and as shown in fig. 6, the data mining server 100 may include a network interface 110, a machine-readable storage medium 120, a processor 130, and a bus 140. The number of the processors 130 may be one or more, and one processor 130 is taken as an example in fig. 6; the network interface 110, the machine-readable storage medium 120, and the processor 130 may be connected by a bus 140 or otherwise, as exemplified by the connection by the bus 140 in fig. 6.
The machine-readable storage medium 120 is a computer-readable storage medium, and can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the method for establishing a knowledge base of robot auto-quiz in the embodiment of the present application (for example, the acquisition clustering module 310, the extraction determination module 320, the data acquisition module 330, and the data mining module 340 in the big data mining apparatus 300 shown in fig. 5). The processor 130 executes various functional applications and data processing of the terminal device by running the software programs, instructions and modules stored in the machine-readable storage medium 120, that is, implements the above big data mining method, which is not described herein again.
The machine-readable storage medium 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the machine-readable storage medium 120 may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory. In some examples, the machine-readable storage medium 120 may further include memory located remotely from the processor 130, which may be connected to the terminal device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor 130 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 130. The processor 130 may be a general-purpose processor, a Digital signal processor (Digital signal processor dsp), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
The data mining server 100 may interact with other devices (e.g., the business server 200) via the communication interface 110. Communication interface 110 may be a circuit, bus, transceiver, or any other device that may be used to exchange information. Processor 130 may send and receive information using communication interface 110.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims (8)

1. A big data mining method is applied to a data mining server which is in communication connection with each business server corresponding to a service to be mined, and the method comprises the following steps:
acquiring service big data of multiple dimensions from each service server, and clustering all the service big data of the dimensions according to each dimension to obtain a cluster of each dimension;
extracting feature information of the clustering cluster of each dimension, and determining a plurality of data mining items of the service to be mined and the data dimension to be mined corresponding to each data mining item according to the feature information of the clustering cluster of each dimension;
according to the plurality of data mining items of the service to be mined and the dimensionality of the data to be mined corresponding to each data mining item, respectively acquiring business process data corresponding to the dimensionality of the data to be mined under each data mining item;
acquiring business process data corresponding to the dimensionality of the data to be mined under each data mining item to obtain a big data mining result of the service to be mined;
the step of obtaining the big data mining result of the service to be mined according to the business process data corresponding to the data dimension to be mined under each data mining item comprises the following steps:
obtaining a plurality of first mining service candidate item sets according to business process data corresponding to the data dimension to be mined under each data mining item, wherein each first mining service candidate item set comprises a plurality of mining service candidate items;
determining that mining service candidates identical to the preset mining service candidates in the preset mining service candidate retrieval table exist in the plurality of first mining service candidate sets according to a preset mining service candidate retrieval table, and using the mining service candidates as target mining service candidates of the plurality of first mining service candidate sets, wherein the preset mining service candidate retrieval table comprises the plurality of preset mining service candidates, a first association relation identifier for identifying every two preset mining service candidates with a first association relation and frequent item business levels of the preset mining service candidates with a second association relation, and the first association relation and the second association relation are respectively used for representing a strong association relation between frequent items and a weak association relation between frequent items;
according to the first incidence relation identification contained in the preset mining service candidate item retrieval table, determining a second mining service candidate item set of the preset mining service candidate items with a second incidence relation in each first mining service candidate item set with the preset mining service candidate items;
for each second mining service candidate item set, selecting one preset mining service candidate item as a parent mining service candidate item and other preset mining service candidate items as child mining service candidate items according to the corresponding frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table;
and obtaining a big data mining result of the service to be mined according to the parent mining service candidate item and the child mining service candidate item.
2. The big data mining method according to claim 1, wherein the step of selecting one of the preset mining service candidates as a parent mining service candidate and the other preset mining service candidates as child mining service candidates according to the frequent business level of each of the preset mining service candidates in the second mining service candidate set in the preset mining service candidate retrieval table comprises:
and selecting the preset mining service candidate item with the frequent item service level greater than other preset mining service candidate items as a parent mining service candidate item and taking other preset mining service candidate items as child mining service candidate items according to the corresponding frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table.
3. The big data mining method according to claim 1, wherein the step of obtaining the big data mining result of the service to be mined according to the parent mining service candidate and the child mining service candidate comprises:
adding the parent mining service candidate into a specified mining item set, wherein the specified mining item set comprises mining strategies matched with the parent mining service candidate;
randomly generating a plurality of target sub mining service candidates from the plurality of sub mining service candidates according to the mining strategy matched with the parent mining service candidate and the proportion of the sub mining service candidates;
calculating the relevancy of each target sub mining service candidate item in the plurality of target sub mining service candidate items and the parent mining service candidate item;
according to the calculated degree of correlation between each target sub-mining service candidate item and the parent mining service candidate item, taking the target sub-mining service candidate item with the largest degree of correlation as a comparison sub-mining service candidate item, and selecting the remaining target sub-mining service candidate items in the plurality of target sub-mining service candidate items to obtain a selected target sub-mining service candidate item set;
carrying out crossover and mutation genetic operations on the selected target sub-mining service candidate item set to obtain a new target sub-mining service candidate item set;
calculating the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set, judging whether the new target sub mining service candidate item set meets a preset condition according to the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set and the relevance of the comparison sub mining service candidate items, and if so, outputting a big data mining result of the service to be mined, which corresponds to the new target sub mining service candidate item set and the parent mining service candidate item, according to the mining strategy.
4. The big data mining method according to claim 1, wherein the step of determining the plurality of data mining items of the service to be mined and the dimension of the data to be mined corresponding to each data mining item according to the feature information of the cluster of each dimension includes:
analyzing the feature information of the clustering cluster of each dimension to obtain a high contribution value feature and a low contribution value feature;
calculating a first proportion of the high-contribution-value features in the feature information of the clustering cluster of each dimension and a second proportion of the low-contribution-value features in the feature information of the clustering cluster of each dimension;
determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion;
and determining the data dimension to be mined corresponding to each data mining project according to the plurality of data mining projects of the service to be mined and the contribution value of the service to be mined and a preset data dimension corresponding relation.
5. The big data mining method according to claim 4, wherein the step of determining the plurality of data mining items of the service to be mined according to the first and second ratios comprises:
respectively determining a first mining coefficient of a high-contribution-value feature and a second mining coefficient of a low-contribution-value feature according to a first difference between the first proportion and a first set value and a second difference between the second proportion and a second set value;
determining a first proportion of data mining items corresponding to the high-contribution-value features and a second proportion of data mining items corresponding to the low-contribution-value features according to the first mining coefficient and the second mining coefficient;
and determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion.
6. A big data mining device is applied to a data mining server which is in communication connection with each business server corresponding to a service to be mined, and the device comprises:
the acquisition clustering module is used for acquiring the service big data of a plurality of dimensions from each service server, and clustering all the service big data of the dimension aiming at each dimension to obtain a clustering cluster of each dimension;
the extraction determining module is used for extracting the characteristic information of the clustering cluster of each dimension and determining a plurality of data mining items of the service to be mined and the dimension of the data to be mined corresponding to each data mining item according to the characteristic information of the clustering cluster of each dimension;
the data acquisition module is used for acquiring business process data corresponding to the dimensionality of the data to be mined under each data mining item according to the data mining items of the service to be mined and the dimensionality of the data to be mined corresponding to each data mining item;
the data mining module is used for acquiring business process data corresponding to the dimensionality of the data to be mined under each data mining item to obtain a big data mining result of the service to be mined;
the method for obtaining the big data mining result of the service to be mined according to the business process data corresponding to the data dimension to be mined under each data mining item comprises the following steps:
obtaining a plurality of first mining service candidate item sets according to business process data corresponding to the data dimension to be mined under each data mining item, wherein each first mining service candidate item set comprises a plurality of mining service candidate items;
determining that mining service candidates identical to the preset mining service candidates in the preset mining service candidate retrieval table exist in the plurality of first mining service candidate sets according to a preset mining service candidate retrieval table, and using the mining service candidates as target mining service candidates of the plurality of first mining service candidate sets, wherein the preset mining service candidate retrieval table comprises the plurality of preset mining service candidates, a first association relation identifier for identifying every two preset mining service candidates with a first association relation and frequent item business levels of the preset mining service candidates with a second association relation, and the first association relation and the second association relation are respectively used for representing a strong association relation between frequent items and a weak association relation between frequent items;
according to the first incidence relation identification contained in the preset mining service candidate item retrieval table, determining a second mining service candidate item set of the preset mining service candidate items with a second incidence relation in each first mining service candidate item set with the preset mining service candidate items;
for each second mining service candidate item set, selecting one preset mining service candidate item as a parent mining service candidate item and other preset mining service candidate items as child mining service candidate items according to the corresponding frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table;
and obtaining a big data mining result of the service to be mined according to the parent mining service candidate item and the child mining service candidate item.
7. A data mining server, comprising a machine-readable storage medium storing machine-executable instructions and a processor, wherein the processor, when executing the machine-executable instructions, implements the big data mining method according to any one of claims 1 to 5.
8. A readable storage medium having stored therein machine executable instructions which when executed perform the big data mining method of any of claims 1-5.
CN201910728338.1A 2019-08-08 2019-08-08 Big data mining method and device and data mining server Active CN110442623B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910728338.1A CN110442623B (en) 2019-08-08 2019-08-08 Big data mining method and device and data mining server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910728338.1A CN110442623B (en) 2019-08-08 2019-08-08 Big data mining method and device and data mining server

Publications (2)

Publication Number Publication Date
CN110442623A CN110442623A (en) 2019-11-12
CN110442623B true CN110442623B (en) 2021-08-27

Family

ID=68433851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910728338.1A Active CN110442623B (en) 2019-08-08 2019-08-08 Big data mining method and device and data mining server

Country Status (1)

Country Link
CN (1) CN110442623B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163625B (en) * 2020-10-06 2021-06-25 西安石油大学 Big data mining method based on artificial intelligence and cloud computing and cloud service center
CN113742472B (en) * 2021-09-15 2022-05-27 达而观科技(北京)有限公司 Data mining method and device based on customer service marketing scene
CN114780606B (en) * 2022-03-30 2022-10-14 上海必盈特软件系统有限公司 Big data mining method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699755A (en) * 2015-01-07 2015-06-10 中国电子科技集团公司第三十研究所 Intelligent multi-target comprehensive identification method based on data mining
CN105005570A (en) * 2014-04-23 2015-10-28 国家电网公司 Method and apparatus for mining massive intelligent power consumption data based on cloud computing
CN106033424A (en) * 2015-03-11 2016-10-19 哈尔滨工业大学深圳研究生院 A data mining method and device
CN106484844A (en) * 2016-09-30 2017-03-08 广州特道信息科技有限公司 Big data method for digging and system
CN107291734A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 A kind of method for digging of frequent item set, apparatus and system
CN108920609A (en) * 2018-06-28 2018-11-30 南方电网科学研究院有限责任公司 Electric power experimental data method for digging based on multi dimensional analysis
CN109446319A (en) * 2018-09-29 2019-03-08 昆明理工大学 A kind of biological medicine patent clustering method based on K-means

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7627620B2 (en) * 2004-12-16 2009-12-01 Oracle International Corporation Data-centric automatic data mining

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005570A (en) * 2014-04-23 2015-10-28 国家电网公司 Method and apparatus for mining massive intelligent power consumption data based on cloud computing
CN104699755A (en) * 2015-01-07 2015-06-10 中国电子科技集团公司第三十研究所 Intelligent multi-target comprehensive identification method based on data mining
CN106033424A (en) * 2015-03-11 2016-10-19 哈尔滨工业大学深圳研究生院 A data mining method and device
CN107291734A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 A kind of method for digging of frequent item set, apparatus and system
CN106484844A (en) * 2016-09-30 2017-03-08 广州特道信息科技有限公司 Big data method for digging and system
CN108920609A (en) * 2018-06-28 2018-11-30 南方电网科学研究院有限责任公司 Electric power experimental data method for digging based on multi dimensional analysis
CN109446319A (en) * 2018-09-29 2019-03-08 昆明理工大学 A kind of biological medicine patent clustering method based on K-means

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于数据挖掘的客户细分维度分析;范文婷;《商讯》;20190325;全文 *

Also Published As

Publication number Publication date
CN110442623A (en) 2019-11-12

Similar Documents

Publication Publication Date Title
CN110442762B (en) Big data processing method based on cloud platform big data
CN110442623B (en) Big data mining method and device and data mining server
KR102079860B1 (en) Text address processing method and device
CN109992601B (en) To-do information pushing method and device and computer equipment
CN108959370B (en) Community discovery method and device based on entity similarity in knowledge graph
CN108846749B (en) Partitioned transaction execution system and method based on block chain technology
EP3432157A1 (en) Data table joining mode processing method and apparatus
CN108304935B (en) Machine learning model training method and device and computer equipment
US20120303624A1 (en) Dynamic rule reordering for message classification
CN109361628B (en) Message assembling method and device, computer equipment and storage medium
JP2015526800A (en) Push business objects
CN113411404A (en) File downloading method, device, server and storage medium
CN110674182A (en) Big data analysis method and data analysis server
US9600251B1 (en) Enhancing API service schemes
CN108111591B (en) Method and device for pushing message and computer readable storage medium
CN110599278B (en) Method, apparatus, and computer storage medium for aggregating device identifiers
CN111198961A (en) Commodity searching method and device and server
CN111814052A (en) Mobile internet user management method, device, server and readable storage medium
CN115361295B (en) TOPSIS-based resource backup method, device, equipment and medium
CN114003648B (en) Identification method and device for risk transaction group partner, electronic equipment and storage medium
WO2023050670A1 (en) False information detection method and system, computer device, and readable storage medium
CN105045664A (en) Information processing device and information processing method
CN110555158A (en) mutually exclusive data processing method and system, and computer readable storage medium
CN113411364B (en) Resource acquisition method and device and server
CN112055076A (en) Multifunctional intelligent monitoring method and device based on Internet and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant