CN106528795B

CN106528795B - Data mining method and device

Info

Publication number: CN106528795B
Application number: CN201610991856.9A
Authority: CN
Inventors: 陈萌; 杜锐; 赵焕芳; 杨声钢; 苑洪林; 吴洋
Original assignee: Agricultural Bank of China
Current assignee: Agricultural Bank of China
Priority date: 2016-11-10
Filing date: 2016-11-10
Publication date: 2023-10-13
Anticipated expiration: 2036-11-10
Also published as: CN106528795A

Abstract

The application discloses a data mining method and a device, which are suitable for a data mining system, wherein the data mining system comprises a first cluster and a second cluster, the first cluster comprises a plurality of first servers, the second cluster comprises a plurality of second servers, the first servers are configured with a first mining model based on an ILog rule engine, and the second servers are configured with a second mining model based on an SAS, and the method comprises the following steps: receiving a data mining request including a request type; classifying the data mining request; transmitting a data mining request with a request type of a quick response type to a first cluster, and performing data mining processing on data by a first server based on the data mining request by using a first mining model to obtain a first mining result; transmitting the data mining request with the data type which is not the quick response type to a second cluster, and performing data mining processing on the data by a second server based on the data mining request by using a second mining model to obtain a second mining result.

Description

Data mining method and device

Technical Field

The present application relates to the field of data mining technologies, and in particular, to a data mining method and apparatus.

Background

With the development of science and technology, the rapid development of business intelligence and the daily and monthly variation of big data technology are considered, the value of big data is more and more important, particularly, a banking system can accumulate massive business data in the daily business handling process, and the big data is utilized for data mining, so that the mining result can be widely applied to various fields such as client marketing, product optimization, risk management and control, and the like, and the method has important significance for improving the core competitiveness.

Thus, there is a need for an implementation that can mine data effectively in real time.

Disclosure of Invention

Accordingly, the present application is directed to a data mining method and apparatus, which are used to solve the technical problem that the data cannot be mined effectively in real time in the prior art.

The application provides a data mining method, which is suitable for a data mining system, wherein the data mining system comprises a first cluster and a second cluster, the first cluster comprises a plurality of first servers, the second cluster comprises a plurality of second servers, the first servers are configured with a first mining model based on an ILog rule engine, and the second servers are configured with a second mining model based on an SAS, and the method comprises the following steps:

receiving at least one data mining request, wherein the data mining request at least comprises a request type;

classifying the data mining requests based on request types thereof;

transmitting a data mining request with a request type of a quick response type to the first cluster, and performing data mining processing on data in a data source by a first server in the first cluster based on the data mining request by using the first mining model to obtain a first mining result;

transmitting a data mining request with a data type which is not a quick response type to the second cluster, and performing data mining processing on data in a data source by a second server in the second cluster based on the data mining request by using the second mining model to obtain a second mining result.

In the above method, preferably, after obtaining the first mining result and the second mining result, the method further includes:

and returning the first mining result and the second mining result.

and storing the first mining result and the second mining result.

transmitting the first mining result and the second mining result to the second cluster, and performing cross verification on the first mining result and the second mining result by using the second mining model through a second server in the second cluster.

The above method, preferably, further comprises:

transmitting the first mining model to the second cluster, and performing model training and verification by a second server in the second cluster by using the second mining model.

The application also provides a data mining device connected with a data mining system, the data mining system comprises a first cluster and a second cluster, the first cluster comprises a plurality of first servers, the second cluster comprises a plurality of second servers, the first servers are configured with a first mining model based on an ILog rule engine, the second servers are configured with a second mining model based on an SAS (STATISTICAL ANALYSIS SYSTEM ), the device comprises:

a request receiving unit, configured to receive at least one data mining request, where the data mining request at least includes a request type;

a request classification unit, configured to classify the data mining request based on a request type thereof;

the first transmission unit is used for transmitting a data mining request with a request type of quick response type to the first cluster, and a first server in the first cluster uses the first mining model to perform data mining on data in a data source based on the data mining request to obtain a first mining result;

and the second transmission unit is used for transmitting a data mining request with a data type which is not a quick response type to the second cluster, and a second server in the second cluster performs data mining processing on the data in the data source by using the second mining model based on the data mining request to obtain a second mining result.

The above device, preferably, further comprises:

and the result returning unit is used for returning the first mining result and the second mining result after the first mining result and the second mining result are obtained.

The above device, preferably, further comprises:

and the result storage unit is used for storing the first mining result and the second mining result.

The above device, preferably, further comprises:

and the third transmission unit is used for transmitting the first mining result and the second mining result to the second cluster after the first mining result and the second mining result are obtained, and the second server in the second cluster uses the second mining model to carry out cross verification on the first mining result and the second mining result.

The above device, preferably, further comprises:

and the fourth transmission unit is used for transmitting the first mining model to the second cluster, and performing model training and verification by a second server in the second cluster by using the second mining model.

According to the data mining method and device provided by the application, the ILog cluster and the SAS cluster are configured in the same system, so that when a data mining request is received, the mining mode of Ilg or the mining mode of SAS can be determined according to the request type of the data mining request, the data mining method and device provided by the application can simultaneously have the characteristics of Ilg based on the expert model and the mining functions of SAS such as mining and verification of the data model, so that the two data mining characteristics of Ilg and SAS are collected on the basis of the same data source, and the response efficiency of tasks with different response times and different mining complexity can be greatly improved under the condition that the processing capacity of the original data mining task is not affected.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flowchart of a data mining method according to a first embodiment of the present application;

FIG. 2 is a diagram illustrating an application example of an embodiment of the present application;

FIG. 3 is a flowchart of a data mining method according to a second embodiment of the present application;

fig. 4 is a flowchart of a data mining method according to a third embodiment of the present application;

fig. 5 is a flowchart of a data mining method according to a fourth embodiment of the present application;

FIG. 6 is a partial flow chart of a data mining method according to a fifth embodiment of the present application;

fig. 7 is a schematic structural diagram of a data mining apparatus according to a sixth embodiment of the present application;

fig. 8 is a schematic structural diagram of a data mining apparatus according to a seventh embodiment of the present application;

fig. 9 is a schematic structural diagram of a data mining apparatus according to an eighth embodiment of the present application;

fig. 10 is a schematic structural diagram of a data mining apparatus according to a ninth embodiment of the present application;

fig. 11 is a schematic structural diagram of a data mining apparatus according to a tenth embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1, a flowchart of a data mining method according to a first embodiment of the present application is suitable for a data mining system shown in fig. 2, where the data mining system is connected to an access terminal and a data source, as shown in fig. 2.

The data mining system may include: the system comprises a first cluster and a second cluster, wherein the first cluster can comprise a plurality of first servers, the second cluster can comprise a plurality of second servers, the first servers are configured with a first mining model based on an ILog rule engine, the first mining model is an expert model capable of rapidly mining and responding to data, therefore, the first servers can rapidly respond to and deploy data mining requirements of users based on the expert model based on the ILog rule engine, the second servers are configured with a second mining model based on a SAS (STATISTICAL ANALYSIS SYSTEM), the second mining model is a data model, and the second servers can respond to data mining tasks with higher complexity based on an SAS.

In this embodiment, the method may include the following steps to implement data mining:

step 101: at least one data mining request is received.

The data mining requests are generated and sent by the access terminal, the requirements of users for mining are represented, and each data mining request at least comprises a request type representing the requirements of the users, such as a type requiring quick response or a request type with high statistics or complexity of big data.

Step 102: the data mining requests are classified based on their request type.

In this embodiment, classifying the data mining request refers to splitting the requirement of the user in analyzing the data mining request, that is, the user generates the data mining request through the access terminal, and at this time, the generated data mining request can characterize what mode the user needs to use to mine the data in the data source.

Step 103: transmitting a data mining request with a request type of a quick response type to a first cluster, and performing data mining processing on data in a data source by a first server in the first cluster based on the data mining request by using a first mining model to obtain a first mining result.

After the data mining request is transmitted to the first cluster, the first cluster can determine one or more first servers to perform data mining according to the current load of each first server, so that load balancing scheduling of data mining is realized.

Step 104: transmitting a data mining request with the data type not being the quick response type to a second cluster, and performing data mining processing on the data in the data source by a second server in the second cluster based on the data mining request by using a second mining model to obtain a second mining result.

After the data mining request is transmitted to the second clusters, the second clusters can determine one or more second servers to perform data mining according to the current load of each second server, so that load balancing scheduling of data mining is realized.

That is, in this embodiment, after the data mining request is split according to the request type, different processing manners are adopted for different types of data mining requests, for example: and transmitting the data mining request needing quick response to the first cluster to perform data mining with higher timeliness, and transmitting the data mining request needing large data quantity or higher complexity requirement to the second cluster to perform more perfect or deeper data mining.

It should be noted that the data sources herein may be various types of data sources, such as a relational database, a Hadoop database, or a data set of data files.

According to the scheme, the ILog cluster and the SAS cluster are configured in the same system, so that when a data mining request is received, whether an Ilg mining mode or an SAS mining mode is adopted can be determined according to the request type of the data mining request, the data mining method has the characteristics of the Ilg based data mining function capable of quickly responding to data mining and the characteristics of the SAS mining function such as mining and verification of the data model, and therefore the Ilg data mining characteristic and the SAS data mining characteristic are collected on the basis of the same data source, and the response efficiency to tasks with different response times and different mining complexity is greatly improved under the condition that the processing capacity of the original data mining task is not affected.

In practical application, the program code with the method function in the embodiment may run in an application server cluster, where the application server cluster may include a plurality of application servers, and may respond to the data mining request and forward the data mining request to the corresponding first cluster or second cluster.

In order to realize load balancing, the data mining request generated by the access terminal of the user can be firstly sent to the load balancing server connected with the application server cluster, and the load balancing server performs balanced scheduling command and then forwards the balanced scheduling command to the application server of the proper application server cluster, so that data mining is realized.

In an implementation scheme, referring to fig. 3, a flowchart of a data mining method according to a second embodiment of the present application is implemented, after the step 103 and the step 104, the method may further include the following steps:

step 105: and returning the first mining result and the second mining result.

Specifically, in this embodiment, the first mining result and the second mining result may be returned to the access terminal of the user.

In an implementation manner, referring to fig. 4, a flowchart of a data mining method according to a third embodiment of the present application is implemented, after the step 103 and the step 104, the method may further include the following steps:

step 106: and storing the first mining result and the second mining result.

Specifically, in this embodiment, the first mining result and the second mining result may be stored in a storage system such as a database connected to the first cluster and the second cluster.

In one implementation, referring to fig. 5, a flowchart of a data mining method according to a fourth embodiment of the present application is provided, where after the step 103 and the step 104, the method may further include the following steps:

step 107: transmitting the first mining result and the second mining result to a second cluster, and performing cross verification on the first mining result and the second mining result by a second server in the second cluster through the second mining model.

That is, the second servers in the second cluster are configured with the second mining model based on SAS, so that the second servers can perform mining and training verification of the data model, and thus, in this embodiment, the first mining result, such as the expert model result, and the second mining result, such as the result of the data model, can be cross-verified after the first mining result and the second mining result are obtained.

In one implementation, referring to fig. 6, a partial flowchart of a data mining method provided in a fifth embodiment of the present application may further include the following steps:

step 108: the first mining model is transmitted to the second cluster, and model training and verification are performed by a second server in the second cluster through the second mining model.

That is, the second server in the second cluster is configured with the second mining model based on the SAS, so that the second server can perform mining and training verification of the data model, and thus, in this embodiment, the first mining model, such as an expert model, of the first server in the first cluster may be put into the second cluster to perform training and verification of the model, and then, the second cluster may feed back the model training result to the first cluster to perform processing such as model refinement.

Referring to fig. 7, a schematic structural diagram of a data mining apparatus according to a sixth embodiment of the present application is provided, where the apparatus is connected to a data mining system shown in fig. 2, the data mining apparatus is connected to an access terminal, and the data mining system is connected to a data source.

In this embodiment, the apparatus may include the following structures to implement data mining:

a request receiving unit 701, configured to receive at least one data mining request, where the data mining request includes at least a request type.

It should be noted that, the request receiving unit 701 may be implemented by using an interface capable of performing data transmission, so as to receive a data mining request sent by an access terminal.

A request classification unit 702, configured to classify the data mining request based on a request type thereof.

It should be noted that, the request classifying unit 702 may be implemented by using a classifier, and classifies the data mining requests based on the request types.

A first transmission unit 703, configured to transmit a data mining request with a request type being a quick response type to the first cluster, where a first server in the first cluster uses the first mining model to perform data mining on data in a data source based on the data mining request, so as to obtain a first mining result.

It should be noted that, the first transmission unit 703 may be implemented using an interface capable of data transmission, so as to transmit the data mining request to the first cluster.

And a second transmission unit 704, configured to transmit a data mining request with a data type that is not a fast response type to the second cluster, and perform, by a second server in the second cluster, data mining processing on data in a data source by using the second mining model based on the data mining request, to obtain a second mining result.

It should be noted that the second transmission unit 704 may be implemented using an interface capable of data transmission, so as to transmit the data mining request to the second cluster.

As can be seen from the above solution, in the data mining apparatus provided in the sixth embodiment of the present application, by configuring the ILog cluster and the SAS cluster in the same system, when a data mining request is received, it is possible to determine whether to use the log mining mode or the SAS mining mode according to the request type of the data mining request, so that the data mining apparatus of the present application can simultaneously have the characteristics of the log-based data mining characteristics capable of quickly responding to data mining and the characteristics of the SAS for mining and verifying the data model, thereby collecting the two data mining characteristics of the log and the SAS on the basis of the same data source, and greatly improving the response efficiency to tasks with different response times and different mining complexities under the condition that the processing capability of the original data mining task is not affected.

Referring to fig. 8, a schematic structural diagram of a data mining apparatus according to a seventh embodiment of the present application may further include the following structures:

and a result returning unit 705, connected to the first cluster and the second cluster, configured to return the first mining result and the second mining result after the first cluster obtains the first mining result and the second cluster obtains the second mining result.

Specifically, the result returning unit 705 may use the same transmission interface as the first transmission unit 703 and the second transmission unit 704, so as to return the first mining result and the second mining result to the access terminal.

Referring to fig. 9, a schematic structural diagram of a data mining apparatus according to an eighth embodiment of the present application is provided, where the apparatus may further include the following structure:

and the result storage unit 706 is connected with a first cluster and a second cluster, the first cluster and the second cluster are connected with a data storage system, and the result storage unit 706 is used for storing the first mining result obtained by the first cluster and the second mining result obtained by the second cluster.

Wherein the result storage unit 706 may be a data interface that transmits the first mining result and the second mining result to a data storage system, such as various types of databases.

Referring to fig. 10, a schematic structural diagram of a data mining apparatus according to a ninth embodiment of the present application is provided, where the apparatus may further include the following structure:

and a third transmission unit 707, connected to a second cluster, configured to transmit the first mining result and the second mining result to the second cluster after the first mining result and the second mining result are obtained, where the second server in the second cluster uses the second mining model to perform cross-validation on the first mining result and the second mining result.

It should be noted that, the third transmission unit 707 may be implemented by using an interface capable of performing data transmission, so as to transmit the first mining result and the second mining result to the second cluster, and perform cross-validation by the second server in the second cluster. For example, the first mining result represents the modeling result of the expert model, the second mining result represents the modeling result of the data magic core, and the second server performs cross validation on the results of the two models by using actual data, so as to find problems and defects of the two models through cross validation, and the problems and defects serve as optimization basis of the two models, so that the accuracy of the models is improved.

Referring to fig. 11, a schematic structural diagram of a data mining apparatus according to a tenth embodiment of the present application is provided, where the apparatus may further include the following structure:

and a fourth transmission unit 708, connected between the first cluster and the second cluster, configured to transmit the first mining model to the second cluster, and perform model training and verification by using the second mining model by a second server in the second cluster.

It should be noted that, the fourth transmission unit 708 may be implemented by using an interface capable of performing data transmission, and the first mining model, such as an expert model, in the first cluster is transmitted to the second cluster, and the second server in the second cluster performs model training and verification. For example, the first mining model (Ilog) can only develop expert models quickly, itself without model training and verification functionality, while the second mining model (SAS) is functional such that the first mining model can be put into the second mining model for model training and verification.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing has outlined the detailed description of the data mining method and apparatus provided by the present application to enable those skilled in the art to make or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A data mining method, characterized by being applied to a data mining system, the data mining system comprising a first cluster and a second cluster, the first cluster comprising a plurality of first servers, the second cluster comprising a plurality of second servers, the first servers being configured with a first mining model based on ILog rule engines, the second servers being configured with a second mining model based on SAS (STATISTICAL ANALYSIS SYSTEM ), the method comprising:

classifying the data mining requests based on request types thereof;

2. The method of claim 1, wherein after obtaining the first mining result and the second mining result, the method further comprises:

and returning the first mining result and the second mining result.

3. The method of claim 1, wherein after obtaining the first mining result and the second mining result, the method further comprises:

and storing the first mining result and the second mining result.

4. The method of claim 1, wherein after obtaining the first mining result and the second mining result, the method further comprises:

5. The method as recited in claim 1, further comprising: