CN115292388B

CN115292388B - Automatic scheme mining system based on historical data

Info

Publication number: CN115292388B
Application number: CN202211194582.2A
Authority: CN
Inventors: 丁家奎; 张慧宙; 魏烈龙
Original assignee: Guangzhou Tiancom Information Technology Co ltd
Current assignee: Guangzhou Tiancom Information Technology Co ltd
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2023-01-24
Anticipated expiration: 2042-09-29
Also published as: CN115292388A

Abstract

The invention provides a scheme automatic mining system based on historical data, which comprises a case storage module, a case analysis module, a factor analysis module and a scheme reconstruction module, wherein the case storage module is used for storing actual case data, the case analysis module is used for decomposing the case data into a plurality of factors, the factor analysis module establishes a factor topology network based on factor relations in the cases, and the scheme reconstruction module is used for receiving initial cases and reconstructing to obtain a new scheme frame based on the factor topology network; the system establishes a factor topology network by using the advantages of big data, and mines factors in an initial scheme based on the factor topology network to obtain other factors with depth and contact degree simultaneously, thereby forming a new scheme framework.

Description

Automatic scheme mining system based on historical data

Technical Field

The invention relates to the field of electronic digital data processing, in particular to a scheme automatic mining system based on historical data.

Background

Data mining, which is a nontrivial process that reveals implicit, previously unknown and potentially valuable information from a large amount of data in a database, is a hot problem for research in the fields of artificial intelligence and databases. Data mining is a decision support process, and is mainly based on artificial intelligence, machine learning, pattern recognition, statistics, databases, visualization technologies and the like, enterprise data are analyzed in a highly automated manner, inductive reasoning is made, potential patterns are mined out from the data, a decision maker is helped to adjust market strategies, risks are reduced, and a correct decision is made.

The foregoing discussion of the background art is intended to facilitate an understanding of the present invention only. This discussion is not an acknowledgement or admission that any of the material referred to is part of the common general knowledge.

A number of data mining systems have been developed, and through a great deal of search and reference, it is found that the existing mining system is disclosed as CN114168691B, and these systems generally include a mining module, an analysis module and a management module; the mining module comprises a character mining unit, a time mining unit and a frequency mining unit, wherein the character mining unit is used for counting and calculating different account numbers in the acquired mining data set to obtain a character data set; the time mining unit is used for counting and marking conversation time points in the human data set to obtain a time point set; acquiring conversation duration according to the conversation time point, and carrying out statistics and marking to obtain a duration set; the set of time points and the set of durations constitute a time data set. However, the system is used for mining the character relations, and the character relations are more vivid and concrete compared with the point sub-relations in the scheme, and the requirements of relevance and concealment cannot be met simultaneously in scheme mining.

Disclosure of Invention

The invention aims to provide a scheme automatic mining system based on historical data aiming at the existing defects.

The invention adopts the following technical scheme:

a scheme automatic mining system based on historical data comprises a case storage module, a case analysis module, a factor analysis module and a scheme reconstruction module, wherein the case storage module is used for storing actual case data, the case analysis module is used for decomposing the case data into a plurality of factors, the factor analysis module establishes a factor topology network based on factor relations in the cases, and the scheme reconstruction module is used for receiving initial cases and reconstructing to obtain a new scheme frame based on the factor topology network;

the case analysis module is internally preset with a plurality of factor databases, each factor database comprises a plurality of keywords, each factor database corresponds to one factor, the case analysis module compares case data with the keywords to obtain a factor set, and the factor set is sent to the factor analysis module;

the factor analysis module carries out statistics on the basis of received factor combinations to obtain the association degree of any two factors, when two factors exist in one factor combination, the association degree of the two factors is increased by one, and the factor analysis module divides the factor topology network into a main trunk and a branch trunk on the basis of the association degree;

after receiving the initial case, the scheme reconstruction module processes the initial case through the case analysis module to obtain initial factors, finds a plurality of target trunks in the factor topology network according to the initial factors, wherein two factors corresponding to the target trunks are both initial factors, the scheme reconstruction module finds a primary factor in the factor topology network, the relationship between the primary factor and the two initial factors of the target trunks are branch relations, and the scheme reconstruction module calculates a mining index P of the primary factor according to the following formula:

；

wherein the content of the first and second substances,

the degree of association corresponding to the target backbone is,

and

respectively relating the primary factor with two initial factors on a target trunk;

the scheme reconstruction module takes the primary factor with the minimum mining index as a target factor of a target trunk, and takes the initial factor and the target factor as a frame of a new scheme;

furthermore, a segmentation database is preset in the case analysis module, the segmentation database comprises a plurality of segmentation words, the case analysis module segments the case content into a plurality of target paragraphs through the segmentation words, and each target paragraph is matched to obtain a factor;

further, the case analysis module calculates the matching degree Q between the factor database and the target paragraph according to the following formula:

；

wherein m is the number of keywords in the factor database,

for each keyword, and

is distributed in descending order;

the case analysis module selects the factor corresponding to the factor database with the highest matching degree as the factor matched with the target paragraph;

further, the factor analysis module divides the relationship between any two factors into a strong association relationship, a middle association relationship and a weak association relationship according to the degree of association, the strong association relationship forms a main trunk of the factor topology network, and the middle association relationship forms a branch of the factor topology network;

further, the factor analysis module determines two demarcation points according to the following inequality

：

；

Wherein the content of the first and second substances,

for the associated total value of all the factors,

representing the number of two factor relations for each relevance value r,

is the proportion coefficient of the medium and weak boundary,

is a medium-strong boundary ratio systemAnd (4) counting.

The beneficial effects obtained by the invention are as follows:

the system establishes a factor topology network through big data, wherein the factor topology network comprises a trunk and a branch, the trunk represents the relationship between two factors with obvious relevance, and the branch represents the relationship between two factors with relevance but not easy to be found.

For a better understanding of the features and technical content of the present invention, reference is made to the following detailed description of the invention and accompanying drawings, which are provided for purposes of illustration and description only and are not intended to limit the invention.

Drawings

FIG. 1 is a schematic view of the overall structural framework of the present invention;

FIG. 2 is a schematic diagram illustrating a matching process between a target segment and a factor according to the present invention;

FIG. 3 is a schematic diagram illustrating the factor relationship division process of the present invention;

FIG. 4 is a schematic flow chart of a target backbone determination according to the present invention;

FIG. 5 is a schematic diagram of a process for determining a target factor according to the present invention.

Detailed Description

The following is a description of embodiments of the present invention with reference to specific embodiments, and those skilled in the art will understand the advantages and effects of the present invention from the disclosure of the present specification. The invention is capable of other and different embodiments and its several details are capable of modifications and various changes in detail without departing from the spirit and scope of the present invention. The drawings of the present invention are for illustrative purposes only and are not intended to be drawn to scale. The following embodiments will further explain the related art of the present invention in detail, but the disclosure is not intended to limit the scope of the present invention.

Example one

The embodiment provides a scheme automatic mining system based on historical data, which is combined with a diagram 1 and comprises a case storage module, a case analysis module, a factor analysis module and a scheme reconstruction module, wherein the case storage module is used for storing actual case data, the case analysis module is used for decomposing the case data into a plurality of factors, the factor analysis module establishes a factor topology network based on factor relations in cases, and the scheme reconstruction module is used for receiving initial cases and reconstructing to obtain a new scheme framework based on the factor topology network;

the factor analysis module carries out statistics on the basis of received factor combinations to obtain the association degrees of any two factors, when two factors exist in one factor combination, the association degrees of the two factors are increased by one, and the factor analysis module divides the factor topology network into a main trunk and a branch trunk on the basis of the association degrees;

；

wherein, the first and the second end of the pipe are connected with each other,

the degree of association corresponding to the target backbone is,

and

a segmentation database is preset in the case analysis module, the segmentation database comprises a plurality of segmentation words, the case analysis module segments the case content into a plurality of target paragraphs through the segmentation words, and each target paragraph is matched to obtain a factor;

the case analysis module calculates the matching degree Q of the factor database and the target paragraph according to the following formula:

；

wherein m is the number of keywords in the factor database,

for each keyword, the number of times it appears in the target paragraph

Is distributed in descending order;

the factor analysis module divides the relationship between any two factors into a strong association relationship, a middle association relationship and a weak association relationship according to the degree of association, the strong association relationship forms a main trunk of the factor topology network, and the middle association relationship forms a branch trunk of the factor topology network;

the factor analysis module determines two demarcation points according to the inequality

：

；

Wherein the content of the first and second substances,

for the associated total value of all the factors,

representing the number of two factor relations for each relevance value r,

is the proportion coefficient of the medium and weak boundary,

is a medium-strong scale factor.

Example two

The embodiment includes all the contents of the first embodiment, and provides a scheme automatic mining system based on historical data, which comprises a case storage module, a case analysis module, a factor analysis module and a scheme reconstruction module, wherein the case storage module is used for inputting and storing actual case data, the case analysis module is used for decomposing each actual case data to obtain a plurality of factors, the factor analysis module is used for analyzing the factors to obtain a factor topology network, and the scheme reconstruction module is used for automatically mining a scheme based on the factor topology network;

the case analysis module is internally preset with a plurality of factor databases and a segmentation database, the factor databases comprise a plurality of keywords, the segmentation database comprises a plurality of segmentation words, the case analysis module segments the case content into a plurality of target paragraphs through the segmentation words, the case analysis module compares each target paragraph with the keywords in the factor databases to obtain the most appropriate factor through matching, and the case analysis module sends all the obtained factors to the factor analysis module as a set;

the factor analysis module counts the factors in the set, when the two factors exist in the same set, the correlation degree of the two factors is added by one, the correlation degree between any two factors can be obtained after the factor analysis module counts the factor combination of all the actual case data, and the correlation degree between the two factors x and y is used for calculating the correlation degree between the two factors x and y

Representing;

the factor analysis module divides the relationship between any two factors into a strong correlation relationship, a middle correlation relationship and a weak correlation relationship according to the correlation degree, the strong correlation relationship forms a main stem of the factor topology network, and the middle correlation relationship forms a branch stem of the factor topology network;

the scheme reconstruction module determines a plurality of initial factors based on requirements, then determines a plurality of target trunks in the factor topology network according to the initial factors, determines a plurality of target branches based on the target trunks, the factors on the branches are target factors, and the scheme reconstruction module takes the initial factors and the target factors as a new scheme frame after mining;

with reference to fig. 2, the process of matching the target paragraphs with the factors by the case analysis module includes the following steps:

s1, the case analysis module selects a factor database, and counts the number m of key words in the factor database contained in the target paragraph and the occurrence frequency of each key word in the target paragraph

Wherein the value range of i is

And is made of

Is distributed in descending order;

s2, the case analysis module calculates the matching degree Q of the factor database and the target paragraph according to the following formula:

；

s3, repeating the step S1 and the step S2, calculating the matching degrees of all the factor databases and the target paragraph, and selecting the factor corresponding to the factor database with the highest matching degree as the factor matched with the target paragraph;

with reference to fig. 3, the relationship division process between the factors includes the following steps:

s21, calculating the correlation total value of all factors according to the following formula

：

；

S22, counting the number of each relevance value and recording the number as

R is a correlation value with a value range of

，

The following relationship is satisfied:

；

；

wherein the content of the first and second substances,

the number of factor databases in the case analysis module;

s23, determining two demarcation points according to the following inequality

：

；

is the scale factor of the medium and weak boundary,

is a medium-strong boundary scale factor,

and

setting according to experience;

s24, the correlation degree is more than

The relationship of the two factors is a strong association relationship, and the association degree is less than

The relationship of the two factors is weak association relationship, and the association degree is

And

the relationship between the two factors is a middle association relationship;

with reference to fig. 4, the process of determining the target skeleton by the solution reconstruction module according to the initial factor includes the following steps:

s31, finding all trunks containing any two initial factors according to the factor topology network and putting the trunks into a target pool;

s32, selecting a backbone formed by two initial factors with highest relevance in the target pool, taking the backbone as a target backbone, and putting the two initial factors into a factor pool;

s33, selecting a trunk formed by two initial factors with highest relevance in the target pool, wherein one initial factor of the trunk is in the factor pool, and the other initial factor is not in the factor pool;

s34, taking the trunk selected in the step S33 as a target trunk, and putting the initial factor of the trunk into the factor pool;

s35, continuously repeating the step S33 and the step S34 until all the initial factors are placed in the factor pool;

with reference to fig. 5, the process of determining the target branch factor according to the target backbone by the solution reconstruction module includes the following steps:

s41, selecting factors which have an incidence relation with the two initial factors on the target trunk and are not the initial factors from the factor topology network, wherein the selected factors are called primary factors;

s42, calculating the excavation index of the first-order factor according to the following formula

：

；

Wherein the content of the first and second substances,

the degree of association corresponding to the target backbone is,

and

and S43, taking the primary factor with the minimum excavation index as a target factor.

The disclosure is only a preferred embodiment of the invention, and is not intended to limit the scope of the invention, so that all equivalent technical changes made by using the contents of the specification and the drawings are included in the scope of the invention, and further, the elements thereof can be updated as the technology develops.

Claims

1. A scheme automatic mining system based on historical data is characterized by comprising a case storage module, a case analysis module, a factor analysis module and a scheme reconstruction module, wherein the case storage module is used for storing actual case data, the case analysis module is used for decomposing the case data into a plurality of factors, the factor analysis module establishes a factor topology network based on factor relations in the cases, and the scheme reconstruction module is used for receiving initial cases and reconstructing to obtain a new scheme frame based on the factor topology network;

the method comprises the following steps that after an initial case is received by a scheme reconstruction module, the initial case is processed by a case analysis module to obtain initial factors, a plurality of target trunks are found in a factor topology network according to the initial factors, two factors corresponding to the target trunks are both initial factors, a first-level factor is found by the scheme reconstruction module in the factor topology network, the relationship between the first-level factor and the two initial factors of the target trunks is a branch-trunk relationship, and the mining index P of the first-level factor is calculated by the scheme reconstruction module according to the following formula:

；

wherein the content of the first and second substances,

the degree of association corresponding to the target backbone is,

and

the scheme reconstruction module takes the first-level factor with the minimum mining index as a target factor of a target trunk, and takes the initial factor and the target factor as a frame of a new scheme.

2. The automatic historical data-based scheme mining system as claimed in claim 1, wherein a case parsing module is pre-configured with a segmentation database, the segmentation database includes a plurality of segmentation words, the case parsing module segments the case content into a plurality of target paragraphs through the segmentation words, and each target paragraph is matched to obtain a factor.

3. The system of claim 2, wherein the case analysis module calculates the matching degree Q between the factor database and the target paragraph according to the following formula:

；

wherein m is the number of keywords in the factor database,

for each keyword, and

is distributed in descending order;

and the case analysis module selects the factor corresponding to the factor database with the highest matching degree as the factor matched with the target paragraph.

4. The automatic historical data-based scheme mining system of claim 3, wherein the factor analysis module divides the relationship between any two factors into a strong association relationship, a medium association relationship and a weak association relationship according to the degree of association, the strong association relationship forms a backbone of the factor topology network, and the medium association relationship forms a backbone of the factor topology network.

5. The system of claim 4, wherein the factor analysis module determines two cut points according to the inequality

：

；

Wherein the content of the first and second substances,

for the associated total value of all the factors,

representing the number of two factor relations for each relevance value r,

is the proportion coefficient of the medium and weak boundary,

is a medium-strong boundary proportionality coefficient.