CN115292388A - Automatic scheme mining system based on historical data - Google Patents
Automatic scheme mining system based on historical data Download PDFInfo
- Publication number
- CN115292388A CN115292388A CN202211194582.2A CN202211194582A CN115292388A CN 115292388 A CN115292388 A CN 115292388A CN 202211194582 A CN202211194582 A CN 202211194582A CN 115292388 A CN115292388 A CN 115292388A
- Authority
- CN
- China
- Prior art keywords
- factor
- case
- factors
- analysis module
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
Abstract
The invention provides a scheme automatic mining system based on historical data, which comprises a case storage module, a case analysis module, a factor analysis module and a scheme reconstruction module, wherein the case storage module is used for storing actual case data, the case analysis module is used for decomposing the case data into a plurality of factors, the factor analysis module establishes a factor topology network based on factor relations in the cases, and the scheme reconstruction module is used for receiving initial cases and reconstructing to obtain a new scheme frame based on the factor topology network; the system establishes a factor topology network by using the advantages of big data, and mines factors in an initial scheme based on the factor topology network to obtain other factors with depth and contact degree simultaneously, thereby forming a new scheme framework.
Description
Technical Field
The invention relates to the field of electronic digital data processing, in particular to a scheme automatic mining system based on historical data.
Background
Data mining, which is a nontrivial process that reveals implicit, previously unknown and potentially valuable information from a large amount of data in a database, is a hot problem for research in the fields of artificial intelligence and databases. Data mining is a decision support process, and is mainly based on artificial intelligence, machine learning, pattern recognition, statistics, databases, visualization technologies and the like, enterprise data are analyzed in a highly automated manner, inductive reasoning is made, potential patterns are mined out from the data, a decision maker is helped to adjust market strategies, risks are reduced, and a correct decision is made.
The foregoing discussion of the background art is intended only to facilitate an understanding of the present invention. This discussion is not an acknowledgement or admission that any of the material referred to is part of the common general knowledge.
A plurality of data mining systems have been developed, and through a lot of search and reference, it is found that the existing mining systems are disclosed as CN114168691B, and the systems generally comprise a mining module, an analysis module and a management module; the mining module comprises a character mining unit, a time mining unit and a frequency mining unit, wherein the character mining unit is used for counting and calculating different account numbers in the acquired mining data set to obtain a character data set; the time mining unit is used for counting and marking conversation time points in the character data set to obtain a time point set; acquiring conversation duration according to the conversation time point, and carrying out statistics and marking to obtain a duration set; the set of time points and the set of durations constitute a time data set. However, the system is used for mining the character relationship, and the character relationship is more vivid and concrete compared with the point sub-relationship in the scheme, and the requirements of relevance and concealment cannot be met simultaneously in scheme mining in this way.
Disclosure of Invention
The invention aims to provide a scheme automatic mining system based on historical data aiming at the existing defects.
The invention adopts the following technical scheme:
a scheme automatic mining system based on historical data comprises a case storage module, a case analysis module, a factor analysis module and a scheme reconstruction module, wherein the case storage module is used for storing actual case data, the case analysis module is used for decomposing the case data into a plurality of factors, the factor analysis module establishes a factor topology network based on factor relations in the cases, and the scheme reconstruction module is used for receiving initial cases and reconstructing to obtain a new scheme frame based on the factor topology network;
the case analysis module is internally preset with a plurality of factor databases, each factor database comprises a plurality of keywords, each factor database corresponds to one factor, the case analysis module compares case data with the keywords to obtain a factor set, and the factor set is sent to the factor analysis module;
the factor analysis module carries out statistics on the basis of received factor combinations to obtain the association degrees of any two factors, when two factors exist in one factor combination, the association degrees of the two factors are increased by one, and the factor analysis module divides the factor topology network into a main trunk and a branch trunk on the basis of the association degrees;
the method comprises the following steps that after an initial case is received by a scheme reconstruction module, the initial case is processed by a case analysis module to obtain initial factors, a plurality of target trunks are found in a factor topology network according to the initial factors, two factors corresponding to the target trunks are both initial factors, a first-level factor is found by the scheme reconstruction module in the factor topology network, the relationship between the first-level factor and the two initial factors of the target trunks is a branch-trunk relationship, and the mining index P of the first-level factor is calculated by the scheme reconstruction module according to the following formula:
wherein the content of the first and second substances,the degree of association corresponding to the target backbone is,andrespectively relating the primary factor with two initial factors on a target trunk;
the scheme reconstruction module takes the primary factor with the minimum mining index as a target factor of a target trunk, and takes the initial factor and the target factor as a frame of a new scheme;
furthermore, a segmentation database is preset in the case analysis module, the segmentation database comprises a plurality of segmentation words, the case analysis module segments the case content into a plurality of target paragraphs through the segmentation words, and each target paragraph is matched to obtain a factor;
further, the case analysis module calculates the matching degree Q between the factor database and the target paragraph according to the following formula:
wherein m is the number of keywords in the factor database,for each keyword, andis distributed in descending order;
the case analysis module selects the factor corresponding to the factor database with the highest matching degree as the factor matched with the target paragraph;
further, the factor analysis module divides the relationship between any two factors into a strong association relationship, a middle association relationship and a weak association relationship according to the degree of association, the strong association relationship forms a main trunk of the factor topology network, and the middle association relationship forms a branch of the factor topology network;
further, the factor analysis module determines two demarcation points according to the following inequality:
Wherein the content of the first and second substances,for the associated total value of all the factors,representing the number of two factor relations for each relevance value r,is the proportion coefficient of the medium and weak boundary,is a medium-strong scale factor.
The beneficial effects obtained by the invention are as follows:
the system establishes a factor topology network through big data, wherein the factor topology network comprises a trunk and a branch, the trunk represents the relationship between two factors with obvious relevance, and the branch represents the relationship between two factors with relevance but not easy to be found.
For a better understanding of the features and technical content of the present invention, reference should be made to the following detailed description of the invention and accompanying drawings, which are provided for purposes of illustration and description only and are not intended to limit the invention.
Drawings
FIG. 1 is a schematic view of the overall structural framework of the present invention;
FIG. 2 is a schematic diagram of a process for matching a target segment with a factor according to the present invention;
FIG. 3 is a schematic diagram of the factor relationship division process of the present invention;
FIG. 4 is a schematic flow chart of a target backbone determination according to the present invention;
FIG. 5 is a flow chart illustrating the process of determining the target factor according to the present invention.
Detailed Description
The following is a description of embodiments of the present invention with reference to specific embodiments, and those skilled in the art will understand the advantages and effects of the present invention from the disclosure of the present specification. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention. The drawings of the present invention are for illustrative purposes only and are not intended to be drawn to scale. The following embodiments are further detailed to explain the technical matters related to the present invention, but the disclosure is not intended to limit the scope of the present invention.
Example one
The embodiment provides a scheme automatic mining system based on historical data, which is combined with a figure 1 and comprises a case storage module, a case analysis module, a factor analysis module and a scheme reconstruction module, wherein the case storage module is used for storing actual case data, the case analysis module is used for decomposing the case data into a plurality of factors, the factor analysis module establishes a factor topology network based on factor relation in a case, and the scheme reconstruction module is used for receiving an initial case and reconstructing to obtain a new scheme frame based on the factor topology network;
the case analysis module is internally preset with a plurality of factor databases, each factor database comprises a plurality of keywords, each factor database corresponds to one factor, the case analysis module compares case data with the keywords to obtain a factor set, and the factor set is sent to the factor analysis module;
the factor analysis module carries out statistics on the basis of received factor combinations to obtain the association degree of any two factors, when two factors exist in one factor combination, the association degree of the two factors is increased by one, and the factor analysis module divides the factor topology network into a main trunk and a branch trunk on the basis of the association degree;
after receiving the initial case, the scheme reconstruction module processes the initial case through the case analysis module to obtain initial factors, finds a plurality of target trunks in the factor topology network according to the initial factors, wherein two factors corresponding to the target trunks are both initial factors, the scheme reconstruction module finds a primary factor in the factor topology network, the relationship between the primary factor and the two initial factors of the target trunks are branch relations, and the scheme reconstruction module calculates a mining index P of the primary factor according to the following formula:
wherein the content of the first and second substances,the degree of association corresponding to the target backbone is,andrespectively relating the primary factor with two initial factors on a target trunk;
the scheme reconstruction module takes the primary factor with the minimum mining index as a target factor of a target trunk, and takes the initial factor and the target factor as a frame of a new scheme;
a segmentation database is preset in the case analysis module, the segmentation database comprises a plurality of segmentation words, the case analysis module segments the case content into a plurality of target paragraphs through the segmentation words, and each target paragraph is matched to obtain a factor;
the case analysis module calculates the matching degree Q of the factor database and the target paragraph according to the following formula:
wherein m is the number of keywords in the factor database,for each keyword, andis distributed in descending order;
the case analysis module selects the factor corresponding to the factor database with the highest matching degree as the factor matched with the target paragraph;
the factor analysis module divides the relationship between any two factors into a strong association relationship, a middle association relationship and a weak association relationship according to the degree of association, the strong association relationship forms a main trunk of the factor topology network, and the middle association relationship forms a branch trunk of the factor topology network;
Wherein the content of the first and second substances,for the associated total value of all the factors,representing the number of two factor relations for each relevance value r,is the proportion coefficient of the medium and weak boundary,is a medium-strong scale factor.
Example two
The embodiment comprises the whole content of the first embodiment, and provides a scheme automatic mining system based on historical data, which comprises a case storage module, a case analysis module, a factor analysis module and a scheme reconstruction module, wherein the case storage module is used for inputting and storing actual case data, the case analysis module is used for decomposing each actual case data to obtain a plurality of factors, the factor analysis module is used for analyzing the factors to obtain a factor topology network, and the scheme reconstruction module is used for automatically mining a scheme based on the factor topology network;
the case analysis module is internally preset with a plurality of factor databases and a segmentation database, the factor databases comprise a plurality of keywords, the segmentation database comprises a plurality of segmentation words, the case analysis module segments the case content into a plurality of target paragraphs through the segmentation words, the case analysis module compares each target paragraph with the keywords in the factor databases to obtain the most appropriate factor through matching, and the case analysis module sends all the obtained factors to the factor analysis module as a set;
the factor analysis module counts the factors in the set, when the two factors exist in the same set, the correlation degree of the two factors is added with one, the correlation degree between any two factors can be obtained after the factor analysis module counts the factor combination of all the actual case data, and the correlation degree between the two factors x and y is usedRepresents;
the factor analysis module divides the relationship between any two factors into a strong association relationship, a middle association relationship and a weak association relationship according to the degree of association, the strong association relationship forms a main trunk of the factor topology network, and the middle association relationship forms a branch trunk of the factor topology network;
the scheme reconstruction module determines a plurality of initial factors based on requirements, then determines a plurality of target trunks in the factor topology network according to the initial factors, determines a plurality of target branches based on the target trunks, the factors on the branches are target factors, and the scheme reconstruction module takes the initial factors and the target factors as a new scheme frame after mining;
with reference to fig. 2, the process of matching the target paragraphs with the factors by the case analysis module includes the following steps:
s1, the case analysis module selects a factor database, and counts the number m of key words in the factor database contained in the target paragraph and the occurrence frequency of each key word in the target paragraphWherein the value range of i isAnd is andis distributed in descending order;
s2, the case analysis module calculates the matching degree Q of the factor database and the target paragraph according to the following formula:
s3, repeating the step S1 and the step S2, calculating the matching degrees of all the factor databases and the target paragraph, and selecting the factor corresponding to the factor database with the highest matching degree as the factor matched with the target paragraph;
with reference to fig. 3, the relationship division process between the factors includes the following steps:
S22, counting the number of each relevance value and recording the number asR is a relevance value with a value range of,The following relationship is satisfied:
wherein the content of the first and second substances,the number of factor databases in the case analysis module;
Wherein the content of the first and second substances,is the proportion coefficient of the medium and weak boundary,is a medium-strong boundary scale factor,andsetting according to experience;
S24, the correlation degree is greater thanThe relationship of the two factors is a strong association relationship, and the association degree is less thanThe relationship of the two factors is weak association relationship, and the association degree isAndthe relationship between the two factors is a middle association relationship;
with reference to fig. 4, the process of determining the target skeleton by the solution reconstruction module according to the initial factor includes the following steps:
s31, finding all trunks containing any two initial factors according to the factor topology network and putting the trunks into a target pool;
s32, selecting a backbone formed by two initial factors with highest relevance in the target pool, taking the backbone as a target backbone, and putting the two initial factors into a factor pool;
s33, selecting a trunk formed by two initial factors with highest relevance in the target pool, wherein one initial factor of the trunk is in the factor pool, and the other initial factor is not in the factor pool;
s34, taking the trunk selected in the step S33 as a target trunk, and putting the initial factor of the trunk into the factor pool;
s35, continuously repeating the step S33 and the step S34 until all the initial factors are placed in the factor pool;
with reference to fig. 5, the process of determining the target branch factor according to the target backbone by the solution reconstruction module includes the following steps:
s41, selecting factors which have an incidence relation with the two initial factors on the target trunk and are not the initial factors from the factor topology network, wherein the selected factors are called primary factors;
Wherein the content of the first and second substances,the degree of association corresponding to the target backbone is,andrespectively relating the primary factor with two initial factors on a target trunk;
and S43, taking the primary factor with the minimum excavation index as a target factor.
The above disclosure is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, so that all the modifications and equivalents of the technical changes and equivalents made by the disclosure and drawings are included in the scope of the present invention, and the elements thereof may be updated as the technology develops.
Claims (5)
1. A scheme automatic mining system based on historical data is characterized by comprising a case storage module, a case analysis module, a factor analysis module and a scheme reconstruction module, wherein the case storage module is used for storing actual case data, the case analysis module is used for decomposing the case data into a plurality of factors, the factor analysis module establishes a factor topology network based on factor relations in the cases, and the scheme reconstruction module is used for receiving initial cases and reconstructing to obtain a new scheme frame based on the factor topology network;
the case analysis module is internally preset with a plurality of factor databases, each factor database comprises a plurality of keywords, each factor database corresponds to one factor, the case analysis module compares case data with the keywords to obtain a factor set, and the factor set is sent to the factor analysis module;
the factor analysis module carries out statistics on the basis of received factor combinations to obtain the association degrees of any two factors, when two factors exist in one factor combination, the association degrees of the two factors are increased by one, and the factor analysis module divides the factor topology network into a main trunk and a branch trunk on the basis of the association degrees;
the method comprises the following steps that after an initial case is received by a scheme reconstruction module, the initial case is processed by a case analysis module to obtain initial factors, a plurality of target trunks are found in a factor topology network according to the initial factors, two factors corresponding to the target trunks are both initial factors, a first-level factor is found by the scheme reconstruction module in the factor topology network, the relationship between the first-level factor and the two initial factors of the target trunks is a branch-trunk relationship, and the mining index P of the first-level factor is calculated by the scheme reconstruction module according to the following formula:
wherein the content of the first and second substances,the degree of association corresponding to the target backbone is,andrespectively relating the primary factor with two initial factors on a target trunk;
the scheme reconstruction module takes the first-level factor with the minimum mining index as a target factor of a target trunk, and takes the initial factor and the target factor as a frame of a new scheme.
2. The automatic historical data-based scheme mining system as claimed in claim 1, wherein a case parsing module is pre-configured with a segmentation database, the segmentation database includes a plurality of segmentation words, the case parsing module segments the case content into a plurality of target paragraphs through the segmentation words, and each target paragraph is matched to obtain a factor.
3. The system of claim 2, wherein the case analysis module calculates the matching degree Q between the factor database and the target paragraph according to the following formula:
wherein m is the number of keywords in the factor database,for each keyword, andis distributed in descending order;
and the case analysis module selects the factor corresponding to the factor database with the highest matching degree as the factor matched with the target paragraph.
4. The automatic historical data-based scheme mining system of claim 3, wherein the factor analysis module divides the relationship between any two factors into a strong association relationship, a medium association relationship and a weak association relationship according to the degree of association, the strong association relationship forms a backbone of the factor topology network, and the medium association relationship forms a backbone of the factor topology network.
5. The system of claim 4, wherein the factor analysis module determines two cut points according to the inequality:
Wherein, the first and the second end of the pipe are connected with each other,for the associated total value of all the factors,representing the number of two factor relationships for each relevance value r,is the proportion coefficient of the medium and weak boundary,is a medium-strong scale factor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211194582.2A CN115292388B (en) | 2022-09-29 | 2022-09-29 | Automatic scheme mining system based on historical data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211194582.2A CN115292388B (en) | 2022-09-29 | 2022-09-29 | Automatic scheme mining system based on historical data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115292388A true CN115292388A (en) | 2022-11-04 |
CN115292388B CN115292388B (en) | 2023-01-24 |
Family
ID=83834916
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211194582.2A Active CN115292388B (en) | 2022-09-29 | 2022-09-29 | Automatic scheme mining system based on historical data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115292388B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120029957A1 (en) * | 2010-08-02 | 2012-02-02 | National Tsing Hua University | Factor analysis system and analysis method thereof |
CN107688653A (en) * | 2017-09-01 | 2018-02-13 | 武汉倚天剑科技有限公司 | User behavior data digging system and its method based on network shallow-layer data |
CN109118079A (en) * | 2018-08-07 | 2019-01-01 | 山东纬横数据科技有限公司 | A kind of manufacturing industry product quality data relation analysis method |
EP3543874A1 (en) * | 2018-03-23 | 2019-09-25 | Servicenow, Inc. | Automated intent mining, clustering and classification |
CN112434104A (en) * | 2020-12-04 | 2021-03-02 | 东北大学 | Redundant rule screening method and device for association rule mining |
CN113486086A (en) * | 2021-07-01 | 2021-10-08 | 上海硕恩网络科技股份有限公司 | Data mining method and system based on feature engineering |
CN114548646A (en) * | 2021-12-15 | 2022-05-27 | 中南大学 | Epidemic situation risk factor identification method and system based on association rule |
CN114861655A (en) * | 2022-04-02 | 2022-08-05 | 渤海银行股份有限公司 | Data mining processing method, system and storage medium |
-
2022
- 2022-09-29 CN CN202211194582.2A patent/CN115292388B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120029957A1 (en) * | 2010-08-02 | 2012-02-02 | National Tsing Hua University | Factor analysis system and analysis method thereof |
CN107688653A (en) * | 2017-09-01 | 2018-02-13 | 武汉倚天剑科技有限公司 | User behavior data digging system and its method based on network shallow-layer data |
EP3543874A1 (en) * | 2018-03-23 | 2019-09-25 | Servicenow, Inc. | Automated intent mining, clustering and classification |
CN109118079A (en) * | 2018-08-07 | 2019-01-01 | 山东纬横数据科技有限公司 | A kind of manufacturing industry product quality data relation analysis method |
CN112434104A (en) * | 2020-12-04 | 2021-03-02 | 东北大学 | Redundant rule screening method and device for association rule mining |
CN113486086A (en) * | 2021-07-01 | 2021-10-08 | 上海硕恩网络科技股份有限公司 | Data mining method and system based on feature engineering |
CN114548646A (en) * | 2021-12-15 | 2022-05-27 | 中南大学 | Epidemic situation risk factor identification method and system based on association rule |
CN114861655A (en) * | 2022-04-02 | 2022-08-05 | 渤海银行股份有限公司 | Data mining processing method, system and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115292388B (en) | 2023-01-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2020200997B2 (en) | Optimization of audio fingerprint search | |
CN108154198B (en) | Knowledge base entity normalization method, system, terminal and computer readable storage medium | |
CN109634924B (en) | File system parameter automatic tuning method and system based on machine learning | |
CN104765768A (en) | Mass face database rapid and accurate retrieval method | |
CN109800879A (en) | Construction of knowledge base method and apparatus | |
CN114022904B (en) | Noise robust pedestrian re-identification method based on two stages | |
CN115292388B (en) | Automatic scheme mining system based on historical data | |
CN112100419A (en) | Single weather image identification method and system based on image retrieval | |
CN116757498A (en) | Method, equipment and medium for pushing benefit-enterprise policy | |
CN116204647A (en) | Method and device for establishing target comparison learning model and text clustering | |
CN106897705B (en) | Ocean observation big data distribution method based on incremental learning | |
CN113283243B (en) | Entity and relationship combined extraction method | |
CN110689040B (en) | Sound classification method based on anchor portrait | |
CN111813975A (en) | Image retrieval method and device and electronic equipment | |
CN115080921B (en) | Improved Top-k dosing method based on audit sensitivity | |
CN104182461B (en) | A kind of Time Series Data Mining system | |
CN111680986B (en) | Method and device for identifying serial case | |
CN117350288B (en) | Case matching-based network security operation auxiliary decision-making method, system and device | |
CN117235137B (en) | Professional information query method and device based on vector database | |
CN116910377B (en) | Grid event classified search recommendation method and system | |
CN102968432A (en) | Control method for verifying tuple on basis of degree of confidence | |
CN117499340A (en) | Communication resource name matching method, device, equipment and medium | |
CN111552862A (en) | Automatic template mining system and method based on cross support degree evaluation | |
CN117313892A (en) | Training device and method for text processing model | |
CN113723429A (en) | Region boundary identification method and device based on model optimization iteration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |