CN115619192B

CN115619192B - Mixed relation extraction method oriented to demand planning rules

Info

Publication number: CN115619192B
Application number: CN202211408137.1A
Authority: CN
Inventors: 刘嫣然; 汪亦星; 许璐; 倪颖; 梅杰; 杨阳
Original assignee: Materials Branch of State Grid Jiangsu Electric Power Co Ltd
Current assignee: Materials Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2022-11-10
Filing date: 2022-11-10
Publication date: 2023-10-03
Anticipated expiration: 2042-11-10
Also published as: CN115619192A

Abstract

The invention discloses a mixed relation extraction method for demand planning rules, which comprises a parameter pre-extraction stage, an autonomous learning stage, an active learning stage and an application deployment stage 4 stages on a general data set. Based on the existing artificial intelligence technology and relation extraction theory, the invention can extract a great amount of unknown relation of the requirement planning rules on the basis of semi-supervised learning, and has the relation extraction capability under weak supervision. The problem that noise is possibly large due to different levels of writing demand plans of various personnel during extraction can be relieved. After the relation extraction, the method can realize that unreasonable places possibly exist, and prompt the unreasonable places to human users.

Description

Mixed relation extraction method oriented to demand planning rules

Technical Field

The invention belongs to the field of electric power, and relates to a mixed relation extraction method facing to a demand planning rule.

Background

Demand planning is the basis of intelligent purchasing as a source of material management. With the continuous improvement of the lean requirements of the power grid on the plan management and the timeliness requirements of the plan report in recent years, the problems of low efficiency and easy error exist in the conventional manual demand plan audit, the plan management work has long service chain, multiple levels, various kinds of the demand plans, the rule system relates to a plurality of constraint conditions such as service coding, business calculation, space-time range and the like, and the plan full time and audit expert need to check the manual audit by contrast rule. Moreover, since the demand plan involves a plurality of batches, the auditing points are more, the association relationship is complex, and the rules are difficult to accurately grasp. Therefore, a new demand plan information intelligent technology is needed, and complex logic in the existing auditing gist is extracted as machine rules capable of rapidly and efficiently standardizing auditing by utilizing tools of machine learning, deep learning and data science, so that the efficiency and accuracy of demand plan auditing are improved, and a tamping foundation is implemented for efficient purchasing.

The relation extraction method based on deep learning is mainly divided into two types of remote supervision and supervised learning, wherein the supervised learning uses a manually marked data set in the training process, and the remote supervision method automatically marks the corpus by aligning a remote knowledge base. The performance of the supervised model or the remote supervised model depends on the marking quality of the training set, and the marking mode of the remote supervised data set enables the model to contain a large amount of noise, so that the influence of the noise on the model is an important research problem of remote supervision relation extraction.

For tasks facing the requirement planning rules, the problems to be solved are:

1. the relation extraction of a large number of unknown demand planning rules needs to have the relation extraction capability under weak supervision;

2. when extracting, the level of the writing demand plan of various personnel is different, so that noise can be larger;

3. after the relationships are extracted, the relationships need to be approved by human users, and according to requirements, a prompt for suspicious relationships should be provided, namely, unreasonable places can exist in an unknown demand plan, and the unreasonable places need to be prompted to the human users by machines.

For the development of the requirement planning rule, the resources that may be provided are:

1. through some existing common public data sets, some common relations can be learned;

2. there are many characteristic method schemes, which can learn by multiple algorithms, and the hardware resources can support the comparison learning of multiple algorithms;

3. when the method is used in the initial stage, the user's approval process is also a relationship auditing process for unknown samples, so that a new sample can be provided;

4. for demand planning rules of large groups, a generic template can be defined, which, when writing important sentences of its core content,

the user can fill in semantic content in the core content sentence, so that the automatic determination of the core content sentence relation is realized.

Therefore, it is easy to see that in order to fully utilize the above resource conditions to complete the problem to be solved, a hybrid relation extraction algorithm can be designed 1, and a semi-supervised scheme is utilized to learn a great amount of relation extraction results and give full-supervised learning; 2. training a plurality of full supervision schemes by using the existing general data set-based sample data accumulated at ordinary times, and supervising the result of the relation extraction of the semi-supervision schemes by using the voting scheme; 3. controlling whether the accuracy is excessively deviated or not by using the angle of stability; 4. finally, if multiple learning schemes are utilized, particularly if the deviation of the learning schemes is known, voting schemes can be used, and the content with the largest deviation can be used as a suspected object or an unreasonable object to be provided for a user to prompt the user to analyze and approve. In summary, the method is similar to a teacher-student learning scheme for a plurality of students on a learning model, and the relationship between the teachers and the students is continuously converted.

Disclosure of Invention

In order to overcome the problems of the prior art, the invention aims to provide a mixed relation extraction method facing to a requirement planning rule, which is based on the existing artificial intelligence technology and relation extraction theory, realizes relation extraction of a standard sample on the basis of semi-supervised learning, compares the relation extraction with the standard sample, prompts a user to examine and approve unreasonable content, and realizes meta-learning capability through active learning.

The aim of the invention is achieved by the following technical scheme:

a mixed relation extraction method facing to a demand planning rule is characterized by comprising the following steps:

step 1, parameter Pre-extraction stage on generic dataset

Step 1-1, when an important sentence of the core content is written by defining a general template, a user can fill in semantic content in the important sentence to realize automatic determination of the sentence relation of the core content, so that a small rule base R1 of the general relation which is empty initially is provided, and the content of the small rule base R1 can be obtained through simple text recognition and position positioning in the future instead of a complex relation extraction scheme. In addition, the module format requirement is set, and the length of the bar can be controlled, so that the value of the distance length lf extracted by the remote supervision relation under the universal template is obtained.

Step 1-2, on the general non-professional full-label data sets DatasetG (1) to DatasetG (n), using 90% data, respectively learning 1 semi-supervised learning algorithm MH and m fully supervised learning algorithms MF (1) to MF (m), and training respective models MHt and MFt (1) to MFt (m). Their performance was then tested on the remaining 10% of the samples, specifically in calculating the difference in relation extraction between 1 semi-supervised and m fully supervised voting methods on each of these all n data sets, one data set alone. The concrete work is as follows:

statistics were made on the results RsG (mi) (mi=1 to mimax) of all the relationship extractions of the remaining 10% of the full-label samples of the individual data sets, where mimax is the total number of all the ternary relationships, which results are summed together from the results of all the learning algorithms. The semi-supervised learning algorithm results in part in RsGh (mih) (mih =1 to mihmax). The full-supervised learning algorithm is partially RsGf (mif) (mif=1 to mifmax), and there is a crossover between them, and the whole RsG (mi) also includes the times respectively identified by the semi-supervised learning and the full-supervised learning. The relational expression of ternary method is known in the art, and its main content is ABC composition, A and C are objects, B is a relation, for example Nanjing (A) belongs to (B) Jiangsu (C). In the present invention, each ternary relationship result also includes a discovered array Timde1 (RsG, deth1, detf1, realout), where RsG indicates that the pre-training is on the common dataset; deth1 is the result tested by the semi-supervised learning algorithm, and is only 1 or 0; detf1 is the number of times tested by m full supervision algorithms, which range from 0 to m. For example, the rm-th relation "Nanjing belongs to Jiangsu" Timde1 (RsG, 1,2, 1) shows that the Nanjing is extracted by the semi-supervision relation and is extracted by the full supervision for 2 times, and Reaout is whether the relation is artificially marked as correct in the full label sample, only 1 or 0,1 is correct, and 0 is incorrect.

The fully supervised voting method sets a variable volenum ranging from 1 to m. For the models MFt (1) to MFt (m) of the full supervision method, the voting method is performed on the results in 10% of the samples, and if the model MFt (1) to MFt (m) of the full supervision method is found to be equal to or greater than the volenum times, then it is judged that the relationship is true (1) from the viewpoint of full supervision. For example, if m is 10 and von is 2, one rmth relation "Timde 1 (RsG, 1,2, 1) of Timde1 (RsG, deth1, detf1, reaout) in RsG (mi) above" nanjing belongs to Jiangsu "becomes Timde2 (RsG, deth2, detf2, reaout) = (RsG, 1), detf2 is now the same as deth1, deth2, only 1 or 0, indicating a confirmation relation. 1 denotes the results of votes by m full supervision models with volenum threshold as confirmation, at which point the semi-supervision is as effective as the full supervision and they all conform to the results of the real human label realou.

Using the traversal method, set votnum from 1 to m, then find out at which value, among all results, the equal result ratio of detf2 and reaout is highest in a single data set, yielding votenumbest (ni) on n data sets, ni=1 to n. These votenumbest (ni) values were then averaged to obtain a votoop as the voting threshold for the post-voting method.

The above fully supervised voting method is used, and the threshold is volteop, so that the accuracy rates Pre (MFop (1)) to Pre (MFop (n)) of the total data sets are obtained. Their average avg (MFop), variance var (MFop) are then calculated for later use.

In addition, the accuracy performance of the semi-supervised method, pre (Mht (1)) to Pre (Mht (n)), was calculated for each individual dataset. Their average avg (Mht), variance var (Mht) are then calculated for later use.

Step 2, autonomous learning phase on professional but weak-labeled dataset

And 2-1, learning a data set DatasetS of the weak tag by using a semi-supervised learning algorithm MH. And (3) providing the numerical value of the distance length lf extracted by the remote supervision relation obtained in the step (1-1) to the MH under a general template as a parameter. Meanwhile, the small rule base R1 of the general relation which is initially empty in the step 1-1 is used as an index of evaluation, data which requires autonomous learning of the small rule base R1 are contained, and a model MHt2 is trained.

And 2-2, obtaining knowledge results of all ternary relation sets from the learned semi-supervised learning algorithm MHt2 on DatasetS, taking sentences as unit length, listing as new data samples DatasetSF (1) of full-supervised learning, providing the data samples DatasetSF (1) and DatasetG (1) to DatasetG (n) for m full-supervised learning algorithms MF (1) to MF (m) for learning, and training respective models MFt2 (1) to MFt (m).

Step 3, active learning stage

At this point, the algorithm has started working towards the approver, and at this stage, the content with the greatest difference between the results of the multiple learning methods is submitted to the approver, who is asked to label the content.

Step 3-1, for each new sample, performing relation extraction by using the semi-supervised model MHt2 and the fully-supervised model MFt (1) to MFt (m) trained in step 2 to obtain an Rsf2 (mj) relation set, mj=1 to mjmax, where the relation set includes a set of triples similar to step 1-2, including a found array Timede1 (Rsf 2, deth1, detf1, realout). At this point, since there is no tag, realout is Null. At this time, when defh1 is calculated to be 0, detf1 is greater than 7; and the case that defh1 is 1 and deff 1 is less than 2, namely, the case that the two semi-supervision and the full-supervision are greatly different. These conditions are provided to the approver who is asked to centrally apply labels, i.e., labels Realout.

And 3-2, manually labeling the content, namely the knowledge result of the ternary relation set, listing the knowledge result as a new data sample DatasetSF (2) by taking sentences as unit lengths, adding the knowledge result into the data set of the step 2-2, and referring to the step 2-2, so as to train the model MHt2.

Step 3-3, referring to step 2-2, using MHt2 to analyze the new sample of step 3-1, and giving all the full-supervised learning to the ternary relation extracted from the new sample of step 3-1, and training the respective models MFt2 (1) -MFt (m);

step 3-4, repeating step 3-1 to step 3-3 until at least 3 consecutive samples are analyzed, wherein no large difference occurs in the double emission of step 3-1. Since knowledge samples are incremental, the number of such widely varying cases has been theoretically shrinking. Meanwhile, if the user wants to increase the speed and reduce the time, labels can be actively added to the label-free ternary relation group in the step 3-1, so that a learning sample is improved.

Step 4, application deployment phase

The work of this stage is to face a new task, not actively ask the approver to provide the label of the dispute position, but passively accept the result of each task approval, and provide the inconsistent content of the analysis results of the full supervision model and the semi-supervision model to the user, remind the user of the unreasonable place where the unreasonable situation may occur, namely, complete the problem 3 in the background of the invention.

Step 4-1, for each new sample, performing relation extraction on the semi-supervised model MHt2 and the fully-supervised model MFt (1) -MFt (m) trained in step 3 to obtain an Rsf3 (mk) relation set, where mk=1-mkmax, where the new sample contains a set of triples similar to step 1-2, including a discovered array Timkde1 (Rsf 2, detf1, realcout). At this point, since there is no tag, realout is Null. At this time, referring to step 1-2, the volteop obtained in step 1-2 is written into this step, when the surf 1 is equal to or greater than the volteop, the surf 2 in Timkde2 (Rsf 2, det 2, realout) is 1, otherwise, is 0. All inconsistent content parts of deth2 and detf2 are displayed to the user, alerting him that an unreasonable place may occur here.

Step 4-2, after the user approves the sample in step 4-1, a result of not passing (0) or passing (1) is obtained, and the result is used as a weak label and is used as a data set together with the data set in step 2. And (3) intensively learning and updating every 1 NumT sample according to the step (2) to obtain a new trained semi-supervised model MHt2 and a full-supervised model MFt (1) to MFt (m).

Step 4-3, intensively learning statistics according to step 1-2 every NumT2 (wherein NumT2> NumT 1) samples. The new trained semi-supervised model MHt2', fully supervised model MFt ' (1) to MFt ' (m) is derived using 90% of the data together in step 4-2. Calculating average avg '(MFop) and variance var' (MFop) of the full supervision model; the accuracy performance of the semi-supervised method was counted and their average avg' (Mht) calculated.

Condition 1: analyzing whether the accurate range B1 of the average value of the full supervision model in the step 3 is within the accurate range B0 of the 3 variances obtained in the step 1-2.

B0:(avg(MFop)-3*var(MFop))～(avg(MFop)+3*var(MFop))

B1:(avg’(MFop)-var’(MFop))～(avg’(MFop)+var’(MFop))

Condition 2: analyzing whether avg' (Mht) of the semi-supervised model average value in the step 3 is within the 3 variance accuracy range C0 obtained in the step 1-2.

C0:(avg(Mht)-3*var’(Mht))～(avg(Mht)+3*var’(Mht))

If either condition 1 or condition 2 does not hold, it is indicated that too much noise has been learned, step 3 needs to be re-entered, and after learning the exact tag, it is deployed.

The invention mainly comprises a semi-supervised learning algorithm MH and a plurality of fully supervised learning algorithms MF (1) to MF (m) on a model. The data set comprises a plurality of general but non-professional full-label data sets DatasetG (1) -DatasetG (n), and a professional but only whole correct or not sample set DatasetS of requirement planning rules, wherein the sample of the sample set is a sample added with weak labels at each approval. In the learning process, the method comprises 4 stages of a parameter pre-extraction stage, an autonomous learning stage, an active learning stage and an application deployment stage on a general data set. In addition, the invention can also define a general template, when writing important sentences of the core content, a user can fill in semantic content, and can realize automatic determination of the sentence relationship of the core content according to the position of the user in the text through simple semantic recognition, so that a small rule base R1 of the general relationship is provided.

The beneficial effects of the invention are as follows:

by implementing the scheme, 3 problems of relation extraction facing to the requirement planning rules can be solved. The relation extraction method can extract a large number of unknown requirement planning rules, and has the relation extraction capability under weak supervision. The problem that noise is possibly large due to different levels of writing demand plans of various personnel during extraction can be relieved. After the relation extraction, the method can realize that unreasonable places possibly exist, and prompt the unreasonable places to human users.

Drawings

FIG. 1 is a flow chart of an overall method according to an embodiment of the present invention.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

The present invention is further described below in conjunction with fig. 1.

Based on the existing artificial intelligence technology and relation extraction theory, the invention provides a relation extraction method for large-scale enterprise demand planning rules, which realizes relation extraction of standard samples on the basis of semi-supervised learning, then compares the standard samples, prompts users to examine and approve unreasonable content, and realizes meta-learning capability through active learning.

The invention mainly comprises a semi-supervised learning algorithm MH and a plurality of fully supervised learning algorithms MF (1) to MF (m) on a model. The data set comprises a plurality of general but non-professional full-label data sets DatasetG (1) -DatasetG (n), and a professional but only whole correct or not sample set DatasetS of requirement planning rules, wherein the samples of the sample set are added with weak label samples at each approval. In the learning process, the method comprises 4 stages of a parameter pre-extraction stage, an autonomous learning stage, an active learning stage and an application deployment stage on a general data set. In addition, the invention can also define a general template, when writing important sentences of the core content, a user can fill in semantic content in the important sentences, and the automatic determination of the sentence relationship of the core content is realized, so that a general relationship small rule base R1 is provided.

In a specific embodiment of the present invention, the semi-supervised learning algorithm MH employs the TMNN algorithm in literature (Ni Jun. Study of relation extraction method based on weakly supervised learning [ D ]. University of even-though university). Total 10 totally supervised learning algorithms, namely m=10, are respectively:

[1] the Multi-head method regards the relation extraction task as a Multi-head selection problem, and can extract various relation types between entity pairs.

[2] Multi-head+AT, model uses the anti-learning method based on Multi-head attention mechanism to extract the entity relationship.

[3] Sci IE, model introduces a multitasking setup for classifying entities, relationships and co-occurrence clusters in scientific articles.

The relationship classification model can utilize cross-sentence relationships and reduce cascading errors through co-occurrence word connection.

[4] And the Relation-Metric model is combined with the Metric learning and convolutional neural network to realize Relation extraction.

[5] Biaffine Attention the extended Bi LSTM-CRF model learns the second order interactions of hidden states using a Deep BiaffineAt-attention Layer.

[6] Multi-turn QA, model defines entity relation extraction as a Multi-round dialogue question-answer task, and SOTA (State-of-the-art) effect is obtained in the sequence labeling method.

[7] Dy GIE++, a Dy GIE model based on span is further expanded, and common-finger analysis is introduced to realize enhancement of entity and relation characteristic representation.

[8] SpERT is a span-attention-based entity relationship extraction method, with best results achieved on Co NLL04 and ADE datasets.

[9] Hierarchical Attention an auxiliary language model based training goal is presented and a hierarchical multi-head attention mechanism is employed to capture the most important semantic information enhancing relationship extraction capabilities.

[10] Cas Rel, which first identifies all possible head entities, and second uses a specific relationship labeler to identify the relationship and tail entity corresponding to each head entity.

In this embodiment, the generic but non-professional full label dataset is a TACRED dataset, a Sem Eval-2010 Task 8 dataset, a SCIERC dataset, a CoNLL04 dataset, an ADE dataset, i.e., n=5. The sample set DatasetS of requirement planning rules that are professional but only wholly correct or not is self-built.

The method comprises the following steps:

step 1, parameter Pre-extraction stage on generic dataset

In this embodiment, R1 is a self-built standard three-way semantic rule, which includes a plurality of classes such as a subunit of the first party, a fee relationship, a task relationship, and the like, and can automatically analyze 78 standard three-way semantic rules through text recognition instead of relationship extraction only by filling in an accurate position. In addition, lf is limited to within 80 Chinese characters due to the standard length established.

Step 1-2, on the general non-professional full-label data sets DatasetG (1) to DatasetG (n), using 90% data, respectively learning 1 semi-supervised learning algorithm MH and m fully supervised learning algorithms MF (1) to MF (m), and training respective models MHt and MFt (1) to MFt (m). Their performance was then tested on the remaining 10% of the samples, specifically in calculating the difference in relation extraction between 1 semi-supervised and m fully supervised voting methods on each of these all n data sets, one data set alone. In this embodiment, m=10, n=5. The concrete work is as follows:

In this particular embodiment, votoop=3.

Step 2, autonomous learning phase on professional but weak-labeled dataset

In this embodiment, R1 is automatically analyzed by text semantics, but not by relation extraction, and the semantics are projected to 78 known rules according to the position at the time of writing, so that the analysis can be simplified. And then by providing these 78 rules to the MH for learning.

Step 3, active learning stage

Step 3-1, for each new sample, performing relation extraction by using the semi-supervised model MHt2 and the fully-supervised model MFt (1) to MFt (m) trained in step 2 to obtain an Rsf2 (mj) relation set, mj=1 to mjmax, which includes a set of similar triples as step 1-2, including a found array Timede1 (Rsf 2, deth1, detf1, realout). At this point, since there is no tag, realout is Null. At this time, when defh1 is calculated to be 0, detf1 is greater than 7; and the case that defh1 is 1 and deff 1 is less than 2, namely, the case that the two semi-supervision and the full-supervision are greatly different. These conditions are provided to the approver who is asked to centrally apply labels, i.e., labels Realout.

And 3-2, manually labeling the content, namely the knowledge result of the ternary relation set, listing the knowledge result as a new data sample DatasetSF (2) by taking sentences as unit lengths, adding the knowledge result into the data set in the step 2-2, and training a model MHt2 by referring to the step 2-1.

Step 4, application deployment phase

Step 4-2, after the user approves the sample in step 4-1, a result of not passing (0) or passing (1) is obtained, and the result is used as a weak label and is used as a data set together with the data set in step 2. And (3) intensively learning and updating every 1 NumT sample according to the step (2) to obtain a new trained semi-supervised model MHt2 and a full-supervised model MFt (1) to MFt (m). In this embodiment, numt1=100.

Step 4-3, intensively learning statistics according to step 1-2 every NumT2 (wherein NumT2> NumT 1) samples. The new trained semi-supervised model MHt2', fully supervised model MFt ' (1) to MFt ' (m) is derived using 90% of the data together in step 4-2. Calculating average avg '(MFop) and variance var' (MFop) of the full supervision model; the accuracy performance of the semi-supervised method was counted and their average avg' (Mht) calculated. In this embodiment, numt1=300.

Condition 1: analyzing whether the accurate range B1 of the average value of the existing full supervision model in the step 3 is within 3 variance accurate ranges B0 obtained in the step 1-2.

B0:(avg(MFop)-3*var(MFop))～(avg(MFop)+3*var(MFop))

B1:(avg’(MFop)-var’(MFop))～(avg’(MFop)+var’(MFop))

C0:(avg(Mht)-3*var’(Mht))～(avg(Mht)+3*var’(Mht))

Claims

1. A mixed relation extraction method facing to a demand planning rule is characterized by comprising the following steps:

step 1, parameter Pre-extraction stage on generic dataset

Step 1-1, setting format requirements of a universal template, and controlling the length of a bar, so as to obtain the value of the distance length lf extracted by a remote supervision relation under the universal template;

step 1-2, on general non-professional full-label data sets DatasetG (1) to DatasetG (n), 90% of data is used as a training set, the remaining 10% is used as a test set, a voting threshold value votoop is obtained, and the average value and variance of accuracy rates of the voting threshold value volteop on n general data sets are determined, so that a stable range used in the later period is determined;

step 2, autonomous learning phase on professional but weak-labeled dataset

Step 2-1, learning a data set DatasetS of a weak tag by using a semi-supervised learning algorithm MH; providing the distance length lf value extracted by the remote supervision relation obtained in the step 1-1 to the MH as a parameter thereof under a general template; meanwhile, taking a small rule base R1 with an initial empty general relation in the step 1-1 as an index for evaluation, wherein data requiring autonomous learning of the small rule base R1 comprises all small rule bases, and training a semi-supervision model MHt2;

step 2-2, obtaining knowledge results of all ternary relation sets from DatasetS by a learned semi-supervised learning algorithm MHt2, using sentences as unit length, listing as new data samples DatasetSF (1) of full-supervised learning, providing the data samples DatasetSF (1) and DatasetG (1) to DatasetG (n) for m full-supervised learning algorithms MF (1) to MF (m) for learning, and training respective models MFt2 (1) to MFt (m);

step 3, active learning stage

Step 3-1, for each new sample, performing relation extraction on the trained semi-supervised model MHt2 and the fully-supervised model MFt (1) to MFt (m) in step 2 to obtain an Rsf2 (mj) relation set, mj=1 to mjmax, wherein the relation set comprises a group of similar triplet results as in step 1-2, and the triplet results comprise a discovered array Timede1 (Rsf 2, deth1, detf1, realout); at this point, since there is no tag, realout is Null; at this time, when defh1 is calculated to be 0, detf1 is greater than 7; and the case that defh1 is 1 and deff 1 is less than 2, namely the case that the two semi-supervision and the full-supervision are greatly different; labeling the conditions in a centralized way, namely labeling Realout;

step 3-2, manually marked content, namely knowledge results of a ternary relation set, is listed as a new data sample DatasetSF (2) by taking sentences as unit lengths, is added to the data set of the step 2-2, and a semi-supervision model MHt2 is trained by referring to the step 2-2;

step 3-4, repeating step 3-1 to step 3-3 until at least 3 continuous samples are analyzed, wherein no large double differences of step 3-1 occur;

step 4, application deployment phase

Step 4-1, extracting the relation of each new sample by using the semi-supervision model MHt2 and the full-supervision model MFt (1) to MFt (m) trained in the step 3 to obtain an Rsf3 (mk) relation set, wherein mk=1 to mkmax, and the new sample comprises a group of similar triples as in the step 1-2, including a discovered array Timkde1 (Rsf 2, detf1, reaout); at this moment, since no tag exists, realou is Null, at this moment, referring to step 1-2, the volteop obtained in step 1-2 is written into this step, when the surf 1 is greater than or equal to the volteop, the surf 2 in Timkde2 (Rsf 2, deth2, detf2, realou) is 1, otherwise is 0; displaying all inconsistent content parts of deth2 and detf2 to a user, and reminding the user of possible unreasonable places;

step 4-2, after the user approves the sample in step 4-1, obtaining a result which does not pass (0) or passes (1), and taking the result as a weak tag and taking the weak tag and the data set in step 2 as a data set; every 1 NumT sample, intensively learning and updating according to the step 2 to obtain a new trained semi-supervision model MHt2 and a full supervision model MFt (1) to MFt (m);

step 4-3, every other NumT2 samples, wherein NumT2> NumT1, and intensively learning statistics according to the step 1-2; obtaining a new trained semi-supervised model MHt2' and a fully supervised model MFt ' (1) to MFt ' (m) by using 90% of the data in the step 4-2; calculating average avg '(MFop) and variance var' (MFop) of the full supervision model; statistics of the accuracy performance of semi-supervised methods, calculation of their average avg' (Mht);

condition 1: analyzing whether the accurate range B1 of the average value of the full supervision model in the step 3 is within the accurate range B0 of the 3 variances obtained in the step 1-2;

B0:(avg(MFop)-3*var(MFop))～(avg(MFop)+3*var(MFop))

B1:(avg’(MFop)-var’(MFop))～(avg’(MFop)+var’(MFop))

condition 2: analyzing whether avg' (Mht) of the semi-supervised model average value in the step 3 is within the 3 variance accurate range C0 obtained in the step 1-2;

C0:(avg(Mht)-3*var’(Mht))～(avg(Mht)+3*var’(Mht))

wherein avg (MFop) is the average value of the full supervision model and var (MFop) is the variance; avg (Mht) is the semi-supervised model average value;

2. The method for extracting a mixed relation for a demand planning rule according to claim 1, wherein step 1 is specifically as follows:

step 1-1, writing important sentences of core content by defining a general template, realizing automatic determination of the sentence relation of the core content, and providing a small rule base R1 with an initial empty general relation, wherein the content of the small rule base R1 can be obtained in the future through simple text recognition and position positioning, and is not a complex relation extraction scheme; setting the template format requirement, and controlling the length of the bar, thereby obtaining the value of the distance length lf extracted by the remote supervision relation under the universal template;

step 1-2, on a general non-professional full-label data set DatasetG (1) to DatasetG (n), using 90% data to enable 1 semi-supervised learning algorithm MH and m fully-supervised learning algorithms MF (1) to MF (m) to learn respectively, and training respective models MHt and MFt (1) to MFt (m); their performance was then tested on the remaining 10% of the samples, specifically in calculating the difference in relation extraction between 1 semi-supervised and m fully supervised voting methods on each of these all n data sets, one data set alone.

3. The method for extracting the mixed relation oriented to the requirement planning rule according to claim 2, wherein the steps 1-2 specifically work as follows:

counting all relations extraction results RsG (mi) of the rest 10% of the whole label samples of the independent data set, wherein mi=1 to mimax, wherein mimax is the total number of all ternary relations, and the results are obtained by summarizing the results of all learning algorithms together; the half-supervised learning algorithm is partially RsGh (mih), mih =1 to mihmax; the total supervision learning algorithm is partially RsGf (mif), mif=1 to mifmax, and the total RsG (mi) also comprises times respectively identified by semi-supervision learning and total supervision learning; the relational expression of the ternary method consists of ABC, A and C are objects, B is a relation, and each ternary method relational result also comprises a discovered array Timde1 (RsG, deth1, detf1, realout), wherein RsG represents that pretraining is on a general data set; deth1 is the result tested by the semi-supervised learning algorithm, and is only 1 or 0; detf1 is the number of times tested by m full supervision algorithms, with a value from 0 to m;

setting a variable volenum in a range from 1 to m by a full supervision voting method; for the models MFt (1) to MFt (m) of the full supervision method, voting is carried out on the results in 10% of samples of the models, and if the models MFt (1) to MFt (m) of the full supervision method find that the value is greater than or equal to the volenum times, then the relationship is judged to be true (1) from the perspective of full supervision; 1 shows that m full supervision models take the votes of the volenum threshold as confirmation, and at the moment, the semi-supervision and the full supervision have the same effect, and all the results accord with the real artificial label realou;

setting the value of votnum from 1 to m by using a traversal method, then searching a single data set to find out which value is the highest in the equal result proportion of detf2 and Realout in all results, and obtaining votenumbest (ni) on n data sets, wherein ni=1-n; then averaging the votenumbest (ni) to obtain a volteop which is used as a voting threshold value obtained by a later voting method;

using the above fully supervised voting method, the threshold is the volteop, and the accuracy rates Pre (MFop (1)) to Pre (MFop (n)) of the threshold on n general data sets are obtained; then calculating average avg (MFop) and variance var (MFop) of the two components to be used as later use; in addition, the accuracy performance of the semi-supervised method, pre (Mht (1)) to Pre (Mht (n)), in each individual dataset is calculated; their average avg (Mht), variance var (Mht) are then calculated for later use.

4. The method for extracting a mixed relation for a demand planning rule according to claim 1, wherein in step 4, the content of inconsistent analysis results of the fully supervised and semi supervised models is provided to the user to remind the user of possible unreasonable places.