CN110389999A

CN110389999A - A kind of method, apparatus of information extraction, storage medium and electronic equipment

Info

Publication number: CN110389999A
Application number: CN201910684300.9A
Authority: CN
Inventors: 李夏禹
Original assignee: Beijing Shannon Huiyu Technology Co Ltd
Current assignee: Beijing Shannon Huiyu Technology Co Ltd
Priority date: 2019-07-26
Filing date: 2019-07-26
Publication date: 2019-10-29

Abstract

The present invention provides a kind of method, apparatus of information extraction, storage medium and electronic equipments, wherein this method comprises: obtaining open problems, and open problems is decomposed into multiple subproblems according to multi-layer tree construction；Leaf subproblem is chosen as target subproblem, extracts the answer of target subproblem, and is the answer of target subproblem by the expansible information update to match in the subproblem of a upper level；Later using other leaf subproblems as target subproblem, repeat the above steps；The subproblem for the upper level that all expansible information is updated is as target subproblem, until using root problem as target subproblem；The answer of target subproblem is extracted, and using answer as the answer of open problems.Method, apparatus, storage medium and the electronic equipment of the information extraction provided through the embodiment of the present invention can accurately extract the answer of open problems, improve and extract precision, increase substantially the extraction accuracy of challenge.

Description

A kind of method, apparatus of information extraction, storage medium and electronic equipment

Technical field

The present invention relates to information extraction technique fields, method, apparatus, storage in particular to a kind of information extraction Medium and electronic equipment.

Background technique

It is asked currently, question and answer (QA, question answer) model based on general deep learning provides a kind of basis Topic obtains the pervasive solution of answer from text paragraph.However existing QA model is only applicable to simple problem, not It can the complicated problem of answer.

For example, existing original text is: the incumbent principal in the middle school A is Zhang San.Zhang San once held a post in the kindergarten B, and post is the form master； C primary school, post are prefect of studies.

Whom the incumbent principal in the middle school problem 1:A? QA model is answered: Zhang San.

Did the incumbent principal in the middle school problem 2:A once take office in where? QA model can not answer.

Did what post the incumbent principal in the middle school problem 3:A once hold a post? QA model can not answer.

Existing Question-Answering Model (QA model) cannot answer challenge, and certain problems can not be mentioned quickly and accurately For answer.

Summary of the invention

To solve the above problems, a kind of method, apparatus for being designed to provide information extraction of the embodiment of the present invention, storage Medium and electronic equipment.

In a first aspect, the embodiment of the invention provides a kind of methods of information extraction, comprising:

Open problems are obtained, and the open problems are decomposed into multiple subproblems according to multi-layer tree construction；The son Problem includes at least leaf subproblem and root problem, and the answer son corresponding with a upper level of the subproblem of current level Expansible information in problem matches；

The leaf subproblem is chosen as target subproblem, the target subproblem is extracted from preset content of text Answer, and by the expansible information update to match in the subproblem of a upper level be the target subproblem answer；It It afterwards using other leaf subproblems as target subproblem, repeats the above steps, until all in the subproblem of a upper level can Extension information is updated；

The subproblem for the upper level that all expansible information is updated repeats above-mentioned step as target subproblem Suddenly, until using the root problem as target subproblem；

The answer of the target subproblem is extracted from preset content of text, and target extracted at this time is asked Answer of the answer of topic as the open problems.

Second aspect, the embodiment of the invention also provides a kind of devices of information extraction, comprising:

PROBLEM DECOMPOSITION module is decomposed into for obtaining open problems, and by the open problems according to multi-layer tree construction Multiple subproblems；The subproblem includes at least leaf subproblem and root problem, and the answer of the subproblem of current level with Expansible information in the corresponding subproblem of a upper level matches；

Subproblem answer extracting module, for choosing the leaf subproblem as target subproblem, from preset text The answer of the target subproblem is extracted in content, and is by the expansible information update to match in the subproblem of a upper level The answer of the target subproblem；It later using other leaf subproblems as target subproblem, repeats the above steps, until upper one All expansible information is updated in the subproblem of level；The upper level that all expansible information is updated Subproblem repeats the above steps as target subproblem, until using the root problem as target subproblem；

Open problems answer extracting module, for extracting the answer of the target subproblem from preset content of text, And using the answer of the target subproblem extracted at this time as the answer of the open problems.

The third aspect, the embodiment of the invention also provides a kind of computer storage medium, the computer storage medium is deposited Contain computer executable instructions, side of the computer executable instructions for information extraction described in above-mentioned any one Method.

Fourth aspect, the embodiment of the invention also provides a kind of electronic equipment, comprising:

At least one processor；And

The memory being connect at least one described processor communication；Wherein,

The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes, so that the method that at least one described processor is able to carry out information extraction described in above-mentioned any one.

In the scheme that the above-mentioned first aspect of the embodiment of the present invention provides, multiple sons of PROBLEM DECOMPOSITION multi-layer tree construction are asked Topic, the answer of each subproblem is successively determined according to sequence from the bottom up, and finally determines answering for top layer grade subproblem Case.Can be determined by way of PROBLEM DECOMPOSITION it is multiple it is apparent, be easier to extract the subproblem of accurate answer, and be based on upper one The answer of the subproblem of level updates the subproblem of next level, finally accurately extracts the answer of open problems, Ke Yiti Height extracts precision, increases substantially the extraction accuracy of challenge.

To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 shows a kind of flow chart of the method for information extraction provided by the embodiment of the present invention；

Fig. 2 shows a kind of schematic diagrames of the multi-layer tree construction of the composition of subproblem provided by the embodiment of the present invention；

Fig. 3 is shown in the method for information extraction provided by the embodiment of the present invention, extracts the answer of target subproblem Method flow diagram；

Fig. 4 shows a kind of structural schematic diagram of the device of information extraction provided by the embodiment of the present invention；

The structure that Fig. 5 shows the electronic equipment of the method extracted provided by the embodiment of the present invention for execution information is shown It is intended to.

Specific embodiment

In the description of the present invention, it is to be understood that, term " center ", " longitudinal direction ", " transverse direction ", " length ", " width ", " thickness ", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom" "inner", "outside", " up time The orientation or positional relationship of the instructions such as needle ", " counterclockwise " is to be based on the orientation or positional relationship shown in the drawings, and is merely for convenience of The description present invention and simplified description, rather than the device or element of indication or suggestion meaning must have a particular orientation, with spy Fixed orientation construction and operation, therefore be not considered as limiting the invention.

In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include one or more of the features.In the description of the present invention, the meaning of " plurality " is two or more, Unless otherwise specifically defined.

In the present invention unless specifically defined or limited otherwise, term " installation ", " connected ", " connection ", " fixation " etc. Term shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected；It can be machine Tool connection, is also possible to be electrically connected；It can be directly connected, two members can also be can be indirectly connected through an intermediary Connection inside part.For the ordinary skill in the art, above-mentioned term can be understood in this hair as the case may be Concrete meaning in bright.

A kind of method of information extraction provided in an embodiment of the present invention, it is shown in Figure 1, comprising:

Step 101: obtaining open problems, and open problems are decomposed into multiple subproblems according to multi-layer tree construction；Son Problem includes at least leaf subproblem and root problem, and the answer son corresponding with a upper level of the subproblem of current level Expansible information in problem matches.

In the embodiment of the present invention, open problems need the problem of being answered according to preset content of text；It is obtaining To after open problems, open problems are divided into multiple subproblems.Meanwhile subproblem is indicated in the form of the multi-layer tree construction, I.e. each subproblem corresponds to a node of multi-layer tree construction, such as corresponding subproblem (the i.e. leaf of each leaf node Subproblem), and the corresponding subproblem of root node of top layer, i.e. root problem；The root problem was completely corresponding to ask to solution Topic.Wherein, leaf subproblem is the subproblem of bottom grade, and root problem is the subproblem of top layer's grade, subproblem composition A kind of schematic diagram of multi-layer tree construction can be found in shown in Fig. 2.

Meanwhile the hierarchical structure in the present embodiment between subproblem passes through the answer of subproblem and the extension letter of subproblem Breath is to determine.Specifically, the expansible information in the answer subproblem corresponding with a upper level of the subproblem of current level Match." expansible information " in the present embodiment refers to be extended in subproblem (alternatively, can obtain by extended mode To) information, which specifically can be a word, a phrase or a subordinate sentence in subproblem etc..

For example, open problems are " the incumbent principal in middle school once took office in where? ", which can resolve into two Subproblem " whom the incumbent principal in middle school is? ", " he once took office in where? "；Wherein, " he " in second subproblem refers to It is " the incumbent principal in middle school ", then " he " is exactly an expansible information, and is matched with the answer of first character problem, therefore second A subproblem is the subproblem of a upper level for first subproblem.And it is asked since the open problems have only resolved into two sons Topic, then first subproblem " whom the incumbent principal in middle school is? " for leaf subproblem, " his second subproblem once took office in what Place? " it is root problem.

When being decomposed to open problems, can be decomposed by decomposition model.Specifically, above-mentioned steps 101 " will Open problems are decomposed into multiple subproblems according to multi-layer tree construction " it can specifically include:

Step A1: establishing PROBLEM DECOMPOSITION model, and obtains sample problem to be solved and corresponding with sample problem to be solved in advance Multi-layer tree construction multiple subsample problems.

Step A2: using sample problem to be solved as input, by multi-layer tree construction corresponding with sample problem to be solved Multiple subsample problems are trained PROBLEM DECOMPOSITION model as output, the problem decomposition model after determining training.

Step A3: after getting open problems, open problems are decomposed into mould as input, based on the problem after training Type determines multiple subproblems of multi-layer tree construction corresponding with open problems.

In the embodiment of the present invention, sample problem to be solved and corresponding subsample problem are obtained in advance, as training Sample is trained PROBLEM DECOMPOSITION model, may thereby determine that the parameter of PROBLEM DECOMPOSITION model to get the problem to after training Decomposition model.When needing to decompose open problems again later, the problem decomposition model after the training is utilized.Wherein, The PROBLEM DECOMPOSITION model is specifically as follows converter model (Transformer model), by the model to input wait solve Problem is coded and decoded, and obtains the son that the open problems include using from attention mechanism during coding and decoding Problem.

Open problems are decomposed from top to bottom alternatively, can also be handled by natural language understanding.Specifically, will be wait solve Problem determines the expansible information in root problem as root problem, and the determining expansible information with the root problem Matched next level subproblem；It is matched more next in the expansible information institute for determining the subproblem of next level later The subproblem of level, until identified subproblem does not include expansible information, i.e. the leaf subproblem of bottom grade does not include Expansible information.

For example, open problems are " the incumbent principal in middle school once took office in where? ", " the incumbent principal in middle school " therein is can Extend information, produce at this time corresponding next level subproblem " whom the incumbent principal in middle school is? "；Meanwhile the subproblem In expansible information is not present, then the subproblem " whom the incumbent principal in middle school is? " leaf subproblem after as decomposing, root are asked Entitled " the incumbent principal in middle school once took office in where? ".

Step 102: choosing leaf subproblem as target subproblem, target subproblem is extracted from preset content of text Answer, and by the expansible information update to match in the subproblem of a upper level be target subproblem answer；Later will Other leaf subproblems repeat the above steps as target subproblem, until all expansible in the subproblem of a upper level Information is updated.

Step 103: the subproblem for the upper level that all expansible information is updated is as target subproblem, weight Multiple above-mentioned steps, until using root problem as target subproblem.

In the embodiment of the present invention, after determining all subproblems, need successively to obtain from preset content of text each The answer of subproblem；That is, the traditional approach of open problems will be directly acquired in the present embodiment from content of text, replace with from text The mode of the subproblem of open problems is successively obtained in this content, and finally extracts the corresponding answer of open problems.This implementation In example, the answer of subproblem is determined according to the sequence of multi-layer tree construction from the bottom up, i.e., determines answering for leaf subproblem first Case determines the answer of the subproblem of a upper level successively later, until the final answer for determining root problem.

Specifically, determining the answer of leaf subproblem first in a step 102, it may thereby determine that the son of a level is asked Content corresponding to expansible information in topic；After all expansible information is updated in the subproblem of a upper level, i.e., Step 103 can be carried out, i.e., from the answer for extracting the subproblem of a level on this in content of text, until determining institute in root problem There is content corresponding to expansible information, it at this time can be using root problem as target subproblem.

For example, open problems are " what post the incumbent principal in the middle school A holds a post at the kindergarten that Li Si creates? ", should Open problems are divided into three subproblems: subproblem 1 " whom the incumbent principal in the middle school A is ", and " kindergarten of Li Si's creation is subproblem 2 What? ", subproblem 3 " what post someone holds a post at somewhere "；And subproblem 1 and subproblem 2 are leaf subproblems, Subproblem 3 is root problem.Preset content of text are as follows: " Zhang San and Li Si are good friends, and Li Si once created the kindergarten B, Please Zhang San hold a post in the kindergarten B, post is the form master.... later, Zhang San goes to C primary school, and post is educational administration.... post-tensioning Three go to the middle school A again.... currently, the incumbent principal in the middle school A is Zhang San." in the present embodiment, it is first determined leaf subproblem is answered Case, according to content of text it is found that the answer of subproblem 1 is " Zhang San ", the answer of subproblem 2 is " kindergarten B "；In root problem " someone " and " somewhere " be expansible information, two expansible information respectively correspond subproblem 1 and subproblem 2, can After the answer for extending the subproblem that information update is a upper level, which is that " Zhang San holds a post at the kindergarten B What post? ", the answer of root problem can be accurately extracted from content of text at this time.

Step 104: extracting the answer of target subproblem from preset content of text, and target extracted at this time is asked Answer of the answer of topic as open problems.

In the embodiment of the present invention, root problem and standby problem be it is completely corresponding, sub using root problem as target Identified answer can be used as the answer of open problems when problem.Such as above-mentioned example, " Zhang San is in the kindergarten B for root problem When, what post held a post in? " answer be open problems " the incumbent principal in the middle school A at the kindergarten that Li Si creates, tenure In what post? " answer, answer be " form master ".

Traditional problem model often attempts disposably to extract the answer gone wrong, and information provided in an embodiment of the present invention The method of extraction successively determines multiple subproblems of PROBLEM DECOMPOSITION multi-layer tree construction every according to sequence from the bottom up The answer of a subproblem, and finally determine the answer of top layer grade subproblem.It can be determined by way of PROBLEM DECOMPOSITION multiple It is apparent, be easier to extract the subproblem of accurate answer, and the answer of the subproblem based on a upper level updates next level Subproblem finally accurately extracts the answer of open problems, and extraction precision can be improved, and increases substantially the pumping of challenge Take accuracy.

On the basis of the above embodiments, shown in Figure 3, in step 102 and step 104, " from preset text The answer of target subproblem is extracted in content " it specifically includes:

Step 301: content of text being divided into multiple text units, text unit is word, phrase, sentence, one in paragraph Item is multinomial.

Step 302: determining the similarity between each text unit and target subproblem, similarity is greater than preset threshold Text unit as effective text unit, and from all effective text units extract target subproblem answer.

Since content of text generally comprises bulk information, if extracting answering for each subproblem from complete content of text Case will affect treatment effeciency.In the embodiment of the present invention, as unit of word, phrase, sentence or paragraph, content of text is divided into more A text unit, and text unit relevant to current target subproblem is chosen from all text units to extract target The answer of subproblem reduces to reduce treating capacity when determining subproblem answer and calculates the time, improves extraction efficiency.Wherein, Determine which text unit is and the target subproblem phase by calculating the similarity between target subproblem and text unit The unit of pass.The preset threshold can be pre-set fixed value；It can also be according to determining that text unit asks with target Preset threshold is determined after the similarity of topic, again so as to select several text units with highest similarity.

Specifically, above-mentioned steps 302 " determining the similarity between each text unit and target subproblem " include:

Step B1: word segmentation processing is carried out to target subproblem, determines all participle p of target subproblem_i, i ∈ [1, m], m For the participle quantity of target subproblem.

In the embodiment of the present invention, word segmentation processing is carried out to target subproblem first, determines the m participle of target subproblem, That is p₁,p₂,…,p_m.Specifically word segmentation processing can be carried out according to participle model, the present embodiment does not limit this.

Step B2: each participle p is determined_iWith text unit D_jDegree of correlation r_ijAnd the weights omega of each participle_i；And:

Wherein, f_ijIndicate participle p_iIn text unit D_jIn word frequency, l_jIndicate text unit D_jLength, avgl indicate The average length of all text units, N indicates the total quantity of text unit, and j ∈ [1, N], λ are preset non-zero adjustment system Number, n (p_i) indicate comprising participle p_iText unit quantity；g₁()、g₂() is positive correlation function.

In the present embodiment, content of text is divided into N number of text unit D_j, target subproblem includes m participle p_i；It at this time can be with Determine each participle p_iWeight.Specifically, according to comprising segmenting p_iText unit quantity n (p_i) in text unit sum The accounting in N is measured to determine corresponding weight, the accounting is smaller, and corresponding weight is bigger.For example, content of text has been divided into 100 A text unit, and segment " Zhang San " and only occur in two text units wherein, then the weight for segmenting " Zhang San " is opposite It is larger；If participle " ", "Yes" in 99 text units occur, illustrate that such participle is common word, weight It is relatively low.In the present embodiment,λ is preset non-zero adjustment factor, avoids n (p_i) it is zero.g₂ () is positive correlation function, i.e.,It is bigger, corresponding weights omega_iIt is bigger.g₂() is specifically as follows linear function, index Function, logarithmic function etc., the present embodiment does not limit this.

Meanwhile for each text unit, its degree of correlation r between each participle of target subproblem can be determined_ij。 Wherein, text unit D_jLength it is longer, then segment p_iMore easily occur in text cells D_jIn, the degree of correlation between the two r_ijIt is lower.Specifically,G therein₁() and g₂() is similar, and be also positive correlation function.

Step B3: according to the degree of correlation of all participles of target subproblem and weight calculation target subproblem and text unit D_j Between similarity R_j, and

In the present embodiment, each participle and D are being determined_jBetween the degree of correlation after, that is, can determine target subproblem and text Cells D_jBetween similarity R_j.The present embodiment can quickly and accurately determine similar between target subproblem and text unit Degree facilitates and subsequent selects text unit relevant to target subproblem to extract answer.

Optionally, on the basis of the above embodiments, in step 102 and step 104, " from preset content of text The middle answer for extracting target subproblem " specifically includes:

Step C1: when extracting multiple answers of target subproblem from content of text, using all answers as time Answer is selected, and determines the confidence level of each candidate answers.

Step C2: using the highest candidate answers of confidence level as the answer of the target subproblem finally extracted.

In the present embodiment, when extracting the answer of target subproblem from content of text, it may can determine whether multiple answers, i.e., Multiple candidate answers determine unique answer according to the confidence level of each candidate answers at this time.Wherein, question and answer can specifically be passed through Model determines the confidence levels of each candidate answers.

The method of information extraction provided in an embodiment of the present invention, by multiple subproblems of PROBLEM DECOMPOSITION multi-layer tree construction, The answer of each subproblem is successively determined according to sequence from the bottom up, and finally determines the answer of top layer grade subproblem. Can be determined by way of PROBLEM DECOMPOSITION it is multiple it is apparent, be easier to extract the subproblem of accurate answer, and be based on upper one layer The answer of the subproblem of grade updates the subproblem of next level, finally accurately extracts the answer of open problems, can be improved Precision is extracted, the extraction accuracy of challenge is increased substantially.Content of text is divided into multiple text units, and from all Text unit relevant to current target subproblem is chosen in text unit to extract the answer of target subproblem, to reduce Treating capacity when determining subproblem answer, it is possible to reduce calculate the time, improve extraction efficiency

The method flow of information extraction is described in detail above, this method can also be realized by corresponding device, below The structure and function of the device is discussed in detail.

A kind of device of information extraction provided in an embodiment of the present invention, it is shown in Figure 4, comprising:

PROBLEM DECOMPOSITION module 41 is decomposed for obtaining open problems, and by the open problems according to multi-layer tree construction For multiple subproblems；The subproblem includes at least leaf subproblem and root problem, and the answer of the subproblem of current level Expansible information in subproblem corresponding with a upper level matches；

Subproblem answer extracting module 42, for choosing the leaf subproblem as target subproblem, from preset text The answer of the target subproblem, and the expansible information update that will be matched in the subproblem of a upper level are extracted in this content For the answer of the target subproblem；It later using other leaf subproblems as target subproblem, repeats the above steps, until upper All expansible information is updated in the subproblem of one level；The upper level that all expansible information is updated Subproblem as target subproblem, repeat the above steps, until using the root problem as target subproblem；

Open problems answer extracting module 43, for extracting answering for the target subproblem from preset content of text Case, and using the answer of the target subproblem extracted at this time as the answer of the open problems.

On the basis of the above embodiments, PROBLEM DECOMPOSITION module 41 includes:

Model foundation unit for establishing PROBLEM DECOMPOSITION model, and obtains sample problem to be solved and with described wait solve in advance Multiple subsample problems of the corresponding multi-layer tree construction of sample problem；

Training unit, for using the sample problem to be solved as input, will be corresponding with the sample problem to be solved Multiple subsample problems of multi-layer tree construction are trained described problem decomposition model as output, asking after determining training Inscribe decomposition model；

PROBLEM DECOMPOSITION unit, for after getting open problems, using the open problems as input, based on described Problem decomposition model after training determines multiple subproblems of multi-layer tree construction corresponding with the open problems.

On the basis of the above embodiments, the subproblem answer extracting module 42 extracts institute from preset content of text The answer for stating target subproblem includes:

The content of text is divided into multiple text units, the text unit is word, phrase, sentence, one in paragraph Item is multinomial；

It determines the similarity between each text unit and the target subproblem, similarity is greater than to the text of preset threshold This unit extracts from all effective text units the answer of the target subproblem as effective text unit.

On the basis of the above embodiments, the subproblem answer extracting module 42 determines each text unit and the mesh Mark subproblem between similarity include:

Word segmentation processing is carried out to the target subproblem, determines all participle p of the target subproblem_i, i ∈ [1, m], M is the participle quantity of the target subproblem；

Determine each participle p_iWith text unit D_jDegree of correlation r_ijAnd the weights omega of each participle_i；And:

Wherein, f_ijIndicate participle p_iIn text unit D_jIn word frequency, l_jIndicate text unit D_jLength, avgl indicate The average length of all text units, N indicates the total quantity of the text unit, and j ∈ [1, N], λ are preset non-zero adjustment Coefficient, n (p_i) indicate comprising participle p_iText unit quantity；g₁()、g₂() is positive correlation function；

According to target subproblem and text unit described in the degree of correlation and weight calculation of all participles of target subproblem D_jBetween similarity R_j, and

When extracting multiple answers of the target subproblem from the content of text, using all answers as time Answer is selected, and determines the confidence level of each candidate answers；

Using the highest candidate answers of confidence level as the answer of the target subproblem finally extracted.

The device of information extraction provided in an embodiment of the present invention, by multiple subproblems of PROBLEM DECOMPOSITION multi-layer tree construction, The answer of each subproblem is successively determined according to sequence from the bottom up, and finally determines the answer of top layer grade subproblem. Can be determined by way of PROBLEM DECOMPOSITION it is multiple it is apparent, be easier to extract the subproblem of accurate answer, and be based on upper one layer The answer of the subproblem of grade updates the subproblem of next level, finally accurately extracts the answer of open problems, can be improved Precision is extracted, the extraction accuracy of challenge is increased substantially.Content of text is divided into multiple text units, and from all Text unit relevant to current target subproblem is chosen in text unit to extract the answer of target subproblem, to reduce Treating capacity when determining subproblem answer, it is possible to reduce calculate the time, improve extraction efficiency

The embodiment of the invention also provides a kind of computer storage medium, the computer storage medium is stored with computer Executable instruction, it includes the program of the method for executing above-mentioned information extraction, the computer executable instructions are executable Method in above-mentioned any means embodiment.

Wherein, the computer storage medium can be any usable medium that computer can access or data storage is set It is standby, including but not limited to magnetic storage (such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc.), optical memory (such as CD, DVD, BD, HVD etc.) and semiconductor memory (such as ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), solid state hard disk (SSD)) etc..

Fig. 5 shows the structural block diagram of a kind of electronic equipment of another embodiment of the invention.The electronic equipment 1100 can be the host server for having computing capability, personal computer PC or portable portable computer or end End etc..The specific embodiment of the invention does not limit the specific implementation of electronic equipment.

The electronic equipment 1100 includes at least one processor (processor) 1110, communication interface (Communications Interface) 1120, memory (memory array) 1130 and bus 1140.Wherein, processor 1110, communication interface 1120 and memory 1130 complete mutual communication by bus 1140.

Communication interface 1120 with network element for communicating, and wherein network element includes such as Virtual Machine Manager center, shared storage.

Processor 1110 is for executing program.Processor 1110 may be a central processor CPU or dedicated collection At circuit ASIC (Application Specific Integrated Circuit), or it is arranged to implement the present invention One or more integrated circuits of embodiment.

Memory 1130 is for executable instruction.Memory 1130 may include high speed RAM memory, it is also possible to also wrap Include nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.Memory 1130 can also be with It is memory array.Memory 1130 is also possible to by piecemeal, and described piece can be combined into virtual volume by certain rule.Storage The instruction that device 1130 stores can be executed by processor 1110, so that processor 1110 is able to carry out in above-mentioned any means embodiment Information extraction method.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. a kind of method of information extraction characterized by comprising

Open problems are obtained, and the open problems are decomposed into multiple subproblems according to multi-layer tree construction；The subproblem Including at least leaf subproblem and root problem, and the answer subproblem corresponding with a upper level of the subproblem of current level In expansible information match；

The leaf subproblem is chosen as target subproblem, answering for the target subproblem is extracted from preset content of text Case, and be the answer of the target subproblem by the expansible information update to match in the subproblem of a upper level；Later will Other leaf subproblems repeat the above steps as target subproblem, until all expansible in the subproblem of a upper level Information is updated；

The subproblem for the upper level that all expansible information is updated repeats the above steps as target subproblem, Until using the root problem as target subproblem；

Extract the answer of the target subproblem from preset content of text, and by the target subproblem extracted at this time Answer of the answer as the open problems.

2. the method according to claim 1, wherein described divide the open problems according to multi-layer tree construction Solution is that multiple subproblems include:

PROBLEM DECOMPOSITION model is established, and obtains sample problem to be solved and multi-layer corresponding with the sample problem to be solved in advance Multiple subsample problems of tree construction；

Using the sample problem to be solved as input, by the multiple of multi-layer tree construction corresponding with the sample problem to be solved Subsample problem is trained described problem decomposition model as output, the problem decomposition model after determining training；

After getting open problems, using the open problems as input, based on the problem decomposition model after the training Determine multiple subproblems of multi-layer tree construction corresponding with the open problems.

3. the method according to claim 1, wherein described extract target from preset content of text The answer of problem includes:

The content of text is divided into multiple text units, the text unit be word, phrase, sentence, one in paragraph or It is multinomial；

It determines the similarity between each text unit and the target subproblem, similarity is greater than to the text list of preset threshold Member is used as effective text unit, and the answer of the target subproblem is extracted from all effective text units.

4. according to the method described in claim 3, it is characterized in that, each text unit of the determination and the target subproblem Between similarity include:

Word segmentation processing is carried out to the target subproblem, determines all participle p of the target subproblem_i, i ∈ [1, m], m are institute State the participle quantity of target subproblem；

Wherein, f_ijIndicate participle p_iIn text unit D_jIn word frequency, l_jIndicate text unit D_jLength, avgl indicates all The average length of text unit, N indicates the total quantity of the text unit, and j ∈ [1, N], λ are preset non-zero adjustment system Number, n (p_i) indicate comprising participle p_iText unit quantity；g₁()、g₂() is positive correlation function；

According to target subproblem described in the degree of correlation and weight calculation of all participles of target subproblem and text unit D_jBetween Similarity R_j, and

5. method according to claim 1 to 4, which is characterized in that it is described from preset content of text extract described in The answer of target subproblem includes:

When extracting multiple answers of the target subproblem from the content of text, all answers are answered as candidate Case, and determine the confidence level of each candidate answers；

6. a kind of device of information extraction characterized by comprising

PROBLEM DECOMPOSITION module for obtaining open problems, and the open problems is decomposed into according to multi-layer tree construction multiple Subproblem；The subproblem includes at least leaf subproblem and root problem, and the answer and upper one of the subproblem of current level Expansible information in the corresponding subproblem of level matches；

Subproblem answer extracting module, for choosing the leaf subproblem as target subproblem, from preset content of text The middle answer for extracting the target subproblem, and be described by the expansible information update to match in the subproblem of a upper level The answer of target subproblem；It later using other leaf subproblems as target subproblem, repeats the above steps, until a upper level Subproblem in all expansible information be updated；The son for the upper level that all expansible information is updated is asked Topic is used as target subproblem, repeats the above steps, until using the root problem as target subproblem；

Open problems answer extracting module, for extracting the answer of the target subproblem from preset content of text, and will Answer of the answer of the target subproblem extracted at this time as the open problems.

7. device according to claim 6, which is characterized in that PROBLEM DECOMPOSITION module includes:

Model foundation unit, for establishing PROBLEM DECOMPOSITION model, and obtain in advance sample problem to be solved and with the sample to be solved Multiple subsample problems of the corresponding multi-layer tree construction of problem；

Training unit, for using the sample problem to be solved as input, will multilayer corresponding with the sample problem to be solved Multiple subsample problems of grade tree construction are trained described problem decomposition model as output, the problem after determining training point Solve model；

PROBLEM DECOMPOSITION unit, for using the open problems as input, being based on the training after getting open problems Problem decomposition model afterwards determines multiple subproblems of multi-layer tree construction corresponding with the open problems.

8. device according to claim 6, which is characterized in that the subproblem answer extracting module is out of preset text The answer that the target subproblem is extracted in appearance includes:

9. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with computer executable instructions, The method that the computer executable instructions require information extraction described in 1-5 any one for perform claim.

10. a kind of electronic equipment characterized by comprising

At least one processor；And

The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one It manages device to execute, so that the method that at least one described processor is able to carry out information extraction described in claim 1-5 any one.