CN111008271B - Neural network-based key information extraction method and system - Google Patents

Neural network-based key information extraction method and system Download PDF

Info

Publication number
CN111008271B
CN111008271B CN201911138210.6A CN201911138210A CN111008271B CN 111008271 B CN111008271 B CN 111008271B CN 201911138210 A CN201911138210 A CN 201911138210A CN 111008271 B CN111008271 B CN 111008271B
Authority
CN
China
Prior art keywords
vector
elements
text
feature vector
key information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911138210.6A
Other languages
Chinese (zh)
Other versions
CN111008271A (en
Inventor
姜磊
杨钊
赖招展
欧阳滨滨
陈南山
朱振航
何慧
沈广盈
屈吕杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Brilliant Data Analytics Inc
Original Assignee
Brilliant Data Analytics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Brilliant Data Analytics Inc filed Critical Brilliant Data Analytics Inc
Priority to CN201911138210.6A priority Critical patent/CN111008271B/en
Publication of CN111008271A publication Critical patent/CN111008271A/en
Application granted granted Critical
Publication of CN111008271B publication Critical patent/CN111008271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an information extraction technology, in particular to a key information extraction method and a system based on a neural network, which comprises the following steps: generating a label vector, setting the length of an article as n, setting the position of a first character of key information in the article as s and the position of a last character as e, taking s + n + e as elements of the label vector, initializing all the elements to be 0, and resetting the elements at the position of s + n + e to be 1; performing text tensor processing on the article to obtain a text tensor C, and then generating a text feature vector; replacing elements which are obviously impossible to be the largest in the text feature vector with minimum values, multiplying the elements which are not obviously impossible to be the largest by a weight, and generating an output vector; calculating the cross entropy of the output vector and the label vector as loss, and performing iterative training on the neural network until convergence to obtain a model; and inputting text data into the model to obtain an output vector and obtain key information. The problem of prior art on little data set easy overfitting and unable make full use of prior information is solved.

Description

Neural network-based key information extraction method and system
Technical Field
The invention belongs to the technical field of information extraction, and particularly relates to a key information extraction method and system based on a neural network.
Background
The neural network is a mathematical model, which is formed by connecting nodes, and the values of parameters in the neural network are generally updated by a back propagation algorithm during training, so that the whole model is closer to the mapping from a real input space to an output space. Theoretically two layers of neural networks wide enough can fit any function, but in practice if it is done so, it is likely that the model simply remembers the training set and does not learn deeper connections between the data. It may cause the model to perform well on the training set but perform poorly on the test set. Because of this problem, instead of a shallow but wide enough network, one tries a deep network with a certain width, hopefully the deep network can learn deeper features further with the features learned by the shallow network. But with the problem of gradient explosion and gradient disappearance, networks using sigmoid as the activation function were generally limited to five layers at the time. Later relu activation functions were proposed, alleviating the problems of gradient explosion and gradient disappearance.
In 2015, after residual connection is extracted, the problems of gradient explosion and gradient disappearance are basically solved, and hundreds of layers of neural networks can be easily constructed by using the residual connection. With such a deep neural network, the fitting ability is naturally not a problem, but an overfitting problem arises. The learning ability of the model is too strong, and some random phenomena are often learned as rules. This phenomenon is less severe on large datasets; according to the theorem of large numbers, if the data set is large enough, it is difficult for the neural network to learn a remarkable random phenomenon. However, the overfitting phenomenon is particularly remarkable on a small data set, and the existing model is good in performance on a large data set, but poor in performance on a small data set and even inferior to a simple model.
In summary, the technology of neural networks is used for key information extraction at present stage, and has two problems. One aspect is the wrecking of the standard model; for example, sometimes bert + crf (crf generally refers to a sub-class of crf: linear chain random field) is used for extracting key information, the linear chain random field is a regular constraint in nature for a neural network, but the regular constraint only acts on adjacent words, and the constraint between cross words is not existed, so that good prior information cannot be utilized naturally. On the other hand, the existing model is complex, if no prior information is added or no special optimization is performed, overfitting is easy to perform, and the effect is even inferior to that of a simple model.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method and a system for extracting key information based on a neural network, which are used for customizing a neural network model for extracting the key information, and providing strict and effective rule constraints in the model according to the characteristics of the key information, thereby improving the performance of the model on a small data set and solving the problems that the existing key information extraction technology is easy to learn randomly-appearing features on the small data set, so that overfitting is caused and prior information cannot be fully utilized.
The extraction method is realized by adopting the following technical scheme: the key information extraction method based on the neural network comprises the following steps:
s1, generating label vectors, setting the length of an article as n, the position of a first word of key information in the article as S, the position of a last word of key information in the article as e, taking S + n + e as elements of the label vectors, and initializing an n-dimensional label vector for each article; initializing all elements of the label vector to 0, and resetting the elements of the label vector at the position of s × n + e to 1 to obtain a final label vector;
s2, carrying out text tensor processing on a given article to obtain a text tensor C;
s3, generating a text feature vector, obtaining a first character feature vector CS and a last character feature vector CE according to the text tensor C, and taking a Cartesian product of the first character feature vector CS and the last character feature vector CE as the text feature vector;
s4, replacing the elements which are obviously impossible to be the largest in the text feature vector with minimum values;
s5, sharing parameters, multiplying the elements which are not obvious and can not be the largest in the text feature vector by a weight to generate a new output vector;
s6, calculating a loss function, calculating the cross entropy of the output vector and the label vector as loss, and performing iterative training on the neural network;
s7, minimizing a loss function by using a gradient descent method, iterating until convergence, and storing a neural network model;
and S8, inputting text data into the stored neural network model to obtain a final output vector and obtain the extracted key information.
In a preferred embodiment, in step S1, the elements of the initialized label vector are processed out of order, copied for several times and spliced together end to obtain an element string, and then the first elements in the element string are fixedly intercepted to form a final label vector.
In step S5, all elements having the same difference between e and S, which are associated with the elements that are not obviously the largest but are not possible in the text feature vector, are multiplied by the same weight, and a new text feature vector is generated as an output vector.
In a preferred embodiment, step S8 includes:
obtaining m elements of the output vector, wherein the m elements are more than or equal to any element except the m elements in the output vector;
calculating to obtain s and e corresponding to the m elements according to the one-to-one correspondence between the positions of the m elements in the output vector and the combination of the s and the e;
obtaining text fields corresponding to the m elements through s and e corresponding to the m elements; and adding corresponding elements of the same text segments to serve as a new corresponding element of the text segment, and selecting the text segment corresponding to the largest element in the new corresponding elements as a final output vector.
The invention relates to a key information extraction system based on a neural network, which comprises:
the tag vector generation module is used for initializing a n-dimensional tag vector for each article by taking the length of the article as n, the position of the first character of the key information in the article as s and the position of the last character of the key information in the article as e and taking s + n + e as an element of the tag vector; initializing all elements of the label vector to 0, and resetting the elements of the label vector at the position of s × n + e to 1 to obtain a final label vector;
the text tensorial module is used for carrying out text tensorial processing on a given article to obtain a text tensor C;
the text feature vector generation module is used for obtaining a first character feature vector CS and a last character feature vector CE according to the text tensor C, and taking a Cartesian product of the first character feature vector CS and the last character feature vector CE as a text feature vector;
the assignment module is used for replacing obviously elements which cannot be the largest in the text feature vector with minimum values;
the parameter sharing module is used for multiplying the elements which are not obvious and can not be the largest in the text feature vector by a weight to generate a new output vector;
the loss function calculation module is used for calculating the cross entropy of the output vector and the label vector as loss and performing iterative training on the neural network;
the iterative convergence module is used for minimizing a loss function through a gradient descent method, iterating until convergence, and storing a neural network model;
and the prediction module is used for inputting text data into the stored neural network model to obtain a final output vector and obtain the extracted key information.
According to the technical scheme, compared with the prior art, the invention has the following beneficial effects:
1. aiming at a specific scene of key information extraction, the invention provides a set of neural network models specially used for processing the key information extraction problem, two strict and effective rule constraints are provided in the models according to the characteristics of the key information, the parameter utilization rate of the models is improved by using shared parameters, and the parameter quantity of the models is reduced; the representation of the model on the small data set is remarkably improved, and the problems that the existing key information extraction technology easily learns randomly-appearing features on the small data set, so that overfitting is caused and prior information cannot be fully utilized are solved.
2. The method has obvious effect on small data sets, and can improve the expression of the model and accelerate the convergence rate of the model to a certain extent for large data sets. The feature vectors of the text are extracted through the customized neural network, and then the neural network can utilize prior information by shielding certain elements in the feature vectors and adopting a parameter sharing mode, so that the convergence speed of the neural network is increased, and the overfitting degree is reduced.
3. The invention provides a neural network model in the field of key information extraction, and the remarkable progress is achieved by the neural network model comprises the following three aspects:
in the first aspect, compared with the common neural network models such as bert + crf, lstm + crf or bert + lstm + crf, the invention provides an idea of adding customized rule constraints into the neural network aiming at the characteristics of key information extraction. The customized rule constraint specifically includes two, and the first rule constraint is: setting the position of the first character of the key information in the text as s and the position of the last character of the key information in the text as e, wherein s is smaller than e; the second rule constraint is: e-s cannot exceed a certain value x, where x is a set threshold. Because the position of the word at the beginning of the key information (i.e. the first word) is obviously not larger than that of the word at the end of the key information (i.e. the last word), and because the length of the key information is not too long in general, the positions of the words at the head and the tail of the key information are not too far away; the two rule constraints achieve a significant technical effect of greatly reducing the effective dimension of the output vector of the neural network from original n x n (n is the article length) to x.
In a second aspect, the invention further provides a technical means that corresponding elements with the same e and s difference share parameters in the feature vector, so that x (x-1) parameters are reduced, and the overfitting phenomenon of the neural network model is effectively relieved.
In a third aspect, the present invention proposes a scheme that considers the first several (e.g., m) candidate key information. Compared with the method of directly taking the most possible candidate key information as the key information, the scheme of the invention has more stability.
In general, through strong and effective rule constraint and parameter sharing, the method has more excellent performance in the field of key information extraction compared with a general neural network model. The promotion of the model of the present invention over a generic general model is larger when the amount of data is smaller, i.e. applied on small data sets.
Drawings
FIG. 1 is a flow chart of key information extraction according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
In the key information extraction method based on the neural network, on the whole, on one hand, label vectors are generated according to articles and are used for training the neural network; on the other hand, for the article with the length of n, extracting a text feature vector with the dimension of n x n by using a neural network; and then extracting the index of the largest element in the text characteristic vector, and analyzing the index of the largest element in the text characteristic vector into the position of the first character of the key information in the article and the position of the last character of the key information in the article through the one-to-one correspondence relationship between the Cartesian combination of the position s of the first character of the key information in the article and the position e of the last character of the key information in the article and each element of the text characteristic vector, thereby extracting the key information.
The tenderers in the bidding documents are relatively critical information, and the corresponding tenderers are generally available on the existing bidding documents. The following describes in detail how to implement the present invention, taking as an example how to extract key information of a bidder from a bidding document. As shown in fig. 1, in this embodiment, the key information extraction method includes the following steps:
step 1, data cleaning.
And cleaning the data, and removing the repeated data and the abnormal data.
And 2, generating a label vector.
The tenderer may appear in the bidding document many times, and the embodiment uses a tag vector generated by the position information of the tenderer in the article as a tag instead of directly using the tenderer as a tag.
Specifically, assuming that the length of an article (also called text length) is n, the first word of the key information of the tenderer is s in the bidding document, the last word is e in the bidding document, and (s n + e) is taken as an element of a label vector, a label vector with dimensions of n × n is initialized for each article (bidding document), and all elements of the label vector are initialized to 0; and for each element of the label vector, if the corresponding combination of s and e exists, resetting the element of the label vector to 1, namely resetting the element of the label vector at the position of s × n + e to 1 to obtain a final label vector for the calculation of a subsequent loss function.
Because the number of elements of the label vector corresponding to different bidding documents may be different, which is not beneficial to the implementation of the model, the number of elements of the label vector corresponding to different bidding documents is preferably equal, and the invention considers that the elements of each label vector should be given equal status. Therefore, the elements of the initialized label vector are processed out of order (in a disordered order), are copied for a certain number of times and then are spliced together end to obtain an element string, and then a plurality of front elements in the element string are fixedly intercepted to form a final label vector as a label.
And step 3, text quantization.
In the embodiment, a Google Kaiyuan Chinese bert model is selected as a tensor quantization mode, and results are serialized. Specifically, given an article, the article is converted into a vector consisting of 512 words id according to a preprocessing method of a google Chinese bert model, and then the vector is used as an input of the google Chinese bert model, so that a text tensor C (the specific shape of the tensor C is [512,768]) is obtained. And then serializing the text tensor C to facilitate subsequent reutilization.
It should be noted that the Chinese bert model in Google is only a text quantization method, and other text quantization methods are available, such as a bert model trained from the beginning with business data, and a fastText model which is more cost-effective.
And 4, generating a text feature vector.
Assuming that the shape of the text tensor C is C [ n, d ] (n is the text length, i.e., the article length, and d is the dimension of a word), a query vector S is randomly initialized, the shape of which is [ d ], and the value of CS (CS ═ C × S) is used as the first character feature vector. Similarly, a query vector E is randomly initialized to have the shape of [ d ], and CE (CE ═ C × E) is used as the final word feature vector. Taking the Cartesian product of the first character feature vector CS and the last character feature vector CE as a text feature vector, wherein the dimension of the text feature vector is n x n.
A vector is a special case of a tensor, a shape is an attribute of a tensor, e.g. the shape of the vector [1,2,3] is [3], and the shape of the tensor [1,2], [3,4] is [2,2 ]. The shape of the text tensor C is [ n, d ], the shape of the query vector S is [ d ], the shape of the first character feature vector CS is [ n ], the dimension of the first character feature vector CS and the number of words are equal, and each element of the first character feature vector CS represents the probability that the word at the corresponding position is the first character; accordingly, each element of the last word feature vector CE represents the probability that the word at the corresponding position is the last word.
And 5, replacing the obviously impossible largest elements in the text feature vector with minimum values.
Since the first word and the last word of the key information are not too far apart, the method for judging that the first word and the last word of the key information are obviously not the largest among all elements of the feature vector may be: if s and e corresponding to a certain element in the text feature vector have s > -e, the element obviously cannot be the largest; or, if s and e corresponding to a certain element in the text feature vector have a value of e-s larger than a certain set threshold value x, the certain element is obviously unlikely to be the largest. For example, if the element has an e-s value greater than 40, then the element is clearly unlikely to be the largest.
According to the priori knowledge, if the element in the text feature vector with the dimension n x n is obviously not the largest element in the text feature vector, the element is reset to be a number which is extremely small compared with the value of the element in the text feature vector with the dimension n x n, namely a minimum value is given.
In this embodiment, the positions l, s, and e of the elements in the text feature vector have the following relationships: s + e ═ l; so s is l// n (//denotes the division of two numbers, rounded down) and e is l% n (% is the remainder). So replace the element of l% n < (s)/n with-1000 to achieve the first rule constraint, s must be less than e; elements of l% n-l// nd <40 are replaced with-1000 to implement the second rule constraint, e minus s cannot exceed a certain value.
Step 6, sharing parameters: multiplying all elements of the text feature vector which are not obviously possibly the largest by a weight to generate a new output vector.
The weights (parameters) multiplied by the elements that are not obviously likely to be the largest in the text feature vector with dimension n x n are shared according to the relation of s and e. All elements in the text feature vector that have the same difference between e and s, but for which it is not obviously possible to maximize the elements, are multiplied by the same (i.e., shared) weight (parameter), thereby generating a new output vector. That is, for each element which is not obviously possible to be the largest in the text feature vector with the dimension of n × n, when the difference between e and s corresponding to the elements is the same, each element with the same difference between e and s is multiplied by the same weight parameter, and a new n × n text feature vector is generated as an output vector.
Specifically, for the obviously unlikely largest element in the feature vector, its weight is 1; for other elements, the weights are trainable parameters, and for all elements with the same value of l% n-l// nd, their corresponding weights are set to be shared (i.e., multiplied by the same parameter).
Because the processing of step 5 and step 6, the obviously unlikely largest element in the output vector has been assigned the minimum value, i.e., -1000, it is basically guaranteed that these obviously unlikely largest elements will not be selected, and their corresponding fields will not be selected, which is the intuitive effect of the constraint.
And 7, calculating a loss function. And (4) performing softmax operation on the new output vector generated in the step (6), generating the new output vector again, calculating the cross entropy of the generated new output vector and the label vector again to be used as loss, and participating in iterative training of the neural network.
In the invention, the position information of the keyword in the article is used as the label, so the label vector reflects the position information of the keyword in the article. In this embodiment, for each training sample, all keywords to be extracted in an article are regularly matched, and then position information, i.e., a position s and a position e, of the regularly matched keywords is analyzed, and then the analyzed position information is used as a label of the corresponding article to participate in iterative training of a neural network. Because the iterative training of the neural network fully utilizes the label vector reflecting the position information of the keyword, and the neural network can utilize prior information by shielding certain elements in the characteristic vector and adopting a parameter sharing mode, the effective dimensionality of the output vector of the neural network can be effectively reduced, and the overfitting phenomenon of a neural network model can be effectively relieved.
And 8, iterative convergence, namely minimizing a loss function by using a gradient descent method, iterating until convergence, and storing the neural network model.
The neural network calculates gradients of the variables (e.g., query vector S, query vector E) based on the loss, updates the query vector S, E based on the calculated gradients of the variables, and so on. The iterative process may optimize query vector S, E so that query vector S, E can correctly determine which word should be given a high score (i.e., which word has a high probability of being the first or last word). When the loss is small enough, the output vector and the label vector are close enough, the neural network model is considered to have the prediction capability, and the iteration is stopped.
And 9, predicting, namely inputting text data into the neural network model stored in the step 8 to obtain a final output vector and obtain the extracted key information.
The method specifically comprises the following steps: m elements of the output vector are found, the m elements being equal to or greater than any element of the output vector other than the m elements. From the foregoing, the positions of the elements in the vector are s × n + e, and s and e corresponding to the m elements are calculated through the one-to-one correspondence between the positions of the m elements in the output vector and the combination of s and e; then, the literal fields corresponding to the m elements are solved through s and e corresponding to the m elements; and adding corresponding elements of the same text segments to serve as a new corresponding element of the text segment, and selecting the text segment corresponding to the largest element in the new corresponding elements as a final output vector to obtain the extracted key information, namely the tenderer.
The invention relates to a key information extraction system based on a neural network, which comprises: a tag vector generation module for implementing the step 2; a text tensoriation module for implementing the step 3; a text feature vector generation module for implementing the step 4; the assignment module is used for realizing the step 5; a parameter sharing module for implementing the step 6; a loss function calculation module for implementing the step 7; an iteration convergence module for implementing the step 8; and the prediction module is used for realizing the step 9.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (7)

1. A key information extraction method based on a neural network is characterized by comprising the following steps:
s1, generating label vectors, setting the length of an article as n, the position of a first word of key information in the article as S, the position of a last word of key information in the article as e, taking S + n + e as elements of the label vectors, and initializing an n-dimensional label vector for each article; initializing all elements of the label vector to be 0, and resetting the elements of the label vector at the position of s x n + e to be 1 to obtain a final label vector;
s2, carrying out text tensor processing on a given article to obtain a text tensor C;
s3, generating a text feature vector, obtaining a first character feature vector CS and a last character feature vector CE according to the text tensor C, and taking a Cartesian product of the first character feature vector CS and the last character feature vector CE as the text feature vector;
s4, replacing the elements which are obviously impossible to be the largest in the text feature vector with minimum values;
s5, sharing parameters, multiplying the elements which are not obvious and can not be the largest in the text feature vector by a weight to generate a new output vector;
s6, calculating a loss function, calculating the cross entropy of the output vector and the label vector as loss, and performing iterative training on the neural network;
s7, minimizing a loss function by using a gradient descent method, iterating until convergence, and storing a neural network model;
s8, inputting text data into the stored neural network model to obtain a final output vector and obtain the extracted key information;
in step S4, the element that is obviously unlikely to be the largest in the text feature vector is determined by the following method:
if s and e corresponding to a certain element in the text feature vector have s > -e, the element obviously cannot be the largest; or if s and e corresponding to a certain element in the text feature vector have the value of e-s larger than the set threshold value x, the element is obviously impossible to be the largest;
in step S5, for each element that is not obviously the largest possible in the text feature vector with dimension n × n, when the differences between e and S corresponding to the elements are the same, each element with the same difference between e and S is multiplied by the same weight parameter, and a new n × n text feature vector is generated as an output vector.
2. The method for extracting key information of claim 1, wherein in step S1, the elements of the initialized label vector are processed out of order, copied for several times and spliced together end to obtain an element string, and then the previous elements in the element string are fixedly intercepted to form a final label vector.
3. The method of claim 1, wherein in step S3, assuming that the shape of the text tensor C is C [ n, d ], where d is the dimension of a word, a query vector S is randomly initialized, and the shape is [ d ], and the value of CS ═ C ═ S is used as an initial feature vector; randomly initializing a query vector E, wherein the shape of the query vector E is [ d ], and the value of CE ═ C × E is used as a final character feature vector; taking the Cartesian product of the first character feature vector CS and the last character feature vector CE as a text feature vector, wherein the dimension of the text feature vector is n x n.
4. The method of extracting key information according to claim 1, wherein step S8 includes:
obtaining m elements of the output vector, wherein the m elements are more than or equal to any element except the m elements in the output vector;
calculating to obtain s and e corresponding to the m elements according to the one-to-one correspondence between the positions of the m elements in the output vector and the combination of the s and the e;
determining text fields corresponding to the m elements according to s and e corresponding to the m elements; and adding corresponding elements of the same text segments to serve as a new corresponding element of the text segment, and selecting the text segment corresponding to the largest element in the new corresponding elements as a final output vector.
5. A key information extraction system based on a neural network is characterized by comprising:
the tag vector generation module is used for initializing a n-dimensional tag vector for each article by taking the length of the article as n, the position of the first character of the key information in the article as s and the position of the last character of the key information in the article as e and taking s + n + e as an element of the tag vector; initializing all elements of the label vector to be 0, and resetting the elements of the label vector at the position of s x n + e to be 1 to obtain a final label vector;
the text tensor module is used for performing text tensor processing on a given article to obtain a text tensor C;
the text feature vector generation module is used for obtaining a first character feature vector CS and a last character feature vector CE according to the text tensor C, and taking a Cartesian product of the first character feature vector CS and the last character feature vector CE as a text feature vector;
the assignment module is used for replacing obviously elements which cannot be the largest in the text feature vector with minimum values;
the parameter sharing module is used for multiplying the elements which are not obvious and can not be the largest in the text feature vector by a weight to generate a new output vector;
the loss function calculation module is used for calculating the cross entropy of the output vector and the label vector as loss and performing iterative training on the neural network;
the iterative convergence module is used for minimizing the loss function through a gradient descent method, iterating until convergence, and storing the neural network model;
the prediction module is used for inputting text data into the stored neural network model to obtain a final output vector and obtain the extracted key information;
in the assignment module, elements obviously impossible to be the largest in the text feature vectors are judged, and the judgment method comprises the following steps:
if s and e corresponding to a certain element in the text feature vector have s > -e, the element obviously cannot be the largest; or, if s and e corresponding to a certain element in the text feature vector have the value of e-s larger than the set threshold value x, the element is obviously unlikely to be the largest;
in the parameter sharing module, for each element which is not obviously possible to be the largest in the text feature vector with the dimension of n x n, when the difference between e and s corresponding to each element is the same, each element with the same difference between e and s is multiplied by the same weight parameter, and a new n x n text feature vector is generated to serve as an output vector.
6. The key information extraction system of claim 5, wherein the tag vector generation module performs out-of-order processing on the elements of the initialized tag vector, copies the elements for a plurality of times, splices the elements together end to obtain an element string, and then fixedly intercepts a plurality of previous elements in the element string to form a final tag vector.
7. The key information extraction system of claim 5, wherein the process of obtaining the extracted key information by the prediction module comprises:
obtaining m elements of the output vector, wherein the m elements are more than or equal to any element except the m elements in the output vector;
calculating to obtain s and e corresponding to the m elements according to the one-to-one correspondence between the positions of the m elements in the output vector and the combination of the s and the e;
determining text fields corresponding to the m elements according to s and e corresponding to the m elements; and adding corresponding elements of the same text segments to serve as a new corresponding element of the text segment, and selecting the text segment corresponding to the largest element in the new corresponding elements as a final output vector.
CN201911138210.6A 2019-11-20 2019-11-20 Neural network-based key information extraction method and system Active CN111008271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911138210.6A CN111008271B (en) 2019-11-20 2019-11-20 Neural network-based key information extraction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911138210.6A CN111008271B (en) 2019-11-20 2019-11-20 Neural network-based key information extraction method and system

Publications (2)

Publication Number Publication Date
CN111008271A CN111008271A (en) 2020-04-14
CN111008271B true CN111008271B (en) 2022-06-24

Family

ID=70113762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911138210.6A Active CN111008271B (en) 2019-11-20 2019-11-20 Neural network-based key information extraction method and system

Country Status (1)

Country Link
CN (1) CN111008271B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832289A (en) * 2017-10-12 2018-03-23 北京知道未来信息技术有限公司 A kind of name entity recognition method based on LSTM CNN
CN108733792A (en) * 2018-05-14 2018-11-02 北京大学深圳研究生院 A kind of entity relation extraction method
WO2018218705A1 (en) * 2017-05-27 2018-12-06 中国矿业大学 Method for recognizing network text named entity based on neural network probability disambiguation
CN110196980A (en) * 2019-06-05 2019-09-03 北京邮电大学 A kind of field migration based on convolutional network in Chinese word segmentation task
CN110263325A (en) * 2019-05-17 2019-09-20 交通银行股份有限公司太平洋信用卡中心 Chinese automatic word-cut

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018218705A1 (en) * 2017-05-27 2018-12-06 中国矿业大学 Method for recognizing network text named entity based on neural network probability disambiguation
CN107832289A (en) * 2017-10-12 2018-03-23 北京知道未来信息技术有限公司 A kind of name entity recognition method based on LSTM CNN
CN108733792A (en) * 2018-05-14 2018-11-02 北京大学深圳研究生院 A kind of entity relation extraction method
CN110263325A (en) * 2019-05-17 2019-09-20 交通银行股份有限公司太平洋信用卡中心 Chinese automatic word-cut
CN110196980A (en) * 2019-06-05 2019-09-03 北京邮电大学 A kind of field migration based on convolutional network in Chinese word segmentation task

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks;Daojian Zeng 等;《Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing》;20150921;第1753-1762页 *
基于深度学习的短文本分类及信息抽取研究;李超;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑(月刊)》;20171215(第12期);第I138-471页 *

Also Published As

Publication number Publication date
CN111008271A (en) 2020-04-14

Similar Documents

Publication Publication Date Title
US20200401939A1 (en) Systems and methods for preparing data for use by machine learning algorithms
US11151450B2 (en) System and method for generating explainable latent features of machine learning models
CN111368996B (en) Retraining projection network capable of transmitting natural language representation
CN111079532B (en) Video content description method based on text self-encoder
Ahmed et al. Star-caps: Capsule networks with straight-through attentive routing
Li et al. Modified discrete grey wolf optimizer algorithm for multilevel image thresholding
CN112567355A (en) End-to-end structure-aware convolutional network for knowledge base completion
CN112380319B (en) Model training method and related device
Chen et al. Transfer learning for sequence labeling using source model and target data
CN112368697A (en) System and method for evaluating a loss function or a gradient of a loss function via dual decomposition
CN107579816B (en) Method for generating password dictionary based on recurrent neural network
KR102223382B1 (en) Method and apparatus for complementing knowledge based on multi-type entity
US20190266474A1 (en) Systems And Method For Character Sequence Recognition
CN113628059B (en) Associated user identification method and device based on multi-layer diagram attention network
CN111782804B (en) Text CNN-based co-distributed text data selection method, system and storage medium
CN116402352A (en) Enterprise risk prediction method and device, electronic equipment and medium
WO2021253938A1 (en) Neural network training method and apparatus, and video recognition method and apparatus
CN116226357B (en) Document retrieval method under input containing error information
CN111008271B (en) Neural network-based key information extraction method and system
Antonello et al. Selecting informative contexts improves language model finetuning
CN113554145B (en) Method, electronic device and computer program product for determining output of neural network
CN112131363B (en) Automatic question and answer method, device, equipment and storage medium
Gurunath et al. Insights Into Deep Steganography: A Study of Steganography Automation and Trends
CN113158577A (en) Discrete data characterization learning method and system based on hierarchical coupling relation
CN112685603A (en) Efficient retrieval of top-level similarity representations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant