CN106021227A

CN106021227A - State transition and neural network-based Chinese chunk parsing method

Info

Publication number: CN106021227A
Application number: CN201610324281.5A
Authority: CN
Inventors: 戴新宇; 程川; 陈家骏; 黄书剑; 张建兵
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2016-05-16
Filing date: 2016-05-16
Publication date: 2016-10-12
Anticipated expiration: 2036-05-16
Also published as: CN106021227B

Abstract

The invention proposes a state transition and neural network-based Chinese chunk parsing method. The method comprises the steps of converting a chunk parsing task into a serialized tagging task; tagging a sentence by using a state transition-based framework; scoring a transition operation to be carried out in each state by using a forward neural network in the tagging process; and taking a distributed representation characteristic of words and part-of-speech tagging learned by utilizing a two-way long short-term memory neural network model as an additional information characteristic of a tagging model, thereby improving the accuracy of chunk parsing. Compared with other Chinese chunk parsing technologies, the Chinese chunk parsing method has the advantages that characteristics of chunk levels can be more flexibly added by using the state transition-based framework, combination modes among the characteristics can be automatically learned by using the neural network, the useful additional information characteristic is introduced by utilizing the two-way long short-term memory neural network model, and the combination of the state transition-based framework, the neural network and the two-way long short-term memory neural network model effectively improves the accuracy of chunk parsing.

Description

chinese chunk analysis method based on state transition and neural network

Technical Field

The invention relates to a method for carrying out Chinese shallow syntax analysis by utilizing a computer, in particular to a method for carrying out automatic Chinese chunk analysis by utilizing a mode based on the combination of state transition and a neural network.

Background

The Chinese syntactic analysis is a basic task in Chinese information processing, and the wide application demand thereof attracts a great deal of relevant research so as to promote the rapid development of the relevant technology thereof. The complete syntactic analysis has low analysis accuracy and low speed due to factors such as high complexity of the problem itself, and the like, so that the practicability is limited. Chunk analysis, also called shallow syntactic analysis, is different from complete syntactic analysis aimed at obtaining a complete syntactic tree of a sentence, where the analysis is aimed at identifying certain relatively simple-structured, non-nested sentence components in the sentence, such as non-nested noun phrases, verb phrases, etc. The recognition target is non-nested and non-overlapping phrase components which are in accordance with certain grammatical rules in sentences, so that the complexity of a chunk analysis task is lower and the processing speed is higher compared with the complete syntactic analysis, and meanwhile, the recognition target is always paid attention by researchers as the recognition target can be used as a preprocessing stage of a plurality of tasks such as machine translation, complete syntactic analysis, information extraction and the like. Chunking analysis for chinese continues to be a relevant research with the advent of chinese tree libraries and the extraction of data sets from them by researchers for chunking analysis tasks.

In the way of modeling block analysis tasks, it is a common approach to consider them as serialized annotation tasks. The working process is as follows: for the sentence to be analyzed, each word is labeled (i.e. labeled) from left to right by taking the word as a unit, wherein one labeling mode is to label the word as a chunk start word with types (noun phrase, verb phrase, adjective phrase and the like), a single chunk word, and five types of chunk end words, chunk internal words and chunk external words without types. When the whole sentence is marked in this way, the complete chunk information is extracted from the sentence. The invention also considers the Chinese chunk analysis task as a serialization labeling task and adopts the five labeling modes when modeling the Chinese chunk analysis task.

A statistical-based method is widely applied to a chunking analysis task, and a classical model in structured learning is commonly used for processing the chunking analysis task, such as a hidden markov model, a conditional random field model, a support vector machine model based on dynamic programming, and the like. However, due to the model itself, the method has limited use of features at the chunk level, and has little influence on a chunk analysis task which takes the whole sentence as a processing object and needs to consider more global information. In order to alleviate the influence brought by the model, a state transition-based method is a choice, is used more in complete syntactic analysis, and has the characteristics of high efficiency and accuracy. The working process is as follows: for the sentence to be analyzed, the words are read in from left to right in sequence by taking the words as units, each read word is labeled, the labeling type refers to the labeling mode, the labeling operation is carried out each time, the state defined on the whole sentence is transferred (one state of the sentence records which words of the current sentence are labeled, the labeling type corresponding to each labeled word and which words are not labeled yet), and the selection of the specific labeling type is completed by the trained scoring model. When a word is labeled, the labeling types of all words on the left side of the word in a sentence are determined, so that the information of the labeled words can be fully utilized to guide the labeling of the current word, and particularly the related information of the chunk on the left side of the word which is identified as the chunk is utilized to guide the labeling. In order to make more use of the information characteristics at the chunk level, the present invention employs a state transition-based approach to chinese chunk analysis.

Neural networks are a common machine learning method, and have the ability to automatically learn a feature combination from some basic atomic features, which is different from the conventional method that requires a user to design a large number of task-related templates based on a priori knowledge such as linguistic correlations. Neural networks have been extensively tried in chinese information processing, but have not been used in chinese chunking analysis so far. The use of the neural network can save the work of manually customizing a large number of combined feature templates, and can automatically learn the combination of the features by means of the strong expression capability of the neural network. On the other hand, in the conventional chunk analysis technology, the information features used when labeling each word are all words or part-of-speech information in a certain fixed-size window based on the current word, but after analyzing chinese sentences in the tree library, it can be found that many information features useful for chunk analysis are often outside the window, for example, punctuation information such as "book", etc., text mode information such as "words, …", etc., which are spaced at intervals of a pause number, and such information is often wide in range and is not easily incorporated into the conventional chunk analysis technology. In order to fully utilize the information, the invention uses a bidirectional long and short memory neural network to calculate the word and part-of-speech sequence of the sentence, thereby capturing far-distance word and part-of-speech characteristics more.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the defects that the model used in the existing Chinese chunking analysis technology can not fully utilize the chunking level and the remote information characteristics and needs to manually customize a complex combined characteristic template, the invention provides a method based on state transition and a neural network to relieve the limitation in the aspect and improve the accuracy of Chinese chunking analysis.

In order to solve the technical problems, the invention discloses a Chinese chunk analysis method based on state transition and a neural network and additional description about a model parameter training method used in the analysis process.

The Chinese chunk analysis method based on state transition and neural network comprises the following steps:

step 1, a computer reads a Chinese text file containing a sentence to be analyzed, defines the type of a Chinese chunk, divides the sentence to be analyzed, labels the part of speech of each word, and determines the part of speech label type which can be selected according to the current sentence state when labeling the part of speech;

and 2, performing Chinese chunk analysis on the sentence to be analyzed by using a state transition and neural network-based method.

Wherein, step 1 includes the following steps:

step 1-1, defining Chinese chunk types by using 12 phrase types defined on the basis of Chinese version CTB (the Chinese Penn Treebank)4.0 (the Tree Bank is a labeled tree Bank of the university of Pennsylvania for Chinese corpus); the type of the chunk is selected by a user according to specific targets, and a traditional Chinese chunk analysis task generally has two specific phrase identification tasks: one is to identify only noun phrases, and the other is to identify 12 types of chunks defined on the basis of version CTB4.0 in the Bingzhou Tree library. In example 1, the second approach was taken and the specific meanings for these 12 phrase types are illustrated in table 1:

TABLE 1 description of Chinese chunk types

Type (B)	Means of	Examples of the present invention
			ADJP	Adjective phrases	developing/JJ Country/NN
ADVP	Adverb phrase	general/AD use/VV
			CLP	Category-type phrases	Hongkong yuan/M and/CC dollar/M
DNP	Multiple limiting phrases	/DEG of
			DP	Qualifier phrase	this/DT
DVP	Ground word phrase	equal/VA harmony/VA ground/DEV
			LCP	Phrase of orientation	Recent years/NT coming/LC
LST	Sequence phrases	(/ PU-CD)/PU
			NP	Noun phrases	Highway/NN project/NN
PP	Preposition phrase	and/P complete machine plant/NN
			QP	Quantitative word phrase	one/CD/M
VP	Verb phrases	permanent/AD full-on/VV

Wherein "NN" in "country/NN" is a part of speech corresponding to the word, "NN" represents a noun, "VV" represents a verb, and the like.

And step 1-2, determining the tagging type which can be selected when the part of speech tagging is carried out on each word to be tagged in the tagging process by adopting a mode of combining a BIOES tagging system with the Chinese chunk type defined in the step 1-1. After modeling the chunk analysis task into a serialized annotation task, what annotation system needs to be adopted is determined. In the english chunk analysis task, the adopted labeling system generally includes two types, i.e., BIO and biees, that is, each word in a sentence is labeled with a combination of a chunk type and BIO or biees. In the BIO labeling mode, B represents the beginning of a chunk, I represents the inside of the chunk, and O represents other positions except the chunk; in the biees notation, B denotes the start of a chunk, I denotes the inside of a chunk, E denotes the end of a chunk, O denotes other locations than chunks, and S denotes a word to form chunks individually. The meaning of the biees notation system is illustrated below using a labeled sentence as an example. First, a sentence is given that has been chunked by chunk:

[ NP Shanghai Pudong ] [ NP development and construction by method ] [ VP synchronization ] [. ]

NP indicates that the chunk is a noun phrase, VP indicates that the chunk is a verb phrase, ". "indicates that the word does not belong to any one chunk. The sentence is labeled by a BIOES labeling system in the following form:

shanghai _ B-NP Pudong _ E-NP develops _ B-NP and makes _ I-NP make _ I-NP build _ E-NP synchronous _ S-VP. O to be noted, the inventionThe notations in the specification will be made according to the BIOES system. Further, the combination of the chunk type and the BIOES is not a complete combination between the two, only B and S are fully combined with the chunk type, i.e., it is assumed that the chunk type shares type₁,type₂,…,type_kK in total, then they, when combined with B and S, have B-type₁，B-type₂，…，B-type_k，S-type₁，S-type₂，…，S-type_kThe total number of the types is 2k, and I, O, S are added, so that the number of the types marked in 2k +3 is total, and the number of the types marked in k is 12, so that the number of the types is 27. The above example sentences are marked in this way as follows:

shanghai _ B-NP Pudong _ E development _ B-NP and I method I construction _ E synchronization _ S-VP. O

In addition, in the annotation process, the generation of the candidate annotation type of a certain word is also restricted by a certain rule, and the restriction in the invention is as follows:

1. the first word of the sentence cannot be I, E;

2. the type is B-type_xThe latter word of the word(s) cannot be B-type_y、O、S-type_y；

3. The word after the type I word cannot be B-type_y、O、S-type_y；

4. The word after the word of type O cannot be I, E;

5. a word of type E cannot be followed by I, E;

6. the type is S-type_xThe latter word cannot be I, E.

In the step 1, a computer reads a natural language text file containing a sentence to be analyzed, and when Chinese chunk analysis is performed, required input is performed to complete part-of-speech tagging of each word besides the fact that the sentence is already divided into words. For example, a complete sentence input is shown in table 2:

TABLE 2 complete sentence input to be analyzed

Word	Part-of-speech tagging
		France	NR
National defense	NN
		Length of the neck	NN
Lyotropic (r) cell	NR
		1 day	NT
Say that	VV
		，	PU
France	NR
		Is under way	AD
Study of	VV
		From	P
Wave black	NR
		Army withdrawal device	VV
Is/are as follows	DEC
		Plan for	NN
。	PU

And 2, performing block analysis on each read sentence by using a state transition and neural network-based method. In the serialization labeling method based on the state transition, for each sentence, the words are read in from left to right in sequence by taking the words as units, the reading of each word can cause one transition of the state of the current sentence, and one state of the sentence records which words of the current sentence are labeled, the labeling type corresponding to each labeled word and which words are not labeled. If the label for each word is unique, then after labeling each word in the sentence, a complete sequence of labels for the sentence is obtained, and the process can be simply described as: suppose a sentence length of n and an initial state of s₁Marking the t-th word as mark_tThe state after labeling the t-th word is s_t+1Then the whole process can be briefly described as The marking sequence corresponding to the whole sentence is mark₁,mark₂,…,mark_nThis way of labeling is called greedy search in the present invention. However, the labeling accuracy of the whole sentence obtained by the labeling mode is low, so the method adopts a column search method to finish the labeling of the whole sentence.

Before describing the method of column search in detail, a brief introduction to the full search is required: a complete search is different from a greedy search in that when labeling is performed on each word in the search process, only one labeling result is obtained, but a labeling result set (i.e., a state set) is obtained, and it is assumed that the state set in which a sentence is located before labeling the ith word is denoted as SⁱSo the state set of the sentence before labeling the first word of the sentence is S¹Wherein only one state is represented asThe candidate annotation type for the first word when annotated is defined by step 1-2, assuming for the set of states S¹When each state in the table is labeled with the current word, the number of the labeling modes which can be selected is k, and then the states are labeledState set S obtained after complete k kinds of labeling and expansion²In which there are k states, denoted as(the order is sorted by score height); similarly, when labeling the second word, the state set S will be labeled²Will have k expansion for each state in the set, the resulting new set of states will have k²A state, expressed asBy analogy, the expansion of the t-th word can obtain the completion of the whole sentenceFull annotation state collectionIf each expansion operation (i.e., what labeling was done for this) can remain in the new state after expansion, it is possible to draw from the state set Sⁿ⁺¹Go back from each state in (1), restore a complete annotation sequence for the sentence, wherein S isⁿ⁺¹The sequence of the state recovery with the highest score is the labeling result of the sentence by the method. Using this search method, the state set size will grow rapidly, which is not feasible in real-world operation, so the column search method is adopted in the present invention to reduce the state set after each expansion. Column search differs from full search in that: in the previous state set S^t-1When all the states in (1) are expanded, no matter how many states of the obtained new state set are, only m with the highest score (the selection of m is selected by a user according to specific tasks, generally, the larger m is, the higher the obtained labeling precision is, but the higher the overhead is, for example, m selected in embodiment 1 is 4) are reserved, so that the size of the new state set obtained after the state expansion operation for each word is completed can be ensured not to exceed m. From the state set S as a full searchⁿ⁺¹And the state with the highest median score is traced back forwards, and the labeling sequence of the sentence obtained by reduction is the labeling result of the sentence by the method. This column search approach is used in the present invention.

The length of the sentence to be analyzed is denoted by n throughout step 2, and step 2 comprises the following steps:

step 2-1, under a given state (one state records which words in the current sentence are labeled and labeled types thereof, and records which words are unlabeled words at the same time), scoring is carried out on all labeled types when the tth word is processed; the given state is that the front t-1 words of the sentence to be analyzed are labeled and the corresponding labeling types are known, the t-th to nth words are unlabeled words and the t-th word is the next word to be processed;

step 2-2, a set of states S is given^tFor each state in the set of states when processing the tth wordScoring all the label types according to the mode in the step 2-1, wherein the scoring is completed through calculation, each label type is endowed with a real numerical value, the real numerical value is called the score corresponding to the type, then candidate label types are generated according to the mode in the step 1-2, words are labeled according to each candidate label type so as to expand the state, m new states with the highest score are selected according to the column search mode, and a new state set S is obtained^t+1；

And 2-3, executing steps 2-1 and 2-2 on t equal to 1,2, …, n to obtain a final target state set Sⁿ⁺¹And extracting the state with the highest scoreAnd backtracking from the state to obtain a labeling sequence with the highest score, completing the type labeling of all the words at the moment, and restoring the labeling sequence with the highest score into a corresponding chunk analysis result, wherein the result is the analysis result of the current sentence.

The state transition operation for each word in the invention is a category labeling operation performed on the read-in word in a certain current sentence state. When labeling the t-th word, the previous state set S is given^tIn one of the states, the labeling type set capable of being labeled is defined by the step 1-2, the operation of scoring each label in the labeling set is completed by a forward neural network, and the process of scoring the labeling type capable of being labeled in the given state of the current word by using the neural network comprises two steps: firstly, generating characteristic information, namely generating neural network input; and secondly, scoring all candidate categories by utilizing a neural network. The step 2-1 specifically comprises the following steps:

step 2-1-1, generating a feature vector, wherein the feature vector comprises a basic information feature vector and an additional information feature vector;

and 2-1-2, calculating the feature vector input generated in the step 2-1-1 by using a forward neural network to obtain the scores of all candidate labeling types.

It is first noted that, in information processing, there are mainly two ways for the representation of each feature, one is a one-hot representation, and the other is a distributed representation. one-hot represents that a very long vector is used for representing a feature, the length of the vector is the size of a feature dictionary formed by all the features, the corresponding position of only the feature in the feature dictionary in the vector component is 1, and the rest are all 0; the distributed representation is that each feature is given a real-valued vector representing the feature, and the dimension of the vector is set according to the task requirement. It should be noted that these two representations are widely used in the art, and should be well known to those skilled in the art, and will not be described herein. The expression mode adopted by the invention is distributed expression, namely each feature is endowed with a real value vector with a certain dimension, and the dimension of the feature set in the embodiment 1 is 50. The generation of the part of input in the invention comprises two steps, namely the generation of the basic information characteristic and the generation of the additional information characteristic. All words in the sentence to be analyzed are sequentially denoted as w from left to right in the whole step 2-1-1₁,w₂,…,w_n，w_nRepresenting the nth word in the sentence to be analyzed, wherein the value of n is a natural number; parts of speech corresponding to all words in the sentence to be analyzed are sequentially represented as p from left to right₁,p₂,…,p_n，p_nRepresenting the part of speech corresponding to the nth word in the sentence to be analyzed; a feature vector corresponding to a feature is denoted as e (, and step 2-1-1 comprises the steps of:

and 2-1-1-1, generating a basic information characteristic vector. The basic information characteristic vector comprises a word in a certain window by taking the position of the current word to be labeled as a reference, a characteristic vector corresponding to the part-of-speech characteristic, and a characteristic vector corresponding to the class characteristic of the labeled word in the certain window by taking the position of the current word to be labeled as a reference,the specific process is as follows: the word feature vector in the basic information features comprises: characteristic vector e (w) corresponding to the second word counted leftwards by taking the current word to be processed as the center_-2) And a feature vector e (w) corresponding to the first word counted to the left by taking the current word to be processed as the center_-1) And the feature vector e (w) corresponding to the current word to be processed₀) And counting the characteristic vector e (w) corresponding to the first word to the right by taking the current word to be processed as the center₁) And a feature vector e (w) corresponding to the second word counted right with the current word to be processed as the center₂)；

The part-of-speech feature vector includes: characteristic vector e (p) corresponding to part of speech of second word counted to left by taking current word to be processed as center_-2) And a characteristic vector e (p) corresponding to the part of speech of the first word counted to the left by taking the current word to be processed as the center_-1) Characteristic vector e (p) corresponding to part of speech of current word to be processed₀) And a characteristic vector e (p) corresponding to part of speech of the first word counted rightwards by taking the current word to be processed as the center₁) And a characteristic vector e (p) corresponding to the part of speech of the second word counted rightwards by taking the current word to be processed as the center₂) And a feature vector e (p) corresponding to the part-of-speech combination of the first word and the second word counted to the left by taking the current word to be processed as the center_-2p_-1) And a feature vector e (p) corresponding to the part-of-speech combination of the first word counted leftwards by taking the current word to be processed as the center and the current word to be processed_-1p₀) And a feature vector e (p) corresponding to the part-of-speech combination of the first word counted rightwards by taking the current word to be processed as the center and the current word to be processed₀p₁) Counting the feature vector e (p) corresponding to the part-of-speech combination of the second word and the first word to the right by taking the current word to be processed as the center₁p₂)；

In the block analysis task, the basic features used for scoring each annotation type in each step generally include word and part-of-speech features in a certain window with the position of the current word to be annotated as a reference, and category features to which the annotated word in the certain window with the position of the current word to be annotated as a reference belongs. Generally, the current word is denoted as w₀The ith word on the left is denoted w_-iRight ith wordIs denoted as w_i(ii) a The part of speech of the current word is denoted as p₀The part of speech of the ith word on the left is denoted as p_-iThe part of speech of the ith word on the right is denoted as p_i(ii) a The category characteristics of the labeled words are different from those of the former two words, because all words and part-of-speech information of the whole sentence are known from the beginning of analysis, the window is generally expanded to two sides by taking the current word as a reference, and because the labeling process is from left to right, when one word to be labeled is labeled, only the labeling type of the word on the left of the current word is known, so that the word to be labeled can be expanded to the left by taking the current word as a reference, and the labeled type of the ith word on the left of the current word is labeled as t_-i. The selection of i varies according to the size of the selected window, i is selected as 2 in example 1 (i.e. the window size is 5), and the corresponding basic features are shown in table 3, table 4 and table 5:

TABLE 3 basic word characteristics

TABLE 4 basic part-of-speech characteristics

TABLE 5 class characteristics to which words belong

It should be noted that the above-mentioned features based on words and parts of speech are well known to those skilled in the art and are widely used, so that no further description is made here, and reference may be made specifically to the following references: chen W, Zhang Y, Isahara H. an empirical study of Chinese bathing [ C ]// Proceedings of the coling/ACLon Main conference site sessions, Association for computational Linear regulations, 2006:97-104.

The category characteristics of the labeled words have the same meaning as those of the traditional models such as hidden Markov models, conditional random fields and the like, but the use modes are different: in the invention, the characteristic is treated as the same characteristic as the word and the part-of-speech characteristic, and the traditional model is treated by using a dynamic programming mode, compared with the traditional model in which the increase of i brings about rapid increase of time overhead, the mode based on state transition in the invention has little time overhead increase when i is increased, which is also an advantage of the mode based on state transition in the speed when the characteristic is blended;

step 2-1-1-2, generating an additional information feature vector: the additional information characteristic vector comprises a word characteristic vector and a part-of-speech characteristic vector which are related to the marked chunks in a certain window by taking the position of the current word to be marked as a reference, and the word characteristic vector and the part-of-speech characteristic vector of the current position to be marked, which are calculated by using a bidirectional long and short memory neural network model.

Step 2-1-1-2 comprises the following steps:

step 2-1-1-2-1, counting the second chunk to the left with the current word to be processed as the center, and respectively representing the first chunk as c_-2、c_-1Block c_iIs denoted as start word (c)_i) The last word is denoted end word (c)_i) I-2, -1, the grammatical core word denoted head _ word (c)_i) Block c_iThe part of speech of the first word of (a) is denoted as start _ POS (c)_i) The part of speech of the last word is denoted end _ POS (c)_i) Part of speech of the grammatical core word is denoted as head _ POS (c)_i) Generating word characteristic vectors and part-of-speech characteristic vectors which are related to the marked chunks in a certain window by taking the position of the current word to be marked as a reference: the chunk-level word feature vector includes: the feature vector e (start word (c)) of the first word of the second chunk counting to the left with the current word to be processed as the center_-2) End word (c) of the last word of the second chunk to the left centered on the current word to be processed_-2) In the current word to be processed is taken asFeature vector e (head _ word (c)) of a grammatical headword of the second chunk, heart to left_-2) Starting word (c), the feature vector e of the first word of the first chunk counting to the left with the current word to be processed as the center (start word (c))_-1) End word (c) of the last word of the first chunk counted to the left centering on the current word to be processed_-1) A feature vector e (head _ word (c)) of a grammatical core word of the first chunk counting to the left with the current word to be processed as the center_-1))；

The part-of-speech feature vectors at the chunk level include: feature vector e of part of speech of first word of second chunk counting to left with current word to be processed as center (start _ POS (c)_-2) Characteristic vector of part of speech of the last word of the second chunk counting to the left centering on the current word to be processed (end _ POS (c))_-2) Characteristic vector e (head _ POS (c)) of part of speech of grammatical core word of second chunk counting leftwards with current word to be processed as center_-2) Characteristic vector e (start _ POS (c)) of part of speech of the first word of the first chunk counting left with the current word to be processed as the center (c)_-1) And a part-of-speech feature vector e (end _ POS) (c) of the last word of the first chunk counted to the left with the current word to be processed as the center_-1) Characteristic vector e (head _ POS (c)) of part of speech of grammatical core word of the first chunk counting to the left with current word to be processed as center_-1) ); the selection of i varies according to the size of the selected window, i is selected as 2 in embodiment 1, and the characteristics of the corresponding chunk level are shown in table 6:

TABLE 6 Block level word and part-of-speech features

It should be noted that the above features at the chunk level are not used as in the present invention because they are limited by the markov assumption under the conventional model such as conditional random field, but are used in a complex dynamic programming algorithm after pruning, and specifically, the following documents can be referred to: zhou J, Qu W, Zhang F. Exploiting chunk-level defects to improved graphics chunk [ C ]// Proceedings of the2012Joint Conference on Empirical Methods in Natural Language Processing and Natural Language learning for comparative linearity, 2012:557 567.

Step 2-1-1-2-2, calculating and generating word and part of speech information feature vectors of the current position to be labeled by using a bidirectional long and short memory neural network model: the input of the bidirectional long and short memory neural network model is all words in a sentence to be analyzed and parts of speech corresponding to all words in the sentence to be analyzed, and the output is a forward word feature vector, a forward part of speech feature vector, a backward word feature vector and a backward part of speech feature vector. Firstly, it is to be noted that tanh used in the following formula is a hyperbolic function, which is a real-valued function, and acts on a vector to represent that this operation is performed on each element in the vector, so as to obtain a target vector with the same dimension as the input vector; sigma is a sigmod function, which is a real-valued function, and acts on a vector to represent that each element in the vector is subjected to the operation, so that a target vector with the same dimension as the input vector is obtained; an all-digital multiplication operation is a point multiplication operation, i.e., two vectors with the same dimension are multiplied by bit to obtain a result vector with the same dimension. The calculation of these four feature vectors is as follows:

the forward word feature vector is sequentially represented as h^f(w₁),h^f(w₂),…,h^f(w_n)，h^f(w_t) (t ═ 1, …, n) represents the t-th forward word feature vector, which is calculated as follows:

f_{t}^{w f} = σ (W_{f h}^{w f} h^{f} (w_{t - 1}) + W_{f x}^{w f} e (w_{t}) + W_{f c}^{w f} c_{t - 1}^{w f} + b_{f}^{w f}),

i_{t}^{w f} = σ (W_{i h}^{w f} h^{f} (w_{t - 1}) + W_{i x}^{w f} e (w_{t}) + W_{i c}^{w f} c_{t - 1}^{w f} + b_{i}^{w f}),

o_{t}^{w f} = σ (W_{o h}^{w f} h^{f} (w_{t - 1}) + W_{o x}^{w f} e (w_{t}) + W_{o c}^{w f} c_{t}^{w f} + b_{o}^{w f}),

wherein, the method is a well-trained model parameter matrix (the training process is completed in a mode in an additional description of a model parameter training method in a specification), the value of each element in the matrix is a real numerical value, the group of parameters is irrelevant to t, namely all calculation units in a calculation sequence share the same group of parameters;

the intermediate calculation results in the t-th calculation unit are all real value vectors;

e(w_t)、h^f(w_t-1)、is the input of the t-th computing unit, which is a real-valued vector, e (w) of which_t) I.e. the word w_tA corresponding feature vector; h is^f(w_t)、Is the output of the t-th computing unit,only h is finally used as the characteristic vector of the forward word for the auxiliary calculation result of the long and short memory neural network model^f(w_t-1) Since this is a serialized computational model, the output h of the t-1 st computational unit^f(w_t-1)、The input is the input of the t calculating unit;

etc. are all matrix multiplication operations.

The forward part-of-speech feature vector is sequentially represented as h^f(p₂),…,h^f(p_n)，h^f(p_t) (t ═ 1, …, n) represents the t-th forward part-of-speech feature vector, which is calculated as follows:

f_{t}^{p f} = σ (W_{f h}^{p f} h^{f} (p_{t - 1}) + W_{f x}^{p f} e (p_{t}) + W_{f c}^{p f} c_{t - 1}^{p f} + b_{f}^{p f}),

i_{t}^{p f} = σ (W_{i h}^{p f} h^{f} (p_{t - 1}) + W_{i x}^{p f} e (p_{t}) + W_{i c}^{p f} c_{t - 1}^{p f} + b_{i}^{p f}),

o_{t}^{p f} = σ (W_{o h}^{p f} h^{f} (p_{t - 1}) + W_{o x}^{p f} e (p_{t}) + W_{o c}^{p f} c_{t}^{p f} + b_{o}^{p f}),

e(p_t)、h^f(p_t-1)、is the input of the t-th computing unit, which is a real-valued vector, e (p) of which_t) I.e. part of speech p_tA corresponding feature vector; h is^f(p_t)、Is the output of the t-th computing unit,only h is finally used as the characteristic vector of the forward word for the auxiliary calculation result of the long and short memory neural network model^f(p_t-1) Since this is a serialized computational model, the output h of the t-1 st computational unit^f(p_t-1)、The input is the input of the t calculating unit;

etc. are all matrix multiplication operations.

The backward word feature vector is sequentially represented as h^b(w₁),h^b(w₂),…,h^b(w_n)，h^b(w_t) (t 1, …, n) represents the t-th backward word feature vector, which is calculated as follows:

f_{t}^{w b} = σ (W_{f h}^{w b} h^{b} (w_{t + 1}) + W_{f x}^{w b} e (w_{t}) + W_{f c}^{w b} c_{t + 1}^{w b} + b_{f}^{w b}),

i_{t}^{w b} = σ (W_{i h}^{w b} h^{b} (w_{t + 1}) + W_{i x}^{w b} e (w_{t}) + W_{i c}^{w b} c_{t + 1}^{w b} + b_{i}^{w b}),

o_{t}^{w b} = σ (W_{o h}^{w b} h^{b} (w_{t + 1}) + W_{o x}^{w b} e (w_{t}) + W_{o c}^{w b} c_{t}^{w b} + b_{o}^{w b}),

e(w_t)、h^b(w_t+1)、is the input of the t-th computing unit, which is a real-valued vector, e (w) of which_t) I.e. the word w_tA corresponding feature vector; h is^b(w_t)、Is the output of the t-th computing unit,only h is finally used as the characteristic vector of the forward word for the auxiliary calculation result of the long and short memory neural network model^b(w_t-1) Since this is a serialized computational model,output h of t +1 th calculation unit^b(w_t-1)、The input is the input of the t calculating unit;

etc. are all matrix multiplication operations.

The backward part-of-speech feature vector is sequentially represented as h^b(p₁),h^b(p₂),…,h^b(p_n)，h^b(p_t) (t 1, …, n) represents the t-th backward part-of-speech feature vector, which is calculated as follows:

f_{t}^{p b} = σ (W_{f h}^{p b} h^{b} (p_{t + 1}) + W_{f x}^{p b} e (p_{t}) + W_{f c}^{p b} c_{t + 1}^{p b} + b_{f}^{p b}),

i_{t}^{p b} = σ (W_{i h}^{p b} h^{b} (p_{t + 1}) + W_{i x}^{p b} e (p_{t}) + W_{i c}^{p b} c_{t + 1}^{p b} + b_{i}^{p b}),

o_{t}^{p b} = σ (W_{o h}^{p b} h^{b} (p_{t + 1}) + W_{o x}^{p b} e (p_{t}) + W_{o c}^{p b} c_{t}^{p b} + b_{o}^{p b}),

e(p_t)、h^b(p_t+1)、is the input of the t-th computing unit, which is a real-valued vector, e (p) of which_t) I.e. part of speech p_tA corresponding feature vector; h is^b(p_t)、Is the output of the t-th computing unit,only h is finally used as the characteristic vector of the forward word for the auxiliary calculation result of the long and short memory neural network model^b(p_t+1) Since this is a serialized computational model, the output h of the t +1 th computational unit^b(p_t+1)、I.e. the tth calculation sheetEntry of elements

Etc. are all matrix multiplication operations.

In order to fully utilize the mode information of word strings and part-of-speech strings which are farther away from the current word to be labeled in a sentence, the invention adopts a bidirectional long and short memory neural network model to calculate the word and part-of-speech information characteristics of the position of the current word to be labeled. The specific calculation process is divided into two steps of forward and backward, wherein the forward direction is from left to right, the backward direction is from right to left, and the calculation modes are consistent, so that only the forward calculation process is explained in detail here: first, assuming that the sentence length is n, the words in the sentence are sequentially represented as w from left to right₁,w₂,…,w_nThe corresponding feature vector is e (w) in sequence₁),e(w₂),…e(w_n) (ii) a Parts of speech in a sentence are sequentially denoted as p from left to right₁,p₂,…,p_nThe corresponding feature vector is e (p) in sequence₁),e(p₂),…e(p_n) (ii) a In addition, the forward word feature vectors obtained by calculation are sequentially expressed as h^f(w₁),h^f(w₂),…,h^f(w_n) Sequentially expressing the forward part-of-speech feature vectors obtained by calculation as h^f(p₁),h^f(p₂),…,h^f(p_n) (ii) a It should be noted that these vectors are all real-valued vectors that have been trained, and their dimensions are set by the user, such as w in example 1_tAnd p_tIs set to 50, h^f(w_t) And h^f(p_t) Is set to 25.

In the step 2-1-2, the forward neural network is used for calculating and obtaining scores of all the labeling types, after the step 2-1-1 is finished, a real value vector formed by splicing vectors corresponding to all the features in the step 2-1-1 is obtained, the dimension size of the real value vector is the sum of the dimensions of all the feature vectors, the vector is used as the input of the forward neural network, and the calculation process of the whole forward neural network is carried out according to the following formula:

h＝σ(W₁x+b₁)，

o＝W₂h，

wherein, W₁、b₁、W₂The method comprises the following steps of (1) obtaining a trained model parameter matrix, wherein the value of each element in the matrix is a real value; x is an input vector which is formed by splicing all the characteristic vectors obtained in the step 2-1-1, the dimensionality of the input vector is the sum of the dimensionalities of all the characteristic vectors generated in the step-1-1, and the value of each element is a real numerical value; h is a hidden layer vector of the neural network, is an intermediate calculation result unit, and is a vector, the dimension of which is defined in advance, for example, the dimension of which is 300 in embodiment 1; o is a calculation output, which is a real-valued vector, the dimension of which corresponds to the number of label types that can be selected when each word is labeled in the labeling process defined in the step 1-2, wherein the g-th value represents a score for labeling the current step as type g; w₁x、W₂h are all matrix multiplication operations.

Step 2-2 comprises the following steps:

and 2-2-1, giving each state in the previous state set, and scoring all the label types according to the mode in the step 2-1. Assumed state S_xScore of (S) is score (S)_x) Type of label_kScore of score (type)_k) If all the label types are expanded, K new target states are obtained after the expansion and are expressed asK is the total number of all the labeled types, and the corresponding score of the kth state is calculated according to the following formula

s c o r e (S_{i k}^{t + 1}) = s c o r e (S_{i}^{t}) + s c o r e ({type}_{k}),

Wherein K is 1-K, and all the scores are real numerical values. Determining candidate marking type according to the mode in step 1-2, and setting state according to the candidate marking typeExpansion is performed assuming a set of states S^tIf there are c (i) candidate label types determined by the state in step 1-2, c (i) new states are obtained after the state is expanded and are expressed as

Step 2-2-2, assume state set S^tHaving z states, where z is a natural number, assembling the states into a set S^tWherein all the states are expanded in the mode of the step 2-2-1, and all the expanded states are

Step 2-2-3, extracting m states with highest scores from all the expanded states obtained in the step 2-2-2 in a column search mode to form a new state set

Has the advantages that: compared with the widely used method based on Markov assumption, the method based on state transition used by the Chinese chunking analysis method can more flexibly add characteristics at a chunking level, simultaneously, a neural network model adopted when scoring the candidate transition types of each state can automatically learn a combination mode among the characteristics, and in addition, the utilization of the bidirectional long-short memory neural network model introduces useful additional information characteristics, and the three are combined to improve the accuracy of the Chinese chunking analysis.

Drawings

The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

Fig. 1 is a schematic diagram of a long-short memory neural network computing unit.

FIG. 2 is a schematic diagram of a network structure of a forward long-short memory neural network computation sequence.

Fig. 3 is a schematic diagram of a forward neural network structure.

Fig. 4 is a flow chart of the present invention.

Detailed Description

The invention provides a Chinese chunk analysis method based on state transition and a neural network. When each word in a sentence is labeled with a chunk type, relevant information characteristics are constructed according to existing information, then a neural network is used for scoring all candidate categories, and then state transition operation is executed. In the existing Chinese chunk analysis technology, due to the assumption of the model, the use of remote features is not sufficient enough, and a complicated feature template is required to be manually designed.

As shown in fig. 4, the present invention discloses a chinese chunking analysis method based on state transition and neural network, which can flexibly add chunking-level features, automatically learn a combination mode between features by using a neural network model, and introduce useful additional information features by using a bidirectional long and short memory neural network model, thereby improving the accuracy of chinese chunking analysis.

The complete Chinese chunk analysis process based on state transition and neural network comprises the following steps:

and 2, performing block analysis on each read sentence by using a state transition and neural network-based method.

The method for defining the Chinese chunk type and the annotation type comprises the following steps:

step 1-1, the chunk type to be analyzed is defined. The type of the chunk is selected by a user according to specific targets, and a traditional Chinese chunk analysis task generally has two specific phrase identification tasks: firstly, only noun phrases are identified, and secondly, 12 types of chunks defined on the basis of the CTB4.0 version in the Bingzhou tree library are identified;

step 1-2, determining the selected labeling type when labeling each word in the labeling process. Each word in the sentence is labeled with a combination of chunk type and BIO or biees.

Firstly, assuming that the length of a sentence to be processed is n, defining a state of the sentence, recording which words of the current sentence are labeled, a labeling type corresponding to each labeled word and which words are not labeled, and representing a state set of the sentence before labeling the ith word as SⁱWherein the state is represented asThe size of a column in the adopted column searching method is set as m, and the analysis process aiming at the sentence comprises the following steps:

step 3-1, in a given state, scoring all the label types when processing the tth word;

step 3-2, a set of states S is given^tFor each state in the set of states when processing the tth wordLabeling according to each candidate labeling type, expanding the states, selecting m new states with highest scores according to a column search mode, and obtaining a new state set S^t+1；

And 3-3, iteratively executing the steps 3-1 and 3-2 for t being 1,2, …, n to obtain a final target state set Sⁿ⁺¹And extracting the state with the highest scoreAnd backtracking to obtain the whole labeling sequence of the sentence.

Wherein the invention gives the previous state set S when processing the t word^tIn one state, the label type set capable of labeling is defined by the step 1-2, the operation of scoring each label in the label set is completed by a forward neural network, and the neural network is used for scoring the current wordThe process of scoring the label type which can be labeled in a given state comprises two steps: firstly, generating characteristic information, namely generating neural network input; secondly, scoring all candidate categories by using a neural network, wherein the step 3-1 specifically comprises the following steps:

step 3-1-1, generating a forward neural network input;

and 3-1-2, as shown in fig. 3, calculating the feature vector input generated in the step 3-1-1 by using a forward neural network to obtain the scores of all candidate label types.

The generation of the forward neural network input comprises two steps, namely generation of basic information characteristics and generation of additional information characteristics. The step 3-1-1 comprises the following steps:

and 3-1-1-1, generating basic information characteristics. The method comprises the word and part-of-speech characteristics in a certain window based on the position of the current word to be labeled, and the class characteristics of the labeled word in the certain window based on the position of the current word to be labeled, wherein the word characteristics are e (w)_-2)，e(w_-1)，e(w₀)，e(w₁)，e(w₂) They respectively represent the feature vectors corresponding to the second word and the first word counted leftwards by taking the current word to be processed as the center, the current word, and the first word and the second word counted rightwards by taking the current word as the center; the part of speech characteristic is e (p)_-2)，e(p_-1)，e(p₀)，e(p₁)，e(p₂)，e(p_-2p_-1)，e(p_-1p₀)，e(p₀p₁)，e(p₁p₂)，e(p_-2p_-1p₀)，e(p_-1p₀p₁)，e(p₀p₁p₂) They respectively represent the corresponding feature vectors of the part of speech of the second word and the first word counted to the left with the current word to be processed as the center, the part of speech of the current word, the part of speech of the first word and the second word counted to the right with the current word as the center, the part of speech combination of the second word and the first word counted to the left, the part of speech combination of the first word counted to the left and the current word, and the like. The feature vectors are all alreadyAnd training the real-valued vector.

Step 3-1-1-2, generating additional information characteristics, comprising the following two steps:

and 3-1-1-2-1, generating words and part-of-speech characteristics related to the marked chunks in a certain window by taking the position of the current word to be marked as a reference. The chunk level word is characterized by e (start word (c)_-2)),e(end_word(c_-2))，e(head_word(c_-2))，e(start_word(c_-1)，e(end_word(c_-1)),e(head_word(c_-1) Respectively representing a first word, a last word, a grammar headword, a first word, a last word and a grammar headword of a second chunk counted to the left by taking a current word to be processed as a center, and a first word, a last word and a grammar headword of a first chunk counted to the left by taking the current word as a center; part-of-speech at the chunk level is characterized by e (start _ POS (c)_-2)),(end_POS(c_-2))，e(head_POS(c_-2))，e(start_POS(c_-1)，e(end_POS(c_-1)),e(head_POS(c_-1) Respectively representing the part of speech of the first word of the second chunk counted to the left with the current word to be processed as the center, the part of speech of the last word, the part of speech of the central word of the grammar, the part of speech of the first word of the first chunk counted to the left with the current word as the center, the part of speech of the last word, and the part of speech of the central word of the grammar. The feature vectors are trained real-valued vectors;

and 3-1-1-2-2, generating word and part-of-speech information characteristics of the current position to be labeled, which are calculated by using the two-way long and short memory neural network model. The input of this step is all words in the sentence, denoted w from left to right in turn₁,w₂,…,w_n(ii) a And parts of speech corresponding to all words in the sentence, which are sequentially denoted as p from left to right₁,p₂,…,p_n. The output is a forward word feature vector which is sequentially expressed as h^f(w₁),h^f(w₂),…,h^f(w_n) (ii) a Forward part-of-speech feature vector, denoted in turn as h^f(p₁),h^f(p₂),…,h^f(p_n) (ii) a Backward word feature vector, denoted as h in turn^b(w₁),h^b(w₂),…,h^b(w_n) (ii) a Backward part-of-speech feature vector, sequentially denoted as h^b(p₁),h^b(p₂),…,h^b(p_n). Since the backward and forward comparisons are only the difference in the calculation direction, and the calculation method is the same, only the forward calculation process will be described in detail here, and for each h^f(x) (x may be w)_tOr p_t(t is 1,2, … n), except that the input and calculation parameters are different, the calculation mode is completely consistent, and it is abbreviated as h^f) The calculation is carried out according to the following formula:

f_t＝σ(W_fhh_t-1+W_fxx_t+W_fcc_t-1+b_f)，

i_t＝σ(W_ihh_t-1+W_ixx_t+W_icc_t-1+b_i)，

c_t＝f_t⊙c_t-1+i_t⊙tanh(W_chh_t-1+W_cxx_t+b_c)，

o_t＝σ(W_ohh_t-1+W_oxx_t+W_occ_t+b_o)，

h_t＝o_t⊙tanh(c_t)，

wherein, W_fh、W_fx、W_fc、b_f、W_ih、W_ix、W_ic、b_i、W_ch、W_cx、b_c、W_oh、W_ox、W_oc、b_oIs a well-trained model parameter matrix (the training process is realized by combining the analysis method in the invention with the correct label sequence in the maximum likelihood training data set), the value of each element in the matrix is a real value, it needs to be pointed out that the group of parameters is irrelevant to t, that is, all the calculation units in a calculation sequence share the same group of parameters, because all the calculation units in a calculation sequence share the same group of parametersThe forward and reverse calculation sequences of words and parts of speech are related in the invention, so 4 groups of parameters are shared; f. of_t、i_t、o_tThe intermediate calculation results in the t-th calculation unit are all real value vectors; h is_t-1、c_t-1、x_tIs the input of the t-th computing unit, which is a real-valued vector, where x_tIs e (w)_t) Or e (p)_t)；c_t、h_tIs the output of the t-th computing unit, but c_tOnly h is used as the auxiliary calculation result of the long and short memory neural network model and finally used as the word or part-of-speech feature vector_t，h_tI.e. the target feature vector h^f(w_t) Or h^f(p_t) Note that since this is a serialized computational model, the output h of the t-1 st computational unit_t-1、c_t-1I.e. the input of the t-th computing unit, tanh is a hyperbolic function which is a real-valued function and acts on a vector to represent that each element in the vector is subjected to the operation to obtain a target vector with the same dimension as the input vector, sigma is a sigmod function which is a real-valued function and acts on a vector to represent that each element in the vector is subjected to the operation to obtain a target vector with the same dimension as the input vector, ⊙ is a point multiplication operation, i.e. the two vectors with the same dimension are subjected to bit multiplication to obtain a result vector with the same dimension, and W is a linear vector_fhh_t-1、W_fxx_tEtc. are all matrix multiplication operations.

And 3-1-2, calculating the feature vector input generated in the step 3-1-1 by using a forward neural network to obtain scores of all the label types. After the step 3-1 is finished, a real value vector formed by splicing vectors corresponding to all the features in the step 3-1 is obtained, the dimension size of the real value vector is the sum of the dimensions of all the feature vectors, the vector is used as the input of the forward neural network, and the calculation process of the whole forward neural network is carried out according to the following formula:

h＝σ(W₁x+b)

o＝W₂h

wherein, W₁、b、W₂The method comprises the following steps of (1) obtaining a trained model parameter matrix, wherein the value of each element in the matrix is a real value; x is an input vector, the value of each element of which is a real number value; o is a calculation output, which is a real-valued vector, the dimension of which corresponds to the number of the labeling types that can be selected when labeling each word in the labeling process defined in the step 1-2, wherein the ith value represents the score for labeling the current step as the category i; w₁x、W₂h are all matrix multiplication operations.

Step 3-2, a set of states S is given^tFor each state in the set of states when processing the tth wordLabeling according to each candidate labeling type, expanding the states, selecting m new states with highest scores according to a column search mode, and obtaining a new state set S^t+1. The method comprises the following steps:

step 3-2-1, give each state in the previous set of statesScoring all annotation types in the manner of step 3-1, assuming state S_xScore of (S) is score (S)_x) Type of label_kScore of score (type)_k) If all types are expanded, K (K is the total number of all labeled types) new target states are obtained after expansion and are expressed asThe corresponding score is calculated according to the following formula:

s c o r e (S_{i k}^{t + 1}) = s c o r e (S_{i}^{t}) + s c o r e ({type}_{k}),

wherein the scores are all real values. Then determining candidate marking types according to the constraint rules in the step 1-2, and setting the state according to the marking typesExpansion is performed assuming a set of states S^tA certain state inIf there are c (i) candidate label types determined according to the constraint rule in step 1-2, then the state is matchedAfter expansion, c (i) new states are obtained and are expressed as

Step 3-2-2, gathering the state S^tAll the states (assuming m states) are expanded in the manner of step 3-2-1, and all the expanded states are

Step 3-2-3, taking out m states with highest score from all the states obtained in step 3-2-2, and forming new statesCollection

Step 3-3, namely, executing steps 3-1 and 3-2 to t 1,2, …, n to obtain the final target state set Sⁿ⁺¹And extracting the state with the highest scoreAnd backtracking to obtain the whole labeling sequence of the sentence, and further obtaining a chunk analysis result corresponding to the sentence.

The additional description of the model parameter training method used in the analysis process of the present invention is as follows:

as can be seen from step 2 of the analysis process, the parameters used in the analysis process of the present invention include the following components (hereinafter, these parameters are referred to as model parameter sets):

1. feature vectors corresponding to the features, denoted by e (, where denotes the basic word features and basic part-of-speech features in step 2-1-1-1 and the block-level word features and part-of-speech features in step 2-1-1-2-1, that is, all words and parts-of-speech appearing in the training expectation and the combination of two adjacent words and the combination of two adjacent parts-of-speech correspond to one set of feature vectors;

2. neural network parameters for calculating forward word sequences in step 2-1-1-2-2

3. Neural network parameters for calculating backward word sequences in step 2-1-1-2-2

4. Neural network parameters for calculating forward part-of-speech sequences in step 2-1-1-2-2

5. Neural network parameters for calculating backward part-of-speech sequences in step 2-1-1-2-2

6. Forward neural network parameter W used in step 2-1-2₁、W₂。

The training process is implemented using an iterative approach with the correct sequence labeled in the maximum likelihood training dataset. Before training begins, the parameters in the model parameter set are randomly sampled, for example, the values of embodiment 1 and embodiment 2 are randomly sampled according to the uniform distribution between-0.1 and 0.1. The labeled dataset (assuming dataset size D) is then used, dataest ═ sent₁,sent₂,…,sent_DTraining parameters: first, a training objective is defined on the whole data set, which is also called a loss function, and is a function of all parameters in the whole model parameter set, and assumed to be L (dataset), and for each sentence sent_rIs expressed as loss (send)_r) The definition and the calculation process of the two are carried out according to the following modes:

when the t-th word of a sentence is processed in the way of step 2 in the analysis process, for each state in the previous state set, the expression method in step 2-2 is assumed to be expressed asThe score (type) obtained by scoring the kth annotation type in the current state can be known by the process of step 2-1_k) In fact, it is a complex function of all parameters in sets 2-5 (assumed as Θ) and those feature vectors in set 1 taken at this state in steps 2-1-1-1 and 2-1-1-2-1. Assumed to be in a given stateAll the feature vectors extracted at the time of processing the t-th are expressed as a wholeSince here the score of the whole sentence is to be expressed, we will be in a given state for convenienceThe score obtained by scoring the kth annotation type when processing the t < th > is expressed asThen there are:

s c o r e (S_{i}^{t}, t, {type}_{k}) = F (Θ, E (S_{i}^{t}, t)),

f is a composite function formed by compounding four long and short memory neural networks and a strong term neural network according to the process description in the step 2-1, and theta is all parameters in the sets 2-5 of the model parameter set.

From step 2, it can be seen that after processing a sentence in steps 2-3, the state set is Each of the statesAll the scores of (A) are all the parameters (expressed by Θ) in the sets 2 to 5 of the model parameter sets and the state from the beginning in the set 1 of the parametersExtend to StateEach word is processed in the whole path according to a composite function of all the feature vectors extracted in step 2-1-1-1 and step 2-1-1-2-1. Assume for the set of states Sⁿ⁺¹Each state ofIts slave stateExtend to StateIn the process of (2), the selected sequence of the annotation type isThe sequence of states experienced in the process is(Is thatIs that) Then state ofThe score of (a) is:

s c o r e (S_{i}^{n + 1}) = Σ_{j = 1}^{n} s c o r e (S_{i_{j - 1}}^{j}, j, {type}_{i_{j}}),

since the training sentences are all labeled data, i.e. knowing the correct labeling sequence, the state set S is assumedⁿ⁺¹State of (1)Corresponding to the correct labeling sequence. Defining a loss function for the sentence:

l o s s ({sent}_{r}) = - Σ_{l = 1}^{m} \frac{e^{s c o r e (s_{g o l d}^{n + 1})}}{e^{s c o r e (s_{l}^{n + 1})}},

wherein e^xAn exponential function is represented and e represents a constant of the natural logarithm.

The loss function for the entire training data set is defined as:

L (d a t a s e t; Θ, E) = Σ_{l = 1}^{D} l o s s ({sent}_{l}),

wherein Θ and E denote that the loss function is a function of a parameter in the set of model parameters.

The objective of the whole training process is to minimize the above loss function, and various methods for minimizing the above loss function and obtaining the parameters are well known to practitioners in the art, such as the embodiment in which a random gradient descent method is used to solve the above loss function.

Example 1

First, the model parameters in this embodiment are trained on 9978 sentences of 728 files (file numbers from chtb _001.fid to chtb _899.ptb, note that the numbers are not consecutive, so there are only 110 files) in version ctb (the Chinese Penn treebank)4.0 in the bingzhou tree library in the manner described in the attached description of the model parameter training method in the specification.

The present embodiment performs a complete process of chinese chunk analysis on a sentence by using the chinese chunk analysis method based on state transition and neural network of the present invention as follows:

step 1-1, defining Chinese chunk types, and defining 12 types on the basis of Chinese version CTB4.0 of Bingzhou tree bank: ADJP, ADVP, CLP, DNP, DP, DVP, LCP, LST, NP, PP, QP, VP, the specific meanings of which are given in step 1-1 in the specification;

and 1-2, determining the selected labeling type when each word is labeled in the labeling process, and adopting a BIOES system. The finally determined labeling types are 27 types of B-ADJP, B-ADVP, B-CLP, B-DNP, B-DP, B-DVP, B-LCP, B-LST, B-NP, B-PP, B-QP, B-VP, ADJP, I, O, E, S-ADVP, S-CLP, S-DNP, S-DP, S-DVP, S-LCP, S-LST, S-NP, S-PP, S-QP and S-VP;

step 2-1, the computer reads a natural language text file containing the sentence to be analyzed. For convenience of explanation, a sentence "Shanghai/NR Pudong/NR development/NN and/CC method/NN construction/NN synchronization/VV" is read in;

step 3, at first, the initial state set is S¹Wherein one state isThe state is an initial sentence, and then the following steps are executed;

step 3-1, processing the 1 st word "Shanghai", and executing the following steps:

step 3-1-1, generating the input of the forward network, and executing the following steps:

and 3-1-1-1, generating basic information characteristics. Since it is the first word, counting left without words, and adding a supplementary word, assumed to be "word start", and a supplementary part of speech, assumed to be "POS start", to its left as usual, the corresponding word is characterized by w_-2＝“word_start”、w_-1＝“word_start”、w₀"shanghai", w₁Pudong, w₂When developed, the part of speech is characterized by p_-2＝“POS_start”、p_-1＝“POS_start”、p₀＝“NR”、p₁＝“NR”、p₂＝“NN”、p_-2p_-1＝“POS_startPOS_start”、p_-1p₀＝“POS_start NR”、p₀p₁＝“NR NR”、p₁p₂The vector representation corresponding to these features is then taken, in this embodiment, the dimensions of these feature vectors are all set to 50, and they are all real-valued vectors, e.g., e (w NN)₀) The values of the first 5 elements are-0.0999, 0.0599, 0.0669, -0.0786 and 0.0527;

and 3-1-1-2, generating additional information characteristics. The following steps are carried out:

and 3-1-1-2-1, generating related words and part-of-speech feature vectors of the chunks. Since this word is not preceded by yet analyzed chunks, also represented by the complementary word, respectively start word (c)_-2)＝“start_chunk_word_NULL”、end_word(c_-2)＝“end_chunk_word_NULL”、head_word(c_-2)＝“head_chunk_word_NULL”、start_word(c_-1)＝“start_chunk_word_NULL”、end_word(c_-1)＝“end_chunk_word_NULL”、head_word(c_-1)＝“head_chunk_word_NULL”、start_POS(c_-2)＝“start_chunk_POS_NULL”、end_POS(c_-2)＝“end_chunk_POS_NULL”、head_POS(c_-2)＝“head_chunk_POS_NULL”、start_POS(c_-1)＝“start_chunk_POS_NULL”、end_POS(c_-1)＝“end_chunk_POS_NULL”、head_POS(c_-1) The vector representation corresponding to these features is then taken, and in this embodiment, the dimensions of these feature vectors are all set to 50, and they are all real-valued vectors;

and 3-1-1-2-2, as shown in fig. 1 and 2, generating a feature vector of the word and part of speech information features of the current position to be labeled, which is calculated by using a two-way long and short memory neural network model. For word feature vectors, the input is a vector representation for each word in the sentence, and for part-of-speech feature vectors, the input is a vector representation for each part-of-speech in the sentence, which vector representations are consistent with the corresponding vector representations of the same word or part-of-speech in step 3-1-1-1, e.g., e (w)₀)(w₀Shanghai-") still has values of-0.0999, 0.0599, 0.0669, -0.0786, 0.0527; for parameters in the long and short memory models, the values are all real values, such as a matrix W for calculating forward word vectors_fhThe first 5 parameter values in the first row in (A) are 0.13637, 0.11527, -0.06217, -0.19870, 0.03157; then calculating to obtain a feature vector h corresponding to each word and part of speech^fAnd h^bThey are all real-valued vectors, h being set in this embodiment^fAnd h^bAll dimensions of (2) are 25.

Step 3-1-2, concatenating all vectors obtained in step 3-1-1 to obtain a real value vector, which is 14 × 50+12 × 50+4 × 25 ═ 1400 dimensions in this example, and then obtaining scores of all 27 labeled types, which are 0.7898 (B-adpp), 0.4961(ADVP), -0.1281(B-CLP), -0.0817(B-DNP),0.5265(B-DP), -0.0789(B-DVP),0.4362(B-LCP), -0.2250(B-LST),2.9887(B-NP), -0.0726(B-PP),0.1320(B-QP),0.4636(B-VP),1.6294(E),1.8871(I), -0.3904(O),0.6985 (S-adpp), -0.1703 (S-adpp), -0.3287(S-CLP),0.1734(S-DNP),0.5694(S-DP),0.0990(S-DVP),0.0902(S-LCP), -1.0364(S-LST),2.0767(S-NP), -0.0179(S-PP), -0.0606(S-QP),0.0941 (S-VP);

step 3-2-1, the current given state set is S¹Wherein only one state isAnd is provided withRemoving the label types I and E (score (I) 1.8871 and score (E) 1.6294) obtained in the step 3-1-2 according to the constraint rule 1 in the step 1-2 in the specification, and converting the states into the statesExpanding according to each remaining label type and calculating the score of the corresponding target stateBecause of the fact thatSo that there areFor example, have

Step 3-2-2, converting the state S¹Each state in (a) is extended in the manner in step 3-2-1. Because among them onlySo 27-2-25 new states are obtained;

and 3-2-3, selecting 4 states with the highest scores from the 25 new states to form a new state set. The 4 highest scoring new states are sequentially From which a new set of states S is composed²It contains four new states, respectively:

1.a score 2.9887 indicating "Shanghai/NR _ B-NP Pudong/NR development/NN and/CC method/NN construction/NN synchronization/VV";

2.a score 2.0767 indicating "Shanghai/NR _ S-NP Pudong/NR development/NN and/CC method/NN construction/NN synchronization/VV";

3.a score of 0.7898, which indicates "Shanghai/NR _ S-ADJP Pudong/NR development/NN and/CC method/NN construction/NN synchronization/VV";

4.this indicates "Shanghai/NR _ B-QP Pudong/NR development/NN and/CC method/NN construction/NN synchronization/VV", score 0.6985.

Step 3-3, processing the remaining words according to the mode in the steps 3-1 and 3-2 to obtain a final target state set S⁸It contains four states, respectively:

1.indicating "Shanghai/NR _ B-NP Pudong/NR _ E development/NN _ B-NP and/CC _ I method/NN _ I construction/NN _ E synchronization/VV _ S-VP", score 24.6169;

2.indicating "Shanghai/NR _ B-NP Pudong/NR _ E development/NN _ B-NP and/CC _ I method/NN _ E construction/NN _ S-VP synchronization/VV _ S-VP", score 20.2407;

3.indicating "Shanghai/NR _ B-NP Pudong/NR _ E development/NN _ B-NP and/CC _ I method/NN _ I construction/NN _ E synchronization/VV _ B-VP", score 19.7653;

4.represents "Shanghai/NR _ B-NP Pudong/NR _ E development/NN _ B-NP and/CC _ I method/NN _ I construction/NN _ E synchronization/VV _ O", score 19.6299.

Taking out the state with the highest scoreBacktracking to obtain the wholeThe labeled sequence of the sentence is:

the analysis result of the corresponding blocks is [ NP Shanghai Pudong ] [ NP development and legal construction ] [ VP synchronization ].

Example 2

The algorithm used by the invention is written and realized by C + + language. The model used in the experiment of this embodiment is: intel (R) core (TM) i7-5930K processor with 3.50GHz main frequency and 64G memory. First, the model parameters in this embodiment are trained on 9978 sentences of 728 files (file numbers from chtb _001.fid to chtb _899.ptb, note that the numbers are not continuous, and therefore only 110 files) in version ctb (the chinese Penn treebank)4.0 in the bingzhou tree library, in the manner described in the attached description of the model parameter training method in the specification. The experimental test used data for chunking analysis using 5290 sentences of 110 files (file numbers from chtb _900.fid to chtb _1078.ptb, note that the numbers are not consecutive and therefore only 110 files) and the experimental results are shown in Table 7:

table 7 Experimental results show

Among them, MBL (Memory-based learning) is a learning method based on Memory, TBL (Transformation-based learning) is a learning method based on conversion, crf (conditional random field) is a conditional random field learning method, and svm (support Vector machine) is a support Vector machine learning method, which are four conventional common machine learning algorithms for processing the task. It should be noted that evaluating on the data set is a common way to evaluate the chinese chunk analysis method. It can be seen that the method of the present invention achieved a higher F1-score value on the data set, illustrating the effectiveness of the method.

The calculation of F1-score is described here: since the test set is a labeled data set, it is known that the labeling result is correct, and it is assumed that for the entire data set, the set s (gold) is composed of all chunks, and its size is count (gold); after each sentence in the data set is subjected to chunk analysis in the manner of embodiment 1, all chunks in the analysis results are taken out to form a prediction result set s (prediction), and the size of the prediction result set s (prediction) is assumed to be count (prediction); the set composed of the same chunks in S (gold) and S (predict) is S (correct), and the size thereof is count (correct); assuming that the prediction accuracy is expressed as precision and the prediction recall is expressed as recall, the calculation of the values is performed as follows:

p r e c i s i o n = \frac{c o u n t (c o r r e c t)}{c o u n t (p r e d i c t)},

r e c a l l = \frac{c o u n t (c o r r e c t)}{c o u n t (g o l d)},

F 1 - s c o r e = \frac{2 \times p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l} .

Claims

1. A Chinese chunk analysis method based on state transition and a neural network is characterized by comprising the following steps:

2. The method of claim 1, wherein step 1 comprises the steps of:

step 1-1, defining Chinese chunk types according to 12 phrase types defined in table 1;

TABLE 1

Type (B) Means of ADJP Adjective phrases ADVP Adverb phrase CLP Category-type phrases DNP Multiple limiting phrases DP Qualifier phrase DVP Ground word phrase LCP Phrase of orientation LST Sequence phrases NP Noun phrases PP Preposition phrase QP Quantitative word phrase VP Verb phrases

And step 1-2, determining the tagging type which can be selected when the part of speech tagging is carried out on each word to be tagged in the tagging process by adopting a mode of combining a BIOES tagging system with the Chinese chunk type defined in the step 1-1.

3. The method of claim 2 wherein in step 2, the chinese chunk analysis process is used as a serialized annotation task, and the type of annotation is generated by combining the chinese chunk type defined in step 1-1 with the biees annotation hierarchy used in step 1-2.

4. A method according to claim 3, characterized in that the length of the sentence to be analyzed is denoted by n throughout step 2, step 2 comprising the steps of:

step 2-1, in a given state, scoring all the label types when processing the tth word, wherein the given state is that the t-1 words in the front of the sentence to be analyzed are labeled and the corresponding label types are known, the tth to nth words are unlabeled words and the tth word is the next word to be processed;

step 2-2, a set of states S is given^tWhen the tth word is processed, scoring all the label types of each state in the state set in the mode of step 2-1, wherein the scoring is carried outAfter the calculation is finished, each label type is endowed with a real numerical value which is called a score corresponding to the type, then candidate label types are generated according to the mode of the step 1-2, words are labeled according to each candidate label type so as to expand the state, m new states with the highest score are selected according to a column search mode, and a new state set S is obtained^t+1；

Step 2-3, for t being 1,2, …, n, iteratively executing steps 2-1 and 2-2 to obtain a final target state set Sⁿ⁺¹And extracting the state with the highest scoreAnd backtracking from the state to obtain a labeling sequence with the highest score, completing the type labeling of all the words at the moment, and restoring the labeling sequence with the highest score into a corresponding chunk analysis result, wherein the result is the analysis result of the current sentence.

5. The method of claim 4, wherein step 2-1 comprises the steps of:

and 2-1-2, calculating the feature vectors generated in the step 2-1-1 by using a forward neural network to obtain scores of all candidate labeling types.

6. The method according to claim 5, characterized in that all words in the sentence to be analyzed are represented as w from left to right in turn in the whole step 2-1-1₁,w₂,…,w_n，w_nRepresenting the nth word in the sentence to be analyzed, wherein the value of n is a natural number; parts of speech corresponding to all words in the sentence to be analyzed are sequentially represented as p from left to right₁,p₂,…,p_n，p_nRepresenting the part of speech corresponding to the nth word in the sentence to be analyzed; a feature vector corresponding to a feature is denoted as e (, and step 2-1-1 comprises the steps of:

step 2-1-1-1,generating a basic information characteristic vector, wherein the basic information characteristic vector comprises a word in a certain window with the position of the current word to be labeled as a reference, a characteristic vector corresponding to part-of-speech characteristics, and a characteristic vector corresponding to the category characteristics of the labeled word in the certain window with the position of the current word to be labeled as the reference; the specific process is as follows: the word feature vector in the basic information features comprises: characteristic vector e (w) corresponding to the second word counted leftwards by taking the current word to be processed as the center_-2) And a feature vector e (w) corresponding to the first word counted to the left by taking the current word to be processed as the center_-1) And the feature vector e (w) corresponding to the current word to be processed₀) And counting the characteristic vector e (w) corresponding to the first word to the right by taking the current word to be processed as the center₁) And a feature vector e (w) corresponding to the second word counted right with the current word to be processed as the center₂)；

7. The method of claim 6, wherein step 2-1-1-2 comprises the steps of:

step 2-1-1-2-1, counting the second chunk to the left with the current word to be processed as the center, and respectively representing the first chunk as c_-2、c_-1Block c_iIs denoted as start word (c)_i) The last word is denoted end word (c)_i) I-2, -1, the grammatical core word denoted head _ word (c)_i) Block c_iThe part of speech of the first word of (a) is denoted as start _ POS (c)_i) The part of speech of the last word is denoted end _ POS (c)_i) Part of speech of the grammatical core word is denoted as head _ POS (c)_i) Generating word characteristic vectors and part-of-speech characteristic vectors which are related to the marked chunks in a certain window by taking the position of the current word to be marked as a reference:

the chunk-level word feature vector includes: the feature vector e (start word (c)) of the first word of the second chunk counting to the left with the current word to be processed as the center_-2) End word (c) of the last word of the second chunk to the left centered on the current word to be processed_-2) A feature vector e (head _ word (c)) of a grammatical core word of a second chunk counting to the left with the current word to be processed as the center_-2) Starting word (c), the feature vector e of the first word of the first chunk counting to the left with the current word to be processed as the center (start word (c))_-1) End word (c) of the last word of the first chunk counted to the left centering on the current word to be processed_-1) A feature vector e (head _ word (c)) of a grammatical core word of the first chunk counting to the left with the current word to be processed as the center_-1))；

The part-of-speech feature vectors at the chunk level include: feature vector e of part of speech of first word of second chunk counting to left with current word to be processed as center (start _ POS (c)_-2) Left centering on the current word to be processed)Feature vector e (end _ POS (c) of part of speech of last word of second chunk_-2) Characteristic vector e (head _ POS (c)) of part of speech of grammatical core word of second chunk counting leftwards with current word to be processed as center_-2) Characteristic vector e (start _ POS (c)) of part of speech of the first word of the first chunk counting left with the current word to be processed as the center (c)_-1) And a part-of-speech feature vector e (end _ POS) (c) of the last word of the first chunk counted to the left with the current word to be processed as the center_-1) Characteristic vector e (head _ POS (c)) of part of speech of grammatical core word of the first chunk counting to the left with current word to be processed as center_-1))；

Step 2-1-1-2-2, calculating and generating word and part of speech information feature vectors of the current position to be labeled by using a bidirectional long and short memory neural network model: the input of the two-way long and short memory neural network model is all words in a sentence to be analyzed and parts of speech corresponding to all words in the sentence to be analyzed, the output is a forward word characteristic vector, a forward part of speech characteristic vector, a backward word characteristic vector and a backward part of speech characteristic vector, tanh used in the following formula is a hyperbolic function and is a real-valued function, and the function of the tanh on one vector represents that the operation is carried out on each element in the vector, so that a target vector with the same dimension as the input vector is obtained; sigma is a sigmod function, which is a real-valued function, and acts on a vector to represent that each element in the vector is subjected to the operation, so that a target vector with the same dimension as the input vector is obtained; as a dot product operation, that is, two vectors with the same dimension are multiplied bit by bit to obtain a result vector with the same dimension, the four eigenvectors are calculated as follows:

the forward word feature vector is sequentially represented as h^f(w₁),h^f(w₂),…,h^f(w_n)，h^f(w_t) The t-th forward word feature vector is represented, and the calculation mode is carried out according to the following formula:

f_{t}^{w f} = σ (W_{f h}^{w f} h^{f} (w_{t - 1}) + W_{f x}^{w f} e (w_{t}) + W_{f c}^{w f} c_{t - 1}^{w f} + b_{f}^{w f}),

i_{t}^{w f} = σ (W_{i h}^{w f} h^{f} (w_{t - 1}) + W_{i x}^{w f} e (w_{t}) + W_{i c}^{w f} c_{t - 1}^{w f} + b_{i}^{w f}),

o_{t}^{w f} = σ (W_{o h}^{w f} h^{f} (w_{t - 1}) + W_{o x}^{w f} e (w_{t}) + W_{o c}^{w f} c_{t}^{w f} + b_{o}^{w f}),

wherein, the method is characterized in that the method is a trained model parameter matrix, the value of each element in the matrix is a real value, the group of parameters is irrelevant to t, namely all calculation units in a calculation sequence share the same group of parameters;

the forward part-of-speech feature vector is sequentially represented as h^f(p₂),…,h^f(p_n)，h^f(p_t) The t forward part-of-speech feature vector is represented, and the calculation mode is carried out according to the following formula:

f_{t}^{p f} = σ (W_{f h}^{p f} h^{f} (p_{t - 1}) + W_{f x}^{p f} e (p_{t}) + W_{f c}^{p f} c_{t - 1}^{p f} + b_{f}^{p f}),

i_{t}^{p f} = σ (W_{i h}^{p f} h^{f} (p_{t - 1}) + W_{i x}^{p f} e (p_{t}) + W_{i c}^{p f} c_{t - 1}^{p f} + b_{i}^{p f}),

o_{t}^{p f} = σ (W_{o h}^{p f} h^{f} (p_{t - 1}) + W_{o x}^{p f} e (p_{t}) + W_{o c}^{p f} c_{t}^{p f} + b_{o}^{p f}),

the backward word feature vector is sequentially represented as h^b(w₁),h^b(w₂),…,h^b(w_n)，h^b(w_t) The characteristic vector of the t-th backward word is represented, and the calculation mode is carried out according to the following formula:

f_{t}^{w b} = σ (W_{f h}^{w b} h^{b} (w_{t + 1}) + W_{f x}^{w b} e (w_{t}) + W_{f c}^{w b} c_{t + 1}^{w b} + b_{f}^{w b}),

i_{t}^{w b} = σ (W_{i h}^{w b} h^{b} (w_{t + 1}) + W_{i x}^{w b} e (w_{t}) + W_{i c}^{w b} c_{t + 1}^{w b} + b_{i}^{w b}),

o_{t}^{w b} = σ (W_{o h}^{w b} h^{b} (w_{t + 1}) + W_{o x}^{w b} e (w_{t}) + W_{o c}^{w b} c_{t}^{w b} + b_{o}^{w b}),

the intermediate calculation results in the t-th calculation unit are all real value vectors; e (w)_t)、h^b(_t+1)、Is the input of the t-th computing unit, which is a real-valued vector, e (w) of which_t) I.e. the word w_tA corresponding feature vector; h is^b(w_t)、Is the output of the t-th computing unit,only h is finally used as the characteristic vector of the forward word for the auxiliary calculation result of the long and short memory neural network model^b(w_t-1) Since this is a serialized computational model, the output h of the t +1 th computational unit^b(w_t-1)、The input is the input of the t calculating unit;

the backward part-of-speech feature vector is sequentially represented as h^b(p₁),h^b(p₂),…,h^b(p_n)，h^b(p_t) And (3) expressing the t-th backward part-of-speech feature vector, wherein the calculation mode is carried out according to the following formula:

f_{t}^{p b} = σ (W_{f h}^{p b} h^{b} (p_{t + 1}) + W_{f x}^{p b} e (p_{t}) + W_{f c}^{p b} c_{t + 1}^{p b} + b_{f}^{p b}),

i_{t}^{p b} = σ (W_{i h}^{p b} h^{b} (p_{t + 1}) + W_{i x}^{p b} e (p_{t}) + W_{i c}^{p b} c_{t + 1}^{p b} + b_{i}^{p b}),

o_{t}^{p b} = σ (W_{o h}^{p b} h^{b} (p_{t + 1}) + W_{o x}^{p b} e (p_{t}) + W_{o c}^{p b} c_{t}^{p b} + b_{o}^{p b}),

is an intermediate calculation junction in the t-th calculation unitFruits, all are real value vectors;

e(p_t)、h^b(p_t+1)、is the input of the t-th computing unit, which is a real-valued vector, e (p) of which_t) I.e. part of speech p_tA corresponding feature vector; h is^b(p_t)、Is the output of the t-th computing unit,only h is finally used as the characteristic vector of the forward word for the auxiliary calculation result of the long and short memory neural network model^b(p_t+1) Since this is a serialized computational model, the output h of the t +1 th computational unit^b(p_t+1)、I.e. the input of the t-th calculation unit.

8. The method of claim 7, wherein the forward neural network is used in step 2-1-2 to calculate the scores of all the labeled types, and the calculation process of the whole forward neural network is performed according to the following formula:

h＝σ(W₁x+b₁)，

o＝W₂h，

wherein, W₁、b₁、W₂The method comprises the following steps of (1) obtaining a trained model parameter matrix, wherein the value of each element in the matrix is a real value; x is an input vector which is formed by splicing all the characteristic vectors obtained in the step 2-1-1, the dimensionality of the input vector is the sum of the dimensionalities of all the characteristic vectors generated in the step-1-1, and the value of each element is a real numerical value; h is a hidden layer vector of the neural network and is an intermediate calculation result unit; o is the output of the calculation, is a realA value vector, the dimension of which corresponds to the number of labeling types that can be selected when labeling each word in the labeling process defined in step 1-2, wherein the g-th value represents a score for labeling the current step as type g, and the score is a real numerical value; w₁x、W₂h are all matrix multiplication operations.

9. The method of claim 8, wherein step 2-2 comprises the steps of:

step 2-2-1, each state in the previous state set is given, all the labeled types are scored according to the mode in the step 2-1, and a state S is assumed_xScore of (S) is score (S)_x) Type of label_kScore of score (type)_k) If all the label types are expanded, K new states are obtained after the expansion, and are expressed asK is the total number of all the labeled types, and the corresponding score of the kth state is calculated according to the following formula

s c o r e (S_{i k}^{t + 1}) = s c o r e (S_{i}^{t}) + s c o r e ({type}_{k}),

Wherein K is 1-K, scores are all real numerical values, candidate marking types are determined according to the mode in the step 1-2, states are expanded according to the candidate marking types, and a state set S is assumed^tIf there are c (i) candidate label types determined by the state in step 1-2, c (i) new states are obtained after the state is expanded and are expressed as