CN111274392B

CN111274392B - Multi-channel joint processing method and device

Info

Publication number: CN111274392B
Application number: CN202010047500.6A
Authority: CN
Inventors: 宋彦; 田元贺; 王咏刚
Original assignee: Innovation Workshop Guangzhou Artificial Intelligence Research Co ltd
Current assignee: Innovation Workshop Guangzhou Artificial Intelligence Research Co ltd
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2024-03-15
Anticipated expiration: 2040-01-16
Also published as: CN111274392A

Abstract

The embodiment of the application aims to provide a multi-channel joint processing method and device for a word segmentation and part-of-speech tagging system. The method provided by the application embodiment comprises the following steps: acquiring a word sequence contained in an input sequence and length information corresponding to the word sequence; corresponding each word sequence to a plurality of channels according to the length information, so that a word sequence set with the same length corresponds to one channel; respectively modeling and weighting calculation are carried out on the contribution sizes of the word sequence sets with different lengths to the joint labels in each channel to obtain weighted word sequence vectors with specific lengths corresponding to each channel; and carrying out weighted concatenation on the length weighted vectors corresponding to the channels to obtain weighted word sequence vectors corresponding to the input sequences, wherein the weighted word sequence vectors are used for reflecting the contribution of different word sequence sets with different lengths contained in the input sequences to the joint labels.

Description

Multi-channel joint processing method and device

Technical Field

The invention relates to the technical field of computers, in particular to a multi-channel joint processing method for a word segmentation and part-of-speech tagging system.

Background

The joint task (Joint Chinese Word Segmentation and Part-of-speech Tagging) of Chinese word segmentation and part-of-speech Tagging refers to the fact that word segmentation and part-of-speech Tagging are regarded as a joint task, namely, word segmentation and part-of-speech Tagging are simultaneously carried out on an input Chinese character sequence, rather than word segmentation firstly and part-of-speech Tagging are carried out on a word segmentation result.

In the scheme in the prior art, the technology of the joint task oriented to Chinese word segmentation and part-of-speech tagging can be divided into a traditional feature-based method and a deep learning method.

The feature-based method is that the feature extraction is carried out on the input text by a method of manually designing and selecting the features, and the joint label of the current word is judged based on the features. Common features include current word, previous word, next word, etc. However, the effect of this method is highly dependent on the quality of the manually designed, extracted features, and it is very difficult to design a high quality feature extraction method. In addition, the feature extraction method has no corresponding solution for ambiguity caused by sentence difference.

In recent years, a deep learning method is gradually applied to chinese word segmentation. According to the method, text features can be automatically extracted according to the characteristics of specific tasks, and huge cost of manual design and feature extraction is avoided. The recognition effect of deep learning is far superior to that of a simple traditional method. Generally, a system of joint tasks based on deep learning is based on a basic sequence labeling mode, and specifically comprises the following three modules of an input embedded layer, a context information coding layer and a decoding output layer.

Wherein the input embedding layer maps each word in the input text and the n-gram (i.e. the word sequence of length n) associated with the word to a word vector in the high-dimensional continuous space and the n-gram vector, respectively, and directly concatenates the word vector with the n-gram vector (concatate) to obtain a new word vector, and uses the new word vector to represent the character of the word. The context information coding layer extracts the context information of each word on the basis of the word vector, and calculates the influence of the word vector of other words on the context information. The input to this layer is the output of the embedded layer (i.e., the word vector of the different words in a sentence), which is the context-encoded different word vector. The decoding output layer decodes each word vector after the context information extraction and outputs the predicted joint label.

However, prior art based schemes do not take into account the difference in contributions of different n-grams to the joint labels of the word when concatenating the n-gram vector with the word vector, which may cause those n-gram misleading models with small contributions to predict false joint labels. For example, for the following statement:

(1) Education section data- > education section_nn/analysis_vv/data_nn

(2) Education section student- > education_VV/section_CD/student_NN

Wherein the n-gram "education" has a greater effect on the binding tag in (1) and a lesser effect in (2). If the contribution of the "education" in different contexts is not distinguished, then this n-gram will mislead the model to make false joint label predictions.

Disclosure of Invention

The embodiment of the application aims to provide a multi-channel joint processing method and device for a word segmentation and part-of-speech tagging system.

The embodiment of the application provides a multi-channel joint processing method for a word segmentation and part-of-speech tagging system, wherein the method comprises the following steps:

acquiring a word sequence contained in an input sequence and length information corresponding to the word sequence;

corresponding each word sequence to a plurality of channels according to the length information, so that a word sequence set with the same length corresponds to one channel;

modeling and weighting calculation are respectively carried out on the contribution sizes of the word sequence sets with different lengths to the joint labels in each channel, so that length weighting vectors corresponding to each channel and aiming at specific lengths are obtained;

and carrying out weighted concatenation on the length weighted vectors corresponding to the channels to obtain weighted word sequence vectors corresponding to the input sequences, wherein the weighted word sequence vectors are used for reflecting the contribution of different word sequence sets with different lengths contained in the input sequences to the joint labels.

The embodiment of the application provides a multi-channel joint processing device for a word segmentation and part-of-speech tagging system, wherein the multi-channel joint processing device comprises:

the acquisition module is used for acquiring the word sequence contained in the input sequence and the length information corresponding to the word sequence;

a channel corresponding module, configured to correspond each word sequence to a plurality of channels according to the length information, so that a set of word sequences with the same length corresponds to one channel;

the multi-channel calculation module is used for respectively modeling and carrying out weight calculation on the contribution sizes of the word sequence sets with different lengths to the joint labels in each channel to obtain length weight vectors corresponding to each channel and aiming at specific lengths;

and the weighted series module is used for carrying out weighted series on the length weighted vectors corresponding to the channels to obtain weighted word sequence vectors corresponding to the input sequences, wherein the weighted word sequence vectors are used for reflecting the contribution of different word sequence sets with different lengths contained in the input sequences to the joint labels.

The embodiment of the application provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, and is characterized in that the processor executes the program to realize the method.

A computer-readable storage medium according to an embodiment of the present application has a computer program stored thereon, wherein the program is executed by a processor to implement the above-mentioned method.

Compared with the prior art, the embodiment of the application has the following advantages: according to the word sequence matching method, the word sequences with different lengths are respectively modeled and weighted according to the contribution of the word sequence sets with different lengths to the joint labels in a plurality of channels, and the difference of the contribution of the word sequences with different lengths to the joint labels is considered, so that the error prediction caused by the difference is avoided, the model deviation caused by the fact that the frequency of occurrence of the longer word sequences in a training data set is low is avoided, and the accuracy of a word segmentation and part-of-speech tagging system is improved.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a flow chart of a multi-channel joint processing method for a word segmentation and part-of-speech tagging system according to an embodiment of the present application;

FIG. 2 illustrates a schematic diagram of a word segmentation and part-of-speech tagging system according to an embodiment of the present application;

fig. 3 shows a schematic structural diagram of a multi-channel joint processing device for a word segmentation and part-of-speech tagging system according to an embodiment of the present application.

The same or similar reference numbers in the drawings refer to the same or similar parts.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

Fig. 1 shows a flowchart of a multi-channel joint processing method for a word segmentation and part-of-speech tagging system according to an embodiment of the present application. The method comprises the steps of S1, S2, S3 and S4.

Wherein the method according to the invention is implemented by a multi-channel joint processing device comprised in a computer device. The computer device comprises an electronic device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware of the electronic device comprises, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a programmable gate array (FPGA), a digital processor (DSP), an embedded device and the like. The computer device comprises a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network server, a server group composed of a plurality of network servers, or a Cloud based Cloud Computing (Cloud Computing) composed of a large number of hosts or network servers, wherein Cloud Computing is one of distributed Computing, and is a super virtual computer composed of a group of loosely coupled computer sets. The user equipment includes, but is not limited to, any electronic product that can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a PDA, a game console, an IPTV, or the like. The network where the user equipment and the network equipment are located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.

It should be noted that the user equipment, the network equipment and the network are merely examples, and other user equipment, network equipment and network that may be present in the present invention or may appear in the future are applicable to the present invention, and are also included in the scope of the present invention and are incorporated herein by reference.

Referring to fig. 1, in step S1, a multi-channel joint processing apparatus acquires a word sequence included in an input sequence and length information corresponding to the word sequence.

Wherein the length information includes various information for representing n-gram, which refers to a word sequence of length n appearing in a piece of text, for example, "Zhang Sanning" and "renzhu" are 2-gram, and "student Condition" is 3-gram in the input sequence "Zhang Sanzhuang student Condition".

Preferably, the multi-channel joint processing device acquires the word sequence contained in the input sequence and the length information corresponding to the word sequence through a prestored word list containing the length information.

Wherein the vocabulary includes a list of various n-grams that can be used to extract all occurrences in the input word sequence.

The multi-channel joint processing device may also obtain input sequence length information in other ways, such as analyzing and extracting n-gram information in real time by other tools based on the input sequence, etc.

In step S2, the multi-channel joint processing device corresponds each word sequence to a plurality of channels according to the length information, so that the word sequence set with the same length corresponds to one channel.

In step S3, the multi-channel joint processing device models and performs weighted calculation on the contribution sizes of the word sequence sets with different lengths to the joint labels in each channel, so as to obtain length weighted vectors with specific lengths corresponding to each channel.

The joint label is a joint label of Chinese word segmentation and part-of-speech tagging, and refers to a label formed by combining the word segmentation label and the part-of-speech in a form of word segmentation label-part-of-speech. For example, the joint labels of each word in "Zhang Sanzhuang will grow" are "B-NR", "E-NR", "B-VV", "E-VV", "B-NN", "I-NN", "E-NN", "B-NN", "E-NN", in that order.

Specifically, for each channel, the multi-channel joint processing device calculates the weight of the set of word sequences corresponding to that channel relative to each word in the input sequence. The multi-channel joint processing device then obtains a length weighting vector for the particular length corresponding to the channel by calculating a weighted sum of the weights of the obtained set of word sequences relative to each word in the input sequence.

According to a first example of the invention, the input word sequence is noted as(wherein x _i Is a Chinese character, l is the length of the sentence). The multi-channel joint processing device obtains all of the words in step S1 by querying from the vocabularyN-gram of the occurrence of (a). Next, the multi-channel joint processing device associates each word sequence with a plurality of channels by using the n-grams in step S2, so that the set of word sequences with the same length corresponds to one channel, resulting in a set s= { S ₁ ,S ₂ ,…S _k …S _n }, wherein->Is an n-gram of length k (denoted +.>) A set of components. And embedding the function E with an n-gram _s Handle->Mapping to n-gram vector, denoted +.>

Next, the multi-channel processing apparatus calculates a length weight vector for a specific length corresponding to each channel according to the following algorithm in step S3:

in channel k, the set of word sequences corresponding to channel k is relative to the ith word x in the input sequence _i Is calculated according to the following formula:

wherein,u _j is x _i Word vector h obtained after passing through input embedding layer and context information coding layer _i And->An inner product of n-gram vectors of (2); />Is an activating factor with a value of 0 or 1. When->In the time-course of which the first and second contact surfaces,the rest is 0. For example: when x is _i = "learn",>when (I)>When x is _i = "learning",when (I)>

Then, in channel k, the weights are based on the following formulaCalculating a weighted sum of the set of word sequences with respect to the weight of each word in the input sequence:

thus, a length weighting vector for the length k corresponding to the channel k is obtained

Continuing with the description of fig. 1, in step S4, the multi-channel joint processing apparatus performs weighted concatenation on the length weight vectors corresponding to the respective channels to obtain weighted word sequence vectors corresponding to the input sequences.

The weighted word sequence vector is used for reflecting the contribution of different word sequence sets with different lengths contained in the input sequence to the joint label.

Continuing with the first example, the length weight vectors of all channels are weighted and concatenated to obtain the expression of the weight word sequence vector corresponding to the input sequence as follows:

wherein delta _k Representing channel weights, delta _k Is a parameter that can be learned.

It should be noted that the above formula of the word sequence set relative to each word in the input sequence, the weighted word sequence vector, and the weighted word sequence vector corresponding to the input sequence are all exemplary, and not limiting to the present invention, and those skilled in the art should understand that other formulas for calculating the weight of the word sequence set relative to each word in the input sequence, the weighted word sequence vector, and the weighted word sequence vector corresponding to the input sequence are all included in the scope of the present invention.

Preferably, the method further comprises step S5 (not shown), step S6 (not shown) and step S7 (not shown).

In step S5, the multi-channel joint processing device concatenates the weighted word sequence vector with the word vector of the input sequence.

In step S6, the multi-channel joint processing device obtains a predictive tag of the input sequence in the word segmentation and part-of-speech tagging system based on the vectors after the concatenation.

In step S7, the multi-channel joint processing device calculates and optimizes the objective function by using the obtained predictive labels of the individual word segments and the corresponding real labels, and further trains the model of the joint labels.

Preferably, the method further comprises step S8 (not shown).

In step S5, the multi-channel joint processing device uses the trained joint label model to analyze the input chinese sequence, so as to obtain a joint labeling result of word segmentation and part-of-speech labeling of the chinese sequence.

According to the method, the contribution of the word sequences with different lengths to the joint labels is modeled and weighted according to the contribution of the word sequence sets with different lengths to the joint labels in a plurality of channels, and the contribution difference of the word sequences with different lengths to the joint labels is considered, so that error prediction caused by the difference is avoided, model deviation caused by low occurrence frequency of the longer word sequences in a training data set is avoided, and the accuracy of a word segmentation and part-of-speech tagging system is improved.

FIG. 2 illustrates a schematic diagram of a segmentation and part-of-speech tagging system according to an embodiment of the present application.

Referring to fig. 2, the word segmentation and part-of-speech tagging system according to the present embodiment includes an input embedding layer, a context information encoding layer, a decoding output layer, and a multi-channel attention module, by which the method according to the present invention is performed. Wherein the multi-channel attention module is located between the context information encoding layer and the decoding output layer. The inputs of the multi-channel attention module are n-gram information (n-gram) of an input sequence obtained from a vocabulary and a word vector h containing context information from an input embedding layer _i The output of the multi-channel attention module is a weighted word sequence vector a containing weighted n-gram information _i 。

For the input sequence of "Zhang Sanjun student meeting length", the input sequence based on the first example isAnd n-gram vector->The algorithm flow of training the model of the joint tag in the word segmentation and part-of-speech tagging system according to the embodiment is as follows:

1. handleInput to the input embedding layer by a word embedding function E _x Each word x in the text _i Is converted into an input word vector +.>

2. All word vectors in the converted textInput context information encoding layer, output a word vector containing context information for each word +.>

3. In the multi-channel attention module, based on the input word vector h _i And all n-gram vectorsAnd based on the above formulas (1) to (3), a weighted word sequence vector a containing weighted n-gram information is obtained and outputted _i ；

4. The weighted word sequence vector a containing n-gram information _i Vector h of AND word _i Serially connected, the obtained vectors are input to a decoding output layer, and the predictive labels of the word segmentation are outputThe joint labels of each word in the Zhang Sanjun Can Long of the students output by the decoding output layer are 'B-NR', 'E-NR', 'B-VV', 'E-VV', 'B-NN', 'I-NN', 'E-NN', 'B-NN', 'E-NN'.

5. Comparing the predicted tag y' with the corresponding real result y, calculating an objective function, and updating network parameters of the word segmentation and part-of-speech tagging system by optimizing the objective function;

6. repeating the steps 1 to 5 to train the model of the joint label of the word segmentation and part-of-speech tagging system until the expected effect is achieved.

According to the word segmentation and part-of-speech tagging system, the contribution of word sequences with different lengths to the joint tag is modeled and weighted according to the word sequence sets with different lengths in a plurality of channels, and the difference of the contribution of the word sequences with different lengths to the joint tag is considered, so that error prediction caused by the difference can be avoided, model deviation caused by low occurrence frequency of a longer word sequence in a training data set can be avoided, and the accuracy of the word segmentation and part-of-speech tagging system is improved.

Fig. 3 shows a schematic structural diagram of a multi-channel joint processing device for a word segmentation and part-of-speech tagging system according to an embodiment of the present application. The multi-channel joint processing device comprises an acquisition module 1, a channel corresponding module 2, a multi-channel calculation module 3 and a weighted series module 4.

Referring to fig. 3, the acquisition module 1 acquires a word sequence included in an input sequence and length information corresponding to the word sequence.

Preferably, the acquiring module 1 acquires the word sequence included in the input sequence and the length information corresponding to the word sequence through a pre-stored word list including the length information.

The acquisition module 1 may also acquire the input sequence length information in other ways, such as analyzing and extracting n-gram information in real time by other tools based on the input sequence, etc.

The channel corresponding module 2 corresponds each word sequence to a plurality of channels according to the length information, so that the word sequence sets with the same length correspond to one channel.

The multi-channel calculation module 3 respectively models and performs weighted calculation on the contribution sizes of the word sequence sets with different lengths to the joint labels in each channel to obtain length weighted vectors corresponding to each channel and aiming at specific lengths.

The joint label is a joint label of Chinese word segmentation and part-of-speech tagging, and refers to a label formed by combining the word segmentation label and the part-of-speech in a form of word segmentation label-part-of-speech.

Specifically, for each channel, the multi-channel calculation module 3 calculates the weight of the set of word sequences corresponding to that channel relative to each word in the input sequence. Then, the multi-channel computing module 3 obtains a length weighting vector for a specific length corresponding to the channel by computing a weighted sum of the weights of the obtained word sequence set with respect to each word in the input sequence.

The weighting concatenation module 4 performs weighting concatenation on the length weighting vectors corresponding to the channels to obtain a weighted word sequence vector corresponding to the input sequence.

Preferably, the multi-channel joint processing apparatus further includes a vector concatenation module (not shown), a label prediction module (not shown), and a function calculation module (not shown).

The vector concatenation module concatenates the weighted word sequence vector with a word vector of an input sequence.

The label prediction module obtains a prediction label of the input sequence in the word segmentation and part-of-speech tagging system based on the vectors after the series connection.

The function calculation module calculates and optimizes the objective function through the obtained prediction labels of the individual segmentation words and the corresponding real labels, and further trains a model of the joint labels.

Preferably, the multi-channel joint processing device further comprises a labeling result module (not shown)

The labeling result module analyzes the input Chinese sequence by using a trained combined label model, so as to obtain a combined labeling result of word segmentation and part-of-speech labeling of the Chinese sequence.

According to the scheme of the embodiment of the application, the contribution of the word sequences with different lengths to the joint labels is modeled and weighted according to the contribution of the word sequence sets with different lengths to the joint labels in a plurality of channels, so that the contribution difference of the word sequences with different lengths to the joint labels is considered, erroneous prediction caused by the difference is avoided, model deviation caused by low occurrence frequency of the longer word sequences in a training data set is avoided, and the accuracy of a word segmentation and part-of-speech tagging system is improved.

The software program of the present invention may be executed by a processor to perform the steps or functions described above. Likewise, the software programs of the present invention (including associated data structures) may be stored on a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. In addition, some of the steps or functions of the present invention may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various functions or steps.

Furthermore, portions of the present invention may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present invention by way of operation of the computer. Program instructions for invoking the inventive methods may be stored in fixed or removable recording media and/or transmitted via a data stream in a broadcast or other signal bearing medium and/or stored within a working memory of a computer device operating according to the program instructions. An embodiment according to the invention comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to operate a method and/or a solution according to the embodiments of the invention as described above.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms first, second, etc. are used to denote a name, but not any particular order.

Claims

1. A multi-channel joint processing method for a word segmentation and part-of-speech tagging system, wherein the method comprises the steps of:

the method comprises the steps of carrying out weighted series connection on length weighted vectors corresponding to all channels to obtain weighted word sequence vectors corresponding to an input sequence, wherein the weighted word sequence vectors are used for reflecting contributions of different word sequence sets with different lengths contained in the input sequence to a joint label;

wherein the method comprises the steps of:

concatenating the weighted word sequence vector with a word vector of the input sequence;

based on the vectors after the series connection, obtaining a predictive tag of an input sequence in a word segmentation and part-of-speech tagging system;

and calculating and optimizing an objective function through the obtained predictive labels of the individual segmentation words and the corresponding real labels, and further training a model of the joint label.

2. The method of claim 1, wherein the step of obtaining the length weighting vector for the specific length corresponding to each channel by modeling and performing weighting calculation on the contribution sizes of the word sequence sets for the different lengths to the joint labels in each channel respectively includes:

for each channel, calculating the weight of the word sequence set corresponding to the channel relative to each word in the input sequence;

and obtaining a length weighting vector corresponding to the channel and aiming at a specific length by calculating a weighted sum according to the weight of the obtained word sequence set relative to each word in the input sequence.

3. A method according to claim 1 or 2, wherein the method comprises the steps of:

and analyzing the input Chinese sequence by using a trained combined label model, so as to obtain a combined labeling result of word segmentation and part-of-speech labeling of the Chinese sequence.

4. The method of claim 1, wherein the step of acquiring the word sequence and its corresponding length information contained in the input sequence comprises:

and acquiring a word sequence contained in the input sequence and length information corresponding to the word sequence through a prestored word list containing the length information.

5. A multi-channel joint processing device for a word segmentation and part-of-speech tagging system, wherein the multi-channel joint processing device comprises:

the weighting serial module is used for carrying out weighting serial connection on the length weighting vectors corresponding to the channels to obtain weighted word sequence vectors corresponding to the input sequences, wherein the weighted word sequence vectors are used for reflecting the contribution of different word sequence sets with different lengths contained in the input sequences to the joint labels;

wherein, the multi-channel joint processing device comprises:

the vector concatenation module is used for concatenating the weighted word sequence vector with the word vector of the input sequence;

the label prediction module is used for obtaining a prediction label of the input sequence in the word segmentation and part-of-speech tagging system based on the vectors after being connected in series;

and the function calculation module is used for calculating and optimizing an objective function through the obtained prediction labels of the segmented words and the corresponding real labels, so as to train a model of the joint label.

6. The multi-channel joint processing device of claim 5, wherein the multi-channel computing module is configured to:

7. The multi-channel joint processing apparatus according to claim 5 or 6, wherein the multi-channel joint processing apparatus comprises:

and the labeling result module is used for analyzing the input Chinese sequence by using the trained combined label model, so as to obtain a combined labeling result of word segmentation and part-of-speech labeling of the Chinese sequence.

8. The multi-channel joint processing device of claim 5, wherein the acquisition module is configured to:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 4 when the program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1 to 4.