CN114519104A

CN114519104A - Action label labeling method and device

Info

Publication number: CN114519104A
Application number: CN202210132170.XA
Authority: CN
Inventors: 刘乙赛; 罗涛; 施佳子; 于海燕
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2022-05-20

Abstract

The embodiment of the invention discloses an action label marking method and a device, which can be used in the financial field or other technical fields, and the method comprises the following steps: acquiring text data to be marked; inputting the text data to be labeled into a preset action label labeling model, and obtaining an action label labeling result corresponding to the text data to be labeled output by the action label labeling model, wherein the action label labeling model is obtained by training a preset machine learning model according to a preset training sample, the training sample comprises text data labeled with an action label, and the action label comprises: an action start tag and an action continuation tag. The invention improves the efficiency and the accuracy of marking the text of the digital person.

Description

Action label labeling method and device

Technical Field

The invention relates to the technical field of natural language processing, in particular to an action tag labeling method and device.

Background

A digital person (also called a virtual person) is an intelligent display mode in the prior art, and mainly has the functions of voice recognition, voice playing, action display and the like. At present, one of the main functions is to combine text with voice and special action sound. The reasonable action tag is set for the text, so that the expression of the digital person is more natural and vivid, the affinity of the digital person can be improved, and the user experience is enhanced. Currently, a background technician is usually used to mark a text of a digital person by manually marking an action tag in the text, so that the digital person shows a specific action in a specific place.

The mode of this kind of manual action label of beating exists at present and wastes time and energy, the cost of labor too high problem, and the prior art lacks a solution to this problem.

Disclosure of Invention

The present invention provides a method and an apparatus for labeling an action tag to solve at least one technical problem in the background art.

In order to achieve the above object, according to an aspect of the present invention, there is provided an action tag labeling method, the method including:

acquiring text data to be marked;

inputting the text data to be labeled into a preset action label labeling model, and obtaining an action label labeling result corresponding to the text data to be labeled output by the action label labeling model, wherein the action label labeling model is obtained by training a preset machine learning model according to a preset training sample, the training sample comprises text data labeled with an action label, and the action label comprises: an action start tag and an action continuation tag.

Optionally, the action tag labeling method further includes:

obtaining the training sample;

and training the machine learning model according to the training samples to obtain the action label labeling model.

Optionally, the machine learning model includes: a word vector conversion layer and a label labeling layer;

the word vector conversion layer is used for converting the text data into word vectors;

and the label labeling layer is used for labeling the action labels based on the word vectors to obtain action label labeling results.

Optionally, the machine learning model further includes: a word vector fusion layer;

the word vector fusion layer is used for carrying out feature fusion on the word vectors to obtain the word vectors after the feature fusion;

and the label labeling layer is specifically used for performing action label labeling on the basis of the word vectors after the feature fusion to obtain action label labeling results.

Optionally, the machine learning model further includes: a labeling result optimization layer;

and the marking result optimizing layer is used for optimizing the action marking result output by the marking layer to obtain the optimized action marking result.

Optionally, the word vector conversion layer adopts a Bert network or a word2vec neural network.

Optionally, the tag labeling layer adopts a gated cyclic unit network or a long-short term memory network.

Optionally, the optimization layer of the labeling result adopts a conditional random field network or a markov model.

In order to achieve the above object, according to another aspect of the present invention, there is provided an action tag labeling apparatus including:

the text data to be marked acquisition unit is used for acquiring the text data to be marked;

the action label labeling unit is used for inputting the text data to be labeled into a preset action label labeling model to obtain an action label labeling result output by the action label labeling model and corresponding to the text data to be labeled, wherein the action label labeling model is obtained by training a preset machine learning model according to a preset training sample, the training sample comprises text data for labeling an action label, and the action label comprises: an action start tag and an action continuation tag.

In order to achieve the above object, according to another aspect of the present invention, there is also provided a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the above action tag labeling method when executing the computer program.

In order to achieve the above object, according to another aspect of the present invention, there is also provided a computer readable storage medium having stored thereon a computer program/instructions which, when executed by a processor, implement the steps of the above-mentioned action tag labeling method.

To achieve the above object, according to another aspect of the present invention, there is also provided a computer program product comprising a computer program/instructions which, when executed by a processor, implement the steps of the above action tag labeling method.

The invention has the beneficial effects that:

according to the embodiment of the invention, the action label marking model is trained, and then the text data to be marked is automatically marked according to the action label marking model, so that the efficiency and the accuracy of marking the text of a digital person are effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts. In the drawings:

FIG. 1 is a first flowchart of an action tag labeling method according to an embodiment of the present invention;

FIG. 2 is a second flowchart of an action tag labeling method according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an action tag labeling model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a transform encoder according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a GRU structure;

FIG. 6 is a schematic diagram of annotation data according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating the evaluation results of the model according to the embodiment of the present invention;

FIG. 8 is a first block diagram of an embodiment of an action tag labeling apparatus;

FIG. 9 is a second block diagram of an action tag labeling apparatus according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a computer apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

It should be noted that, in the technical solution of the present application, the acquisition, storage, use, processing, etc. of data all conform to the relevant regulations of the national laws and regulations.

It should be noted that the method and apparatus for labeling an action tag of the present invention can be applied to the financial field and can also be applied to other technical fields.

Fig. 1 is a first flowchart of an action tag labeling method according to an embodiment of the present invention, and as shown in fig. 1, in an embodiment of the present invention, the action tag labeling method according to the present invention includes step S101 and step S102.

And S101, acquiring text data to be annotated.

In an embodiment of the present invention, the text data to be labeled is a text sequence, and the text sequence is a text word sequence.

Step S102, inputting the text data to be labeled into a preset action label labeling model, and obtaining an action label labeling result corresponding to the text data to be labeled and output by the action label labeling model, wherein the action label labeling model is obtained by training a preset machine learning model according to a preset training sample, the training sample comprises text data for labeling an action label, and the action label comprises: an action start tag and an action continuation tag.

In one embodiment of the present invention, the action tag of the present invention includes a plurality of action tags of preset actions, and the action tag of each preset action includes a corresponding action start tag and an action continuation tag.

In one embodiment of the present invention, the plurality of preset actions may include: 'bow' (bow), 'ok' (ok gesture), 'hello' (waving), 'up' (pointing up), and 'introl' (teaching).

In one embodiment of the invention, the action start tag may be marked with B and the action continuation tag may be marked with I. In an embodiment of the present invention, the action tag labeling model is specifically used for labeling an action tag in a BIO format, where B marks an action start tag, I marks an action continuation tag, and O marks a no-action tag, and the action tag labeling result may specifically be shown in the embodiment shown in fig. 6.

According to the invention, the action label marking model is trained, and the text data to be marked is automatically marked according to the action label marking model, so that the efficiency and the accuracy of marking the text of a digital person are effectively improved.

Fig. 2 is a second flowchart of the action tag labeling method according to the embodiment of the present invention, and as shown in fig. 2, in an embodiment of the present invention, the specific training process of the action tag labeling model of step S102 includes step S201 and step S202.

Step S201, obtaining the training sample;

step S202, training the machine learning model according to the training samples to obtain the action label labeling model.

In one embodiment of the invention, the machine learning model comprises: a word vector conversion layer and a label labeling layer.

The invention firstly converts the text data into word vectors, and then acts on the word vectors for labeling, and because the actions of the digital people are based on words, for example, a specific action is made on the word "thank you", the accuracy of labeling the action labels is effectively improved by acting on the word vectors for labeling the action labels.

In one embodiment of the invention, the word vector conversion layer adopts a Bert network or a word2vec neural network.

In one embodiment of the invention, the label labeling layer adopts a gated round robin unit (GRU) network or a Long Short Term Memory (LSTM) network.

In one embodiment of the present invention, the machine learning model further comprises: and a word vector fusion layer.

In one embodiment of the invention, if the features of two adjacent word vectors are close, feature fusion is carried out, so that the number of the word vectors is reduced, and the efficiency of model training and labeling is effectively improved.

In one embodiment of the present invention, the machine learning model further comprises: a labeling result optimization layer;

In an embodiment of the present invention, optimizing the action tag labeling result specifically includes: and checking whether the action label of each word vector marked with the action label only has an action starting label or only has an action continuing label, and if so, deleting the action label of the word vector.

In one embodiment of the invention, the annotation result optimization layer adopts a Conditional Random Field (CRF) network or a Markov model.

As shown in fig. 3, in an embodiment of the present invention, the word vector transformation layer employs a Bert network, the word vector fusion layer employs a Convolutional Neural Network (CNN), the tag labeling layer employs a gated round robin unit (GRU) network, and the labeling result optimization layer employs a Conditional Random Field (CRF) network. The invention uses the method of Bert + CNN + GRU + CRF, firstly uses Bert to generate word vector; then, fusing word vectors by adopting a convolutional neural network; and then, the GRU is used for labeling, and finally, the CRF module is used for optimizing the output result of the GRU module to obtain a labeling sequence, so that the whole process of the digital human text labeling is completed.

For Bert, the key part is the encoder part of the transform structure, which is a depth network based on the "self attention mechanism", and the structure diagram is shown in fig. 4.

The Transformer mainly obtains the representation of the words by adjusting the weight coefficient matrix according to the association degree between the words in the same sentence:

q, K, V is a word vector matrix, which represents the query matrix, the key matrix and the value matrix, respectively, for calculating the degree of association between words.

The Attention of MultiHead (MultiHead) is to map QKV through a plurality of different linear transformations to obtain a plurality of attentions, and finally, the plurality of attentions are spliced together and multiplied by a weight matrix W to obtain the MultiHead attentions.

MulitiHeadAttention＝Concat(Attention₁,...,Attention_n)×W

The fully-connected feedforward network in the Transformer structure comprises two steps: the activation function of the first layer is ReLU and the second layer is a linear activation function.

The Convolutional Neural Network (CNN) is a feedforward type Neural Network, and is different from a fully connected Neural Network in that only part of nodes in two adjacent layers of the Convolutional Neural Network are connected, and neurons in each layer are locally connected to extract and convert input hierarchical features. Convolution is further classified into one-dimensional convolution and two-dimensional convolution. The two-dimensional convolution shows excellent performance in the field of image classification, and the one-dimensional convolution is more suitable for texts.

First, Concat operation is applied to n adjacent vectors, and the adjacent word vectors are spliced into a feature, which is denoted as L.

L＝Concat[L1,...,Ln]

And then, extracting the spliced features by adopting a one-dimensional convolution network, wherein the extracted features are marked as F, and the method comprises the following steps of:

F＝f(∑_i∈MH_i*W_i+b)

where H is the word sequence, W is the weight matrix of the convolution translation operator, b is the bias, and f (-) is the activation function.

In the one-dimensional convolutional neural network, each convolution translation operator represents a system for extracting word vector characteristics, and the weight parameters of the convolution translation operators are continuously adjusted in an error back propagation mode in the training process, so that the best word vector related characteristics are finally learned.

The gated recurrent neural network GRU is well able to capture the non-linear relationship between sequences and is therefore commonly used for modeling text sequence data. However, due to the increase of the step size, the gradient disappearance phenomenon exists in the conventional RNN structure, which makes the network unable to effectively learn the time series data. To overcome this problem, a series of variants of RNNs have emerged, mainly Long Short-Term Memory models (LSTM) and Gated cycle cell models (GRU).

The GRU is a variation of the LSTM, the structure of which is shown in figure 5. The GRU only employs two gates: the updating gate is used for controlling the degree of the state information of the previous moment substituted into the current state, and the larger the value of the updating gate is, the more the state information of the previous moment is substituted; the reset gate is used to control the extent to which the state information at the previous time is ignored, with smaller values of the reset gate indicating more ignorance.

In FIG. 5, H^t-1For the state information of the last node, X^tIs the input of the current node. From these two pieces of information, two gating states in the GRU structure are obtained, respectively the reset control gate r and the gating z that controls the update.

r＝sigmoid(w^r*[X^t,H^t-1l)

z＝sigmoid(w^z*[X^t,H^t-1])

After obtaining the gate signal, reset gate and H are used first^t-1Multiplying to obtain reset data h, and multiplying the reset data with input X^tSplicing is performed, and then H' in fig. 5 is obtained through a tanh activation function.

H'＝tanh(W*[X^t,h])

And finally, entering a memory updating stage, obtaining updated state information by using the signals obtained in the formula, and updating the expression:

H^t＝z⊙H^t-1+(1-z)⊙H'

GRU simplifies the gate control structure of LSTM, and retains the prediction effect related to LSTM, so that the cyclic neural network of GRU structure is adopted to process the word vector characteristics generated by the convolution module, and label the action label.

The invention utilizes a Conditional Random Field (CRF) network to process the result output by the GRU module to obtain a final output sequence. For any sequence X (X)₁,x₂,...,x_n) Output matrix O via GRU module, where O_ijRepresenting the confidence that the ith word is the jth tag. For the prediction sequence Y, its fractional function can be written as follows, where a represents the branch score matrix.

Obtaining a maximum fraction output sequence:

Y^best＝argmax(s(X,o_i))

o_i∈O

in one embodiment of the invention, in the training process, an Adam optimizer is adopted, and the learning rate is selected to be 0.001; the size of a one-dimensional convolution kernel is (5, 1), the convolution step length is 1, and a zeroPadding mode is adopted for filling, so that the dimensionality of the feature vector before and after convolution is unchanged; also set the dim of GRU to 100 and the batch _ size to 128.

After the model is trained, the real dialect text which is not added to the training set is used as verification data, and the performance of the model is tested.

Aiming at the particularity of the digital human meaning classification problem, the evaluation index is correspondingly improved as follows. Wherein 'label _ pre' is whether the corresponding label category of the sentence is correct or not (without considering the position information). Considering that a digital person only acts when encountering the position of the first initial character label "B", and is not influenced by subsequent labels, only the correctness of label classification and whether the first position of the label is correct are considered, and therefore, a new evaluation index 'L & F _ pre' is proposed to evaluate whether the appearance position of the first label and the label are correct. The evaluation result shows that the scheme provided by the invention can help a digital person to accurately put on the action label. The results are shown in FIG. 7.

The embodiment shows that the semantic information in the text is learned by utilizing the algorithm according to the actual situation of the digital person, so that a reasonable action label is generated for the text of the digital person, the expression of the digital person is more natural and vivid, the affinity of the digital person can be improved, and the user experience is enhanced.

The invention at least achieves the following beneficial effects:

1. the method applies the language pre-training model Bert to the text entity recognition of the digital person, so that the workload of downstream tasks can be reduced, and better results can be obtained;

2. the invention innovatively fuses the Bert module, the CNN module, the GRU module and the CRF module together to form a novel entity recognition model, and by using the CNN + GRU module, not only can long-distance text information be processed, but also a better effect can be obtained when the dependency relationship between adjacent labels is processed;

3. the invention organically combines the output result of the model with the digital human system, and the digital human can make corresponding actions according to the marked labels when broadcasting the text of the dialogues, and the labels can be correctly marked on similar contexts through experimental verification, thereby enriching the image of the digital human and improving the affinity of the digital human.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

Based on the same inventive concept, an embodiment of the present invention further provides an action tag labeling apparatus, which can be used to implement the action tag labeling method described in the foregoing embodiment, as described in the following embodiment. Because the principle of the action tag labeling device for solving the problem is similar to that of the action tag labeling method, embodiments of the action tag labeling device can refer to embodiments of the action tag labeling method, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 8 is a first block diagram of an action tag labeling apparatus according to an embodiment of the present invention, and as shown in fig. 8, in an embodiment of the present invention, the action tag labeling apparatus of the present invention includes:

the device comprises a to-be-labeled text data acquisition unit 1 for acquiring the to-be-labeled text data;

the action label labeling unit 2 is configured to input the text data to be labeled into a preset action label labeling model, and obtain an action label labeling result output by the action label labeling model, where the action label labeling model is obtained by training a preset machine learning model according to a preset training sample, the training sample includes text data labeled with an action label, and the action label includes: an action start tag and an action continuation tag.

Fig. 9 is a second configuration block diagram of the action tag labeling apparatus according to the embodiment of the present invention, and as shown in fig. 9, in an embodiment of the present invention, the action tag labeling apparatus further includes:

a training sample obtaining unit 3, configured to obtain the training sample;

and the model training unit 4 is used for training the machine learning model according to the training samples to obtain the action label labeling model.

To achieve the above object, according to another aspect of the present application, there is also provided a computer apparatus. As shown in fig. 10, the computer device comprises a memory, a processor, a communication interface and a communication bus, wherein a computer program that can be run on the processor is stored in the memory, and the steps of the method of the embodiment are realized when the processor executes the computer program.

The processor may be a Central Processing Unit (CPU). The Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or a combination thereof.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and units, such as the corresponding program units in the above-described method embodiments of the present invention. The processor executes various functional applications of the processor and the processing of the work data by executing the non-transitory software programs, instructions and modules stored in the memory, that is, the method in the above method embodiment is realized.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more units are stored in the memory and when executed by the processor perform the method of the above embodiments.

The specific details of the computer device may be understood by referring to the corresponding related descriptions and effects in the above embodiments, and are not described herein again.

In order to achieve the above object, according to another aspect of the present application, there is also provided a computer-readable storage medium storing a computer program which, when executed in a computer processor, implements the steps in the above-mentioned action label labeling method. It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk Drive (Hard Disk Drive, abbreviated as HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

To achieve the above object, according to another aspect of the present application, there is also provided a computer program product comprising a computer program/instructions which, when executed by a processor, implement the steps of the above-mentioned action tag labeling method.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An action label labeling method is characterized by comprising the following steps:

acquiring text data to be marked;

2. The action tag labeling method according to claim 1, further comprising:

obtaining the training sample;

3. The method of claim 1 or 2, wherein the machine learning model comprises: a word vector conversion layer and a label labeling layer;

4. The method of claim 3, wherein the machine learning model further comprises: a word vector fusion layer;

5. The method of claim 3, wherein the machine learning model further comprises: a labeling result optimization layer;

6. The action tag labeling method of claim 3, wherein the word vector conversion layer employs a Bert network or a word2vec neural network.

7. The method according to claim 3, wherein the label layer is a gated cyclic unit network or a long-short term memory network.

8. The method of claim 5, wherein the optimization layer of labeling results employs a conditional random field network or a Markov model.

9. An action label labeling apparatus, comprising:

action label mark unit for with treat mark text data input to predetermined action label mark model in, obtain action label mark model output treat the action label mark result that mark text data corresponds, wherein, action label mark model trains the machine learning model of predetermineeing according to predetermined training sample and obtains, the training sample is including the text data who marks out the action label, the action label includes: an action start tag and an action continuation tag.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the computer program is executed by the processor.

11. A computer-readable storage medium, on which a computer program/instructions are stored, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method of any one of claims 1 to 8.

12. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method of any one of claims 1 to 8.