CN110807517A

CN110807517A - Neural network system for multi-task recognition

Info

Publication number: CN110807517A
Application number: CN201911052059.4A
Authority: CN
Inventors: 辛秉哲; 李大海
Original assignee: Wise Sihai Beijing Technology Co Ltd
Current assignee: Wise Sihai Beijing Technology Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-02-18

Abstract

The present disclosure relates to a neural network system for multitask identification, the neural network system being implemented by a computer, comprising: the shared layer comprises a plurality of transform units which are connected in sequence; and a plurality of task output layers, each connected to the sharing layer, for outputting the recognition result for the task. The system of the present disclosure has at least one of the following beneficial technical effects: the method comprises the steps that a plurality of transform units are connected in sequence to form a sharing layer, a large number of parameters are contained in the sharing layer, so that richer text features can be obtained, meanwhile, a plurality of task output layers are arranged, a neural network system has a multi-task learning framework, training data of different tasks are utilized to generalize a model, and the neural network system has excellent text understanding capacity.

Description

Neural network system for multi-task recognition

Technical Field

The present disclosure relates to the field of network information processing, and in particular, to a neural network system for multi-task recognition.

Background

The main content of the question and answer community in the network is questions and answers, wherein the quality of the answer content directly affects the core competitiveness of the website, so that the identification of the quality of the answer content, particularly the quality content, is very important. The traditional method converts text into word vectors, inputs the word vectors into a CNN or RNN network to learn a classifier, however, the above method has the following technical problems: the number of network layers is small relative to the number of image layers, so that the characteristics capable of being learned are fewer; a better pre-training layer is not provided, and the data volume for pre-training is small; for long texts, CNN and RNN networks do not capture the association of preceding and following text well.

Disclosure of Invention

A brief summary of the disclosure is provided below in order to provide a basic understanding of some aspects of the disclosure. It should be understood that this summary is not an exhaustive overview of the disclosure. It is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

According to a first aspect of the present disclosure, there is provided a neural network system for multitask recognition, the neural network system being implemented by a computer, comprising:

the shared layer comprises a plurality of transform units which are connected in sequence; and

and each task output layer is connected to the sharing layer and used for outputting the identification result aiming at the task.

In one embodiment, the sharing layer is to receive text input and output text feature representations.

In one embodiment, the shared layer further comprises a fully connected layer.

According to a second aspect of the present disclosure, a training method for a neural network system for multi-task recognition is provided, wherein the neural network system is implemented by a computer and includes: the shared layer comprises a plurality of transform units which are connected in sequence; and

a plurality of task output layers, each connected to the sharing layer, for outputting a recognition result for the task;

the training method comprises the step of training the neural network system by using training data of a plurality of tasks, so that the sharing layer is trained in the training process of the plurality of tasks respectively.

In one embodiment, a training method includes training a neural network system using training data of a plurality of tasks alternately at random.

In one embodiment, the neural network system has a loss function for each task separately.

In one embodiment, the neural network system is trained using a gradient descent method.

According to a third aspect of the present disclosure, there is provided a multitask identification method based on a neural network system, wherein the neural network system is implemented by a computer and includes:

the multi-task identification method comprises the steps of identifying data to be identified based on a neural network system and outputting identification results of a plurality of tasks.

In one embodiment, the tasks include a premium content identification task, a build-up-oblivious content identification task, a violation content identification task, a medical recommendation content identification task, or a regression prediction task of praise and collection of content.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as in the second or third aspect.

The technical scheme of the disclosure has at least one of the following technical effects: the method comprises the steps that a plurality of transform units are connected in sequence to form a sharing layer, a large number of parameters are contained in the sharing layer, so that richer text features can be obtained, meanwhile, a plurality of task output layers are arranged, a neural network system has a multi-task learning framework, training data of different tasks are utilized to generalize a model, and the neural network system has excellent text understanding capacity.

Drawings

The disclosure may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are incorporated in and form a part of this specification, along with the following detailed description. In the drawings:

FIG. 1 is a schematic block diagram of a neural network system for multi-task recognition in accordance with an embodiment of the present disclosure;

FIG. 2 shows a schematic block diagram of a transform unit in accordance with an embodiment of the present disclosure;

fig. 3 shows a schematic structural diagram of an electronic device implementing an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual embodiment are described in the specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions may be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which will vary from one implementation to another.

Here, it should be further noted that, in order to avoid obscuring the present disclosure with unnecessary details, only the device structure closely related to the scheme according to the present disclosure is shown in the drawings, and other details not so related to the present disclosure are omitted.

It is to be understood that the disclosure is not limited to the described embodiments, as described below with reference to the drawings. In this context, embodiments may be combined with each other, features may be replaced or borrowed between different embodiments, one or more features may be omitted in one embodiment, where feasible.

In natural language processing, a pre-training layer is very important, generally, the amount of pre-training data is larger, the generalization capability is stronger, but in the chinese domain, because the linguistic data of chinese is rich and varied, and the words or phrases of different linguistic data are different, which may cause deviation of downstream tasks in a specific domain, a pre-training layer comprising a plurality of (e.g., 12) conversion units (transformers) may be trained by using massive data of a question and answer website, each transformer unit comprises a Multi-head attention unit (Multi-head annotation), the Multi-head annotation unit may correspond to an integration of 16 self-annotation units, comprising 1024 hidden layer units, and the pre-training layer may be used as a plurality of downstream tasks after training, such as text classification, named entity recognition, and the like.

Such a pre-training layer will be used in this disclosure as a shared layer for multitasking, i.e., a layer in which parameters remain consistent during training of multiple tasks.

For example, in a question and answer scenario, the data volume of some training sets of tasks is very small, the data often needs to be manually checked and added with labels, and the data volume cannot meet the use requirement. In view of the fact that some tasks are closer in the existing training set, the present disclosure provides a framework for multi-task learning, which allows the sharing layer to learn more knowledge when the downstream tasks are fine-tuned. Because the pre-training layer of the present disclosure has many parameters and many learned knowledge, the pre-training layer is selected as the sharing layer. Through verification and analysis, the accuracy and recall rate of the high-quality content identification task under the multi-task learning framework are greatly improved.

Fig. 1 shows a schematic block diagram of a neural network system 100 for multi-task recognition in accordance with an embodiment of the present disclosure. The neural network system 100 is implemented by a computer, including: a shared layer 110 including a plurality of transform units (transformers) connected in sequence; and a plurality of task output layers 120, each task output layer 120 being connected to the sharing layer 110 for outputting a recognition result for the task.

The sharing layer 110 in the above-described embodiments of the present disclosure is used to receive text input and output text feature representations. Describing the structure of the shared layer 110 in detail, fig. 2 shows a schematic block diagram of a transform unit according to an embodiment of the present disclosure, where the shared layer 110 may include a plurality of transform units connected in sequence, each of the transform units may include a Multi-head attention unit (Multi-head entry) capable of acquiring a relationship between front and rear words of text input into the shared layer 110 and a Feed forward unit (Feed forward) for fully connecting output results of the Multi-head entry units. Further, the sharing layer 110 further includes a transmit (full connection layer) connecting the last transform unit of the plurality of transform units, for fully connecting the output result of the last transform unit. Specifically, the task in the above embodiment may be a high-quality content identification task, a closure-lost content identification task, an illegal content identification task, a medical recommendation content identification task, or a regression prediction task of the praise number and collection number of the content. The sharing layer 110 further includes a plurality of task output layers 120, and each task output layer 120 corresponds to one task for outputting a recognition result for the task. The above-described embodiments of the present disclosure have a multitask learning framework that enables the shared layer to learn more knowledge during the training process.

Preferably, 12 transform units can be arranged, and after analysis and verification, 12 transform units are arranged, so that waste of computing resources can be avoided while sufficient abundant text features are ensured to be obtained.

This public above-mentioned neural network system adopts a plurality of transform units to connect gradually and forms the sharing layer, contains a large amount of parameters in this sharing layer and makes it can acquire richer text characteristic, simultaneously, sets up a plurality of task output layers for neural network system has the multitask learning frame, utilizes the training data of different tasks to generalize the model, makes neural network system have splendid text understanding ability.

According to another embodiment of the present disclosure, a training method for a neural network system for multitask recognition, wherein the neural network system is implemented by a computer, and comprises a sharing layer including a plurality of transform units connected in sequence; and a plurality of task output layers, each task output layer being connected to the sharing layer for outputting a recognition result for the task; the training method comprises the step of training the neural network system by using training data of a plurality of tasks, so that the sharing layer is trained in the training process of the plurality of tasks respectively.

The training method in the above embodiment, for different tasks, trains the neural network system by using different training data, where the tasks may be, for example, a high-quality content recognition task, a closure-based obfuscation content recognition task, an illegal content recognition task, a medical advice content recognition task, or a regression prediction task of the praise number and collection number of content, the training data may be, for example, data corresponding to the task in website data, including answers, comments, articles, and the like, and for the high-quality content recognition task, the training data may be a high-quality answer in a certain question and answer website. In the embodiment, the neural network system is trained by using the training data of a plurality of tasks alternately at random, when each task is trained, the neural network system adopts the training result reserved in the previous training and trains on the basis of the neural network system obtained in the previous training, and the shared layer obtained by the final training is obtained by learning all the tasks, so that the generalization capability of the training model is stronger, and the training model has excellent text understanding capability; in the recognition process aiming at multiple tasks, for long texts, the contextual information can be accurately captured, so that the model has higher recognition accuracy.

On the basis of the above embodiment, the neural network system has a loss function for each task, and the loss function may be different or the same for different tasks, for example, the loss function may be cross entropy for a high-quality content identification task and a low-quality content identification task, and the loss function may be mean square error for a regression prediction task of the praise number and the collection number of the content. In this embodiment, a process of training the neural network system, that is, a process of solving a parameter in the neural network system when a loss function value corresponding to the task is minimum.

A training method for a neural network system for Multi-task recognition, each task being processed as a text classification task, the inputs of the different tasks first entering a sharing layer, the sharing layer comprising a pre-training layer consisting of 12 layers of fransformer units, each fransformer unit may comprise a Multi-head entry unit and a Feed forward unit, wherein the Multi-head entry unit corresponds to an integration of a plurality of self-entry units, preferably the Multi-head entry unit corresponds to an integration of 16 self-entry units.

Specifically, the calculation process of the sharing layer may include, for example:

the calculation process for each self-attitude unit may be: converting each word of the input text into an embedded vector; obtaining three vectors of q, k and v according to the embedded vector, wherein each word corresponds to 3 different vectors which are respectively a query vector (q), a Key vector (k) and a Value vector (v), the length of each vector can be 64, and the vectors are respectively obtained by multiplying the embedded vector by three different weight matrixes; calculating a score for each embedding vector score q k; normalizing each score to obtain

Namely, it is

Wherein d is_kAn empirical value, for example, may be 64; to pair

Performing softmax calculation to obtain an activation result; multiplying the activating result point by the vector v to obtain a score S corresponding to the embedded vector; and accumulating the scores S corresponding to each embedded vector to obtain a final output result Z of the input text.

Respectively inputting the input text into each self-orientation unit in each Multi-head orientation unit for calculation to obtain Z_iWherein, i is 1,2 … n, n is the number of self-attention units; all Z are_iAnd splicing to form a feature matrix.

And the Feed forward unit fully connects and outputs the output of each self-orientation unit in the Multi-head orientation unit, so that the output result of each transformer unit can be obtained.

Specifically, a gradient descent method may be used to train the neural network system, that is, a parameter value when the loss function corresponding to the task is the minimum is found by using the gradient descent method, so as to obtain the trained neural network system.

According to the training method for the neural network system for multi-task recognition, the shared layer obtained through training comprises a large number of parameters, so that the shared layer can obtain richer text features, meanwhile, the plurality of task output layers are arranged, the neural network system is provided with a multi-task learning framework, the model is generalized by using data of different tasks, and the neural network system obtained through training has excellent text understanding capacity.

Another embodiment of the present disclosure provides a multitask identification method based on a neural network system, the neural network system being implemented by a computer and including: the shared layer comprises a plurality of transform units which are connected in sequence; and a plurality of task output layers, each task output layer being connected to the sharing layer for outputting a recognition result for the task; the multi-task identification method comprises the steps of identifying data to be identified based on a neural network system and outputting identification results of a plurality of tasks. In this embodiment, the data to be identified may be, for example, data in a certain question and answer website, and may include answers, comments, articles, and the like, where the task may be a high-quality content identification task, a closed-up spam content identification task, an illegal content identification task, a medical advice content identification task, or a regression prediction task of the praise number and the collection number of the content, and the corresponding identification result may be a regression prediction result of the high-quality content, the closed-up spam content, the illegal content, the praise number and the collection number of the medical advice content and the content.

According to the multi-task recognition method, the shared layer obtained by training is adopted for recognizing each task, more text features can be obtained, information in front of and behind the text can be accurately understood, better text understanding capability is achieved, and recognition accuracy can be remarkably improved.

Fig. 3 shows a schematic structural diagram of an electronic device 300 implementing an embodiment of the disclosure. As shown in fig. 3, the electronic apparatus 300 includes a Central Processing Unit (CPU)301 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage section 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the electronic apparatus are also stored. The CPU 301, ROM 302, and RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

The following components are connected to the I/O interface 305: an input portion 306 including a keyboard, a mouse, and the like; an output section 307 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 308 including a hard disk and the like; and a communication section 309 including a network interface card such as a LAN card, a modem, or the like. The communication section 309 performs communication processing via a network such as the internet. A drive 310 is also connected to the I/O interface 305 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 310 as necessary, so that a computer program read out therefrom is mounted into the storage section 308 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer-readable medium bearing instructions that, in such embodiments, may be downloaded and installed from a network via the communication section 309, and/or installed from the removable media 311. The instructions, when executed by the Central Processing Unit (CPU)301, perform the various method steps described in the present invention.

Although example embodiments have been described, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the inventive concept. Accordingly, it should be understood that the above-described exemplary embodiments are not limiting, but illustrative.

Claims

1. A neural network system for multi-task recognition, the neural network system being implemented by a computer, comprising:

2. The neural network system of claim 1, wherein the sharing layer is to receive text input and output text feature representations.

3. The neural network system of claim 1, wherein the shared layer further comprises a fully connected layer.

4. A training method for a neural network system for multitask recognition, wherein the neural network system is implemented by a computer and comprises:

a plurality of task output layers, each task output layer connected to the sharing layer for outputting an identification result for the task;

5. The training method of claim 4, comprising training the neural network system using training data of a plurality of tasks randomly alternated.

6. The training method of claim 4, wherein the neural network system has a loss function for each task separately.

7. The training method of claim 4, wherein the neural network system is trained using a gradient descent method.

8. A multitask identification method based on a neural network system, wherein the neural network system is implemented by a computer and comprises:

the multi-task identification method comprises the steps of identifying data to be identified based on the neural network system and outputting identification results of a plurality of tasks.

9. The multitask identification method according to claim 8, wherein the task includes a high-quality content identification task, a closure-lost content identification task, a violation content identification task, a medical-suggested content identification task, or a regression prediction task of the praise number and the collection number of contents.

10. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 4-9.