CN112131366B

CN112131366B - Method, device and storage medium for training text classification model and text classification

Info

Publication number: CN112131366B
Application number: CN202011009658.0A
Authority: CN
Inventors: 管冲; 卢睿轩; 谢德峰; 李承恩; 姜萌; 文瑞; 陈曦
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2024-02-09
Anticipated expiration: 2040-09-23
Also published as: CN112131366A

Abstract

The application provides a method, a device and a storage medium for training a text classification model and text classification, and relates to an artificial intelligence cloud technology for improving the accuracy of text classification. Inputting first sample data into a language model coding layer through an input layer to obtain a first feature vector of the first sample data, wherein the first sample data comprises at least one group of question-answer pairs and text information for determining answers of questions in the question-answer pairs; performing keyword highlighting operation on the feature vector used for representing the text information in the first feature vector through keyword highlighting operation introduced in the embedding layer, and performing keyword highlighting on the feature vector used for representing the question-answer pair in the first feature vector to obtain a second feature vector of the text information; inputting the second feature vector and the feature vector for representing the question-answer pair into the full-connection layer, and determining answer probability corresponding to the question in the question-answer pair; and reversely adjusting model parameters of the language model coding layer according to the answer probability output by the full-connection layer and the answer in the first sample data.

Description

Method, device and storage medium for training text classification model and text classification

Technical Field

The application relates to the field of natural language processing, and provides a method, a device and a storage medium for training a text classification model and text classification.

Background

With the development of science and technology and internet technology, the data volume is continuously increased, and data with use value can be efficiently obtained from the data by adopting a text classification method. Currently, text classification methods are mainly determined by adopting machine learning or deep learning technology in artificial intelligence technology.

The text classification method based on machine learning mainly divides the text classification problem into two parts of characteristic engineering and classifier. Wherein the feature engineering comprises text preprocessing, feature extraction, text representation and the like. In the process, firstly, a text is cleaned, the text is segmented by a segmentation tool, then the text is expressed into a vector form by using a word bag method, a TF-IDF (term frequency inverse text frequency index) method and the like, and the vector is input into a classifier such as an SVM (Support vector machines, support vector machine), a decision tree and the like to obtain a final classification result. However, in machine learning, the feature expression capability is weak, and feature processing is required to be performed manually, so that the accuracy of text classification is low.

The text classification method based on deep learning firstly cleans and divides words, then converts the text into dense distributed word vectors based on a neural network such as word2vec, and trains data through the neural network such as CNN (Convolutional Neural Networks, convolutional neural network) or LSTM (Long Short-Term Memory network) to obtain optimal results. However, in deep learning, the problems of difficult model training, unsuitable model structure and the like exist, so that the text classification accuracy is low.

In summary, the text classification method in the prior art has lower accuracy in processing the text classification problem.

Disclosure of Invention

The embodiment of the application provides a text classification model training method, a text classification model training device, a text classification method, a text classification storage medium, a question-answer based text classification model training method and a text classification accuracy improvement method.

In a first aspect, embodiments of the present application provide a method of training a text classification model, the method comprising:

acquiring a first training sample set, wherein each first sample data in the first training sample set comprises at least one group of question-answer pairs and text information for determining answers to questions in the question-answer pairs;

Performing a plurality of first iterative training on the text classification model according to the first sample data to obtain a trained text classification model;

the text classification model comprises an input layer, a language model coding layer, an embedded layer (Embedding) and a full-connection layer, and each round of first iterative training process comprises the following steps:

inputting the first sample data into a language model coding layer through an input layer to obtain a first feature vector of the first sample data;

inputting the first feature vector into an embedding layer, and performing keyword salient on the feature vector used for representing the text information in the first feature vector according to the feature vector used for representing the question-answer pair in the first feature vector by keyword salient operation introduced in the embedding layer so as to obtain a second feature vector of the text information;

inputting the second feature vector and the feature vector for representing the question-answer pair into the full-connection layer, and determining answer probability corresponding to the question in the question-answer pair;

and reversely adjusting model parameters of the language model coding layer according to the answer probability output by the full-connection layer and the answer in the first sample data.

In a second aspect, the present application provides a method of text classification, the method comprising:

acquiring a text classification request containing text data, wherein the text data contains a target problem and target text information for judging whether the target problem is correct or not;

Inputting the text data into a trained text classification model, and determining whether a target problem is a correct text classification result based on the trained text classification model; wherein the trained text classification model is trained by the method of the first aspect.

In a third aspect, an embodiment of the present application provides an apparatus for training a text classification model, the apparatus comprising:

a first obtaining unit, configured to obtain a first training sample set, where each first sample data in the first training sample set includes at least one set of question-answer pairs and text information for determining answers to questions in the question-answer pairs;

the training unit is used for performing multiple rounds of first iterative training on the text classification model according to the first sample data so as to obtain a trained text classification model;

the text classification model comprises an input layer, a language model coding layer, an embedding layer and a full connection layer, and the training unit is specifically used for:

In one possible implementation, the training unit is specifically configured to:

identifying parts of speech of keywords in the question-answer pair of the first sample data, and setting a part of speech tag set according to the parts of speech;

for each word in the text information of the first sample data, determining the part-of-speech tag, and adding a target vector to the feature vector of each word according to the judging result of whether the part-of-speech tag is in the part-of-speech tag set, so as to obtain a second feature vector.

In one possible implementation, at least one fully-connected layer is further set after the embedded layer, each fully-connected layer corresponds to a task, and a loss function is set for each task;

the first acquisition unit is further used for acquiring a second training sample set, the second training sample set comprises second sample data subjected to data enhancement processing, and each second sample data comprises at least one group of question-answer pairs and text information for determining answers of questions in the question-answer pairs;

The training unit is further used for executing multiple rounds of second iterative training on the basis of the trained text classification model according to the second sample data so as to obtain a retrained text classification model;

the training unit is specifically used for:

inputting the second sample data into a language model coding layer of the trained text classification model through an input layer to obtain feature vectors of the second sample data;

generating a feature vector with fixed dimension by passing the feature vector of the second sample data through an embedding layer of the trained text classification model;

the feature vectors output by the embedded layer are respectively input to the full-connection layer corresponding to each task, and a loss function corresponding to each task is determined;

and reversely adjusting model parameters of a language model coding layer of the trained text classification model according to the loss function of each task.

weighting the loss function corresponding to each task according to a preset task weight proportion to obtain a target loss function;

and reversely adjusting model parameters of a language model coding layer of the trained text classification model according to the target loss function.

In one possible implementation, the data enhancement process includes one or a combination of the following:

Randomly replacing words in the text data with words in a synonym table according to a set first proportion;

randomly selecting words in the text according to a set second proportion and randomly inserting the words in any position in the text;

randomly deleting words in the text information according to a set third proportion;

two words in the text information are randomly selected according to the set fourth proportion and position inversion is carried out.

In a possible implementation, the training unit is further configured to:

obtaining a first loss function according to the feature vector output by the language model coding layer;

determining a first gradient value according to the first loss function;

calculating a disturbance vector according to the gradient value and the first gradient value of the embedded matrix, and adding the disturbance vector to the feature vector subjected to the embedded layer dimension reduction processing to obtain an countermeasure vector;

determining a second loss function based on the challenge vector;

reversely obtaining an antagonism gradient value according to the second loss function, and accumulating the antagonism gradient value to the first gradient value to obtain a target gradient;

and adjusting model parameters of the language model coding layer according to the target gradient.

In one possible implementation, the language model coding layer is one of BERT (Bidirectional Encoder Representation from Transformers, bi-directional encoder representation of transformers), roberta (A Robustly Optimized BERT Pretraining Approach, brute force optimized BERT), XLNet.

In a fourth aspect, an embodiment of the present application provides an apparatus for text classification, including:

a second obtaining unit configured to obtain a text classification request including text data, wherein the text data includes a target question and target text information for judging whether the target question is correct;

a determining unit for inputting text data into the trained text classification model, and determining whether the target problem is correct or not based on the trained text classification model; wherein the trained text classification model is trained by the method of the first aspect.

In a fifth aspect, embodiments of the present application provide a computing device comprising at least one processor and at least one memory, wherein the memory has program code stored therein, the processor being configured to read the program code stored in the memory and perform the method of training a text classification model as in the first aspect and the method of classifying text in the second aspect.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium storing computer instructions that, when executed by a processor, implement a method of training a text classification model in the first aspect and a method of classifying text in the second aspect provided by embodiments of the present application.

The beneficial effects of the application are as follows:

the application provides a method, a device and a storage medium for training a text classification model and text classification, relates to an artificial intelligent cloud technology, and particularly relates to a natural language processing technology. In the method, a first training sample set is obtained, each first sample data in the first training sample set contains at least one group of question-answer pairs and text information for determining answers to questions in the question-answer pairs, and multiple rounds of first iterative training is performed on a text classification model according to the first sample data so as to obtain a trained text classification model. In the first iterative training process of each round, inputting first sample data into a language model coding layer through an input layer to obtain a first feature vector of the first sample data; then, through keyword highlighting operation introduced in the embedding layer, performing keyword highlighting on the feature vector used for representing the text information in the first feature vector according to the feature vector used for representing the question-answer pair in the first feature vector so as to obtain a second feature vector of the text information; inputting the second feature vector and the feature vector for representing the question-answer pair into the full-connection layer, and determining answer probability corresponding to the question in the question-answer pair; and reversely adjusting model parameters of the language model coding layer according to the answer probability output by the full-connection layer and the answer in the first sample data. Because the keyword highlighting operation is adopted in the process of training the text classification model, the word in the text information is highlighted according to the word in the question-answer pair, the association degree between the question-answer pair and the text information is improved, the answer of the question in the question-answer pair is determined to be more accurate based on the text information, and the trained text classification model can improve the association degree between the text information and the question-answer pair when model training is further carried out according to the determined answer and the answer in sample data, so that the accuracy of the question-answer based text classification is further ensured.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

FIG. 2 is a flowchart of a method for text classification according to an embodiment of the present application;

FIG. 3 is a block diagram of a text classification model according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for iteratively training a text classification model according to a first embodiment of the present application;

fig. 5 is a schematic diagram of a first sample data splicing process according to an embodiment of the present application;

FIG. 6 is a block diagram of another text classification model provided in an embodiment of the present application;

FIG. 7 is a block diagram of another text classification model provided in an embodiment of the present application;

FIG. 8 is a flowchart of another training method for text classification models according to an embodiment of the present application;

FIG. 9 is a flowchart of a method for iteratively training a text classification model according to a second embodiment of the present application;

FIG. 10 is a flowchart of a method for text classification according to an embodiment of the present application;

FIG. 11 is an overall method flowchart for text classification provided in an embodiment of the present application;

FIG. 12 is a block diagram of a device for training a text classification model according to an embodiment of the present application;

fig. 13 is a block diagram of an apparatus for text classification according to an embodiment of the present application;

fig. 14 is a schematic view of a computing device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantageous effects of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments, but not all embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein.

Some of the terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

1. Artificial intelligence (Artificial Intelligence, AI):

artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

2. Natural language processing (Nature Language processing, NLP):

natural language processing is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

3. Machine Learning (ML):

machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

4. Migration learning:

the transfer learning is a method for retraining on an original task by using a model trained on a similar task as a model initial point, and the learning efficiency of the model can be accelerated and the generalization of the model can be improved by sharing the knowledge learned by the model.

5. Multitasking learning:

the multi-task learning is a method for retraining on an original task by using a model trained on a similar task as a model initial point, and the learning efficiency of the model can be accelerated and the generalization of the model can be improved by sharing the knowledge learned by the model and transferring learning.

6. Data enhancement:

data enhancement involves a series of techniques for generating new training samples by employing random dithering and scrambling on the original data without label-like changes. The goal of applying data enhancement is to increase the generalization of the model.

7. Challenge training:

challenge training is an important representation that enhances the robustness of the model. During the course of the countermeasure training, some small disturbance is added to the sample, which is model error, so that the model can adapt to the disturbance during the training process, and the robustness of the model is enhanced.

8. Rapid gradient descent method (Fast Gradient Method, FGM):

the rapid gradient descent method obtains a new challenge sample by adding a perturbation to the direction of gradient ascent.

9. Bidirectional encoder representation of transformer (Bidirectional Encoder Representation from Transformers, BERT):

the bi-directional encoder representation of the Transformer is a pre-trained language model based on a transform that is multitasking trained on a large-scale corpus with a masking language model (Mask Language Model, MLM) and next sentence prediction (Next Sentence Prediction, NSP).

10. Cloud technology:

cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

The following briefly describes the design concept of the embodiment of the present application.

With the continuous development of network technology, artificial intelligence technology has been applied to various fields such as text classification technology.

Text classification in the related art is mainly divided into two types, one is based on a machine learning method in artificial intelligence, and the other is based on a deep learning method in artificial intelligence.

Based on a machine learning method: the machine learning classification method divides the whole text classification problem into two parts of characteristic engineering and classifier. The feature engineering comprises text preprocessing, feature extraction, text representation and the like. Firstly, cleaning a text, segmenting the text by using a word segmentation tool, representing the text into a vector form by using a word bag method, a TF-IDF method and the like, and inputting the vector form into a classifier such as an SVM (support vector machine), a decision tree and the like to obtain a final result.

Based on a deep learning method: the method can utilize neural networks to obtain effective features, such as convolutional neural networks and recurrent neural networks. The text is also required to be cleaned and segmented, then the text is converted into dense distributed word vectors by a neural network thought-based method such as word2vec, and then the data is trained by a neural network such as CNN or LSTM to obtain an optimal result.

The machine learning method has weak feature expression capability and needs to manually process features, and the deep learning method has difficult model training and low training data quality, so that the two text classification methods in the related technology have low accuracy in processing the complicated text classification problem.

Based on the above problems, the present application provides a method, apparatus and storage medium for training a text classification model and text classification, and in the process of training the text classification model, embodiments of the present application perform training of the text classification model based on keyword highlighting and multiple enhanced question-answer pairs.

In the application, a first training sample set is obtained, and each first sample data in the first training sample set comprises at least one group of question-answer pairs and text information for determining answers of questions in the question-answer pairs; performing a plurality of first iterative training on the text classification model according to the first sample data to obtain a trained text classification model;

The text classification model comprises an input layer, a language model coding layer, an embedding layer and a full connection layer, and each round of first iterative training process comprises the following steps:

inputting the first sample data into a language model coding layer through an input layer to obtain a first feature vector of the first sample data; inputting the first feature vector into an embedding layer, and performing keyword salient on the feature vector used for representing the text information in the first feature vector according to the feature vector used for representing the question-answer pair in the first feature vector by keyword salient operation introduced in the embedding layer so as to obtain a second feature vector of the text information; inputting the second feature vector and the feature vector for representing the question-answer pair into the full-connection layer, and determining answer probability corresponding to the question in the question-answer pair; and reversely adjusting model parameters of the language model coding layer according to the answer probability output by the full-connection layer and the answer in the first sample data.

In the method, the text classification model is trained based on the question-answer pairs and the corresponding text information as training data, and keyword highlighting operation is introduced in the training process, so that the association degree of the text information and the question-answer pairs is improved, the accuracy in the training process is improved, and therefore the accuracy of text classification is improved.

In one possible implementation manner, the text semantic analysis method and device based on the migration learning introduces external knowledge, monitors the text classification model through the multi-task learning, and adds data enhancement to increase noise to the model so as to increase generalization and robustness of the model, so that text semantics can be better understood.

Specifically, at least one full-connection layer is continuously arranged after the embedding layer, each full-connection layer corresponds to one task, and a loss function is arranged for each task;

acquiring a second training sample set, wherein the second training sample set comprises second sample data subjected to data enhancement processing, and each second sample data comprises at least one group of question-answer pairs and text information for determining answers of questions in the question-answer pairs;

performing a plurality of rounds of second iterative training on the basis of the trained text classification model according to the second sample data to obtain a retrained text classification model;

the second iterative training process of each round is as follows:

inputting the second sample data into a language model coding layer of the trained text classification model through an input layer to obtain feature vectors of the second sample data; generating a feature vector with fixed dimension by passing the feature vector of the second sample data through an embedding layer of the trained text classification model; the feature vectors output by the embedded layer are respectively input to the full-connection layer corresponding to each task, and a loss function corresponding to each task is determined; and reversely adjusting model parameters of a language model coding layer of the trained text classification model according to the loss function of each task.

In one possible implementation, to improve the accuracy of the text classification model, an countermeasure training is added in the process of training the text classification model.

Specifically, a first loss function is obtained according to the feature vector output by the language model coding layer; determining a first gradient value according to the first loss function; calculating a disturbance vector according to the gradient value and the first gradient value of the embedded matrix, and adding the disturbance vector to the feature vector subjected to the embedded dimension reduction processing to obtain an countermeasure vector; determining a second loss function based on the challenge vector; reversely obtaining an antagonism gradient value according to the second loss function, and accumulating the antagonism gradient value to the first gradient value to obtain a target gradient; and adjusting model parameters of the language model coding layer according to the target gradient.

The application provides a method for training a text classification model based on keyword highlighting operation and a question-answering reading understanding mode; and in the training process of the text classification model, various enhancement strategies are combined, including data enhancement, migration learning, multi-task learning and countermeasure training. The keyword highlighting operation can help the text classification model to better learn the key information in the text by identifying the parts of speech of the keywords in the question-answer pair, then judging whether each word in the text information is matched with the identified keyword, and adding the matched information to generate a new word vector for training. In order to enhance the generalization capability of the model, the method adopts various data enhancement technologies, and certain words in the text are randomly deleted, replaced and inserted according to a certain proportion, so that noise is increased for training data of the model; and by negating sentences in the text information and modifying the labels, the model can learn more knowledge through comparison. Training on a data set with similar data sources to obtain a trained text classification model, and performing combined training, namely multi-task learning, with the data set with similar tasks to obtain a retrained text classification model, wherein the retrained text classification model can enhance generalization of the model and knowledge learning capability. The text classification model provided by the application obtains extremely high accuracy in text semantic understanding tasks through experiments.

After the design concept of the embodiment of the present application is introduced, the application scenario set in the present application is briefly described below.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application. The application scenario includes a terminal device 10 and a server 11. The terminal device 10 and the server 11 may communicate with each other via a communication network.

In an alternative embodiment, the communication network is a wired network or a wireless network. The terminal device 10 and the server 11 may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

In the embodiment of the present application, the terminal device 10 is an electronic device used by a user, where the electronic device may be a computer device having a certain computing capability, such as a personal computer, a mobile phone, a tablet computer, a notebook, an electronic book reader, and running instant messaging software and a website or social software and a website;

the server 11 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, and the like. The terminal device 10 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal device 10 and the server 11 may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

The text classification model may be deployed on the server 11 for training, and a large number of training samples may be stored in the server 11, and include at least one set of question-answer pairs and text information for determining answers to questions in the question-answer pairs, for training the text classification model. Alternatively, after training to obtain the text classification model based on the training method in the embodiment of the present application, the trained text classification model may be directly deployed on the server 11 or the terminal device 10. A text classification model is typically deployed directly on the server 11, and in this embodiment of the present application, the text classification model is often used to analyze a problem input by a user and corresponding text information, so as to determine, based on the text information, a probability of whether the problem input by the user is accurate.

It should be noted that, the method for training text classification model and text classification provided in the embodiment of the present application may be applied to various application scenarios including question-answering type general semantic text classification tasks. Basic tasks such as text classification in various natural language processing tasks in the medical field, but such basic tasks are often critical to subsequent tasks. For example, the method can be used for judging whether the descriptions of disease symptoms, medicines and the like in the medical record text appear in the text, so that doctors can be helped to make auxiliary judgment; in addition, the diseases described by the patients can be classified in advance in a plurality of rounds of conversations, and then guided to specific departments or specific major doctors to play a role of pre-consultation.

Accordingly, the training samples used in different scenarios are different. Taking a medical scene as an example, the training sample is a patient doctor question-answer pair and a corresponding case; similarly, when the trained text classification model is used for text classification, the used problems and corresponding text information are different in different scenes, for example, in a medical scene, the used text information is a case, and the problems are judgment problems input by a patient and including various disease names.

When the embodiments of the present application are applied to Medical scenes, the present application also relates to Medical clouds (Medical clouds) in cloud technology. The medical cloud is based on new technologies such as cloud computing, mobile technology, multimedia, 4G communication, big data, internet of things and the like, and a medical health service cloud platform is created by combining the medical technology and using 'cloud computing', so that medical resource sharing and medical range expansion are realized. Because of the application and combination of the cloud computing technology, the medical cloud improves the efficiency of the medical institution and facilitates residents to seek medical advice. Like the appointment registration, electronic medical records, medical insurance and the like of the prior hospital are products of combination of cloud computing and medical field, and the medical cloud also has the advantages of data security, information sharing, dynamic expansion and overall layout; the cloud computing is a computing mode, and distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space and information service according to requirements.

In one possible application scenario, the cloud technology further includes the technical field of artificial intelligence cloud services, which are also commonly called AIaaS (AI as a Service), i.e. AI services. The service mode of the artificial intelligent platform is the mainstream at present, and particularly, the AIaaS platform can split several common AI services and provide independent or packaged services at the cloud. This service mode is similar to an AI theme mall: all developers can access one or more artificial intelligence services provided by the use platform through an API interface, and partial deep developers can deploy and operate and maintain cloud artificial intelligence services exclusive to themselves by using an AI framework and an AI infrastructure provided by the platform. The method for training the text classification model provided by the embodiment of the application can be realized based on the cloud technology. In the implementation, each artificial intelligence service related in the text classification model training process is split, for example, a language coding model related to a language model coding layer acquires feature vectors, classification of a full-connection layer and the like are split, and independent or packed services are provided at a cloud end; or when training the text classification model, one or more artificial intelligence services provided by the use platform can be accessed by way of an API interface.

In the application, the training samples for training the text classification model can also be stored by adopting a cloud storage technology. Cloud storage (cloud storage) is a new concept that extends and develops in the concept of cloud computing, and a distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network to work cooperatively through application software or application interfaces through functions such as cluster application, grid technology, and a distributed storage file system, so as to provide data storage and service access functions for the outside.

In one possible application scenario, in order to facilitate reducing the communication latency, the servers 11 may be deployed in respective areas, or for load balancing, different servers 11 may serve respective areas corresponding to the respective terminal devices 10. The plurality of servers 11 share data by a blockchain, and the plurality of servers 11 correspond to a data sharing system constituted by the plurality of servers 11. For example, the terminal device 10 is located at the site a and is in communication connection with the server 11, and the terminal device 10 is located at the site b and is in communication connection with the other servers 11.

For each server 11 in the data sharing system, having a node identifier corresponding to the server 11, each server 11 in the data sharing system may store the node identifiers of other servers 11 in the data sharing system, so as to broadcast the generated block to other servers 11 in the data sharing system according to the node identifiers of the other servers 11. Each server 11 may maintain a list of node identifiers as shown in the following table, and the server 11 name and node identifier are correspondingly stored in the list of node identifiers. The node identifier may be an IP (Internet Protocol, protocol of interconnection between networks) address, and any other information that can be used to identify the node, and the IP address is only illustrated in table 1.

TABLE 1

Server name	Node identification
		Node 1	119.115.151.174
Node 2	118.116.189.145
		…	…
Node N	119.124.789.258

The method of text classification provided by the exemplary embodiments of the present application will be described below with reference to the accompanying drawings in conjunction with the application scenario described above, and it should be noted that the application scenario described above is merely shown for the convenience of understanding the spirit and principles of the present application, and embodiments of the present application are not limited in any way in this respect.

In the application, the text classification is mainly question-answer type text two classification, namely, the input text information and the target questions are processed through a trained text classification model, and the answer probability of the target questions is determined based on the text information so as to provide correct guidance for a user and save the time of the user. Thus, in the present application, a text classification model is trained first, and text classification is performed by applying the trained text classification model.

Embodiment one: a first method of training a text classification model.

As shown in fig. 2, a flowchart of a method for training a text classification model according to an embodiment of the present application includes the following steps:

step S200, a first training sample set is acquired, and each first sample data in the first training sample set includes at least one set of question-answer pairs and text information for determining answers to questions in the question-answer pairs.

Taking a medical scenario as an example: the first sample data comprises patient cases; and at least one set of question-answer pairs for the case, the question-answer pairs containing the questions posed by the patient for the case and the questions posed by the doctor for the patient, such as patient "doctor me is suffering from a medical illness," doctor "yes," or patient "doctor me suffers from what illness," doctor "you suffer from a medical illness.

After the first training sample set is obtained, performing multiple rounds of first iterative training on the text classification model based on first sample data in the first training sample set to obtain a trained text classification model.

Step S201, according to the first sample data, performing multiple rounds of first iterative training on the text classification model to obtain a trained text classification model.

In the present application, the round of the first iterative training may be preset; or determined according to a stopping condition in the training process, wherein the stopping condition can be that the loss function converges to a desired value or the loss function reaches a certain value and then is differentiated.

In this application, when performing multiple rounds of first iterative training on the text classification model, the structure of the text classification model is unchanged, as shown in fig. 3, which is a structure diagram of the text classification model provided in this embodiment of the present application, where the text classification model 300 includes an input layer 301, a language model coding layer 302, an embedding layer 303, and a full connection layer 304. Therefore, the operations performed by each round of first iteration training are the same, and the process of each round of first iteration training is shown in fig. 4, which is a flowchart of a method for training a text classification model for the first iteration according to the embodiment of the present application, including:

step S400, inputting the first sample data into a language model coding layer through an input layer to obtain a first feature vector of the first sample data.

In the present application, the language model coding layer is one of BERT, roberta, XLNet.

In the application, first sample data is input to an input layer in the training process, then the input layer inputs the first sample data to a language model coding layer for coding processing, and a first feature vector of the first sample data is determined.

In one possible implementation, the input layer performs a data stitching process on text information and at least one set of question-answer pairs contained in the first sample data.

When data splicing processing is carried out, a [ CLS ] classification token is arranged at the beginning of data, and text information, questions and answers are connected through [ SEP ]; fig. 5 is a schematic diagram of a first sample data splicing process according to an embodiment of the present application.

In the present application, after the language model encoding, each word in the first sample data outputs a 1×1024 vector as the feature vector of the word, that is, each word in the text information and each word in the question-answer pair outputs a 1×1024 vector as the corresponding first feature vector.

Step S401, inputting the first feature vector into an embedding layer, and performing keyword salient on the feature vector used for representing the text information in the first feature vector according to the feature vector used for representing the question-answer pair in the first feature vector by the keyword salient operation introduced in the embedding layer so as to obtain a second feature vector of the text information.

In order to increase the association relation between text information and question-answer pairs and obtain a more targeted embedded vector, a keyword highlighting operation is introduced into an embedded layer in the text classification model structure diagram shown in fig. 6.

Therefore, in the embedding layer of the text classification model training process, not only the first feature vector in the first sample data is subjected to dimension reduction and the feature vector with fixed dimension is generated, but also the word in the text information in the first sample data is subjected to keyword highlighting operation so as to obtain the corresponding second feature vector in the text information.

In the application, the keyword highlighting operation is performed on the words in the text information through the keyword highlighting operation in the embedded layer, and the specific operation of obtaining the second feature vector of the text information is as follows:

parts of speech of the keywords in the question-answer pair of the first sample data are identified, and a part-of-speech tag set is set according to the parts of speech.

Identifying parts of speech of keywords in the question-answer pair, the keywords tending to contain more valid information; the part-of-speech tags contain nouns, verbs, adjectives, adverbs, numbers or foreign words; that is, in the present application, the updated part-of-speech tag set is set according to the part-of-speech tags of the keywords in the question-answer pair.

Determining a part-of-speech tag for each word in the text information of the first sample data; i.e. to determine whether each word in the text information is a noun, a verb, an adjective, etc.

Judging whether the part-of-speech tag corresponding to each word in the text information is in a part-of-speech tag set identified in the question-answer pair corresponding to the text information, and adding a target vector to the feature vector of each word in the text information according to a judging result to obtain a second feature vector.

In the application, the feature vector of each word in the text information is a feature vector processed through the embedding layer, the processed feature vector has a fixed dimension, and the feature vector can be subjected to dimension reduction processing at the embedding layer. Let the feature vector of each word be d _i 。

Further identifying the part of speech of each word in the text information, judging whether the part of speech is in a part of speech tag set determined according to question-answer pairs, if so, adding a l ⁺ Vector, otherwise add one l ^- Vector, where l ⁺ And l ^- Are all d _i Word vector of the same dimension, and l ⁺ And l ^- Is the target vector in the embodiment of the application.

From the above, the word vector newly generated by each word in the text information isWherein h is _i Is l ⁺ Or l ^- . Namely, the second feature vector obtained by the text information through the keyword highlighting operation is +.>Subsequent training is then continued with the second feature vector so that the word vector for each word in the text message has a higher relevance to the question and answer.

It should be noted that, the keyword highlighting operation of the present application is used only when three data of text, question and answer are available at the same time, and training is performed for the question-answer text classification task.

Step S402, inputting the second feature vector and the feature vector for representing the question-answer pair into the full-connection layer, and determining answer probabilities corresponding to the questions in the question-answer pair.

In this application, an activation function for text classification is set in the fully connected layer, and the fully connected layer corresponds to a classifier.

In one possible embodiment, the output of the first character [ CLS ] is taken as the input of the full connection layer, and the final output is obtained for the text classification learning task via the activation function.

Step S403, reversely adjusting the model parameters of the language model coding layer according to the answer probability output by the full-connection layer and the answer in the first sample data.

And determining a loss function based on the answer probability output by the text classification model to the questions in the question-answer pair and the answers in the question-answer pair of the first text training data, and reversely adjusting model parameters of the language model coding layer according to the loss function.

In the present application, a plurality of rounds of first iterative training are performed according to steps S400-S403 to obtain a trained text classification model.

In one possible implementation, in order to enable the model to learn more text knowledge, a method for generating positive and negative samples is also proposed in the present application: randomly generating n from each text message _q A question and an associated answer. The specific operation is as follows: randomly extracting n from text information _s Word, and splice the extracted words as questions in question-answer pair, then splice the rest words in text information as answers in question-answer pair, wherein n _q And n _s Is a positive integer.

In the application, in order to increase generalization of the text classification model and the ability to learn knowledge, model parameters of the text classification model are finely tuned on the basis of the trained text classification model. In the process, an obtained text classification model trained by a text classification task is used as a model initial point for transfer learning, and similar task retraining, namely multi-task learning is carried out, so that the learning efficiency of the model is improved, and the generalization of the model is improved; meanwhile, in order to increase the diversity of data and the robustness of the model in the training process, different degrees of noise are introduced into the training data of the model through various data enhancement technologies. See the training process of the text classification model in the second embodiment for details.

Embodiment two: a second method for training a text classification model.

As shown in fig. 7, a block diagram of another text classification model according to an embodiment of the present application is provided, where the text classification model 700 includes an input layer 701, a language model coding layer 702, an embedding layer 703, and a plurality of fully connected layers 704, where the plurality of fully connected layers includes one fully connected layer for implementing a text classification main task, and at least one fully connected layer for implementing a sub task similar to but different from the text classification main task, where an activation function in each fully connected layer is different, and a corresponding loss function is different.

As shown in fig. 7, similar to but different from the text classification primary task in the present application, the secondary tasks include, but are not limited to: text implication, multitasking.

And the text implicates that the task corresponds to a single fully connected layer and the multitask selection task corresponds to a single fully connected layer.

It should be noted that, the input layer, the language model coding layer and the embedding layer in the text classification model are the input layer, the language model coding layer and the embedding layer in the text classification model obtained by training in the first embodiment, and since other tasks are introduced, the text classification task is not only trained, and therefore, the keyword highlighting operation is not introduced into the embedding layer in the training process.

As shown in fig. 8, a flowchart of another training method for a text classification model according to an embodiment of the present application includes the following steps:

step S800, obtaining a second training sample set, where the second training sample set includes second sample data after data enhancement processing, and each second sample data includes at least one question-answer pair and text information for determining answers to questions in the question-answer pair.

In this application, the data enhancement process includes one or a combination of the following:

Randomly replacing synonyms, namely randomly replacing words in the text data with words in a synonym table according to a set first proportion;

randomly inserting words, namely randomly selecting words in the text according to a set second proportion and randomly inserting the words in any position in the text;

randomly deleting words, namely randomly deleting words in the text information according to a set third proportion;

the words are randomly reversed, namely, two words in the text information are randomly selected according to the set fourth proportion and the position is reversed.

Step S801, performing a plurality of rounds of second iterative training on the basis of the trained text classification model according to the second sample data to obtain a retrained text classification model.

It should be noted that, the operations performed by each round of the second iterative training are the same, so the present application is only described with respect to one round of the second iterative training, as shown in fig. 9, and a flowchart of a method for training a text classification model for the second iteration provided in an embodiment of the present application includes:

in step S900, the second sample data is input into the language model coding layer of the trained text classification model through the input layer, so as to obtain the feature vector of the second sample data.

Reference may be made specifically to the description in the first embodiment, and no further description is given here.

In step S901, feature vectors of the second sample data are passed through the embedded layer of the trained text classification model to generate feature vectors of fixed dimensions.

Step S902, the feature vectors output by the embedded layer are respectively input to the full connection layer corresponding to each task, and a loss function corresponding to each task is determined.

In the application, the full connection layer corresponding to each task outputs a corresponding result, and the output result is compared with the input preset result with the label to determine the loss function corresponding to each task.

Step S903, reversely adjusting model parameters of a language model coding layer of the trained text classification model according to the loss function of each task.

In the present application, when model parameters of a language model coding layer of a trained text classification model are reversely adjusted according to a loss function of each task:

determining a plurality of loss functions according to the output result of each full connection layer and the corresponding preset result; weighting the loss functions corresponding to the tasks according to preset weight distribution of the main task and the auxiliary task to obtain target loss functions; and reversely adjusting model parameters of the language model coding layer according to the target loss function.

For example, a text classification task is set in advance: text implies tasks: the multitasking selection task is 8:1:1, and when determining the target loss function, classifying the task loss function according to the text: text implies a task loss function: the target loss function is determined by multitasking the task loss function with a weight ratio of 8:1:1.

In the application, the weight distribution is set for the main task and the auxiliary task in the model in order to enable the model to learn more knowledge and avoid performance degradation caused by overfitting of the model. The multi-task weight can embody the importance ratio of a certain task in the final result, and the text classification task is set to be the largest weight because the text classification task is important in the application.

It should be noted that, in the training process of the text classification model of the first embodiment and the second embodiment, countermeasure training may be cited. Because the prediction result of the text classification model can be easily changed when some tiny disturbance occurs to the text data in the training process of the text classification model, the prediction of the text classification model is inaccurate. Such disturbance is referred to as challenge disturbance, input after disturbance is referred to as challenge sample, and this process of inputting a challenge sample misleading model is referred to as challenge attack. The vulnerability of the text classification model when the text classification model encounters a challenge brings great risks to practical applications.

In order to improve the robustness of the text classification model to the challenge, the application provides a challenge training method, wherein the challenge training is a training mode for introducing noise, the model parameters can be regularized, the challenge training refers to a method for constructing a challenge sample in the training process of the model and mixing the challenge sample and an original sample together to train the model, in other words, the challenge is carried out on the model in the training process of the model so as to improve the robustness and generalization capability of the model to the challenge.

In the application, a disturbance r is added on the basis of a feature vector obtained by language model coding by utilizing FGM algorithm _adv ＝∈g/‖g‖ ₂ Where g is the input gradientIncreasing the disturbance increases the difficulty of model convergence to achieve the effect of countertraining.

The FGM algorithm employed is specifically as follows:

for each text training data of the training:

calculating forward Loss of a sample data feature vector (X), and back-propagating to obtain a first gradient value;

the forward Loss is determined by comparing the result output by the embedded layer and the full-connection layer with the answer in the sample data according to the feature vector output by the language model code.

Calculating disturbance vector r according to the gradient value and the first gradient value of the embedded matrix _adv And adding the disturbance vector to the current embedded vector to obtain a countermeasure vector (X+r) _adv )；

Calculating an countermeasure vector (X+r) _adv ) Counter-propagating the forward Loss of (1) to obtain an antagonism gradient value, and accumulating a first gradient value aiming at the antagonism gradient value to obtain a target gradient value;

and further adjusting model parameters of the language model coding layer according to the target gradient value.

The application provides a training method of a question-answer text classification model, provides a keyword highlighting operation in the training process, and combines various enhancement strategies, wherein the enhancement strategies comprise data enhancement, transfer learning, multi-task learning and countermeasure training. The keyword highlighting operation is performed by identifying parts of speech of words in the question-answer pair, then judging whether the parts of speech of each word in the text information are matched with the identified parts of speech in the question-answer pair, and adding matching information to generate a new word vector for training. In order to enhance the generalization capability of the model, various data enhancement technologies are adopted, and certain words in the text information are randomly deleted, replaced and inserted according to a certain proportion, so that noise is added to training data of the model. The transfer learning can enhance generalization of the model and ability to learn knowledge by training on larger datasets with similar data sources while introducing countertraining to obtain an initial model, and then performing joint training, i.e., multi-task learning, with datasets with similar tasks. The model provided by the application achieves extremely high accuracy in text classification tasks.

Embodiment III: a method of text classification.

As shown in fig. 10, a flowchart of a method for text classification according to an embodiment of the present application includes the following steps:

in step S1000, a text classification request including text data is acquired, wherein the text data includes a target question and target text information for determining whether the target question is correct.

In the present application, the target problem is a judgment problem including a target noun.

Taking a medical scenario as an example for illustration, the target text information is a case of a patient, the target question is a question input by the patient, the input question is a judgment question including a target noun, for example, whether the patient suffers from a medical disease, the target question is a judgment question, and the included target noun is a medical disease.

In step S1001, text data is input into the trained text classification model, and based on the trained text classification model, whether the target question is a correct text classification result is determined.

Wherein the trained text classification model is trained by the methods of embodiment one and embodiment two of the present application.

For example, the case of the patient and the target question of "whether the patient has a medical disease" are input into a trained text classification model, the text classification model outputs a "yes" and a "no" probability value for the target question, and a classification result corresponding to the target question can be determined according to the probability value, that is, whether the patient has the medical disease is determined according to the case of the patient.

In this application, a trained text classification model includes an input layer, a language model coding layer, an embedded layer, and a fully connected layer. In the text classification process, text data is input into a trained text classification model, the text data is processed sequentially through each layer in the text classification model, and finally a text classification result is output.

As shown in fig. 11, an overall method flowchart for text classification according to an embodiment of the present application includes:

step S1100, obtaining a text classification request containing text data;

the text data is target text information and target questions, and the target questions are judgment questions comprising target nouns.

Step 1101, inputting the text data into the trained text classification model, and performing splicing processing on the target text information and the target problem in the text data through the input layer.

In step S1102, the spliced text data is transmitted to the language model coding layer via the input layer, and the feature vector of the text data is obtained by coding the text data by the language model coding layer.

In this application, after language model encoding, each word in the text data outputs a 1×1024 vector as the feature vector of the word.

In step S1103, the feature vector is transmitted to the embedding layer via the language model encoding layer, and is processed by the embedding layer to obtain a feature vector with a fixed dimension.

Since the dimension of each word output by the language model coding layer is not fixed, a feature vector with fixed dimension is generated by the embedding layer;

in one possible implementation manner, the dimension of the feature vector of the word output by the language model coding layer is larger, so that the calculation efficiency is affected, and the feature vector can be subjected to dimension reduction processing through the embedding layer.

In step S1104, the target feature vector is input to the full-connection layer via the embedding layer, and the classification processing is performed by the full-connection layer to determine the classification result.

In the application, the full-connection layer contains an activation function for text classification, and the activation function obtains a final classification result for a text classification task. It should be noted that the full-connection layer corresponds to a text classifier in the present application.

Because the text classification method is mainly based on question-answer text classification, keyword highlighting operation is introduced in the model training process, so that the information of question-answer pairs is introduced into the embedded vector process of text information through the keyword highlighting operation, the embedded vector of the text information is more relevant to the question-answer, and the accuracy of the trained text classification model on the question-answer text classification is improved.

Based on the same inventive concept, the embodiment of the present application further provides an apparatus 1200 for training a text classification model, as shown in fig. 12, the apparatus 1200 includes: a first acquisition unit 1201 and a training unit 1202, wherein:

a first obtaining unit 1201, configured to obtain a first training sample set, where each first sample data in the first training sample set includes at least one set of question-answer pairs and text information for determining answers to questions in the question-answer pairs;

a training unit 1202, configured to perform multiple rounds of first iterative training on the text classification model according to the first sample data, so as to obtain a trained text classification model;

the text classification model includes an input layer, a language model coding layer, an embedded layer, and a full connection layer, and the training unit 1202 is specifically configured to:

In one possible implementation, the training unit 1202 is specifically configured to:

identifying parts of speech of keywords in the question-answer pair of the first sample data, and setting a part of speech tag set according to the parts of speech; for each word in the text information of the first sample data, determining the part-of-speech tag, and adding a target vector to the feature vector of each word according to the judging result of whether the part-of-speech tag is in the part-of-speech tag set, so as to obtain a second feature vector.

the first obtaining unit 1201 is further configured to obtain a second training sample set, where the second training sample set includes second sample data after data enhancement processing, and each second sample data includes at least one set of question-answer pairs and text information for determining answers to questions in the question-answer pairs;

a training unit 1202, configured to perform a plurality of rounds of second iterative training on the basis of the trained text classification model according to the second sample data, so as to obtain a retrained text classification model;

the training unit 1202 is specifically configured to:

weighting the loss function corresponding to each task according to a preset task weight proportion to obtain a target loss function; and reversely adjusting model parameters of a language model coding layer of the trained text classification model according to the target loss function.

In one possible implementation, the training unit 1202 is further configured to:

obtaining a first loss function according to the feature vector output by the language model coding layer; determining a first gradient value according to the first loss function; calculating a disturbance vector according to the gradient value and the first gradient value of the embedded matrix, and adding the disturbance vector to the feature vector subjected to the embedded dimension reduction processing to obtain an countermeasure vector; determining a second loss function based on the challenge vector; reversely obtaining an antagonism gradient value according to the second loss function, and accumulating the antagonism gradient value to the first gradient value to obtain a target gradient; and adjusting model parameters of the language model coding layer according to the target gradient.

In one possible implementation, the language model coding layer is one of BERT, roberta, XLNet.

Based on the same inventive concept, the embodiment of the present application further provides a text classification device 1300, as shown in fig. 13, where the device 1300 includes: a second acquisition unit 1301 and a determination unit 1302, wherein:

a second obtaining unit 1301 configured to obtain a text classification request including text data, where the text data includes a target question and target text information for judging whether the target question is correct;

A determining unit 1302 for inputting text data into the trained text classification model, determining whether the target question is a correct text classification result based on the trained text classification model; the trained text classification model is obtained through training by the method for training the text classification model.

For convenience of description, the above parts are respectively described as functionally divided into units (or modules). Of course, the functions of each unit (or module) may be implemented in the same piece or pieces of software or hardware when implementing the present application.

Having described the method and apparatus for text classification and corresponding method and apparatus for text classification model training according to an exemplary embodiment of the present application, next, a computing device in a text classification process or text classification model training process according to another exemplary embodiment of the present application is described.

Those skilled in the art will appreciate that the various aspects of the present application may be implemented as a system, method, or program product. Accordingly, aspects of the present application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

In one possible implementation, a computing device provided by an embodiment of the present application may include at least a processor and a memory. The memory stores program code that, when executed by the processor, causes the processor to perform any of the methods of text classification of various exemplary embodiments in the present application, and to perform any of the methods of text classification model training of various exemplary embodiments in the present application.

In some possible implementations, the present embodiments also provide a computer readable storage medium including program code for causing an electronic device to perform the steps of the method of any of the above embodiments, and the steps of the method of any of the above embodiments, when the program product is run on the electronic device.

A computing device 1400 according to such an embodiment of the present application is described below with reference to fig. 14. The computing device 1400 of fig. 14 is but one example and should not be taken as limiting the functionality and scope of use of embodiments of the present application.

As shown in fig. 14, computing device 1400 is represented in the form of a general purpose computing device. Components of computing device 1400 may include, but are not limited to: the at least one processor 1401, the at least one memory unit 1402, and a bus 1403 connecting the different system components (including the memory unit 1402 and the processor 1401).

Bus 1403 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, and a local bus using any of a variety of bus architectures.

The storage unit 1402 may include a readable medium in the form of a volatile memory, such as a Random Access Memory (RAM) 14021 and/or a cache storage unit 14022, and may further include a Read Only Memory (ROM) 14023.

The storage unit 1402 may also include a program/utility 14025 having a set (at least one) of program modules 14024, such program modules 14024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The computing device 1400 may also communicate with one or more external devices 1404 (e.g., keyboard, pointing device, etc.), one or more devices that enable a user to interact with the computing device 1400, and/or any devices (e.g., routers, modems, etc.) that enable the computing device 1400 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 1405. Moreover, computing device 1400 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 1406. As shown, network adapter 1406 communicates with other modules for computing device 1400 over bus 1403. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computing device 1400, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

In some possible embodiments, aspects of the text classification method provided herein may also be implemented in the form of a program product comprising program code for causing a computer device to perform the steps of the text classification method according to various exemplary embodiments of the present application as described herein above, and the method of text classification model training, when the program product is run on a computer device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The present application is described above with reference to block diagrams and/or flowchart illustrations of methods, apparatus (systems) and/or computer program products according to embodiments of the application. It will be understood that one block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, and/or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.

Accordingly, the present application may also be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Still further, the present application may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this application, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. A method of training a text classification model, the method comprising:

inputting the first sample data into the language model coding layer through the input layer to obtain a first feature vector of the first sample data;

inputting the first feature vector into the embedding layer, and performing keyword salient on the feature vector used for representing the text information in the first feature vector according to the feature vector used for representing the question-answer pair in the first feature vector by the keyword salient operation introduced in the embedding layer so as to obtain a second feature vector of the text information;

Inputting the second feature vector and the feature vector for representing the question-answer pair to the full-connection layer, and determining answer probability corresponding to the question-answer pair;

reversely adjusting model parameters of the language model coding layer according to the answer probability output by the full-connection layer and the answer in the first sample data;

the keyword salient operation introduced in the embedding layer performs keyword salient on the feature vector, which is used for representing text information, according to the feature vector, which is used for representing a question-answer pair, of the first feature vector, so as to obtain a second feature vector of the text information, and the keyword salient operation comprises the following steps:

and determining part-of-speech tags for each word in the text information of the first sample data, and adding a target vector to the feature vector of each word according to the judging result of whether the part-of-speech tags are in the part-of-speech tag set so as to obtain the second feature vector.

2. The method of claim 1, wherein the method further comprises:

Continuously setting at least one full-connection layer after the embedded layer, wherein each full-connection layer corresponds to a task, and setting a loss function for each task;

performing a plurality of rounds of second iterative training on the trained text classification model based on the second sample data to obtain a retrained text classification model;

the second iterative training process of each round is as follows:

inputting the second sample data into a language model coding layer of the trained text classification model through the input layer to obtain feature vectors of the second sample data;

the feature vectors output by the embedded layer are respectively input to a full-connection layer corresponding to each task, and a loss function corresponding to each task is determined;

3. The method of claim 2, wherein said inversely adjusting model parameters of a language model coding layer of said trained text classification model according to said per-task loss function comprises:

4. The method of claim 2, wherein the data enhancement process comprises one or a combination of:

5. The method of claim 1 or 2, further comprising:

determining a first gradient value according to the first loss function;

calculating a disturbance vector according to the gradient value and the first gradient value of the embedded matrix, and adding the disturbance vector to the feature vector subjected to the dimension reduction processing of the embedded layer to obtain an countermeasure vector;

determining a second loss function from the challenge vector;

reversely obtaining an antagonism gradient value according to the second loss function, and accumulating the antagonism gradient value on the first gradient value to obtain a target gradient;

6. A method of text classification, the method comprising:

inputting the text data into a trained text classification model, and determining whether the target problem is a correct text classification result based on the trained text classification model; wherein the trained text classification model is trained by the method of any one of claims 1-5.

7. An apparatus for training a text classification model, the apparatus comprising:

the training unit is used for executing multiple rounds of first iterative training on the text classification model according to the first sample data so as to obtain a trained text classification model;

wherein, training unit is specifically used for:

8. A computing device comprising at least one processor, and at least one memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any one of claims 1 to 5 or the steps of the method of claim 6.

9. A computer readable storage medium, characterized in that it comprises a program code for causing an electronic device to perform the steps of the method of any one of claims 1-5 or the steps of the method of claim 6 when said program code is run on the electronic device.