CN112131366A

CN112131366A - Method, device and storage medium for training text classification model and text classification

Info

Publication number: CN112131366A
Application number: CN202011009658.0A
Authority: CN
Inventors: 管冲; 卢睿轩; 谢德峰; 李承恩; 姜萌; 文瑞; 陈曦
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2020-12-25
Anticipated expiration: 2040-09-23
Also published as: CN112131366B

Abstract

The application provides a method and a device for training a text classification model and text classification and a storage medium, and relates to an artificial intelligence cloud technology to improve the accuracy of text classification. Inputting the first sample data into a language model coding layer through an input layer to obtain a first feature vector of the first sample data, wherein the first sample data comprises at least one group of question-answer pairs and text information used for determining answers to questions in the question-answer pairs; performing keyword highlighting on the feature vector used for representing the text information in the first feature vector through keyword highlighting operation introduced into the embedding layer according to the feature vector used for representing the question-answer pair in the first feature vector to obtain a second feature vector of the text information; inputting the second characteristic vector and the characteristic vector for representing the question-answer pair into the full-connection layer, and determining answer probability corresponding to the question in the question-answer pair; and reversely adjusting the model parameters of the language model coding layer according to the answer probability output by the full connection layer and the answer in the first sample data.

Description

Method, device and storage medium for training text classification model and text classification

Technical Field

The application relates to the field of natural language processing, and provides a method, a device and a storage medium for training a text classification model and text classification.

Background

With the development of science and technology and internet technology, the data volume is continuously increased, and the data with use value can be efficiently obtained from the data by adopting a text classification method. At present, the text classification method is mainly determined by adopting a machine learning or deep learning technology in an artificial intelligence technology.

The text classification method based on machine learning mainly divides a text classification problem into two parts, namely a feature engineering part and a classifier part. The feature engineering comprises the parts of text preprocessing, feature extraction, text representation and the like. In the process, firstly, a text is cleaned, a word segmentation tool is used for segmenting words of the text, then the text is expressed into a vector form by using methods such as a word bag method, a TF-IDF (term frequency-inverse text frequency index) and the like, and the vector is input into classifiers such as SVM (Support vector machines) and decision trees to obtain a final classification result. However, the feature expression capability is weak in machine learning, and manual feature processing is required, so that the accuracy of text classification is low.

The text classification method based on deep learning comprises the steps of firstly cleaning and segmenting a text, then converting the text into dense distributed word vectors based on a Neural network such as word2vec, and then training data through the Neural network such as CNN (Convolutional Neural Networks) or LSTM (Long Short-Term Memory) to obtain an optimal result. However, the accuracy of text classification is low due to the problems of difficulty in model training, improper model structure and the like in deep learning.

In summary, the text classification method in the prior art is low in accuracy when processing the text classification problem.

Disclosure of Invention

The embodiment of the application provides a method, a device and a storage medium for training a text classification model and text classification, and provides a method for training a text classification model based on question answering, which is used for improving the accuracy of text classification.

In a first aspect, an embodiment of the present application provides a method for training a text classification model, where the method includes:

acquiring a first training sample set, wherein each first sample data in the first training sample set comprises at least one group of question-answer pairs and text information for determining answers to questions in the question-answer pairs;

performing a plurality of rounds of first iterative training on the text classification model according to the first sample data to obtain a trained text classification model;

the text classification model comprises an input layer, a language model coding layer, an Embedding layer (Embedding) and a full-connection layer, and each round of first iterative training process comprises the following steps:

inputting the first sample data into a language model coding layer through an input layer to obtain a first feature vector of the first sample data;

inputting the first feature vector into an embedding layer, and performing keyword highlighting on the feature vector used for representing the text information in the first feature vector according to the feature vector used for representing the question-answer pair in the first feature vector through keyword highlighting operation introduced into the embedding layer to obtain a second feature vector of the text information;

inputting the second characteristic vector and the characteristic vector for representing the question-answer pair into the full-connection layer, and determining answer probability corresponding to the question in the question-answer pair;

and reversely adjusting the model parameters of the language model coding layer according to the answer probability output by the full connection layer and the answer in the first sample data.

In a second aspect, the present application provides a method of text classification, the method comprising:

acquiring a text classification request containing text data, wherein the text data contains a target problem and target text information used for judging whether the target problem is correct;

inputting the text data into a trained text classification model, and determining whether the target problem is a correct text classification result based on the trained text classification model; wherein the trained text classification model is trained by the method of the first aspect.

In a third aspect, an embodiment of the present application provides an apparatus for training a text classification model, where the apparatus includes:

the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a first training sample set, and each first sample data in the first training sample set comprises at least one group of question-answer pairs and text information used for determining answers to questions in the question-answer pairs;

the training unit is used for executing multiple rounds of first iterative training on the text classification model according to the first sample data to obtain a trained text classification model;

the text classification model comprises an input layer, a language model coding layer, an embedding layer and a full connection layer, and the training unit is specifically used for:

In a possible implementation, the training unit is specifically configured to:

identifying the part of speech of the keyword in the question-answer pair of the first sample data, and setting a part of speech tag set according to the part of speech;

and determining a part-of-speech tag for each word in the text information of the first sample data, and adding a target vector to the feature vector of each word according to a judgment result of whether the part-of-speech tag is in the part-of-speech tag set or not to obtain a second feature vector.

In a possible implementation manner, at least one full connection layer is continuously set after the embedding layer, each full connection layer corresponds to one task, and a loss function is set for each task;

the first acquisition unit is further used for acquiring a second training sample set, the second training sample set comprises second sample data subjected to data enhancement processing, and each second sample data comprises at least one group of question-answer pairs and text information used for determining answers to questions in the question-answer pairs;

the training unit is also used for executing a plurality of rounds of second iterative training on the basis of the trained text classification model according to the second sample data to obtain a retrained text classification model;

the training unit is specifically configured to:

inputting second sample data into a language model coding layer of the trained text classification model through an input layer to obtain a feature vector of the second sample data;

generating a feature vector of a fixed dimension by passing the feature vector of the second sample data through an embedded layer of the trained text classification model;

respectively inputting the feature vectors output by the embedding layer into a full connection layer corresponding to each task, and determining a loss function corresponding to each task;

and reversely adjusting the model parameters of the language model coding layer of the trained text classification model according to the loss function of each task.

In a possible implementation, the training unit is specifically configured to:

weighting the loss function corresponding to each task according to a preset task weight proportion to obtain a target loss function;

and reversely adjusting the model parameters of the language model coding layer of the trained text classification model according to the target loss function.

In one possible implementation, the data enhancement process includes one or a combination of the following:

randomly replacing words in the text data with words in a synonym table according to a set first proportion;

randomly selecting words in the text according to a set second proportion and randomly inserting the words into any position in the text;

randomly deleting words in the text information according to a set third proportion;

and randomly selecting two words in the text information according to the set fourth proportion and reversing the positions of the words.

In one possible implementation, the training unit is further configured to:

obtaining a first loss function according to the feature vector output by the language model coding layer;

determining a first gradient value according to the first loss function;

calculating a disturbance vector according to the gradient value of the embedded matrix and the first gradient value, and adding the disturbance vector to the feature vector subjected to the embedded layer dimensionality reduction processing to obtain a confrontation vector;

determining a second loss function according to the countermeasure vector;

obtaining a confrontation gradient value in a reverse direction according to the second loss function, and accumulating the confrontation gradient value to the first gradient value to obtain a target gradient;

and adjusting the model parameters of the language model coding layer according to the target gradient.

In one possible implementation, the language model coding layer is one of BERT (Bidirectional Encoder Representation of transformer), Roberta (robust Optimized BERT), and XLNet.

In a fourth aspect, an embodiment of the present application provides an apparatus for text classification, where the apparatus includes:

the second acquisition unit is used for acquiring a text classification request containing text data, wherein the text data contains a target question and target text information used for judging whether the target question is correct;

the determining unit is used for inputting the text data into the trained text classification model and determining whether the target problem is a correct text classification result based on the trained text classification model; wherein the trained text classification model is trained by the method of the first aspect.

In a fifth aspect, an embodiment of the present application provides a computing apparatus, including at least one processor and at least one memory, where the memory stores program code, and the processor is configured to read the program code stored in the memory and execute the method for training a text classification model in the first aspect and the text classification method in the second aspect.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium, which stores computer instructions that, when executed by a processor, implement the method for training a text classification model in the first aspect and the text classification method in the second aspect provided by embodiments of the present application.

The beneficial effect of this application is as follows:

the application provides a method, a device and a storage medium for training a text classification model and text classification, and relates to an artificial intelligence cloud technology, in particular to a natural language processing technology. In the method, a first training sample set is obtained, each first sample data in the first training sample set comprises at least one group of question-answer pairs and text information used for determining answers to questions in the question-answer pairs, and multiple rounds of first iterative training are performed on a text classification model according to the first sample data to obtain a trained text classification model. In the process of each round of first iterative training, inputting first sample data into a language model coding layer through an input layer to obtain a first feature vector of the first sample data; then, performing keyword highlighting on the feature vector used for representing the text information in the first feature vector according to the feature vector used for representing the question-answer pair in the first feature vector through keyword highlighting operation introduced in the embedding layer to obtain a second feature vector of the text information; inputting the second characteristic vector and the characteristic vector for representing the question-answer pair into the full-connection layer, and determining answer probability corresponding to the question in the question-answer pair; and reversely adjusting the model parameters of the language model coding layer according to the answer probability output by the full connection layer and the answer in the first sample data. The keyword highlighting operation is adopted in the process of training the text classification model, the word in the text information is highlighted according to the word in the question-answer pair, the association degree of the question-answer pair and the text information is improved, the answer of the question in the question-answer pair is determined more accurately based on the text information, and further when model training is carried out according to the determined answer and the answer in the sample data, the trained text classification model can improve the association degree of the text information and the question-answer pair, and the accuracy of text classification based on the question-answer mode is further ensured.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

fig. 2 is a flowchart of a text classification method according to an embodiment of the present application;

fig. 3 is a structural diagram of a text classification model according to an embodiment of the present application;

fig. 4 is a flowchart of a first method for iteratively training a text classification model according to an embodiment of the present application;

fig. 5 is a schematic diagram of a first sample data stitching process provided in an embodiment of the present application;

FIG. 6 is a block diagram of another text classification model provided in an embodiment of the present application;

FIG. 7 is a block diagram of another text classification model provided in an embodiment of the present application;

FIG. 8 is a flowchart of another method for training a text classification model according to an embodiment of the present disclosure;

fig. 9 is a flowchart of a second method for iteratively training a text classification model according to an embodiment of the present application;

FIG. 10 is a flowchart of a method for text classification according to an embodiment of the present application;

FIG. 11 is a flowchart of an overall method for text classification according to an embodiment of the present application;

fig. 12 is a block diagram of an apparatus for training a text classification model according to an embodiment of the present disclosure;

fig. 13 is a block diagram of an apparatus for text classification according to an embodiment of the present application;

fig. 14 is a computing device provided in an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solution and advantages of the present application more clearly and clearly understood, the technical solution in the embodiments of the present application will be described below in detail and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and in the claims, and in the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

1. Artificial Intelligence (AI):

artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

2. Natural Language Processing (NLP):

natural language processing is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

3. Machine Learning (ML):

machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

4. Transfer learning:

the transfer learning is a method for retraining an original task by using a model trained on a similar task as a model initial point, and the transfer learning can accelerate the learning efficiency of the model and improve the generalization of the model by sharing knowledge learned by the model.

5. Multi-task learning:

the multi-task learning is a method for retraining an original task by using a model trained on a similar task as a model initial point, and the learning efficiency of the model can be accelerated and the generalization of the model can be improved by the transfer learning by sharing knowledge learned by the model.

6. Data enhancement:

data enhancement involves a series of techniques for generating new training samples by applying random dithering and scrambling to the raw data without the class labels changing. The goal of application data enhancement is to increase the generalization of the model.

7. And (3) confrontation training:

resistance training is an important expression to enhance the robustness of the model. In the process of resisting training, the samples can add some tiny disturbances, namely model mistakes, so that the model can adapt to the disturbances in the training process to enhance the robustness of the model.

8. Fast Gradient descent Method (FGM):

the fast gradient descent method obtains a new challenge sample by adding a perturbation to the direction of gradient ascent.

9. Bidirectional Encoder Representation of Transformers (BERT):

the bidirectional encoder representation of the Transformer is a pre-training Language Model obtained by performing Mask Language Model (MLM) and Next Sentence Prediction (NSP) multitask training on a large-scale corpus based on a Transformer.

10. Cloud technology:

cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

Cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

The following briefly introduces the design concept of the embodiments of the present application.

With the continuous development of network technology, artificial intelligence technology has been applied to various fields, such as text classification technology.

The text classification in the related art is mainly divided into two categories, one is a machine learning method based on artificial intelligence, and the other is a deep learning method based on artificial intelligence.

Based on a machine learning method: the machine learning classification method divides the whole text classification problem into two parts of feature engineering and a classifier. The feature engineering comprises the parts of text preprocessing, feature extraction, text representation and the like. Firstly, a text is cleaned, a word segmentation tool is used for segmenting words of the text, then the text is expressed into a vector form by using methods such as a word bag method and a TF-IDF method, and the text is input into a classifier such as an SVM (support vector machine), a decision tree and the like to obtain a final result.

Based on a deep learning method: the method can obtain effective features such as a convolutional neural network and a cyclic neural network by using the neural network. The text also needs to be cleaned and segmented, then the text is converted into dense distributed word vectors by a neural network thought-based method such as word2vec, and the data is trained by a neural network such as CNN or LSTM to obtain the optimal result.

The feature expression capability of the machine learning method is weak, the features need to be processed manually, model training of the deep learning method is difficult, and the quality of training data is too low, so that the accuracy of two text classification methods in the related technology is low when the problem of complex text classification is processed.

Based on the above problems, the present application provides a method, an apparatus, and a storage medium for training a text classification model and text classification.

In the application, a first training sample set is obtained, wherein each first sample data in the first training sample set comprises at least one group of question-answer pairs and text information for determining answers to questions in the question-answer pairs; performing a plurality of rounds of first iterative training on the text classification model according to the first sample data to obtain a trained text classification model;

the text classification model comprises an input layer, a language model coding layer, an embedding layer and a full connection layer, and each round of first iterative training process comprises the following steps:

inputting the first sample data into a language model coding layer through an input layer to obtain a first feature vector of the first sample data; inputting the first feature vector into an embedding layer, and performing keyword highlighting on the feature vector used for representing the text information in the first feature vector according to the feature vector used for representing the question-answer pair in the first feature vector through keyword highlighting operation introduced into the embedding layer to obtain a second feature vector of the text information; inputting the second characteristic vector and the characteristic vector for representing the question-answer pair into the full-connection layer, and determining answer probability corresponding to the question in the question-answer pair; and reversely adjusting the model parameters of the language model coding layer according to the answer probability output by the full connection layer and the answer in the first sample data.

In the method, the text classification model is trained based on the question-answer pairs and the corresponding text information as training data, and keyword highlighting operation is introduced in the training process so as to improve the relevance of the text information and the question-answer pairs and improve the accuracy in the training process, thereby improving the accuracy of text classification.

In a possible implementation mode, external knowledge is introduced based on transfer learning, a text classification model is supervised through multi-task learning, and meanwhile data enhancement is added to increase noise for the model, so that the generalization and robustness of the model are increased, and text semantics can be better understood.

Specifically, at least one full connection layer is continuously arranged after the embedding layer, each full connection layer corresponds to one task, and a loss function is arranged for each task;

acquiring a second training sample set, wherein the second training sample set comprises second sample data subjected to data enhancement processing, and each second sample data comprises at least one group of question-answer pairs and text information for determining answers to questions in the question-answer pairs;

executing multiple rounds of second iterative training on the basis of the trained text classification model according to the second sample data to obtain a retrained text classification model;

wherein, each round of the second iterative training process is as follows:

inputting second sample data into a language model coding layer of the trained text classification model through an input layer to obtain a feature vector of the second sample data; generating a feature vector of a fixed dimension by passing the feature vector of the second sample data through an embedded layer of the trained text classification model; respectively inputting the feature vectors output by the embedding layer into a full connection layer corresponding to each task, and determining a loss function corresponding to each task; and reversely adjusting the model parameters of the language model coding layer of the trained text classification model according to the loss function of each task.

In one possible implementation, in order to improve the accuracy of the text classification model, countertraining is added in the process of training the text classification model.

Specifically, a first loss function is obtained according to a feature vector output by a language model coding layer; determining a first gradient value according to the first loss function; calculating a disturbance vector according to the gradient value of the embedded matrix and the first gradient value, and adding the disturbance vector to the feature vector subjected to embedded dimension reduction processing to obtain a confrontation vector; determining a second loss function according to the countermeasure vector; obtaining a confrontation gradient value in a reverse direction according to the second loss function, and accumulating the confrontation gradient value to the first gradient value to obtain a target gradient; and adjusting the model parameters of the language model coding layer according to the target gradient.

The application provides a method for training a text classification model based on key word highlighting operation and a question-and-answer reading understanding mode; and in the training process of the text classification model, a plurality of enhancement strategies are combined, wherein the enhancement strategies comprise data enhancement, transfer learning, multi-task learning and countertraining. The keyword highlighting operation can help the text classification model to better learn the key information in the text by performing part-of-speech recognition on the keywords in the question-answer pair, then judging whether each word in the text information is matched with the recognized keyword, and adding matching information to generate a new word vector for training. In order to enhance the generalization ability of the model, the method adopts various data enhancement technologies, and randomly deletes, replaces and inserts some words in the text according to a certain proportion, so that noise is added to the training data of the model; and by negating sentences in the text information and modifying the labels, the model can be more knowledgeable than learning. The trained text classification model is obtained by training the data set with the similar data source, and then the trained text classification model and the data set with the similar task are subjected to combined training, namely multi-task learning, so that the retrained text classification model is obtained, and the retrained text classification model can enhance the generalization of the model and the knowledge learning capability. The text classification model provided by the application obtains extremely high accuracy rate on the text semantic understanding task through experiments.

After introducing the design idea of the embodiment of the present application, an application scenario set by the present application is briefly described below.

Fig. 1 is a schematic view of an application scenario provided in the embodiment of the present application. The application scenario includes a terminal device 10 and a server 11. The terminal device 10 and the server 11 can communicate with each other through a communication network.

In an alternative embodiment, the communication network is a wired network or a wireless network. The terminal device 10 and the server 11 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited thereto.

In the embodiment of the present application, the terminal device 10 is an electronic device used by a user, and the electronic device may be a computer device having a certain computing capability and running instant messaging software and a website or social contact software and a website, such as a personal computer, a mobile phone, a tablet computer, a notebook, an e-book reader, and the like;

the server 11 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform. The terminal device 10 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal device 10 and the server 11 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited thereto.

The text classification model may be deployed on the server 11 for training, and a large number of training samples may be stored in the server 11, and include at least one set of question-answer pairs and text information for determining answers to questions in the question-answer pairs, and are used for training the text classification model. Optionally, after the text classification model is obtained by training based on the training method in the embodiment of the present application, the trained text classification model may be directly deployed on the server 11 or the terminal device 10. Generally, the text classification model is directly deployed on the server 11, and in the embodiment of the present application, the text classification model is often used to analyze the question input by the user and the corresponding text information, so as to determine the probability of whether the question input by the user is accurate based on the text information.

It should be noted that the method for training the text classification model and the text classification provided by the embodiment of the present application can be applied to various application scenarios including a question-and-answer type general semantic text classification task. Basic tasks such as text classification among various natural language processing tasks in the medical field, but such basic tasks are often crucial to subsequent tasks. For example, the method can be used for judging whether descriptions of disease symptoms, medicines and the like in a medical record text appear in the text, so as to help a doctor to make an auxiliary judgment; in addition, the diseases described by the patients can be classified in advance in a plurality of rounds of conversations, and then the diseases are guided to a specific department or a specific main doctor to play a role of pre-inquiry.

Accordingly, the training samples used in different scenarios are different. Taking a medical scene as an example, the adopted training sample is a question and answer pair and a corresponding case for a doctor of a patient; similarly, when text classification is performed using a trained text classification model, the used questions and the corresponding text information are different in different scenes, for example, in a medical scene, the used text information is a case, and the questions are judgment questions including various disease names input by a patient.

When the embodiment of the application is applied to a Medical scene, the application also relates to a Medical cloud (Medical cloud) in cloud technology. The medical cloud is a medical health service cloud platform established by using cloud computing on the basis of new technologies such as cloud computing, mobile technology, multimedia, 4G communication, big data, Internet of things and the like and combining medical technology, so that sharing of medical resources and expansion of medical scope are realized. Due to the application and combination of the cloud computing technology, the medical cloud improves the efficiency of medical institutions and brings convenience to residents to see medical advice. Like the existing appointment register, electronic medical record, medical insurance and the like of a hospital, the appointment register, electronic medical record, medical insurance and the like are combined products in the field of cloud computing and medical treatment, and the medical treatment cloud also has the advantages of data security, information sharing, dynamic expansion and overall layout; the cloud computing is a computing mode, and distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space and information services according to needs.

In one possible application scenario, the cloud technology also includes the technical field of artificial intelligence cloud services, so-called artificial intelligence cloud services, also commonly referred to as AI as a Service (AI as a Service). The method is a service mode of an artificial intelligence platform, and particularly, the AIaaS platform splits several types of common AI services and provides independent or packaged services at a cloud. This service model is similar to the one opened in an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through an API (application programming interface) interface, and part of the qualified developers can also use the AI framework and AI infrastructure provided by the platform to deploy and operate and maintain own dedicated cloud artificial intelligence services. The method for training the text classification model provided by the embodiment of the application can be realized based on the cloud technology. During specific implementation, each artificial intelligence service involved in the process of training the text classification model is split, for example, a language coding model involved in a language model coding layer obtains a feature vector, classification of a full connection layer is split, and independent or packed services are provided at a cloud end; or when the text classification model is trained, one or more artificial intelligence services provided by the platform can be accessed through an API interface.

In the application, the training samples for training the text classification model can also be stored by adopting a cloud storage technology. A distributed cloud storage system (hereinafter, referred to as a storage system) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces to cooperatively work by using functions such as cluster application, grid technology, and a distributed storage file system, and provides a data storage function and a service access function to the outside.

In a possible application scenario, the servers 11 may be deployed in different regions for reducing communication delay, or different servers 11 may serve the regions corresponding to the terminal devices 10 respectively for load balancing. The plurality of servers 11 share data by a block chain, and the plurality of servers 11 correspond to a data sharing system including the plurality of servers 11. For example, the terminal device 10 is located at a site a and is in communication connection with the server 11, and the terminal device 10 is located at a site b and is in communication connection with another server 11.

Each server 11 in the data sharing system has a node identifier corresponding to the server 11, and each server 11 in the data sharing system may store node identifiers of other servers 11 in the data sharing system, so that the generated block is broadcast to other servers 11 in the data sharing system according to the node identifiers of other servers 11. Each server 11 may maintain a node identifier list as shown in the following table, and store the name of the server 11 and the node identifier in the node identifier list correspondingly. The node identifier may be an IP (Internet Protocol) address and any other information that can be used to identify the node, and table 1 only illustrates the IP address as an example.

TABLE 1

Server name	Node identification
		Node 1	119.115.151.174
Node 2	118.116.189.145
		…	…
Node N	119.124.789.258

The method for text classification provided by the exemplary embodiments of the present application is described below with reference to the accompanying drawings in conjunction with the application scenarios described above, it should be noted that the application scenarios described above are only shown for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect.

In the application, the text classification is mainly question-answer text two classification, namely, the input text information and the target question are processed through a trained text classification model, and the answer probability of the target question is determined based on the text information, so that correct guidance is provided for a user, and the time of the user is saved. Therefore, in the present application, the text classification model is trained first, and the trained text classification model is applied to perform text classification.

The first embodiment is as follows: a method of training a text classification model.

As shown in fig. 2, a flowchart of a method for training a text classification model provided in the embodiment of the present application includes the following steps:

step S200, a first training sample set is obtained, and each first sample data in the first training sample set comprises at least one group of question-answer pairs and text information for determining answers to questions in the question-answer pairs.

Take the example of a medical scenario: the first sample data comprises patient cases; and at least one question-answer pair for the case, the question-answer pair comprising a question posed by the patient for the case and an answer given by the doctor for the patient's question, such as the patient "do the doctor i have the medical disease", the doctor "yes", or the patient "what disease the doctor i have", the doctor "you have the medical disease".

After the first training sample set is obtained, multiple rounds of first iterative training are carried out on the text classification model based on first sample data in the first training sample set, so that a trained text classification model is obtained.

Step S201, according to the first sample data, performing multiple rounds of first iterative training on the text classification model to obtain a trained text classification model.

In the present application, the round of the first iterative training may be preset; or determined according to a stopping condition in the training process, wherein the stopping condition can be that the loss function converges to a desired value or that the loss function becomes different after reaching a certain value and stabilizing.

In the present application, when multiple rounds of first iterative training are performed on a text classification model, the structure of the text classification model is not changed, as shown in fig. 3, which is a structural diagram of a text classification model provided in an embodiment of the present application, the text classification model 300 includes an input layer 301, a language model coding layer 302, an embedding layer 303, and a full connection layer 304. Therefore, the operations performed by each round of the first iterative training are the same, and the process of each round of the first iterative training is shown in fig. 4, which is a flowchart of a method for iteratively training a text classification model according to an embodiment of the present application, and includes:

step S400, inputting the first sample data into a language model coding layer through an input layer to obtain a first feature vector of the first sample data.

In the present application, the language model coding layer is one of BERT, Roberta, and XLNet.

In the application, in the training process, first sample data is input into an input layer, then the input layer inputs the first sample data into a language model coding layer for coding, and a first characteristic vector of the first sample data is determined.

In one possible implementation manner, the input layer performs data splicing processing on the text information contained in the first sample data and at least one group of question-answer pairs.

When data splicing processing is carried out, a [ CLS ] classification token is arranged at the head of data, and text information, a question and an answer are connected through [ SEP ]; fig. 5 is a schematic diagram of a first sample data splicing process provided in the embodiment of the present application.

In the present application, after the language model coding, each word in the first sample data will output a 1 × 1024 vector as the feature vector of the word, that is, each word in the text message and each word in the question-answer pair will output a 1 × 1024 vector as the corresponding first feature vector.

Step S401, inputting the first feature vector into the embedding layer, and performing keyword highlighting on the feature vector used for representing the text information in the first feature vector according to the feature vector used for representing the question-answer pair in the first feature vector through keyword highlighting operation introduced in the embedding layer to obtain a second feature vector of the text information.

In order to increase the incidence relation between the text information and the question-answer pairs and obtain more pertinent embedded vectors, a keyword highlighting operation is introduced into an embedding layer in the application, such as a structure diagram of a text classification model shown in fig. 6.

Therefore, in the embedding layer of the text classification model training process, not only the dimension reduction is performed on the first feature vector in the first sample data to generate the feature vector with a fixed dimension, but also the keyword highlighting operation is performed on the words in the text information in the first sample data to obtain the corresponding second feature vector in the text information.

In the present application, the keyword highlighting operation is performed on the words in the text information through the keyword highlighting operation in the embedded layer, and the specific operation of obtaining the second feature vector of the text information is as follows:

and identifying parts of speech of the keywords in the question-answer pair of the first sample data, and setting a part of speech tag set according to the parts of speech.

Identifying parts of speech of keywords in the question-answer pairs, wherein the keywords tend to contain more effective information; part-of-speech tags contain nouns, verbs, adjectives, adverbs, numbers, or foreign words; that is, in the present application, the part-of-speech tag set is updated according to the part-of-speech tag setting of the keyword in the question-answer pair.

Determining a part-of-speech tag for each word in the text information of the first sample data; i.e. determining whether each word in the text information is a noun, a verb, an adjective, etc.

And judging whether the part-of-speech tag corresponding to each word in the text information is in the part-of-speech tag set identified in the question-answer pair corresponding to the text information, and adding a target vector to the feature vector of each word in the text information according to a judgment result to obtain a second feature vector.

In the application, the feature vector of each word in the text information is processed by the embedding layer, the processed feature vector has a fixed dimension, and the feature vector can be subjected to dimension reduction processing in the embedding layer. Let the feature vector of each word be d_i。

Further identifying the part of speech of each word in the text information, judging whether the part of speech is in a part of speech tag set determined according to the question-answer pair, and if the part of speech is in the part of speech tag set, adding one l⁺Vector, otherwise add one l^-Vector of l⁺And l^-Are all and d_iWord vectors of the same dimension, and⁺and l^-Is the target vector in the embodiment of the application.

From the above, a newly generated word vector for each word in the text information is

Wherein h is_iIs 1⁺Or l^-. I.e. highlighted by keywords via text informationThe second feature vector obtained by the operation is

Subsequent training is then continued with the second feature vector so that the word vector for each word in the text message has a higher relevance to the question and the answer.

It should be noted that the keyword highlighting operation of the present application is used only when there are three types of data, namely, text, question and answer at the same time, and when training is performed on the question-and-answer text classification task.

And S402, inputting the second characteristic vector and the characteristic vector for representing the question-answer pair into the full-connection layer, and determining answer probability corresponding to the question in the question-answer pair.

In the present application, an activation function for text classification is set in the fully-connected layer, and the fully-connected layer corresponds to a classifier.

In a possible implementation example, the output of the first character [ CLS ] is taken as the input of the full connection layer, and the final output is obtained by aiming at the text classification learning task through the activation function.

And step S403, reversely adjusting the model parameters of the language model coding layer according to the answer probability output by the full connection layer and the answer in the first sample data.

And determining a loss function according to the answer probability output by the text classification model based on the questions in the question-answer pairs of the text information and the answers in the question-answer pairs of the first text training data, and reversely adjusting the model parameters of the language model coding layer according to the loss function.

In the present application, a plurality of first iterative training rounds are performed according to steps S400 to S403 to obtain a trained text classification model.

In a possible implementation manner, in order to enable the model to learn more text knowledge, the application also proposes a method for generating positive and negative samples: randomly generating n from each text message_qQuestions and associated answers. The specific operation is as follows: randomly extracting n from text information_sWord and splicing the extracted wordsAs a question in a question-answer pair, and then splicing the remaining words in the text information as answers in the question-answer pair, wherein n_qAnd n_sIs a positive integer.

In the application, in order to increase the generalization of the text classification model and the ability of learning knowledge, the model parameters of the text classification model are finely adjusted on the basis of the trained text classification model. In the process, a text classification model obtained by text classification task training is used as a model initial point for transfer learning, similar task retraining, namely multi-task learning, is carried out, the learning efficiency of the model is increased, and the generalization of the model is improved; meanwhile, in order to increase the diversity of data and the robustness of the model in the training process, noise of different degrees is introduced into the training data of the model through various data enhancement technologies. See the training process of the text classification model of embodiment two for details.

Example two: and a second method for training a text classification model.

As shown in fig. 7, a structure diagram of another text classification model provided for the embodiment of the present application is a structure diagram of the text classification model 700, where the text classification model 700 includes an input layer 701, a language model coding layer 702, an embedding layer 703, and a plurality of fully-connected layers 704, where the plurality of fully-connected layers includes a fully-connected layer for implementing a main task of text classification, and at least one fully-connected layer for implementing a sub-task similar to but different from the main task of text classification, where an activation function in each fully-connected layer is different, and a corresponding loss function is different.

As shown in fig. 7, the subtasks similar to but different from the text classification main task in the present application include, but are not limited to: text inclusion, multitask selection.

And the text inclusion task corresponds to an independent full link layer, and the multi-task selection task corresponds to an independent full link layer.

It should be noted that the input layer, the language model coding layer, and the embedding layer in the text classification model are the input layer, the language model coding layer, and the embedding layer in the text classification model obtained by training in the first embodiment, and because other tasks are introduced, the text classification task is not trained, and therefore, no keyword highlighting operation is introduced into the embedding layer in the training process.

As shown in fig. 8, a flowchart of another text classification model training method provided in the embodiment of the present application includes the following steps:

step S800, a second training sample set is obtained, wherein the second training sample set comprises second sample data subjected to data enhancement processing, and each second sample data comprises at least one group of question-answer pairs and text information used for determining answers to questions in the question-answer pairs.

In the present application, the data enhancement process includes one or a combination of the following:

randomly replacing synonyms, namely randomly replacing words in the text data with words in a synonym table according to a set first proportion;

randomly inserting words, namely randomly selecting words in the text according to a set second proportion and randomly inserting the words into any position in the text;

randomly deleting words, namely randomly deleting words in the text information according to a set third proportion;

and randomly reversing words, namely randomly selecting two words in the text information according to a set fourth proportion and reversing positions.

Step S801, executing multiple rounds of second iterative training on the basis of the trained text classification model according to second sample data to obtain a retrained text classification model.

It should be noted that the operations performed in each second iterative training are the same, so the present application only describes one second iterative training, and as shown in fig. 9, a flowchart of a method for iteratively training a text classification model for a second iteration provided by an embodiment of the present application includes:

and step S900, inputting second sample data into a language model coding layer of the trained text classification model through the input layer to obtain the feature vector of the second sample data.

Specifically, reference may be made to the description in the first embodiment, which is not repeated herein.

Step S901, generating a feature vector of a fixed dimension by passing the feature vector of the second sample data through the embedding layer of the trained text classification model.

And step S902, respectively inputting the feature vectors output by the embedding layer into the full-connection layer corresponding to each task, and determining the loss function corresponding to each task.

In the application, the full connection layer corresponding to each task outputs a corresponding result, and the output result is compared with the input preset result with the label to determine the loss function corresponding to each task.

Step S903, the model parameters of the language model coding layer of the trained text classification model are reversely adjusted according to the loss function of each task.

In the present application, when the model parameters of the language model coding layer of the trained text classification model are inversely adjusted according to the loss function of each task:

determining a plurality of loss functions according to the output result and the corresponding preset result of each full connection layer; according to preset weight distribution of a main task and an auxiliary task, weighting processing is carried out on loss functions corresponding to all tasks to obtain target loss functions; and reversely adjusting the model parameters of the language model coding layer according to the target loss function.

For example, a text classification task is preset: the text contains tasks: and if the multi-task selection task is 8:1:1, classifying the task loss function according to the text when determining the target loss function: text implication task loss function: the multitask selection task loss function determines a target loss function for a weight ratio of 8:1: 1.

In the application, in order to avoid performance reduction caused by model overfitting while enabling the model to learn more knowledge, weights are distributed to main tasks and auxiliary tasks in the model. The multitask weight can reflect the importance ratio of a certain task in a final result, and the text classification task is important in the application, so that the weight of the text classification task is set to be the maximum.

It should be noted that countertraining may also be cited in the process of training the text classification model in the first and second embodiments. In the training process of the text classification model, the prediction result of the text classification model can be easily changed when the text data generates some slight disturbance, so that the prediction of the text classification model is not accurate enough. This perturbation is called countermeasure perturbation, the perturbed input is called countermeasure sample, and the process of misleading the input countermeasure sample to the model is called countermeasure attack. The vulnerability of the text classification model when encountering anti-attack brings great risk to practical application.

In order to improve the robustness of the text classification model to the attack resistance, the application provides a method for the confrontation training, the confrontation training is a training mode introducing noise, and can regularize model parameters, and the confrontation training refers to a method for constructing a confrontation sample in the training process of the model and mixing the confrontation sample and an original sample together to train the model, in other words, the confrontation training is carried out on the model in the training process of the model so as to improve the robustness and the generalization capability of the model to the confrontation attack.

In the application, a disturbance r is added on the basis of a feature vector obtained by language model coding by using an FGM algorithm_adv＝∈g/‖g‖₂Wherein g is the input gradient

The perturbation is increased to increase the difficulty of model convergence to achieve the effect of countertraining.

The FGM algorithm used is specifically as follows:

training data for each text of training:

calculating the forward Loss of the sample data feature vector (X), and performing backward propagation to obtain a first gradient value;

the forward Loss is determined by comparing the output result after passing through the embedding layer and the full connection layer with the answer in the sample data according to the feature vector output by the language model code.

Calculating a disturbance vector r according to the gradient value of the embedded matrix and the first gradient value_advAnd adding the perturbation vectorTo the current embedding vector, obtain the confrontation vector (X + r)_adv)；

Computing a confrontation vector (X + r)_adv) Obtaining a confrontation gradient value through the forward Loss and the backward propagation, and accumulating a first gradient value aiming at the confrontation gradient value to obtain a target gradient value;

and further adjusting the model parameters of the language model coding layer according to the target gradient value.

The application provides a training method of a question-answering text classification model, provides a keyword highlighting operation in a training process, and combines various enhancement strategies, wherein the enhancement strategies comprise data enhancement, transfer learning, multi-task learning and confrontation training. The keyword highlighting operation is that the part of speech of the words in the question-answer pairs is identified, then whether the part of speech of each word in the text information is matched with the part of speech of the identified question-answer pairs is judged, and matching information is added to generate a new word vector for training. In order to enhance the generalization ability of the model, a plurality of data enhancement technologies are adopted, and certain words in the text information are randomly deleted, replaced and inserted according to a certain proportion, so that noise is added to the training data of the model. The migration learning can enhance the generalization of the model and the knowledge learning ability by training on a larger data set with similar data sources, simultaneously introducing countertraining to obtain an initial model, and then performing joint training with the data set with similar tasks, namely multi-task learning. The model provided by the application achieves extremely high accuracy rate on the text classification task.

Example three: a method of text classification.

As shown in fig. 10, a flowchart of a text classification method provided in the embodiment of the present application includes the following steps:

step S1000, a text classification request containing text data is obtained, wherein the text data contains a target question and target text information for judging whether the target question is correct.

In the present application, the target question is a judgment question including a target noun.

Taking a medical scenario as an example, the target text information is a case of a patient, the target question is a question input by the patient, the input question is a judgment question containing a target term, such as "whether the patient has a medical disease", the target question is a judgment question, and the contained target term is "the medical disease".

Step S1001, inputting text data into a trained text classification model, and determining whether a target problem is a correct text classification result based on the trained text classification model.

The trained text classification model is obtained by training through the method of the embodiment one and the embodiment two in the application.

For example, a case of a patient and a target question of "whether the patient has a medical disease" are input into the trained text classification model, the text classification model outputs probability values of "yes" and "no" for the target question, and a classification result corresponding to the target question can be determined according to the probability values, that is, whether the patient has the medical disease is determined according to the case of the patient.

In the present application, the trained text classification model includes an input layer, a language model coding layer, an embedding layer, and a fully connected layer. In the text classification process, the text data is input into the trained text classification model, the text data is processed sequentially through all layers in the text classification model, and finally a text classification result is output.

As shown in fig. 11, an overall method flowchart for text classification provided in the embodiment of the present application includes:

step S1100, acquiring a text classification request containing text data;

the text data is target text information and a target question, and the target question is a judgment question containing a target noun.

Step S1101, inputting the text data into the trained text classification model, and performing a stitching process on the target text information and the target question in the text data through the input layer.

Step S1102, the text data after the splicing processing is transmitted to the language model coding layer through the input layer, and the coding processing is performed through the language model coding layer, so as to obtain the feature vector of the text data.

In the present application, after the language model coding, each word in the text data outputs a 1 × 1024 vector as the feature vector of the word.

Step S1103, the feature vectors are transmitted to the embedding layer through the language model coding layer, and the feature vectors are processed by the embedding layer, so as to obtain the feature vectors with fixed dimensions.

Because the dimension of each word output by the language model coding layer is not fixed, a feature vector with fixed dimension is generated through the embedding layer;

in a possible implementation manner, the dimension of the feature vector of the word output by the language model coding layer is large, which affects the computational efficiency, and at this time, the feature vector can be subjected to dimension reduction processing by the embedding layer.

And step S1104, inputting the target feature vector into the full-connection layer through the embedding layer, and performing classification processing through the full-connection layer to determine a classification result.

In the application, the full connection layer comprises an activation function aiming at text classification, and the activation function obtains a final classification result aiming at a text classification task. Note that the full link layer corresponds to a text classifier in the present application.

Because the text classification method is mainly based on question-answer text classification, keyword highlighting operation is introduced in the model training process, and information of question-answer pairs is introduced into the embedded vector process of text information through the keyword highlighting operation, so that the embedded vectors of the text information are more related to the question and answer, and the accuracy of the trained text classification model for the question-answer text classification is improved.

Based on the same inventive concept, an embodiment of the present application further provides an apparatus 1200 for training a text classification model, as shown in fig. 12, the apparatus 1200 includes: a first obtaining unit 1201 and a training unit 1202, wherein:

a first obtaining unit 1201, configured to obtain a first training sample set, where each first sample data in the first training sample set includes at least one question-answer pair and text information for determining an answer to a question in the question-answer pair;

a training unit 1202, configured to perform multiple rounds of first iterative training on the text classification model according to the first sample data to obtain a trained text classification model;

wherein, the text classification model includes an input layer, a language model coding layer, an embedding layer and a full connection layer, and the training unit 1202 is specifically configured to:

In one possible implementation, the training unit 1202 is specifically configured to:

identifying the part of speech of the keyword in the question-answer pair of the first sample data, and setting a part of speech tag set according to the part of speech; and determining a part-of-speech tag for each word in the text information of the first sample data, and adding a target vector to the feature vector of each word according to a judgment result of whether the part-of-speech tag is in the part-of-speech tag set or not to obtain a second feature vector.

the first obtaining unit 1201 is further configured to obtain a second training sample set, where the second training sample set includes second sample data after data enhancement processing, and each second sample data includes at least one group of question-answer pairs and text information used for determining answers to questions in the question-answer pairs;

the training unit 1202 is further configured to perform multiple rounds of second iterative training on the basis of the trained text classification model according to the second sample data to obtain a retrained text classification model;

the training unit 1202 is specifically configured to:

weighting the loss function corresponding to each task according to a preset task weight proportion to obtain a target loss function; and reversely adjusting the model parameters of the language model coding layer of the trained text classification model according to the target loss function.

In one possible implementation, the training unit 1202 is further configured to:

obtaining a first loss function according to the feature vector output by the language model coding layer; determining a first gradient value according to the first loss function; calculating a disturbance vector according to the gradient value of the embedded matrix and the first gradient value, and adding the disturbance vector to the feature vector subjected to embedded dimension reduction processing to obtain a confrontation vector; determining a second loss function according to the countermeasure vector; obtaining a confrontation gradient value in a reverse direction according to the second loss function, and accumulating the confrontation gradient value to the first gradient value to obtain a target gradient; and adjusting the model parameters of the language model coding layer according to the target gradient.

In one possible implementation, the language model coding layer is one of BERT, Roberta, and XLNet.

Based on the same inventive concept, an embodiment of the present application further provides an apparatus 1300 for text classification, as shown in fig. 13, where the apparatus 1300 includes: a second obtaining unit 1301 and a determining unit 1302, wherein:

a second obtaining unit 1301, configured to obtain a text classification request including text data, where the text data includes a target question and target text information used to determine whether the target question is correct;

a determining unit 1302, configured to input text data into the trained text classification model, and determine whether the target problem is a correct text classification result based on the trained text classification model; the trained text classification model is obtained by training through the method for training the text classification model provided by the embodiment of the application.

For convenience of description, the above parts are separately described as units (or modules) according to functional division. Of course, the functionality of the various elements (or modules) may be implemented in the same one or more pieces of software or hardware in practicing the present application.

Having described the method and apparatus for text classification and the corresponding method and apparatus for training a text classification model according to an exemplary embodiment of the present application, a computing apparatus in a text classification process or a text classification model training process according to another exemplary embodiment of the present application will be described.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

In one possible implementation, a computing device provided by the embodiments of the present application may include at least a processor and a memory. Wherein the memory stores program code that, when executed by the processor, causes the processor to perform any of the methods of text classification of various exemplary embodiments herein and any of the methods of text classification model training of various exemplary embodiments herein.

In some possible implementations, the present application further provides a computer-readable storage medium including program code, which, when the program product is executed on an electronic device, is configured to cause the electronic device to perform the steps of any one of the above-described methods for text classification, and the steps of any one of the above-described methods for training a text classification model.

A computing device 1400 according to such an embodiment of the present application is described below with reference to fig. 14. The computing device 1400 of fig. 14 is only one example and should not be taken as limiting the scope of use and functionality of embodiments of the present application.

As with fig. 14, computing device 1400 is embodied in the form of a general purpose computing device. Components of computing device 1400 may include, but are not limited to: the at least one processor 1401, the at least one memory unit 1402, and a bus 1403 connecting the various system components (including the memory unit 1402 and the processor 1401).

Bus 1403 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The storage unit 1402 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)14021 and/or cache storage unit 14022, and may further include Read Only Memory (ROM) 14023.

Storage unit 1402 may also include a program/utility 14025 having a set (at least one) of program modules 14024, such program modules 14024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The computing device 1400 may also communicate with one or more external devices 1404 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the computing device 1400, and/or with any devices (e.g., router, modem, etc.) that enable the computing device 1400 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 1405. Moreover, computing device 1400 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through network adapter 1406. As shown, the network adapter 1406 communicates with other modules for the computing device 1400 over a bus 1403. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computing device 1400, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

In some possible embodiments, various aspects of the method for text classification provided herein may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of the method for text classification according to various exemplary embodiments of the present application described above in this specification, and the method for text classification model training, when the program product is run on the computer device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The present application is described above with reference to block diagrams and/or flowchart illustrations of methods, apparatus (systems) and/or computer program products according to embodiments of the application. It will be understood that one block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, and/or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.

Accordingly, the subject application may also be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Furthermore, the present application may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this application, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of training a text classification model, the method comprising:

performing multiple rounds of first iterative training on the text classification model according to the first sample data to obtain a trained text classification model;

inputting the first sample data into the language model coding layer through the input layer to obtain a first feature vector of the first sample data;

inputting the first feature vector into the embedding layer, and performing keyword highlighting on the feature vector used for representing text information in the first feature vector according to the feature vector used for representing question and answer pairs in the first feature vector through keyword highlighting operation introduced into the embedding layer to obtain a second feature vector of the text information;

inputting the second feature vector and the feature vector for representing the question-answer pair into the full-connection layer, and determining answer probability corresponding to the question in the question-answer pair;

2. The method of claim 1, wherein the performing keyword highlighting on the feature vector of the first feature vector for characterizing text information according to the feature vector of the first feature vector for characterizing question-answer pairs by the keyword highlighting operation introduced in the embedding layer to obtain the second feature vector of the text information comprises:

and determining part-of-speech tags for each word in the text information of the first sample data, and adding a target vector to the feature vector of each word according to a judgment result of whether the part-of-speech tags are in the part-of-speech tag set, so as to obtain the second feature vector.

3. The method of claim 1, wherein the method further comprises:

continuously setting at least one full connection layer after the embedding layer, wherein each full connection layer corresponds to one task, and setting a loss function for each task;

wherein, each round of the second iterative training process is as follows:

inputting the second sample data into a language model coding layer of the trained text classification model through the input layer to obtain a feature vector of the second sample data;

enabling the feature vector of the second sample data to pass through an embedding layer of the trained text classification model to generate a feature vector with fixed dimensionality;

4. The method of claim 3, wherein said back-adjusting model parameters of a language model coding layer of said trained text classification model according to said per-task loss function comprises:

5. The method of claim 3, wherein the data enhancement processing comprises one or a combination of:

6. A method as claimed in claim 1 or 3, characterized in that the method further comprises:

determining a first gradient value according to the first loss function;

determining a second loss function according to the countermeasure vector;

7. A method of text classification, the method comprising:

inputting the text data into a trained text classification model, and determining whether the target problem is a correct text classification result based on the trained text classification model; wherein the trained text classification model is obtained by training according to the method of any one of claims 1 to 6.

8. An apparatus for training a text classification model, the apparatus comprising:

a first obtaining unit, configured to obtain a first training sample set, where each first sample data in the first training sample set includes at least one question-answer pair and text information for determining an answer to a question in the question-answer pair;

9. A computing device comprising at least one processor and at least one memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1-6 or the steps of the method of claim 7.

10. Computer-readable storage medium, characterized in that it comprises program code for causing an electronic device to carry out the steps of the method of any one of claims 1 to 6 or the steps of the method of claim 7, when said program product is run on said electronic device.