CN111897964B

CN111897964B - Text classification model training method, device, equipment and storage medium

Info

Publication number: CN111897964B
Application number: CN202010805356.8A
Authority: CN
Inventors: 邱耀; 张金超; 牛成; 周杰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2023-10-17
Anticipated expiration: 2040-08-12
Also published as: CN111897964A

Abstract

The application discloses a text classification model training method, device, equipment and storage medium, and belongs to the field of artificial intelligence. According to the embodiment of the application, on one hand, the countermeasure sample is introduced, and the text classification model is trained by using the text sample and the countermeasure sample, so that the text classification model learns a classification method for the disturbed text, the robustness of the text classification model is improved, and the accuracy of text classification is improved. On the other hand, the text classification model can reconstruct the text characteristics of the countermeasure sample extracted during classification, restore the text characteristics into text contents, and improve the interpretability of the countermeasure training method. Model parameters are trained by combining errors between the reconstructed text content and the text content of the text sample, so that the text classification model can extract more accurate text features, namely more accurate feature expression of the text content is obtained, and the robustness and accuracy of feature extraction of the text classification model are improved.

Description

Text classification model training method, device, equipment and storage medium

Technical Field

The application relates to the field of artificial intelligence, in particular to a text classification model training method, a device, equipment and a storage medium.

Background

The artificial intelligence is applied to various fields, replaces the work of people based on the artificial intelligence, and can greatly improve the service processing efficiency. In terms of text classification, the text classification model can be trained to obtain a trained text classification model, and the text to be classified is input into the trained text classification model to be predicted to obtain the type of the text.

At present, a text classification model training method generally obtains a text sample, classifies the text sample based on the text classification model to obtain a prediction classification result, and updates model parameters according to the prediction classification result and a target classification result carried by the text sample.

The text classification model trained by the method has poor robustness, and small disturbance is added to the input text, so that the text classification model can be wrongly classified.

Disclosure of Invention

The embodiment of the application provides a text classification model training method, a device, equipment and a storage medium, which can improve the robustness of a text classification model and improve the accuracy effect of feature extraction and classification of the text classification model. The technical scheme is as follows:

In one aspect, a text classification model training method is provided, the method comprising:

based on a text classification model, extracting characteristics of a text sample and a countermeasure sample of the text sample, classifying based on the extracted text characteristics, and outputting prediction classification results of the text sample and the countermeasure sample, wherein the text sample and the corresponding countermeasure sample both carry the same target classification result;

acquiring a first classification error and a second classification error, wherein the first classification error is an error between a prediction classification result and a target classification result of the text sample, and the second classification error is an error between the prediction classification result and the target classification result of the countersample;

based on the text classification model, identifying the text characteristics of the countermeasure sample, and outputting a text identification result corresponding to the text characteristics;

acquiring an identification error based on the text identification result and the text sample;

updating model parameters of the text classification model based on the first classification error, the second classification error and the recognition error.

In one aspect, a text classification model training apparatus is provided, the apparatus comprising:

The classification module is used for extracting characteristics of a text sample and a countermeasure sample of the text sample based on a text classification model, classifying the text sample and the countermeasure sample based on the extracted text characteristics, and outputting a prediction classification result of the text sample and the countermeasure sample, wherein the text sample and the corresponding countermeasure sample both carry the same target classification result;

the acquisition module is used for acquiring a first classification error and a second classification error, wherein the first classification error is an error between a prediction classification result and a target classification result of the text sample, and the second classification error is an error between the prediction classification result and the target classification result of the countermeasure sample;

the recognition module is used for recognizing the text characteristics of the countermeasure sample based on the text classification model and outputting a text recognition result corresponding to the text characteristics;

the acquisition module is further used for acquiring an identification error based on the text identification result and the text sample;

and the updating module is used for updating the model parameters of the text classification model based on the first classification error, the second classification error and the identification error.

In one possible implementation, the identification module is configured to:

mapping the text features of the countermeasure sample to a real number domain based on the text classification model to obtain word embedding information corresponding to the text features;

and matching the word embedding information with a word list, outputting at least one matched word, and taking the at least one word as a text recognition result corresponding to the text feature.

In one possible implementation, the text classification model includes a two-layer neural network, a first layer of neural network for mapping text features of the challenge sample to a real-number domain, and a second layer of neural network for matching the word embedding information with a vocabulary.

In one possible implementation manner, the recognition module is configured to normalize the text features of the challenge sample based on the text classification model, and perform the mapping to real number domain and matching with vocabulary based on the normalized text features.

In one possible implementation, the update module is configured to perform any one of the following:

obtaining a product of the recognition error and the weight of the recognition error, obtaining the sum of the product, the first classification error and the second classification error as a total error, and updating model parameters of the text classification model based on the total error;

And weighting the first classification error, the second classification error and the recognition error based on the weights of the first classification error, the second classification error and the recognition error to obtain a total error, and updating model parameters of the text classification model based on the total error.

In one possible implementation, the classification module includes a classification unit and a generation unit;

the classifying unit is used for inputting the text sample into a text classifying model, extracting the characteristics of the text sample by the text classifying model, classifying the text sample based on the extracted text characteristics, and outputting a prediction classifying result of the text sample;

the generating unit is used for generating a corresponding countermeasure sample based on the text sample, the prediction classification result of the text sample and the target classification result;

the classification unit is also used for extracting the characteristics of the countermeasure sample, classifying the countermeasure sample based on the extracted text characteristics, and outputting a prediction classification result of the countermeasure sample.

In one possible implementation, the classification unit includes a mapping subunit and an extraction subunit;

the mapping subunit is used for mapping words contained in the text content of the text sample to real number fields by the text classification model to obtain word embedding information of the text sample;

And the extraction subunit is used for extracting the characteristics of the word embedding information of the text sample to obtain the text characteristics of the text sample.

In one possible implementation, the generating unit includes a determining subunit and an adding subunit;

the determining subunit is used for determining the countermeasure disturbance of the text sample based on the predicted classification result and the target classification result of the text sample;

the adding subunit is configured to add the countermeasure disturbance to the text sample, so as to obtain a countermeasure sample corresponding to the text sample.

In one possible implementation, the determining subunit is configured to:

acquiring a first classification error of the text sample according to a prediction classification result and a target classification result of the text sample;

acquiring candidate countermeasure disturbances of the text sample based on the gradient of the first classification error;

adding the candidate countermeasure disturbance into the text sample to obtain a candidate countermeasure sample corresponding to the text sample;

continuously obtaining a classification error of the candidate countermeasure sample based on a prediction classification result and a target classification result obtained by classifying the candidate countermeasure sample;

and updating the candidate countermeasure disturbance of the text sample based on the gradient of the classification error of the candidate countermeasure sample until the target condition is reached, and obtaining the countermeasure disturbance of the text sample.

In one possible implementation manner, the adding subunit is configured to add the countermeasure disturbance to the text content of the text sample, so as to obtain the text content of the countermeasure sample corresponding to the text sample;

the feature extraction process of the challenge sample comprises:

mapping words contained in the text content of the countermeasure sample to a real number domain to obtain word embedding information of the countermeasure sample;

and extracting features of the word embedding information of the countermeasure sample to obtain text features of the countermeasure sample.

In one possible implementation manner, the adding subunit is configured to add the countermeasure disturbance to word embedding information of the text sample, so as to obtain word embedding information of a countermeasure sample corresponding to the text sample;

the feature extraction process of the challenge sample comprises:

In one possible implementation, the text classification model includes an encoder, a classifier, and a decoder;

wherein the encoder is used for feature extraction;

the classifier is used for performing classification based on the extracted text features;

The decoder is used for executing recognition on the text characteristics of the countermeasure sample and outputting a text recognition result corresponding to the text characteristics.

In one aspect, an electronic device is provided that includes one or more processors and one or more memories having at least one piece of program code stored therein that is loaded and executed by the one or more processors to implement various alternative implementations of the text classification model training method described above.

In one aspect, a computer readable storage medium having at least one program code stored therein is provided, the at least one program code loaded and executed by a processor to implement various alternative implementations of the text classification model training method described above.

In one aspect, a computer program product or computer program is provided, the computer program product or computer program comprising one or more program codes, the one or more program codes being stored in a computer readable storage medium. The one or more processors of the electronic device are capable of reading the one or more pieces of program code from the computer readable storage medium, the one or more processors executing the one or more pieces of program code such that the electronic device is capable of performing the text classification model training method of any of the possible embodiments described above.

According to the method and the device provided by the embodiment of the application, on one hand, the countermeasure sample is introduced, and the text sample and the countermeasure sample are used for training the text classification model, so that the text classification model learns the classification method aiming at the disturbed text, the robustness of the text classification model is improved, and the accuracy of text classification is improved. On the other hand, the text classification model can reconstruct the text characteristics of the countermeasure sample extracted during classification, restore the text characteristics into text contents, and improve the interpretability of the countermeasure training method. Model parameters are trained by combining errors between the reconstructed text content and the text content of the text sample, so that the text classification model can extract more accurate text features, namely more accurate feature expression of the text content is obtained, and the robustness and accuracy of feature extraction of the text classification model are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an application flow of an emotion analysis system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an application flow of an intent classification system according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an implementation environment of a text classification model training method according to an embodiment of the present application;

FIG. 4 is a flowchart of a text classification model training method provided by an embodiment of the application;

FIG. 5 is a flowchart of a text classification model training method provided by an embodiment of the application;

FIG. 6 is a schematic diagram of a reconstructor or decoder according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a pre-training language model according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a text classification model according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a self-encoder according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a text classification model training device according to an embodiment of the present application;

fig. 11 is a block diagram of a terminal according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements or items having substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the terms "first," "second," and "n," and that there is no limitation on the amount and order of execution. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another element. For example, the first image can be referred to as a second image, and similarly, the second image can be referred to as a first image, without departing from the scope of the various examples. The first image and the second image can both be images, and in some cases, can be separate and distinct images.

The term "at least one" in the present application means one or more, and the term "plurality" in the present application means two or more, for example, a plurality of data packets means two or more data packets.

It is to be understood that the terminology used in the description of the various examples herein is for the purpose of describing particular examples only and is not intended to be limiting. As used in the description of various such examples and in the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The term "and/or" is an association relationship describing an associated object, meaning that three relationships can exist, e.g., a and/or B, can be represented: a exists alone, A and B exist together, and B exists alone. In the present application, the character "/" generally indicates that the front and rear related objects are an or relationship.

It should also be understood that, in the embodiments of the present application, the sequence number of each process does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiments of the present application.

It should also be understood that determining B from a does not mean determining B from a alone, but can also determine B from a and/or other information.

It will be further understood that the terms "Comprises" and/or "Comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "if" may be interpreted to mean "when" ("white" or "upon") or "in response to a determination" or "in response to detection". Similarly, the phrase "if a [ stated condition or event ] is detected" may be interpreted to mean "upon a determination" or "in response to a determination" or "upon a detection of a [ stated condition or event ] or" in response to a detection of a [ stated condition or event ], depending on the context.

The following description of the terms involved in the present application.

Challenge training: the English name of the method is adversarial training, and the method is a mode for enhancing the robustness of the model. The challenge training technique is based on White-box attacks (White-box attacks), which refer to the attacker knowing all information about the model under attack, including model structure, loss function, parameter values, architecture, training method, and in some cases training data. During the challenge training, some minor challenge disturbance can be mixed for the original sample, resulting in a challenge sample that changes little relative to the original sample, but is likely to cause misclassification of the model, and then adapt the model to this change, thus being robust to the challenge sample.

The challenge sample is an input sample formed by deliberately adding a minute challenge disturbance to the data set. Inputting the challenge sample into a conventional model can result in the model giving an erroneous output with high confidence, resulting in misclassification.

The countermeasures are disturbance factors added in the original sample, for example, in the image field, white noise can be added in the clean image to obtain the countermeasures, and the white noise is the countermeasures. As another example, in the text field, the countermeasures may refer to some word changes added to the text, or changes to word embedded information of the text, etc.

Self-encoder: the English is called auto-coder, the English is abbreviated as AE, and the English is an artificial neural network (Artificial Neural Networks, ANNs) used in semi-supervised learning and non-supervised learning. The input from the encoder is the same as the learning objective. The self-encoder can perform characterization learning (representation learning) on the input information by taking the input information as a learning target, that is, learn how to perform accurate feature expression on the input information by taking the input information as a target. The self-encoder has the function of characterizing the learning algorithm in a general sense, and is applied to dimension reduction (dimensionality reduction) and outlier detection (anomaly detection).

The self-encoder includes two parts, an encoder (encoder) and a decoder (decoder). According to the learning paradigm, the self-encoders can be divided into a contracted self-encoder (undercomplete autoencoder), a canonical self-encoder (regularized autoencoder), and a Variational self-encoder (variable AutoEncoder, VAE), where the former two are discriminant models and the latter are generative models. Depending on the type of construction, the self-encoder may be a neural network of feedforward or recursive construction.

In particular, given an input space and a feature space, the self-encoder can solve the mapping of both to minimize the reconstruction error of the input features, thereby learning a more accurate feature expression.

For robustness, robust is a transliteration of Robust, which is a Robust and strong meaning. In the computer context, it refers to the ability of the system to survive abnormal and dangerous situations. For example, in the case of input errors, disk failures, network overloads, or intentional attacks, the computer software can be kept from crashing or crashing, which is the robustness of the computer software. By "robustness" is also meant the characteristic of the control system to maintain certain other properties under perturbation of a parameter (e.g., structure or size).

And for the text classification model, the text classification model is used for classifying the input text and determining the type to which the text belongs. For example, the type may be an emotion expressed by the text, an attribute of an object embodied by the text, or an intention expressed by the text, or the like. The type of text that the text classification model needs to determine is also different depending on the particular application of the text classification model.

The robustness of the text classification model refers to the characteristic that the text classification model can still classify accurately when a little change is made in the input text.

Word embedded information: word embedding is a generic term for language models and token learning techniques in natural language processing (Nature Language processing, NLP). Conceptually, it refers to embedding a high-dimensional space, which is the number of all words in dimension, into a continuous vector space, which is much lower in dimension, each word or phrase being mapped to a vector on the real number domain. The word embedding process is a dimension reduction process, which is used for mapping words to real number domains to obtain vector expressions of the words, and the vector expressions of the words can be called word embedding information.

The word Embedding is a process of Embedding, which is a way of converting discrete variables into continuous vectors. Embedding is a process of mapping source data to another space. In popular terms, word embedding may also be referred to as word embedding, that is, mapping a word in the space to which X belongs into a multidimensional vector in the space to which Y belongs, where the multidimensional vector corresponds to embedding into the space to which Y belongs. The mapping process is also referred to as a process of generating an expression over a new space.

Word embedding can be applied to artificial neural networks, dimension reduction of word co-occurrence matrices, probability models, explicit representations of the context in which the words are located, and other scenarios. The method for representing the words or the phrases by using the word embedding information can improve the analysis effect of the text in the NLP.

The vocabulary refers to a specific embodiment of a certain subject method entity, and in the embodiment of the application, the vocabulary can also be called a dictionary, and the vocabulary comprises a large number of words for identifying words corresponding to other types of characteristics.

BERT (Bidirectional Encoder Representations from Transformers), bi-directional encoder representation from transformer): is a pre-training model. The BERT can extract all used word vectors in the samples, store the word vectors into vector files, and provide the following models with the embedding vectors, namely word embedding information.

Forward neural network (Feed Forward Neural Network, FFN): in FFN, parameters travel unidirectionally from the input layer to the output layer, as opposed to the recurrent neural network output layer also traveling in the opposite direction. The interior of the FFN does not form a directed ring.

Interpretability: the machine learning service application targets the output decision. Interpretability refers to the extent to which humans can understand the cause of a decision. The higher the interpretability of the machine learning model, the easier one can understand why certain decisions or predictions are made. Model interpretability refers to an understanding of the model's internal mechanisms and of the model's results. The importance is shown in that: in the modeling stage, a developer is assisted to understand the model, the comparison selection of the model is carried out, and the model is optimized and adjusted if necessary; and in the operation stage, explaining the internal mechanism of the model to the service party, and explaining the model result. Such as a fund recommendation model, needs to be explained: why a fund is recommended for this user.

Data enhancement (Data Augmentation): data augmentation may refer to expanding a data set or enhancing the diversity and generalization of data in a data set. For example, image augmentation is a type of data augmentation, and the image augmentation (image augmentation) technique enlarges the training data set by making a series of random changes to the training image to produce similar but different training samples. Another explanation for image augmentation is that randomly changing training samples may reduce the dependence of the model on certain properties, thereby improving the generalization ability of the model. For example, the image is cropped differently to cause the object of interest to appear in different locations, thereby reducing the dependence of the model on the location of the object. Or adjusting factors such as brightness, color and the like of the image to reduce the sensitivity of the model to the color.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Key technologies to the speech technology (Speech Technology) are automatic speech recognition technology (ASR) and speech synthesis technology (TTS) and voiceprint recognition technology. The method can enable the computer to listen, watch, say and feel, is the development direction of human-computer interaction in the future, and voice becomes one of the best human-computer interaction modes in the future.

Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.

The scheme provided by the embodiment of the application relates to the technologies of text processing, semantic understanding, machine translation, robot question answering and the like in the natural language processing technology of artificial intelligence, and is specifically described by the following embodiment.

The following describes an application scenario of the solution provided by the embodiment of the present application.

The text classification model training method provided by the embodiment of the application can be applied to any scene needing text classification, and can be applied to products of the scenes, such as an emotion analysis system, a yellow counter system, a commodity classification system, an intention classification system and the like.

Emotion analysis systems are used to classify text into two types, recognition or detraction, or other types, such as happiness, feeling, surprise, confusion, etc., based on the meaning and emotion information expressed by the text. The emotion analysis (Sentiment analysis), also called tendency analysis, opinion extraction (Opinion extraction), opinion mining (Opinion mining), emotion mining (Sentiment mining), subjective analysis (Subjectivity analysis), is a process of analyzing, processing, summarizing and reasoning subjective text with emotion colors, such as analyzing emotion tendencies of users on attributes such as zooming, price, size, weight, flash, usability of a digital camera from comment text.

For example, as shown in fig. 1, when it is desired to determine emotion expressed by one text, the input text 101 may be input into the text classification model 102, and the text classification model 102 may perform emotion analysis on the input text 101 and output the type 103 to which the input text 101 belongs. Such as entering the text "the song sounds better and the word-! "through text classification, the type to which the output text belongs is" happy ".

The yellow-back system is used to identify whether text is yellow, back-reacting content, referred to herein as yellow-back content. For example, a user may publish a text in a website or an application, the terminal may send the text to be published to a yellow counter system, the yellow counter system may classify the text to be published, determine whether the text contains yellow counter content, and if so, may feed back that the text fails the audit; if it is determined not to be included, the text may be published in the website or application. Of course, only one example is provided herein, and the yellow counter system may also include a variety of application scenarios, which are not limited in this embodiment of the present application.

The commodity classification system can automatically and efficiently distinguish and apply commodities, and in particular, the commodity classification system can be classified according to industries of commodity production and circulation fields and the like, for example, commodities are classified into large classes of foods, textiles, department stores, hardware, cultural goods and the like, and each large class can be subdivided. Of course, the classification process can also be classified based on other factors, such as, for example, the performance, composition, etc. of the merchandise. The article classification system may apply a text classification model to classify articles based on text information of the articles, for example, classifying according to article names, classifying according to text information of advertisements when the articles are advertisements, and the like.

The intention classification system is used for classifying the intention of the text. Wherein, the intention refers to the purpose that is intended to be achieved. The intent classification system is then used to analyze the intent of the text. The intent classification system may be applied in a variety of scenarios, for example, in a search scenario, the text may be text entered by a user, after which the intent classification system analyzes the corresponding intent. For another example, the text may be text obtained by performing speech recognition on the collected speech signal in a speech interaction scenario. The user sends out voice, the equipment collects voice signals, the voice signals are recognized to obtain texts, and voice instructions sent by the user are determined through intention classification.

For example, as shown in fig. 2, a specific voice interaction application scenario is provided, a user 201 sends a voice signal 202, a device can collect the voice signal 202, perform voice recognition on the voice signal 202 through a voice recognition system 203 to obtain a text 204 corresponding to the voice signal 202, and perform intention classification on the text 204 through an intention classification system 205 to determine a function 206 that is intended to be achieved by the voice signal 202. For example, if the text corresponding to the voice signal is "play music", the intention of the text is determined by the intention classification system to be: the control device plays music. The playing of music is a function of the device.

The environment in which the present application is implemented is described below.

Fig. 3 is a schematic diagram of an implementation environment of a text classification model training method according to an embodiment of the present application. The implementation environment includes a terminal 301 or the implementation environment includes a terminal 301 and a text classification platform 302. Terminal 301 is connected to text classification platform 302 via a wireless network or a wired network.

The terminal 301 can be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer 3) player, or an MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compression standard audio layer 4) player, a laptop portable computer. The terminal 301 installs and runs an application program supporting text classification, which can be, for example, a system application, an instant messaging application, a news push application, a shopping application, an online video application, a social application.

The terminal 301 may obtain a text sample, train the text classification model based on the text sample, obtain a text classification model with good classification accuracy and robustness after training, and then classify the text by using the trained text classification model to determine the type of the text. The terminal 301 is capable of doing this independently and also of providing data services to it through the text classification platform 302. The embodiment of the present application is not limited thereto.

Text classification platform 302 includes at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. Text classification platform 302 is used to provide background services for applications that support text classification. Optionally, text classification platform 302 takes on primary processing work and terminal 301 takes on secondary processing work; alternatively, text classification platform 302 takes on secondary processing work and terminal 301 takes on primary processing work; alternatively, text classification platform 302 or terminal 301, respectively, can solely undertake processing tasks. Alternatively, the text classification platform 302 and the terminal 301 perform collaborative computing by using a distributed computing architecture.

Optionally, the text classification platform 302 includes at least one server 3021 and a database 3022, where the database 3022 is used to store data, and in an embodiment of the present application, the database 3022 can store text samples to provide data services for the at least one server 3021.

Illustratively, the at least one server 3021 is capable of extracting text samples from the database 3022, training a text classification model based on the text samples, the trained text classification model being used. When the terminal 301 has a text classification requirement, the text to be classified can be sent to the at least one server 3021, the at least one server 3021 can call the trained text classification model, classify the received text, determine the type to which the text belongs, and feed back the classification result (i.e., the type of the text) to the terminal 301.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms. The terminal can be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc.

Those skilled in the art will appreciate that the number of terminals 301 and servers 3021 can be greater or fewer. The number of the terminals or the server 3021 and the device type of the embodiment of the present application are not limited, and the number of the terminals or the server 301 and the server 3021 can be only one, or the number of the terminals or the server 3021 can be tens or hundreds, or more.

Fig. 4 is a flowchart of a text classification model training method according to an embodiment of the present application, where the method is applied to an electronic device, and the electronic device is a terminal or a server, and referring to fig. 4, the method includes the following steps.

401. The electronic equipment performs feature extraction on a text sample and a countermeasure sample of the text sample based on a text classification model, classifies the text sample and a prediction classification result of the countermeasure sample based on the extracted text features, and both the text sample and the corresponding countermeasure sample carry the same target classification result.

A text sample refers to a sample in text form, a sample (specimen) refers to a portion of an individual observed or investigated, which in an embodiment of the application is used to train a text classification model. The challenge sample corresponding to the text sample is a challenge sample obtained based on the text sample. For example, adding an countermeasure disturbance to a text sample can generate a corresponding countermeasure sample. The aim of the countermeasure training is to enable the text classification model to accurately classify the countermeasure sample added with the countermeasure disturbance, and thus, the target classification result carried by the countermeasure sample is the same as the target classification result carried by the corresponding text sample.

The prediction classification result refers to a classification result which is output by classifying through a text classification model, wherein the classification process is a process for predicting the type of the text, and the classification result output by the text classification model is a prediction result. The target classification result is a true, correct classification result, which may also be referred to as a "true value". The training of the text classification model aims at enabling a prediction classification result obtained by the text classification model to be infinitely close to the target classification result by updating model parameters of the text classification model, namely enabling the text classification model to accurately classify the text.

In the step 401, the model parameters of the text classification model are initial values, and further model training is required to obtain better model parameters. During model training, the electronic equipment can acquire a text sample, the text sample is input into the text classification model, and the text classification model classifies the text sample to obtain a prediction classification result.

The classification process comprises two parts, namely text characteristics obtained by characteristic extraction and classification based on the text characteristics. Wherein, a feature refers to a feature of an event that is different from other events, and a text feature refers to a feature of the text that is different from other text. The text feature expresses the characteristics of the text through machine language, for example, the text feature can embody the characteristics of words in the text content, the relation between the words and the context, and the like. The context refers to the context, meaning, and may be other words before the word and after the word.

402. The electronic equipment acquires a first classification error and a second classification error, wherein the first classification error is an error between a predicted classification result and a target classification result of the text sample, and the second classification error is an error between the predicted classification result and the target classification result of the countersample.

After the electronic equipment classifies each sample (including the text sample and the countermeasure sample) to obtain a predicted text classification result, the electronic equipment can acquire a classification error according to the predicted classification result and a target classification result, and the current classification effect of the text classification model is measured through the classification error. The classification error corresponding to the text sample is referred to herein as a first classification error, and the classification error corresponding to the countersample is referred to herein as a second classification error.

It can be appreciated that if the classification error is relatively large, the classification effect of the text classification model is relatively poor; if the classification error is smaller, the classification effect of the text classification model is better.

403. And the electronic equipment identifies the text characteristics of the countermeasure sample based on the text classification model and outputs a text identification result corresponding to the text characteristics.

404. And the electronic equipment acquires the recognition error based on the text recognition result and the text sample.

In this step 403 and step 404, the electronic device may also reconstruct the text characteristics of the challenge sample, and restore the text content thereof. The text sample is compared with the text sample, and the text recognition result obtained by reconstruction after disturbance resistance is added to the text sample, so that recognition errors are obtained, the recognition errors are used for indicating the difference between the text recognition result and the text sample, and the robustness and the accuracy of the text classification model in the aspect of feature extraction can be measured. Specifically, the process of identifying text features as corresponding text content is a reconstruction process, and accordingly, the identification error may also be referred to as a reconstruction error.

It can be understood that if the recognition error is small, that is, the difference between the reconstructed result and the text sample corresponding to the countercheck sample is small, that is, even if the countercheck disturbance is added, the original text content can still be accurately restored, the text feature obtained by feature extraction of the text sample can be accurately represented, and the current feature expression mode is accurate. If the recognition error is large, namely the difference between the reconstructed result and the text sample corresponding to the pair of anti-sample is relatively large, namely the original text content can not be restored even if disturbance is added, the original text content is changed, the text characteristics obtained by extracting the characteristics of the text sample are not accurate enough, and the original meaning is changed by slightly changing the text characteristics.

405. The electronic device updates model parameters of the text classification model based on the first classification error, the second classification error, and the recognition error.

The model parameters of the text classification model are updated by combining the classification errors and the recognition errors, so that the robustness and the accuracy of the classification of the text classification model can be improved, and the robustness and the accuracy of the feature extraction of the text classification model can be improved. On one hand, a countermeasure sample is introduced in the training process, the training sample comprises a text sample and a countermeasure sample, and the text classification model can accurately classify both an original text sample and the text sample added with countermeasure disturbance, so that the robustness and the accuracy of classification of the text classification model are improved. On the other hand, the training process adds recognition errors, through error recognition, the text classification model can accurately restore the original text content after adding anti-disturbance to the input text content, the visible text classification model learns an accurate feature expression mode, and the feature extraction step has strong robustness.

According to the method provided by the embodiment of the application, on one hand, the countermeasure sample is introduced, and the text sample and the countermeasure sample are used for training the text classification model, so that the text classification model learns the classification method aiming at the disturbed text, the robustness of the text classification model is improved, and the accuracy of text classification is improved. On the other hand, the text classification model can reconstruct the text characteristics of the countermeasure sample extracted during classification, restore the text characteristics into text contents, and improve the interpretability of the countermeasure training method. Model parameters are trained by combining errors between the reconstructed text content and the text content of the text sample, so that the text classification model can extract more accurate text features, namely more accurate feature expression of the text content is obtained, and the robustness and accuracy of feature extraction of the text classification model are improved.

Fig. 5 is a flowchart of a text classification model training method according to an embodiment of the present application, and referring to fig. 5, the method includes the following steps.

501. The electronic device obtains a text sample.

In the embodiment of the application, the electronic equipment can acquire the text sample and train the text classification model based on the text sample. The text sample carries a target classification result, and the classification capability of the text classification model can be determined through the prediction classification result of the text sample by the text classification model and the target classification result, so that model parameters are adjusted in a plurality of iteration processes, and the classification capability of the text classification model is improved.

Specifically, according to the difference of the storage addresses of the text samples, the electronic device can acquire the text samples in a plurality of modes, in one possible implementation manner, the text samples can be stored in a text database, and when the electronic device needs to train the text classification model, the electronic device can extract the text samples from the text database.

In another possible implementation, the text sample may be a text resource in a website from which the electronic device is able to download the text sample.

In another possible implementation, the text sample may be stored in the electronic device, for example, historical text sent to the electronic device for other devices, or text generated by the electronic device, which may be extracted from local storage by the electronic device.

The above provides several possible ways of obtaining the text sample, and the electronic device may also obtain the text sample in other ways.

502. The electronic equipment inputs the text sample into a text classification model, and the text classification model maps words contained in the text content of the text sample to a real number field to obtain word embedding information of the text sample.

After the electronic device obtains the text sample, the text sample may be used to train a text classification model. Wherein the text sample is in the form of text, i.e. the text sample is in the form of text content. The text classification model can reduce the dimension of the text content, convert the text content into a representation form of continuous vectors, and the method for representing words or phrases by using word embedding information can improve the subsequent analysis effect on the text.

In one possible implementation manner, the text sample includes at least one word, and the text classification model can map the at least one word in the text content of the text sample to real number fields respectively to obtain corresponding word embedding information, where the word embedding information includes word embedding information corresponding to each word. For example, the text sample may be a sentence that includes one or more words, and the text classification model is capable of mapping each word to word-embedded information, i.e., word-embedded information that constitutes the sentence.

In a specific possible embodiment, the word embedding information may be represented by a single hot representation (one-hot representation) or a distributed representation (distributed representation), and of course, other representation manners may also be adopted, specifically, the word embedding information may be in a vector form, the word embedding information in a vector form may be called a word vector, and the word embedding information may also be in a matrix form, which is not limited by the embodiment of the present application.

503. And the electronic equipment performs feature extraction on the word embedding information of the text sample to obtain text features of the text sample.

After the electronic equipment determines word embedding information of the text sample, the word embedding information contains the representation of words in the text content, but the association between each word and the context cannot be clearly determined through the word embedding information, and the electronic equipment can perform feature extraction on the word embedding information to obtain text features capable of more accurately representing the text sample.

In one possible implementation, the word embedding information includes word embedding information of one or more words, and for each word, the electronic device is capable of determining a text feature corresponding to the word in a context of the word.

Specifically, the electronic device can determine the text feature corresponding to the word according to the word embedding information of the word, the word embedding information of the first word, and the word embedding information of the second word. Wherein the first word refers to a word preceding the word in the text content in order. The second word refers to a word that follows the word in order in the text content. Of course, for the first word and the last word in the text content, the second word and the first word are included, respectively.

The foregoing provides only one specific example, and the electronic device may also perform feature extraction in other manners, for example, the electronic device may also determine a text feature corresponding to the word according to the position of the word in the text content, or determine a corresponding text feature according to the context of the word and the position in the text content, or the like, which is not limited by the embodiment of the present application.

It should be noted that, step 502 and step 503 are processes of extracting features from the text sample by the text classification model, in the process of extracting features, word embedding information is obtained by firstly dimension reduction, and text features are obtained by extracting the word embedding information, so that text contents of the text sample can be analyzed more carefully, and accurate text features can be obtained. In one possible implementation manner, the step 502 and the step 503 can be implemented through two neural network layers, which are respectively called a word embedding layer and a feature extraction layer, the electronic device can process a text sample through the word embedding layer, output word embedding information, input the word embedding information into the feature extraction layer, perform feature extraction on the word embedding information by the feature extraction layer, and output text features. The processing performed by each neural network layer may be convolution processing, or may be other processing, which is not limited by the embodiment of the present application.

Of course, the feature extraction process may also be implemented in other manners, for example, feature extraction is directly performed on text content of the text sample to obtain text features, which is not limited by the embodiment of the present application.

504. The electronic equipment classifies the text based on the extracted text features and outputs a prediction classification result of the text sample.

After the electronic equipment extracts the text features, the electronic equipment can classify the text features based on the text features, the classification process is used for matching the text features with various candidate types of the text sample, determining the matching degree of the text features and each candidate type, and outputting a prediction classification result.

The predictive classification result may include a plurality of forms, and in one possible form, the predictive classification result is in the form of a vector, the vector including a plurality of elements, each element corresponding to a candidate type, the elements being used to represent the degree of matching of the text feature to the candidate type.

In another possible form, the prediction classification result is the candidate type that matches the text feature most, or the prediction classification result is the candidate type that matches the text feature most and the degree of matching.

The matching degree can be in the form of probability or matching grade, and the embodiment of the application does not limit the form of the prediction classification result and the matching degree.

In one possible implementation, the classification process may be implemented by a classification algorithm, which may be any classification algorithm, such as a regression analysis classification algorithm, a Bayesian classification algorithm, a decision tree, etc. The embodiment of the application is not limited to the specific classification algorithm adopted.

In a specific possible embodiment, the classification process may be implemented by using a Softmax function, and the electronic device uses the Softmax function to process the text feature to obtain a predicted classification result of the text feature. The Softmax is a logistic regression function, which is commonly used to classify problems.

In one possible implementation, the classifying step can be implemented by a classifier, which may include multiple types, e.g., a two-classified classifier, a multi-tasked classifier, etc. The classifier can realize any classification algorithm, the type of the classifier and the adopted classification algorithm can be determined by relevant technicians according to requirements, and the embodiment of the application is not limited to the method.

505. The electronic device determines an countermeasure disturbance for the text sample based on the predicted classification result and the target classification result for the text sample.

After the electronic equipment carries out classification prediction on the text sample to obtain a prediction classification result, a corresponding countermeasure sample can be generated for the text sample based on the prediction classification result and the target classification result of the text sample, so that the countermeasure sample can be added on the basis of the text sample to train the text classification model. When the countermeasure sample is generated, the electronic device may determine the countermeasure disturbance of the text sample, and then add the countermeasure disturbance to the text sample, so as to obtain a corresponding countermeasure sample.

In one possible implementation, the disturbance countermeasure determination can be implemented through the following steps one to five.

Step one, the electronic equipment acquires a first classification error of the text sample according to a prediction classification result and a target classification result of the text sample.

After the electronic equipment performs classification prediction on the text sample to obtain a prediction classification result, the classification capability of the text classification model can be measured by comparing the target classification result, and the classification capability can be represented by a first classification error.

In one possible implementation, the first classification error may be obtained by a loss function, which may be any loss function, for example, a cross entropy loss function, an L1, L2 equidistant regression loss function, an exponential loss function, or the like. In a specific possible embodiment, the first classification error may be obtained by a cross entropy Loss Function (Corss Entropy Loss Function, CE Loss Function). The embodiment of the application does not limit the specific acquisition mode of the first classification error.

And step two, the electronic equipment acquires candidate countermeasure disturbance of the text sample based on the gradient of the first classification error.

After the electronic device determines the first classification error, the countermeasure disturbance for the text sample may be determined based on the first classification error. In the process of obtaining the countermeasure disturbance, the electronic device can firstly determine to give a candidate countermeasure disturbance based on the first classification error, add the candidate countermeasure disturbance to the text sample, and then continuously determine a new candidate countermeasure disturbance based on the classification error of the candidate countermeasure sample added with the candidate countermeasure disturbance.

Step three, the electronic equipment adds the candidate countermeasure disturbance into the text sample to obtain a candidate countermeasure sample corresponding to the text sample.

This process of adding candidate countermeasures to the disturbance may be accomplished in a number of ways, two alternatives are provided below, and embodiments of the present application may be implemented in either way and are not limited in this regard.

In the first mode, the candidate countermeasure disturbance is added into word embedding information of the text sample, so that word embedding information of the candidate countermeasure sample corresponding to the text sample is obtained.

The candidate countermeasure sample is in a word embedding form during generation, then in the following step four, the electronic equipment can directly extract features of word embedding information of the candidate countermeasure sample to obtain text features of the candidate countermeasure sample, and then classify the candidate countermeasure sample based on the text features to obtain a prediction classification result.

And secondly, adding the candidate countermeasure disturbance into the text content of the text sample to obtain the text content of the candidate countermeasure sample corresponding to the text sample.

The candidate countermeasure sample is in a text form during generation, and in the following step four, the electronic device can map the words contained in the text content of the candidate countermeasure sample to a real number domain to obtain word embedding information of the candidate countermeasure sample, and then perform feature extraction on the word embedding information of the candidate countermeasure sample to obtain text features of the candidate countermeasure sample.

And step four, the electronic equipment continues to acquire the classification error of the candidate countermeasure sample based on the prediction classification result and the target classification result obtained by classifying the candidate countermeasure sample.

In the fourth step, the classification process and the classification error obtaining process of the candidate challenge sample are the same as those in the steps 502 to 504 and the first step, and are not repeated here.

And fifthly, the electronic equipment updates the candidate countermeasure disturbance of the text sample based on the gradient of the classification error of the candidate countermeasure sample until the target condition is reached, and obtains the countermeasure disturbance of the text sample.

The target condition is that the iteration times reach the target times, or the target condition is that the classification errors are converged. The target times can be set by related technicians according to requirements, and can be hyper-parameters of the text classification model, which are experience values obtained by training the text classification model in the past. For example, the target number may be 3, or 4, or 5, which is not limited by the embodiment of the present application.

The process is a multi-iteration process, and in each iteration process, it is hoped that candidate countermeasure disturbance is increased so as to make classification errors larger and larger, so that the obtained countermeasure sample may have larger influence on the text classification model, and the robustness of the text classification model obtained through training of the countermeasure sample is higher.

In one possible implementation, the electronic device may initialize a candidate challenge disturbance, update the candidate challenge disturbance during each subsequent iteration, find a candidate challenge disturbance with a gradient of classification errors, and maximize the classification errors through multiple iterations, so that the obtained challenge samples have a greater influence on the text classification model.

In each iteration process, the electronic equipment determines an adjustment value of the candidate countermeasure disturbance based on the gradient of the classification error, adds the adjustment value on the basis of the candidate countermeasure disturbance in the previous iteration process, and converts the adjusted candidate countermeasure disturbance into a value range of the countermeasure disturbance to obtain the candidate countermeasure disturbance required to be added in the next iteration.

506. And the electronic equipment adds the countermeasure disturbance into the text sample to obtain a countermeasure sample corresponding to the text sample.

This step 506 is a process of generating a challenge sample based on the challenge disturbance, which the electronic device may add to the different forms of data of the text sample, depending on the form of the challenge disturbance, to generate different forms of challenge samples. In the following two ways are provided, the electronic device can implement the generation process of the challenge sample in any way.

In the first mode, the countermeasures disturbance are added into word embedding information of the text sample, so that word embedding information of the countermeasures sample corresponding to the text sample is obtained.

The challenge sample is generated in a word embedding manner, and accordingly, in step 507, the electronic device may perform feature extraction on the word embedding information of the challenge sample to obtain text features of the challenge sample.

And secondly, adding the countermeasure disturbance into the text content of the text sample to obtain the text content of the countermeasure sample corresponding to the text sample.

In step 507, the electronic device may map the words included in the text content of the challenge sample to the real number domain to obtain word embedding information of the challenge sample, and then perform feature extraction on the word embedding information of the challenge sample to obtain text features of the challenge sample.

It should be noted that, steps 505 to 506 are a process of generating a corresponding challenge sample based on the text sample, the predicted classification result of the text sample, and the target classification result, where the text sample and the corresponding challenge sample both carry the same target classification result.

507. The electronic equipment extracts the characteristics of the countermeasure sample, classifies the countermeasure sample based on the extracted text characteristics, and outputs a prediction classification result of the countermeasure sample.

The step 507 is similar to the steps 502 to 504, and will not be described again.

Steps 502 to 507 are processes of extracting features of a text sample and a challenge sample of the text sample based on a text classification model, classifying based on the extracted text features, and outputting a predicted classification result of the text sample and the challenge sample, where in a possible implementation, the electronic device may obtain the first classification error in step 505, so that in step 508 described below, the electronic device may no longer obtain the first classification error. In another possible implementation, the electronic device may also repeat the step of obtaining the first classification error. Alternatively, the countermeasure disturbance may be obtained in this step 505 in other ways, and the electronic device obtains the first classification error and the second classification error in step 508. The embodiment of the application is not limited in the specific mode.

508. The electronic equipment acquires a first classification error and a second classification error, wherein the first classification error is an error between a predicted classification result and a target classification result of the text sample, and the second classification error is an error between the predicted classification result and the target classification result of the countersample.

This step 508 is similar to the step in step 505, and will not be described in detail herein.

509. And the electronic equipment identifies the text characteristics of the countermeasure sample based on the text classification model and outputs a text identification result corresponding to the text characteristics.

The recognition process is used for recognizing the text characteristics as corresponding text contents, and the text contents obtained through recognition are called text recognition results. The text recognition result may include various forms, for example, text and probability corresponding to the text feature may be included, and text corresponding to the text feature may be included.

This step 509 may also be referred to as a reconstruction process, which is essentially a decoding process, and accordingly, the above-described feature extraction process is essentially an encoding process for restoring the features obtained by the encoding process (i.e., the feature extraction process) to a text form.

Specifically, this step 509 can be realized by the following steps one and two.

Step one, mapping text features of the countermeasure sample to real number fields based on the text classification model to obtain word embedding information corresponding to the text features;

and step two, matching the word embedded information with a word list, outputting at least one matched word, and taking the at least one word as a text recognition result corresponding to the text feature.

In one possible implementation, the text classification model includes a two-layer neural network, the first layer of neural network being used to map text features of the challenge sample to real-number fields, the second layer of neural network being used to match the word embedding information to a vocabulary.

Specifically, in the step 509, the electronic device may map the text feature of the challenge sample to a real number field based on a first layer neural network of the text classification model, so as to obtain word embedding information corresponding to the text feature, where parameters of the first layer neural network may be synchronized with parameters of a neural network layer performing the step 502. The electronic equipment is based on a first layer neural network of the text classification model, matches the word embedding information with a word list, outputs at least one matched word, and takes the at least one word as a text recognition result corresponding to the text feature.

In one possible implementation, the electronic device may also normalize the text features of the challenge sample before performing subsequent mapping and feature extraction steps. Specifically, the electronic device performs normalization processing on the text features of the challenge sample based on the text classification model, and performs the mapping to the real number domain and matching with the vocabulary based on the normalized text features. The text features are normalized, so that the text features can be converted into the data range which can be processed by a subsequent neural network, the accuracy of reconstruction is improved, the subsequent calculated amount can be reduced, and the processing effect is improved.

In a specific possible embodiment, the text classification model includes a normalization layer and the two-layer neural network, where the normalization layer is used to perform the step of normalizing, and the two-layer neural network is used to perform the step of mapping to a real number domain and matching to a vocabulary, respectively.

For example, this step 509 can be implemented by a reconstructor or decoder that can generate, as shown in FIG. 6, the challenge sample generated by the above steps can be word-level semanticsRepresenting Ra601 (i.e., the textual characteristics of the challenge sample described above), ra601 is normalized by a normalization Layer (Layer Normalization, layer Norm) 602 and gaussian error linear unit (Gaussian Error Linerar Unit, geLU) 603 activation functions. Next, the first layer forward neural network (FFN) 604 maps Ra from the hidden layer dimension to the word embedding dimension, and the second layer forward neural network (FFN) 605 maps the word embedding dimension to the vocabulary size dimension, so that we can get a probability distribution over the vocabulary. The final Loss calculation is a cross entropy Loss function (CE Loss) 606 that can calculate the recognition error L from the output of the FFN605 and the input sentence word ID (input sentence token ids) _R 608. The input sentence words ID (input sentence token IDs) are text samples for generating the challenge sample, each word in the text sample is uniquely identified by an Identification (ID), and the output of the FFN605 is also in the form of the word ID. The second layer forward neural network (FFN) 605 can share parameters with the neural network layer in step 502 (which will be referred to herein as the word embedding layer).

510. And the electronic equipment acquires the recognition error based on the text recognition result and the text sample.

The recognition error may also be referred to as a reconstruction error, where the text recognition result is a predicted value, the text sample is a true value, and the recognition error is used to measure the difference between the predicted value and the true value, and is similar to the step in step 505, and may be implemented by using a loss function, which is not described in detail herein. In a specific possible embodiment, the identification error may be obtained by cross entropy Loss Function (Corss Entropy Loss Function, CE Loss Function).

511. The electronic device updates model parameters of the text classification model based on the first classification error, the second classification error, and the recognition error.

For a text sample, the electronic device obtains three errors, one is a first classification error obtained by the text sample, another is a second classification error obtained by the countermeasure sample corresponding to the text sample, and another is an identification error obtained by reconstructing the countermeasure sample. The model parameters are updated by combining three errors, so that the robustness and the accuracy of the classification of the text classification model can be considered, and the robustness and the accuracy of the feature extraction of the text classification model are considered, and the performance of the trained text classification sample in two aspects can be improved.

The updating process combining the three errors can comprise two ways, and the updating step can be realized by adopting any way according to the embodiment of the application. Two alternatives are provided below.

In a first mode, the electronic device obtains a product of the recognition error and a weight of the recognition error, obtains a sum of the product and the first classification error and the second classification error as a total error, and updates model parameters of the text classification model based on the total error.

In one embodiment, a weight may be set for the recognition error, where the weight of the recognition error may be set by a related technician according to requirements, the weight of the recognition error may be a super parameter of the text classification model, an empirical value obtained by training the text classification model before, for example, the weight may be set to 0.1, and in another possible implementation, the weight may be updated together with a model parameter in the training of the model, which is not limited in the embodiment of the present application.

And the second mode is that the electronic equipment weights the first classification error, the second classification error and the identification error based on the weights of the first classification error, the second classification error and the identification error to obtain a total error, and updates the model parameters of the text classification model based on the total error.

In the second mode, each error is provided with a weight, and the setting of the weight is the same as that in the first mode, and will not be described in detail herein.

In one possible implementation, the text classification model includes an encoder, a classifier, and a decoder, wherein the encoder is used for feature extraction; the classifier is used for performing classification based on the extracted text features; the decoder is used for executing recognition on the text characteristics of the countermeasure sample and outputting a text recognition result corresponding to the text characteristics.

Compared with the traditional text classification model which comprises a feature extraction layer and a classifier, the text classification model is characterized by extracting features through an encoder and adding a decoder to decode text features of a countermeasure sample to obtain a corresponding text recognition result, and the decoding process can reconstruct text content so as to analyze whether the text features obtained by feature extraction are accurate enough or not compared with the original text sample.

By adding the decoder, the robustness and accuracy of the text classification model in the aspect of feature extraction can be estimated, and the robustness and accuracy of the text classification model in the aspect of feature extraction are enhanced in the iterative process, so that the robustness and accuracy of the classification and feature extraction of the obtained text classification model are improved.

Optionally, the encoder is configured to perform feature extraction on the word embedding information, and the text classification model further includes a word embedding layer. The word embedding layer is used for mapping the text content to the real number domain to obtain word embedding information.

In one possible implementation, the text classification model before training in step 501 may be a pre-trained speech model.

The pre-trained language model is briefly described below. The pre-trained language model (e.g., the BERT model) will typically add a [ CLS ] to the input single sentence in front of it]The sign can be [ CLS ] when the language model is used for text classification task]The hidden state vector at the last layer of the encoder is used as the semantic vector of the whole sentence, and the semantic vector is input into a classifier consisting of a full-connection layer and Softmax. Specifically, as shown in FIG. 7, a Single Sentence (Single Sentence) 701 may be entered into the BERT model, preceded by a [ CLS ] ]The logo 702 is decomposed into a plurality of words, and Tok1, tok2, … and TokN703 are obtained, wherein Tok is the meaning of Token and means words. The BERT model can model eachToken is converted into word embedded information 704, here denoted E _[CLS] 、E ₁ 、E ₂ 、…、E _N And (3) representing. The BERT model can perform feature extraction on word embedded information through a plurality of hidden layers, and output C, T of the last hidden layer ₁ 、T ₂ 、…、T _N 705 are classified as text feature input classifiers, which are implemented by Softmax function 706.

For example, in one specific example, the text classification model may be based on ALBERT, which is a pre-trained speech model, with the addition of an challenge sample generation and challenge sample learning module. The structure of the text classification model may be as shown in fig. 8, where an input sentence input content 801 (i.e. text sample) is first given, its word Embedding representation Eo (i.e. word Embedding information) is obtained through an ALBERT word Embedding layer (i.e. Embedding Block) 802, and then the semantic representation Ro (i.e. text feature) and classification loss L of the sentence are obtained through an ALBERT encoder (Encoding Block) 803 and Classifier (Classifier) 804, respectively _C (i.e., the first classification error).

Then pass through the L _C A challenge sample of the text sample is generated. In particular, it can be determined by L _C Obtaining candidate countermeasure disturbance P, adding the candidate countermeasure disturbance P into word embedding representation Eo of a text sample to obtain word embedding representation Ea of the candidate countermeasure sample (namely word embedding information of the candidate countermeasure sample), obtaining corresponding semantic expression Ra by an encoder (Encoding Block) through the word embedding representation Ea of the candidate countermeasure sample, classifying the input classifier 804, and obtaining classification errors L of the candidate countermeasure sample _C New candidate countermeasures P are then determined, and so on, by multiple iterations, the classification error L can be determined _C The greatest challenge disturbance P is added to Eo, resulting in a word embedded representation Ea of the challenge sample. In this particular example, a challenge sample in the form of a word embedding can be generated. Ea can be mapped into the representation space of the model by an encoder 803 of ALBERT to obtain a semantic expression Ra (i.e. text feature). Alternatively, in the iterative process, ra may be present in each iterationTo be input into a Reconstructor (Reconstructor) 805 for reconstruction and calculation of a reconstruction error L _R (i.e., identification errors).

The word-level semantic representation Ra of the challenge sample has now been obtained, which is calculated from the challenge sample in the form of word embedding, and it is not known which word the word embedding information of each challenge sample actually corresponds to, and if the model is unable to do so, its comprehensiveness of the whole sentence is greatly impaired, especially if the model misunderstands some keywords. In the embodiment of the application, the model can restore the original word from the word embedding information of the countermeasure sample, and learn to extract more robust syntax and lexical knowledge. Specifically, the obtained semantic expression Ra of the countermeasure sample is input into a Reconstructor (Reconstructor) for reconstruction, and a reconstruction error L is determined according to the reconstructed content and the input sentence 801 _R Thereby based on the input sentences, the classification errors L corresponding to the countermeasure samples respectively _C And reconstruction error L _R To update the model parameters.

In this example, the Reconstructor (Reconstructor) 805 is essentially a decoder, and the encoder (Encoding Block) 803 and decoder may constitute a self-encoder, as explained below. As shown in fig. 9, the self-encoder includes an encoder (encoder) 901 and a decoder (decoder) 902, the encoder 901 is capable of encoding an input (input) to obtain an encoded (code), the decoder 902 is configured to reconstruct the encoded (input) to obtain an estimated input, and a reconstruction error (error) 903 is obtained based on the input and the estimated input. The self-encoder has two constraints: the dimensions of the hidden layer are much smaller than the dimensions of the input layer (vocabulary size), the goal of the decoder 902 is to minimize the reconstruction error 903. The overall optimization objective can be expressed as:

φ，ψ＝argmin _φ，ψ L(X，(ψοφ)X)

Where ψ and Φ represent the encoder 901 and decoder 902, respectively, l is the reconstruction error and X is the input vector. (ψoΦ) X means that X passes through the encoder ψ first, and argmin substantially means the values of the variables (ψ and Φ) when the reconstruction error 903 reaches the minimum.

Accordingly, the objective function of the text classification model shown in fig. 8 described above may be as follows:

where f is the forward function of the model, θ is the model parameters, LC and LR are the classification loss function and the reconstruction loss function, respectively, v is the text sample, y is the target classification result of the text sample (which may be referred to as the true label if it is expressed in the form of a label), D is the data distribution of the text sample, δ is the disturbance countermeasure, ε is the maximum value of the modulus against disturbance (preventing the disturbance countermeasure from being too large to change sentence meaning), E is the word embedding information, w _r Is the weight of the reconstruction loss (e.g., set to 0.1). We first derive the challenge by maximizing the classification loss function, adding it back to the original word embedding to get the challenge sample (word embedding form), and then minimizing the classification loss and reconstruction loss of the challenge sample, i.e. the model both correctly classifies the challenge sample and restores the challenge sample back to the original sample.

After the reconstructor is trained, the text recognition result can be found by experiments to be in the form of probability distribution, and the reconstructor can output probability distribution corresponding to each word position of a sentence, wherein the probability distribution of one word position refers to the probability that the word position is a candidate word. In the probability distribution of each word position, the probability of the original word and the synonyms thereof is relatively high, and in the reconstruction process, for some keywords, the text classification model can output the synonyms thereof according to the probability distribution obtained by the reconstruction thereof to obtain some reconstructed sentences. Thus, for an input sentence, we can obtain multiple reconstructed sentences corresponding to the sentence, and the sentences are in text form, can directly obtain the semantics of the sentences and are not word embedded forms which can not be directly extracted into the semantics, thus providing interpretability for countermeasure training. And more sentences obtained through the process can be used as samples, and the reconstructed sentences and the original sentences are synthesized, so that more expression modes can be obtained for the same semantic, and the method can be used for data enhancement in the training process.

A specific experimental example by which the effect of the text classification model training method provided by the present application is exemplarily described is provided below.

In this experimental example, we used four datasets to evaluate the text classification model training method described above. The four data sets were SST-2, yelp-P, AG's News and Yahoo-! Answers. The four data sets and the experimental setting are explained first, and then the text classification model training method provided by the application is analyzed by combining the experimental results.

SST-2: SST (The Stanford Sentiment Treebank) is a sentiment analysis dataset issued by the university of Stanford which primarily classifies emotions for movie reviews, belonging to the text classification task of a single sentence. Specifically, SST includes SST-2, SST-5, and the like. Wherein SST-2 is a two-class and SST-5 is a five-class. It will be appreciated that the more classifications, the finer the distinction of emotion polarity. SST-2 is able to predict sentence-level emotions of the input text, including positive and negative.

Yelp-P: is a generic dataset for learning that originates from comments in the yellow website. Each comment has a rating label from 1 to 5. We can take this as a binary classification, i.e. choose two scoring labels, randomly draw 30000 training samples, 1000 validation samples and 1000 test samples from the dataset.

Yahoo-! Answers: is a question-answer data set. The dataset includes questions and corresponding answers. The data set originates from Yahoo-! Answers Comprehensive Questions and Answers 1.0.0, the ten main categories comprising: society and culture, science and mathematics, health, education and reference, computers and the internet, sports, business and finance, entertainment and music, family and relationship, and politics and government. Each category contains 140000 training samples and 5000 test samples. In this experimental example, we used five of these categories. For each class we used 12000 training samples, 400 validation samples and 400 test samples.

AG's News: is a data set consisting of more than 100 tens of thousands of news articles, and the news articles in the data set comprise four major categories of world, sports, business and science, each category comprising 30000 training samples and 1900 test samples. In this experimental example, we used 15000 training samples, 500 validation samples and test samples.

In this experimental example, the method provided by the embodiment of the present application was compared with the following methods ALBERT and FreeLB.

ALBERT is used for text classification. For ALBERT, the first tag of the sequence is [ CLS ], and when the text classification task is performed, ALBERT takes the final hidden state h of the [ CLS ] tag as a representation of the entire sentence. The classifier consists of a feed forward layer and a softmax function. The function expression of ALBERT may be as follows:

p(c|h)＝softmax(Wh)

where W is a matrix of learnable parameters and c is a classification. h is a hidden state. p is the probability of transitioning from the hidden state h to the class c. All parameters of ALBERT, W, can be fine tuned together during the training process.

FreeLB: adding the contrast interference in the output of the ALBERT embedding layer and minimizing the contrast loss generated around the input samples, it uses a "free" training strategy to improve the efficiency of the contrast training, which makes it possible to apply the PGD (Project Gradient Descent, projection gradient descent) based contrast training to a large-scale pre-training language model.

The experimental setup is explained below.

The method provided by the embodiment of the application is realized on the ALBERT-BASEV2, and parameters of an ALBERT embedded layer and an ALBERT coding layer are loaded from a pre-trained model, and then experiments are carried out in an adjustment stage. The module was trained using Adam optimizer with a learning rate set to 1e-5, ag's new batch size of 16, and the other three datasets batch size of 32. Since the hyper-parameters of FreeLB are highly dependent on datasets, we performed a hyper-parameter search for each dataset, the search results are shown in Table 1.

TABLE 1

	SST-2	Yahoo！Answers	Yelp-P	AG’s News
					γ	0.6	0	0.5	0
α	0.1	0.01	0.05	0.01
					ε	0	0	0	0
n	2	3	3	3

As shown in table 1, freeLB super parameters over 4 datasets: the step size alpha, the maximum perturbation norm epsilon, i.e. the maximum value of the disturbance countermeasure, the iteration number steps n, the initial disturbance and the disturbance amplitude gamma. These super parameters remain unchanged during the training process.

In this experimental example, we train the model on two Tesla P40 s. The method provided by the present application is referred to herein as RAR (Reconstruction from Adversarial Representations, reconstitution of the challenge description), in which RAR L _R To update the parameters of the model from the beginning of training. In addition, m of YelpP was set to 20000 and m of the other three data sets was set to 16000. Let τ and M be 0.07 and 0.5, respectively.

For SST-2, we use the development set for evaluation. To make the results reliable, we used the same hyper-parameters but different random seeds for the three experiments and reported the average score of the three experiments. For the other three data sets, we used the development set to select the best training checkpoint and evaluate on the test set.

The results of the method and ALBERT, freeLB provided by the examples of the present application are shown in Table 2.

TABLE 2

	SST-2	Yahoo！Answers	Yelp-P	AG’s News
					ALBERT	92.16	73.93	93.55	89.90
FreeLB	93.23	74.28	93.93	90.85
					RAR	93.73	74.88	94.4	91.75

As shown in table 2, RAR, ALBERT and FreeLB were compared on four data sets. ALBERT is a model without any resistance training method. FreeLB uses classification loss to learn an example of antagonism. RAR is implemented based on FreeLB, with additional optimization objectives used in the antagonistic example.

As shown in table 2, freeLB and RAR let the challenge samples participate in the training process of the model, so they perform better than ALBERT. These improvements are mainly due to the effect of data expansion. The experimental results also show that the performance of RAR is higher than FreeLB across all four data sets. The method for resisting the gradient-based resistance attack in the training process is effective and well applied to various text classification data sets. During model training, classification labels can be determined from the resistance and the original performance because competing goals can encourage the model to find the true underlying knowledge. This knowledge is very effective against the resistive disturbance added in the original sample and is not changed by modifying the statement of the sentence. When the model is able to learn this knowledge, its generalization and robustness will be improved.

The challenge sample and the original sample differ in terms of expression, and table 3 compares euclidean distance and cosine similarity between sentence-level characterizations of the challenge and original samples in three methods. We analyzed the model trained by the above three methods using the AG's News test set. For each sample vi we first calculate its original representation Ri, then under the same hyper-parameter settings get their challenge sample representation Radv-i with the k-PGD method, and then measure their distance with cosine similarity and euclidean distance. We also compare the results when different maximum perturbation norms a are used in the k-PGD.

TABLE 3 Table 3

Cosine	α＝1	α＝0.075
			ALBERT	0.851	0.871
FreeLB	0.899	0.918
			RAR	0.926	0.941
Euclidean	α＝1	α＝0.075
			ALBERT	8.409	7.746
FreeLB	6.477	5.776
			RAR	5.121	4.453

The above results are the average of all samples. Experimental results show that FreeLB and RAR have better cosine similarity and Euclidean distance than ALBERT, and the stability of the model representation space can be effectively improved by optimizing the classification error of the resistance sample. Furthermore, the performance of RAR is best compared to FreeLB and RAR. This illustrates that the method provided by the present application is effective for further improving the robustness of the model representation space.

We use the k-PGD method to attack the AG's News-based trained model by three methods. Experimental results show that FreeLB and RAR have good performance, and ALBERT is more. The target operation of the RAR task is a sentence-level representation, which performs better in terms of representing robustness.

We used the k-PGD method to attack the RAR model after training with SST-2. And then obtaining a reconstructed sentence by using the output logic of the RAR module. We can get some text from the resistance samples, called the reconstruction samples. The semantics of these reconstructed samples are substantially identical to those of the original samples, but can successfully fool the ALBERT-trained model into misclassification. These reconstructed sentences may be used as resistant text samples, and may be further used as enhancement data.

In this experimental example, we propose a gradient-based resistance training method RAR to improve the performance and robustness of the text classification model. The key to this approach is to narrow down the range of the original and challenge samples in the expression space. The RAR forces the model to reconstruct the original signature from its resistance representation. Experiments prove that the method is superior to ALBERT and FreeLB. The sentence representation and the performance of the model are more robust, proving the effectiveness of this approach. Furthermore, RAR may be used to generate the antagonistic example.

After the text classification model is trained by the method, the text classification model can provide a text classification function. In one possible implementation manner, the electronic device responds to a text classification instruction, can call the text classification model, inputs the text to be classified into the text classification model, extracts characteristics of the text by the text classification model, classifies the text based on the extracted characteristics of the text, and outputs the type of the text.

All the above optional solutions can be combined to form an optional embodiment of the present application, and will not be described in detail herein.

Fig. 10 is a schematic structural diagram of a text classification model training device according to an embodiment of the present application, referring to fig. 10, the device includes:

the classification module 1001 is configured to perform feature extraction on a text sample and a countermeasure sample of the text sample based on a text classification model, classify based on the extracted text features, and output a prediction classification result of the text sample and the countermeasure sample, where the text sample and the corresponding countermeasure sample both carry the same target classification result;

an obtaining module 1002, configured to obtain a first classification error and a second classification error, where the first classification error is an error between a predicted classification result and a target classification result of the text sample, and the second classification error is an error between the predicted classification result and the target classification result of the countersample;

the recognition module 1003 is configured to recognize text features of the challenge sample based on the text classification model, and output a text recognition result corresponding to the text features;

the obtaining module 1002 is further configured to obtain a recognition error based on the text recognition result and the text sample;

An updating module 1004 is configured to update the model parameters of the text classification model based on the first classification error, the second classification error and the recognition error.

In one possible implementation, the identification module 1003 is configured to:

based on the text classification model, mapping the text features of the countermeasure sample to a real number domain to obtain word embedding information corresponding to the text features;

and matching the word embedded information with a word list, outputting at least one matched word, and taking the at least one word as a text recognition result corresponding to the text feature.

In one possible implementation, the recognition module 1003 is configured to normalize the text feature of the challenge sample based on the text classification model, and perform the mapping to real number field and matching with vocabulary based on the normalized text feature.

In one possible implementation, the update module 1004 is configured to perform any of:

Obtaining the product of the recognition error and the weight of the recognition error, obtaining the sum of the product, the first classification error and the second classification error as a total error, and updating the model parameters of the text classification model based on the total error;

In one possible implementation, the classification module 1001 includes a classification unit and a generation unit;

the classifying unit is used for inputting the text sample into a text classifying model, extracting the characteristics of the text sample by the text classifying model, classifying the text sample based on the extracted text characteristics, and outputting the prediction classifying result of the text sample;

the extraction subunit is used for extracting features of the word embedding information of the text sample to obtain text features of the text sample.

the determining subunit is used for determining the disturbance countermeasure of the text sample based on the predicted classification result and the target classification result of the text sample;

the adding subunit is used for adding the countermeasure disturbance into the text sample to obtain a countermeasure sample corresponding to the text sample.

In one possible implementation, the determining subunit is configured to:

acquiring a first classification error of the text sample according to the predicted classification result and the target classification result of the text sample;

Continuing to obtain a prediction classification result and a target classification result which are obtained by classifying the candidate countermeasure sample, and obtaining a classification error of the candidate countermeasure sample;

updating the candidate countermeasures disturbance of the text sample based on the gradient of the classification error of the candidate countermeasures sample until the target condition is reached, and obtaining the countermeasures disturbance of the text sample.

the feature extraction process of the challenge sample comprises:

wherein the encoder is used for feature extraction;

According to the device provided by the embodiment of the application, on one hand, the countermeasure sample is introduced, and the text sample and the countermeasure sample are used for training the text classification model, so that the text classification model learns a classification method aiming at the disturbed text, the robustness of the text classification model is improved, and the accuracy of text classification is improved. On the other hand, the text classification model can reconstruct the text characteristics of the countermeasure sample extracted during classification, restore the text characteristics into text contents, and improve the interpretability of the countermeasure training method. Model parameters are trained by combining errors between the reconstructed text content and the text content of the text sample, so that the text classification model can extract more accurate text features, namely more accurate feature expression of the text content is obtained, and the robustness and accuracy of feature extraction of the text classification model are improved.

It should be noted that: in the text classification model training device provided in the above embodiment, when training a text classification model, only the division of the above functional modules is used for illustration, in practical application, the above functional allocation can be completed by different functional modules according to needs, that is, the internal structure of the text classification model training device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the text classification model training device and the text classification model training method provided in the above embodiments belong to the same concept, and detailed implementation processes of the text classification model training device and the text classification model training method are detailed in the method embodiments, and are not repeated here.

The electronic device in the method embodiment described above can be implemented as a terminal. For example, fig. 11 is a block diagram of a terminal according to an embodiment of the present application. The terminal 1100 can be: a smart phone, tablet, MP3 (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3) player, MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, notebook, desktop, smart robot, or self-payment device. Terminal 1100 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, and the like.

Generally, the terminal 1100 includes: one or more processors 1101, and one or more memories 1102.

The processor 1101 can include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1101 can be implemented in at least one hardware form of a DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1101 can also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1101 can be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and drawing of content that the display screen is required to display. In some embodiments, the processor 1101 can also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 1102 can include one or more computer-readable storage media, which can be non-transitory. Memory 1102 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one program code for execution by processor 1101 to implement the text classification model training method provided by the method embodiments of the present application.

In some embodiments, the terminal 1100 may further optionally include: a peripheral interface 1103 and at least one peripheral. The processor 1101, the memory 1102, and the peripheral interface 1103 can be connected by a bus or signal lines. The individual peripheral devices can be connected to the peripheral device interface 1103 by buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, a display screen 1105, a camera assembly 1106, audio circuitry 1107, a positioning assembly 1108, and a power supply 1109.

A peripheral interface 1103 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 1101 and memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1101, memory 1102, and peripheral interface 1103 can be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 1104 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1104 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 1104 is capable of communicating with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 1104 can also include NFC (Near Field Communication, small-range wireless communication) related circuitry, which is not limiting of the application.

The display screen 1105 is used to display a UI (User Interface). The UI can include graphics, text, icons, video, and any combination thereof. When the display 1105 is a touch display, the display 1105 also has the ability to collect touch signals at or above the surface of the display 1105. The touch signal can be input to the processor 1101 as a control signal for processing. At this point, the display 1105 can also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1105 can be one, providing a front panel of the terminal 1100; in other embodiments, the display 1105 can be at least two, respectively disposed on different surfaces of the terminal 1100 or in a folded design; in other embodiments, the display 1105 can be a flexible display disposed on a curved surface or a folded surface of the terminal 1100. Even more, the display 1105 can be configured as a non-rectangular, irregular pattern, i.e., a shaped screen. The display 1105 can be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 1106 is used to capture images or video. Optionally, the camera assembly 1106 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 1106 can also include a flash. The flash lamp may be a single-color temperature flash lamp or a two-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 1107 can include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing, or inputting the electric signals to the radio frequency circuit 1104 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be provided at different portions of the terminal 1100, respectively. The microphone can also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The speaker can be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only an electric signal but also an acoustic wave audible to humans can be converted into an acoustic wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 1107 can also include a headphone jack.

The location component 1108 is used to locate the current geographic location of the terminal 1100 to enable navigation or LBS (Location Based Service, location based services). The positioning component 1108 can be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, or the Galileo system of Russia.

A power supply 1109 is used to supply power to various components in the terminal 1100. The power source 1109 can be alternating current, direct current, disposable battery, or rechargeable battery. When the power source 1109 includes a rechargeable battery, the rechargeable battery can be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery can also be used to support fast charge technology.

In some embodiments, terminal 1100 also includes one or more sensors 1110. The one or more sensors 1110 include, but are not limited to: acceleration sensor 1111, gyroscope sensor 1112, pressure sensor 1113, fingerprint sensor 1114, optical sensor 1115, and proximity sensor 1116.

The acceleration sensor 1111 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 1100. For example, the acceleration sensor 1111 can be used to detect components of gravitational acceleration on three coordinate axes. The processor 1101 can control the display screen 1105 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 1111. The acceleration sensor 1111 can also be used for acquisition of motion data of a game or a user.

The gyro sensor 1112 can detect the body direction and the rotation angle of the terminal 1100, and the gyro sensor 1112 can acquire the 3D motion of the user on the terminal 1100 in cooperation with the acceleration sensor 1111. The processor 1101 can realize the following functions based on the data collected by the gyro sensor 1112: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 1113 can be disposed at a side frame of the terminal 1100 and/or at a lower layer of the display 1105. When the pressure sensor 1113 is disposed at a side frame of the terminal 1100, a grip signal of the terminal 1100 by a user can be detected, and the processor 1101 performs a right-left hand recognition or a quick operation according to the grip signal collected by the pressure sensor 1113. When the pressure sensor 1113 is disposed at the lower layer of the display screen 1105, the processor 1101 realizes control of the operability control on the UI interface according to the pressure operation of the user on the display screen 1105. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 1114 is used to collect a fingerprint of the user, and the processor 1101 identifies the identity of the user based on the collected fingerprint of the fingerprint sensor 1114, or the fingerprint sensor 1114 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by the processor 1101 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 1114 can be provided on the front, back, or side of the terminal 1100. When a physical key or vendor Logo is provided on the terminal 1100, the fingerprint sensor 1114 can be integrated with the physical key or vendor Logo.

The optical sensor 1115 is used to collect the ambient light intensity. In one embodiment, the processor 1101 is capable of controlling the display brightness of the display screen 1105 based on the intensity of ambient light collected by the optical sensor 1115. Specifically, when the intensity of the ambient light is high, the display luminance of the display screen 1105 is turned up; when the ambient light intensity is low, the display luminance of the display screen 1105 is turned down. In another embodiment, the processor 1101 is further capable of dynamically adjusting the shooting parameters of the camera assembly 1106 based on the intensity of ambient light collected by the optical sensor 1115.

A proximity sensor 1116, also referred to as a distance sensor, is typically provided on the front panel of the terminal 1100. The proximity sensor 1116 is used to collect a distance between the user and the front surface of the terminal 1100. In one embodiment, when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal 1100 gradually decreases, the processor 1101 controls the display 1105 to switch from the bright screen state to the off screen state; when the proximity sensor 1116 detects that the distance between the user and the front surface of the terminal 1100 gradually increases, the processor 1101 controls the display screen 1105 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 11 is not limiting of terminal 1100, and can include more or fewer components than shown, or combine certain components, or employ a different arrangement of components.

The electronic device in the above-described method embodiment can be implemented as a server. For example, fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1200 may have a relatively large difference due to different configurations or performances, and can include one or more processors (Central Processing Units, CPU) 1201 and one or more memories 1202, where at least one program code is stored in the memories 1202, and the at least one program code is loaded and executed by the processor 1201 to implement the text classification model training method provided in the above method embodiments. Of course, the server can also have components such as a wired or wireless network interface and an input/output interface for inputting and outputting, and can also include other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, a computer readable storage medium, e.g., a memory, comprising at least one program code executable by a processor to perform the text classification model training method of the above-described embodiment is also provided. For example, the computer readable storage medium can be Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), compact disk Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), magnetic tape, floppy disk, optical data storage device, etc.

In an exemplary embodiment, a computer program product is also provided, an aspect, a computer program product or computer program is provided, the computer program product or computer program comprising one or more program code, the one or more program code being stored in a computer readable storage medium. The one or more processors of the electronic device are capable of reading the one or more pieces of program code from the computer-readable storage medium, the one or more processors executing the one or more pieces of program code such that the electronic device is capable of performing the text classification model training method described above.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

It should be understood that determining B from a does not mean determining B from a alone, but can also determine B from a and/or other information.

Those of ordinary skill in the art will appreciate that all or a portion of the steps implementing the above-described embodiments can be implemented by hardware, or can be implemented by a program instructing the relevant hardware, and the program can be stored in a computer readable storage medium, and the above-mentioned storage medium can be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only of alternative embodiments of the application and is not intended to limit the application, but any modifications, equivalents, improvements, etc. which fall within the spirit and principles of the application are intended to be included in the scope of the application.

Claims

1. A method for training a text classification model, the method comprising:

2. The method of claim 1, wherein the identifying text features of the challenge sample based on the text classification model, outputting text identification results corresponding to the text features, comprises:

3. The method of claim 2, wherein the text classification model comprises a two-layer neural network, a first layer of neural network being used to map text features of the challenge sample to real-number fields, and a second layer of neural network being used to match the word embedding information to a vocabulary.

4. The method according to claim 2, wherein the identifying text features of the challenge sample based on the text classification model, and outputting text identification results corresponding to the text features, includes:

And carrying out normalization processing on the text features of the countermeasure sample based on the text classification model, and carrying out the steps of mapping to a real number domain and matching with a word list based on the text features after normalization processing.

5. The method of claim 1, wherein updating model parameters of the text classification model based on the first classification error, the second classification error, and the recognition error comprises any one of:

6. The method of claim 1, wherein the feature extraction of a text sample and a challenge sample of the text sample based on the text classification model, the classification based on the extracted text features, and the output of a predicted classification result of the text sample and the challenge sample, comprises:

Inputting a text sample into a text classification model, extracting features of the text sample by the text classification model, classifying the text sample based on the extracted text features, and outputting a prediction classification result of the text sample;

generating a corresponding countermeasure sample based on the text sample, the predictive classification result of the text sample, and a target classification result;

and extracting the characteristics of the countermeasure sample, classifying the countermeasure sample based on the extracted text characteristics, and outputting a prediction classification result of the countermeasure sample.

7. The method of claim 6, wherein the feature extraction of text samples by the text classification model comprises:

mapping words contained in the text content of the text sample to real number fields by the text classification model to obtain word embedding information of the text sample;

and extracting features of the word embedding information of the text sample to obtain text features of the text sample.

8. The method of claim 6, wherein the generating a corresponding challenge sample based on the text sample, the predicted classification result of the text sample, and a target classification result comprises:

Determining an countermeasure disturbance for the text sample based on the predicted classification result and the target classification result of the text sample;

and adding the countermeasure disturbance into the text sample to obtain a countermeasure sample corresponding to the text sample.

9. The method of claim 8, wherein the determining the counterperturbation of the text sample based on the predicted classification result and the target classification result of the text sample comprises:

10. The method of claim 8, wherein adding the challenge disturbance to the text sample results in a challenge sample corresponding to the text sample, comprising:

adding the countermeasure disturbance into the text content of the text sample to obtain the text content of the countermeasure sample corresponding to the text sample;

the feature extraction process of the challenge sample comprises:

11. The method of claim 8, wherein adding the challenge disturbance to the text sample results in a challenge sample corresponding to the text sample, comprising:

adding the countermeasure disturbance into word embedding information of the text sample to obtain word embedding information of the countermeasure sample corresponding to the text sample;

the feature extraction process of the challenge sample comprises:

12. The method of claim 1, wherein the text classification model comprises an encoder, a classifier, and a decoder;

wherein the encoder is used for feature extraction;

13. A text classification model training apparatus, the apparatus comprising:

14. An electronic device comprising one or more processors and one or more memories, the one or more memories having stored therein at least one piece of program code that is loaded and executed by the one or more processors to implement the text classification model training method of any of claims 1-12.

15. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the text classification model training method of any of claims 1 to 12.