CN111104516A

CN111104516A - Text classification method and device and electronic equipment

Info

Publication number: CN111104516A
Application number: CN202010084986.0A
Authority: CN
Inventors: 蒋亮; 温祖杰; 张家兴; 梁忠平
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-02-10
Filing date: 2020-02-10
Publication date: 2020-05-05
Anticipated expiration: 2040-02-10
Also published as: CN111104516B

Abstract

One or more embodiments of the present specification provide a text classification method, apparatus, and electronic device, where based on a BERT model, the BERT model includes: at least two encoder layers connected in sequence; the method comprises the following steps: inputting a text to be classified into the BERT model; collecting the output of each encoder layer to obtain at least two pieces of feature representation information corresponding to the texts to be classified; fusing at least two pieces of feature representation information to obtain fused feature representation information; the fused feature representation information fully utilizes the output of each encoder layer and accurately reflects lexical and grammatical information contained in the text; and determining the type of the text to be classified according to the fused feature representation information.

Description

Text classification method and device and electronic equipment

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a text classification method and apparatus, and an electronic device.

Background

With the rapid development of the internet on a global scale, the information faced by people increases exponentially. The information faced by people has a large amount of text information, so the processing technology of the text information is particularly important. The text information classification is an effective means for organizing and managing the text information, and can facilitate browsing, searching and using of the text information by people. Text classification means that after the text information is processed by a certain classification algorithm by a computer, the text is distinguished into a predefined class, namely, the mapping from the text to the class is realized. Specifically, the probability that the text to be classified belongs to each specific classification under a specific classification system can be predicted through a pre-trained text classification model.

How to improve the accuracy of text classification is a problem which needs to be solved urgently at present.

Disclosure of Invention

In view of this, one or more embodiments of the present disclosure are directed to a text classification method, a text classification device, and an electronic device.

In view of the above, one or more embodiments of the present specification provide a text classification method based on a BERT model, the BERT model including: at least two encoder layers connected in sequence;

the method comprises the following steps:

inputting a text to be classified into the BERT model;

collecting the output of each encoder layer to obtain at least two pieces of feature representation information corresponding to the texts to be classified;

fusing at least two pieces of feature representation information to obtain fused feature representation information;

and determining the type of the text to be classified according to the fused feature representation information.

In another aspect, one or more embodiments of the present specification further provide a text classification apparatus, including:

an input module configured to input text to be classified into the BERT model;

the acquisition module is configured to acquire the output of each encoder layer to obtain at least two pieces of feature representation information corresponding to the texts to be classified;

the fusion module is configured to fuse at least two pieces of feature representation information to obtain fused feature representation information;

and the classification module is configured to determine the type of the text to be classified according to the fused feature representation information.

In yet another aspect, one or more embodiments of the present specification further provide an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method as described in any one of the above when executing the program.

As can be seen from the foregoing, in the text classification method, the text classification device, and the electronic device provided in one or more embodiments of the present specification, the feature representation information output by each encoder layer in the BERT model is utilized, the final text classification is performed based on the fused feature representation information, the output of each encoder layer is fully utilized, so that lexical and grammatical information contained in the text is accurately reflected, and the accuracy of text classification is effectively improved.

Drawings

In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.

FIG. 1 is a schematic diagram of the operation of the BERT model in the related art;

FIG. 2 is a flow diagram of a method for text classification in accordance with one or more embodiments of the present disclosure;

FIG. 3 is a schematic diagram illustrating an implementation of a text classification method according to one or more embodiments of the present disclosure;

FIG. 4 is a flow diagram illustrating steps for generating characterizing information in one or more embodiments of the present disclosure;

FIG. 5 is a schematic diagram of how an encoder layer operates in one or more embodiments of the present description;

FIG. 6 is a flow diagram of a BERT model training process in accordance with one or more embodiments of the present disclosure;

FIG. 7 is a schematic structural diagram of a text classification device according to one or more embodiments of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device according to one or more embodiments of the present disclosure.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the specification is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

In one or more embodiments of the present description, reference is made to the BERT model, which is collectively referred to as: the bidirectional encoder reproduction from transforms is a language model that can be widely applied to perform various types of natural language processing tasks. The BERT model generally includes a plurality of encoder layers that extract feature representations of the text and output them in the form of representation vectors. Referring to fig. 1, in operation, after a text is input into the BERT model, the text sequentially passes through a plurality of encoder layers, an output of an upper encoder layer is used as an input of a lower encoder layer, an output of a last encoder layer is used as feature information corresponding to the text, text classification is performed based on the feature information, and the text classification is realized by a classification layer. Among them, the encoder layer closer to the input side (e.g., encoder layer 1) is a lower layer, and the encoder layer closer to the classification layer (e.g., encoder layer n) is a higher layer. For a lower encoder layer, the resolution is higher, more detail information is contained, and lexical features are reflected more semantically; the high-level encoder layer has stronger semantic information, but has limited resolution, and more semantically reflects grammatical features.

Therefore, when the existing BERT model is used for classification, only the feature representation information output by the last layer, that is, the high-layer encoder layer is used, that is, the grammatical features of the text are considered more, which also causes that the accuracy of the existing BERT model is still insufficient when the text is classified.

In view of the above problems, one or more embodiments of the present specification provide a text classification scheme, which aims to make full use of feature representation information output by each encoder layer of a BERT model, and combine with a vector fusion manner, so that the feature representation information used in text classification can simultaneously include feature outputs of a lower encoder layer and a higher encoder layer, so that lexical features and grammatical features contained in a text can be considered in text classification, and thus accuracy of text classification can be effectively improved.

Various non-limiting embodiments provided by the present specification are described in detail below with reference to the accompanying drawings.

One or more embodiments of the present specification provide a text classification method, which is implemented based on a BERT model, where the BERT model includes at least two encoder layers. These encoder layers are connected in sequence, i.e. as described above, the output of the previous encoder layer serves as the input of the next encoder layer.

Referring to fig. 2, a flowchart of a text classification method according to one or more embodiments of the present disclosure is provided, and it is to be understood that the method may be performed by any apparatus, device, platform, or cluster having computing and processing capabilities. Specifically, the text classification method comprises the following steps:

step S201, inputting the text to be classified into the BERT model.

In this step, a text to be classified is first obtained, where the text to be classified may be input or uploaded by a user, or may be extracted from a corpus, and a source of the text to be classified is not limited in this embodiment.

For the content of the text to be classified, it is generally a sentence composed of a plurality of single words. When the content of the text to be classified is a paragraph composed of a plurality of sentences, the paragraph is subjected to sentence splitting processing to obtain a plurality of sentences. Accordingly, the method of the present embodiment may be performed for each sentence to achieve classification. In one or more embodiments of the present specification, the text to be classified is taken as a single sentence for explanation. It will be appreciated that when the text to be classified is a paragraph comprising several single sentences, i.e. for parallel classification processes performed on the several single sentences.

Step S202, collecting the output of each encoder layer to obtain at least two pieces of feature representation information corresponding to the texts to be classified.

In this step, referring to fig. 3, after the text to be classified is input into the BERT model, the text to be classified sequentially passes through n encoder layers based on the working mode of the BERT model. Specifically, the text to be classified is processed by the encoder layer 1, and the feature representation information 1 is output; then, the feature representation information 1 is input to the encoder layer 2, and is processed by the encoder layer 2 to output feature representation information 2; subsequently, the output of the encoder layer above is taken as the input of the encoder layer below, and the characteristic representation information n is output by the encoder layer n finally after n encoder layers.

As an alternative embodiment, referring to fig. 4, the encoder layer generates the feature representation information, which may be implemented by the following steps:

step S401, respectively adding special characters and sentence break characters at the beginning and the end of the single sentence, and determining a plurality of single characters included in the single sentence;

step S402, setting a coder for the special symbol, the sentence break symbol and the single character respectively;

step S403, generating a single sentence characteristic vector corresponding to the special character, a sentence break characteristic vector corresponding to the sentence break character and a single character characteristic vector corresponding to the single character respectively by the encoder;

and S404, obtaining the feature representation information according to the single sentence feature vector, the sentence break feature vector and the single character feature vector.

Referring to fig. 5, after the text to be classified with a single sentence as content is input into the BERT model, the input layer of the BERT model adds a special symbol [ CLS ] at the beginning of the single sentence and a punctuation symbol [ SEP ] at the end of the single sentence. In addition, a single sentence is also segmented to determine a number of individual words. As shown in fig. 5, for example, the input single sentence is "i love china", and after passing through the input layer, it will be segmented into four words "i", "love", "middle", "country"; the sentence head and the sentence tail are respectively added with a special symbol [ CLS ] and a sentence break symbol [ SEP ].

For each encoder layer, it may include multiple encoders (transformers). Each special character [ CLS ], sentence break character [ SEP ] and single character are correspondingly assigned with an encoder. As shown in fig. 5, the encoder corresponding to the special symbol [ CLS ] will output the single sentence feature vector corresponding to the special symbol [ CLS ], such as V11, V21, Vn 1; the encoder corresponding to the punctuation [ SEP ] will output punctuation feature vectors corresponding to the punctuation [ SEP ], such as V16, V26, Vn 6; the encoder corresponding to the word will output the word feature vector corresponding to the word, such as V12, V23, etc.

For an encoder layer, the encoder layer comprises a single-sentence feature vector, a sentence break feature vector and a single-word feature vector output by the encoder, and the set of the single-sentence feature vector, the sentence break feature vector and the single-word feature vector form feature representation information output by the encoder layer. For the encoder layer 1, for example, [ V11, V12, V13, V14, V15, V16 ] is the feature representation information 1 output by the encoder layer 1. The feature representation information 1 is input to the encoder layer 2, and [ V21, V22, V23, V24, V25, V26 ] output by the encoder layer 2 is the feature representation information 2 output by the encoder layer 2. After n encoder layers are sequentially passed through in the above manner, the feature representation information n output by the encoder layer n is [ Vn1, Vn2, Vn3, Vn4, Vn5, Vn6 ].

And S203, fusing at least two pieces of feature representation information to obtain fused feature representation information.

In the step, the feature representation information output by each encoder layer is fused, so that the output of each encoder layer is fully utilized, and the lexical features and the grammatical features of the text to be classified are better reflected through the fused feature representation information.

As an optional implementation, fusing the feature representation information may be implemented by: extracting at least two single sentence characteristic vectors from at least two pieces of characteristic representation information; and performing vector fusion on at least two single sentence characteristic vectors to serve as the fused characteristic representation information.

Based on the property of the BERT model, after the encoder layer processes the input, the output single sentence feature vector corresponding to the special character [ CLS ] contains the features of the whole single sentence, so that when the feature representation information is fused, only the single sentence feature vector can be used.

Specifically, for the collected feature representation information output by each encoder layer, the single sentence feature vectors contained in the feature representation information are extracted, so that at least two single sentence feature vectors are obtained. And then carrying out vector fusion on the at least two single sentence characteristic vectors. Any vector fusion method can be selected for vector fusion, such as splicing, averaging, maximum value taking and the like.

Taking a vector fusion method adopting splicing as an example, assume that the BERT model of the present embodiment includes three encoder layers, and the three encoder layers output three single-sentence feature vectors: the first encoder layer outputs a single-sentence feature vector of a four-dimensional vector [0.1,0.8, -0.1,0.2], the second encoder layer outputs a single-sentence feature vector of [0.4,0.2,0.9,0.3], and the third encoder layer outputs a single-sentence feature vector of [ -0.8,0.6, -0.2,0.3 ]. The spliced vector fusion method is that three single sentence feature vectors are directly connected end to obtain a twelve-dimensional vector [0.1,0.8, -0.1,0.2,0.4,0.2,0.9,0.3, -0.8,0.6, -0.2,0.3], and the spliced twelve-dimensional vector is used as fused feature representation information and is subsequently used for classifying texts to be classified.

The three single sentence feature vectors described above are still used as examples. When the vector fusion method of taking the mean value is selected, the mean value is obtained by calculating the values of the corresponding dimensions of the three single sentence characteristic vectors: [ (0.1+0.4-0.8)/3, (0.8+0.2+0.6)/3, (0.9-0.1-0.2)/3, (0.2+0.3+0.3)/3], fused characteristic representation information is [ -0.1,0.5,0.2,0.3 ]. When the vector fusion method of taking the maximum value is selected, the maximum value in the values of the corresponding dimensionality of the three single sentence characteristic vectors is taken to form a vector, and the fused characteristic representation information is [0.4,0.8,0.9,0.3 ]. In addition, other vector fusion methods mentioned in this specification may be selected according to specific implementation requirements.

In addition, according to different implementation requirements, for the feature representation information output by each encoder layer, other feature vectors (such as single-word feature vectors) in the feature representation information may be extracted and subjected to subsequent vector fusion to serve as the fused feature representation information.

As another alternative, fusing the feature representation information may be implemented by: for each piece of feature representation information, performing vector calculation or vector fusion on the single sentence feature vector, the sentence break feature vector and the single character feature vector to obtain a combined feature vector; and carrying out vector fusion on at least two combined feature vectors to serve as feature representation information after fusion.

In contrast to the aforementioned alternative embodiment, instead of using only the single-sentence feature vector in the feature representation information, all the feature vectors included in the feature representation information are used. That is, the single sentence feature vector, the sentence feature vector, and the single character feature vector included in the feature representing information are processed to obtain one vector. Specifically, vector calculation may be performed on the single-sentence feature vector, the sentence feature vector, and the single-word feature vector, such as addition, subtraction, cross multiplication, and the like; the single sentence feature vector, the sentence feature vector and the single character feature vector may also be vector fused, and the specific fusion method may refer to the foregoing embodiments. The vectors obtained by processing the single sentence characteristic vector, the sentence characteristic vector and the single character characteristic vector are called combined characteristic vectors.

For the feature representation information output by each encoder layer, the corresponding combined feature vector is obtained in the above manner, that is, at least two combined feature vectors are obtained. Then, these combined feature vectors are vector-fused as the fused feature representation information.

And S204, determining the type of the text to be classified according to the fused feature representation information.

In this step, based on the fused feature representation information obtained in the previous step, the feature representation information is input to a classification layer of a BERT model for prediction classification, and finally the type of the text to be classified is determined. Specifically, the predictive classification may use any predictive classification algorithm, such as a Softmax function.

As can be seen from the above embodiments, the text classification method provided in this specification utilizes feature representation information output by each encoder layer in the BERT model, performs final text classification based on the fused feature representation information, and makes full use of the output of each encoder layer to accurately reflect lexical and grammatical information contained in a text, thereby effectively improving the accuracy of text classification.

It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

As an alternative embodiment, referring to fig. 6, the text classification method further includes a training process of the BERT model, and specifically includes the following steps:

s601, determining a training text and an initial BERT model; the initial BERT model comprises: at least two sequentially connected initial encoder layers;

step S602, inputting the training text into the initial BERT model;

step S603, collecting the output of the initial encoder layer to obtain at least two training feature representation information corresponding to the training text;

s604, fusing at least two pieces of training characteristic representation information to obtain fused training characteristic representation information;

step S605, determining the prediction type of the training text according to the fused training feature representation information;

step S606, determining the actual type of the training text, and obtaining feedback information according to the actual type and the prediction type;

and S607, adjusting the model parameters of the initial BERT model according to the feedback information to obtain the BERT model.

In this embodiment, the BERT model used in this embodiment is obtained by training based on an initial BERT model. Specific contents and optional embodiments of the steps of acquiring the training text, the form content of the training text, inputting the training text into the initial BERT model, the working mode of each initial encoder layer in the initial BERT model, and fusing the training feature representation information output by each initial encoder layer may be described with reference to the foregoing embodiments.

And based on the training text, after the initial BERT model is used for processing, the initial BERT model finally outputs the prediction type corresponding to the training text. Accordingly, the actual type of training text is determined. The actual type may be obtained by processing the training text through other machine learning models, or may be obtained by manually performing classification determination on the training text.

And generating feedback information according to the predicted type and the actual type of the training text. The feedback information is used to update the model parameters of the initial BERT model. Specifically, according to the feedback information, a loss function is calculated based on a cross entropy algorithm, back propagation is carried out, weight parameters of an initial BERT model are updated, and iterative operation is carried out. And iteratively and repeatedly executing the steps from the step S602 to the step S606, and when the loss value of the loss function tends to be stable in at least two continuous iteration processes, ending the training process of the initial BERT model to obtain the BERT model, so as to implement the text classification method in one or more embodiments of the present specification.

Based on the same inventive concept, one or more embodiments of the present specification further provide a text classification apparatus. Referring to fig. 7, the text classification apparatus includes:

an input module 701 configured to input a text to be classified into the BERT model;

an acquisition module 702 configured to acquire an output of each encoder layer, resulting in at least two pieces of feature representation information corresponding to the text to be classified;

a fusion module 703 configured to fuse at least two pieces of the feature representation information to obtain fused feature representation information;

a classification module 704 configured to determine the type of the text to be classified according to the fused feature representation information.

As an optional embodiment, the apparatus further comprises: a training module configured to determine a training text and an initial BERT model; the initial BERT model comprises: at least two sequentially connected initial encoder layers; inputting the training text into the initial BERT model; acquiring the output of the initial encoder layer to obtain at least two training feature representation information corresponding to the training text; fusing at least two pieces of training characteristic representation information to obtain fused training characteristic representation information; determining the prediction type of the training text according to the fused training feature representation information; determining the actual type of the training text, and obtaining feedback information according to the actual type and the prediction type; and adjusting the model parameters of the initial BERT model according to the feedback information to obtain the BERT model.

As an optional embodiment, the text to be classified includes a single sentence; the encoder layer comprises a plurality of encoders; the collecting module 702 is specifically configured to add a special symbol and a punctuation symbol to a beginning and an end of the single sentence, and determine a plurality of single characters included in the single sentence; setting one coder for the special symbol, the sentence break symbol and the single character respectively; generating a single sentence characteristic vector corresponding to the special character, a sentence break characteristic vector corresponding to the sentence break character and a single character characteristic vector corresponding to the single character respectively through the encoder; and obtaining the feature representation information according to the single sentence feature vector, the sentence break feature vector and the single character feature vector.

As an optional embodiment, the fusion module 703 is specifically configured to extract at least two single sentence feature vectors from at least two pieces of feature representation information; and performing vector fusion on at least two single sentence characteristic vectors to serve as the fused characteristic representation information.

As an optional embodiment, the fusion module 703 is specifically configured to perform vector calculation or vector fusion on the single sentence feature vector, the sentence break feature vector, and the single character feature vector included in each piece of feature representation information to obtain a combined feature vector; and carrying out vector fusion on at least two combined feature vectors to serve as feature representation information after fusion.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

One or more embodiments of the present specification further provide an electronic device based on the same inventive concept. The electronic device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the method according to any of the above embodiments.

Fig. 8 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The electronic device of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.

It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A text classification method based on a BERT model, the BERT model comprising: at least two encoder layers connected in sequence;

the method comprises the following steps:

inputting a text to be classified into the BERT model;

2. The text classification method of claim 1, further comprising:

determining a training text and an initial BERT model; the initial BERT model comprises: at least two sequentially connected initial encoder layers;

inputting the training text into the initial BERT model;

acquiring the output of the initial encoder layer to obtain at least two training feature representation information corresponding to the training text;

fusing at least two pieces of training characteristic representation information to obtain fused training characteristic representation information;

determining the prediction type of the training text according to the fused training feature representation information;

determining the actual type of the training text, and obtaining feedback information according to the actual type and the prediction type;

and adjusting the model parameters of the initial BERT model according to the feedback information to obtain the BERT model.

3. The text classification method according to claim 1, wherein the text to be classified comprises a single sentence; the encoder layer comprises a plurality of encoders;

after the text to be classified is input into the BERT model, the method specifically comprises the following steps:

respectively adding special characters and sentence break characters at the beginning and the end of the single sentence, and determining a plurality of single characters included in the single sentence;

setting one coder for the special symbol, the sentence break symbol and the single character respectively;

generating a single sentence characteristic vector corresponding to the special character, a sentence break characteristic vector corresponding to the sentence break character and a single character characteristic vector corresponding to the single character respectively through the encoder;

and obtaining the feature representation information according to the single sentence feature vector, the sentence break feature vector and the single character feature vector.

4. The text classification method according to claim 3, wherein the fusing at least two of the feature representation information to obtain fused feature representation information comprises:

extracting at least two single sentence characteristic vectors from at least two pieces of characteristic representation information;

and performing vector fusion on at least two single sentence characteristic vectors to serve as the fused characteristic representation information.

5. The text classification method according to claim 3, wherein the fusing at least two of the feature representation information to obtain fused feature representation information comprises:

for each piece of feature representation information, performing vector calculation or vector fusion on the single sentence feature vector, the sentence break feature vector and the single character feature vector to obtain a combined feature vector;

and carrying out vector fusion on at least two combined feature vectors to serve as feature representation information after fusion.

6. A text classification apparatus comprising:

an input module configured to input text to be classified into the BERT model;

7. The text classification apparatus of claim 6, further comprising: a training module configured to determine a training text and an initial BERT model; the initial BERT model comprises: at least two sequentially connected initial encoder layers; inputting the training text into the initial BERT model; acquiring the output of the initial encoder layer to obtain at least two training feature representation information corresponding to the training text; fusing at least two pieces of training characteristic representation information to obtain fused training characteristic representation information; determining the prediction type of the training text according to the fused training feature representation information; determining the actual type of the training text, and obtaining feedback information according to the actual type and the prediction type; and adjusting the model parameters of the initial BERT model according to the feedback information to obtain the BERT model.

8. The text classification apparatus according to claim 6, wherein the text to be classified includes a single sentence; the encoder layer comprises a plurality of encoders;

the acquisition module is specifically configured to add a special symbol and a punctuation symbol to the beginning and the end of the single sentence, and determine a plurality of single characters included in the single sentence; setting one coder for the special symbol, the sentence break symbol and the single character respectively; generating a single sentence characteristic vector corresponding to the special character, a sentence break characteristic vector corresponding to the sentence break character and a single character characteristic vector corresponding to the single character respectively through the encoder; and obtaining the feature representation information according to the single sentence feature vector, the sentence break feature vector and the single character feature vector.

9. The text classification apparatus according to claim 8, wherein the fusion module is specifically configured to extract at least two single sentence feature vectors from at least two of the feature representation information; and performing vector fusion on at least two single sentence characteristic vectors to serve as the fused characteristic representation information.

10. The text classification apparatus according to claim 8, wherein the fusion module is specifically configured to perform vector calculation or vector fusion on the single sentence feature vector, the sentence break feature vector, and the single character feature vector included in each piece of feature representation information to obtain a combined feature vector; and carrying out vector fusion on at least two combined feature vectors to serve as feature representation information after fusion.

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 5 when executing the program.