CN112000803B

CN112000803B - Text classification method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN112000803B
Application number: CN202010739426.4A
Authority: CN
Inventors: 彭团民
Original assignee: Beijing Xiaomi Pinecone Electronic Co Ltd
Current assignee: Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date: 2020-07-28
Filing date: 2020-07-28
Publication date: 2024-05-14
Anticipated expiration: 2040-07-28
Also published as: CN112000803A

Abstract

The disclosure relates to a text classification method and device, electronic equipment and a computer readable storage medium, and belongs to the field of text classification. The text classification method comprises the steps of obtaining natural language text; dividing the natural language text into a plurality of continuous text segments; generating a plurality of continuous feature vectors according to the plurality of continuous text fragments, wherein the plurality of text fragments are in one-to-one correspondence with the plurality of feature vectors; taking the feature vector as a local feature vector of the natural language text, and aggregating each local feature vector to obtain a global feature vector of the natural language text; and classifying the natural language text according to the global feature vector to obtain a classification result. The text is divided into text fragments, local features are extracted, and the local features are aggregated into global features, so that the obtained global features can effectively keep the features of each part, and the accuracy of classifying the natural language text is improved.

Description

Text classification method and device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of text classification, and in particular, to a text classification method and apparatus, an electronic device, and a computer readable storage medium.

Background

Text classification aims at predicting the category of a given text, and is the basic task of NLP (Natural Language Processing). At present, the number of complex documents and texts rises rapidly, in the related technology, a plurality of neural network algorithms are often used for fusion, a method of weighting voting is adopted for text classification by adopting a plurality of models, at the moment, the models are more, the complexity is high, the deployment cost training cost is high, the results of each model are weighted and voted manually so that the obtained results are not optimal results, and the problem that characteristic information extracted from the texts is not abundant exists for a convolutional neural network or a cyclic neural network.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a text classification method and apparatus, an electronic device, and a computer-readable storage medium.

According to a first aspect of an embodiment of the present disclosure, there is provided a text classification method, including:

acquiring a natural language text;

Dividing the natural language text into a plurality of continuous text segments;

generating a plurality of continuous feature vectors according to the plurality of continuous text fragments, wherein the plurality of text fragments are in one-to-one correspondence with the plurality of feature vectors;

taking the feature vector as a local feature vector of the natural language text, and aggregating each local feature vector to obtain a global feature vector of the natural language text;

And classifying the natural language text according to the global feature vector to obtain a classification result.

Optionally, the aggregating the local feature vectors with the feature vector as the local feature vector of the natural language text to obtain a global feature vector of the natural language text includes:

And (3) inputting each local feature vector analog into an image frame to extract NeXtVlad a model, obtaining a vector which is output by the NeXtVlad model and represents the global feature of the video, and taking the vector as the global feature vector of the natural language text.

Optionally, the dividing the natural language text into a plurality of consecutive text segments includes:

determining a target division mode of the natural language text according to the text length of the natural language text;

dividing the natural language text into a plurality of continuous text fragments according to the target division mode;

The target division mode comprises one or more of division according to short sentences, division according to long sentences and division according to paragraphs.

And dividing the natural language text according to different granularities to obtain a plurality of continuous text fragments obtained by dividing under each granularity.

Optionally, the step of dividing the natural language text into a plurality of continuous text segments is performed by a text classification model to obtain the classification result;

Wherein the classification model comprises: the device comprises a data processing layer, a characteristic representation layer connected with the data processing layer, a characteristic aggregation layer connected with the characteristic representation layer and a classifier layer connected with the characteristic aggregation layer;

the classification model is obtained by training the parameters of the feature aggregation layer and the parameters of the classifier layer based on natural language text with classification labels as training samples.

Optionally, the acquiring the natural language text includes:

Acquiring a natural language text input by a user in a chat system;

The method further comprises the steps of:

and determining the chat intention of the user according to the classification result.

Optionally, the acquiring the natural language text includes:

Acquiring a natural language text to be audited;

The method further comprises the steps of:

determining whether the natural language text accords with a network release condition according to the classification result;

And under the condition that the natural language text accords with the network release condition, releasing the natural language text to a column corresponding to the classification result.

According to a second aspect of embodiments of the present disclosure, there is provided a text classification apparatus, including a data processing module that obtains a natural language text and divides the natural language text into a plurality of consecutive text segments;

The feature characterization module generates a plurality of continuous feature vectors according to the plurality of continuous text fragments, wherein the plurality of text fragments are in one-to-one correspondence with the plurality of feature vectors;

The feature aggregation module takes the feature vector as a local feature vector of the natural language text, and aggregates the local feature vectors to obtain a global feature vector of the natural language text;

And the classifier module classifies the natural language text according to the global feature vector to obtain a classification result.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

A memory having a computer program stored thereon;

And a processor for executing the computer program in the memory to implement the steps of the above method.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the text classification method provided by the first aspect of the present disclosure.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: the natural language text is divided into a plurality of continuous text fragments, a plurality of continuous feature vectors are generated according to the plurality of continuous text fragments, and the plurality of continuous feature vectors are aggregated into global features, so that the obtained global features can effectively keep each local feature, the accuracy and the completeness of the features extracted from the text are improved, and the accuracy of classifying the natural language text is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flow chart illustrating a text classification method according to an exemplary embodiment.

FIG. 2 is a block diagram illustrating a training manner of a text classification model according to an exemplary embodiment.

Fig. 3 is another flow chart illustrating a text classification method according to an exemplary embodiment.

Fig. 4 is a block diagram illustrating a structure of a text classification apparatus according to an exemplary embodiment.

Fig. 5 is a block diagram of an electronic device, according to an example embodiment.

Fig. 6 is another block diagram of an electronic device, according to an example embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Fig. 1 is a flowchart illustrating a text classification method according to an exemplary embodiment, where an execution subject of the method may be an electronic device, and the electronic device may be, for example, a server or a terminal, and the embodiment of the present disclosure is not limited to this, and as shown in fig. 1, the method includes the following steps:

step S11, natural language text is acquired.

And step S12, dividing the natural language text into a plurality of continuous text fragments.

And S13, generating a plurality of continuous feature vectors according to the plurality of continuous text fragments, wherein the plurality of text fragments are in one-to-one correspondence with the plurality of feature vectors.

And S14, taking the feature vector as a local feature vector of the natural language text, and aggregating the local feature vectors to obtain a global feature vector of the natural language text.

And S15, classifying the natural language text according to the global feature vector to obtain a classification result.

In the embodiment of the disclosure, the text is split into short text and feature vectors are extracted to obtain continuous text feature vectors, and the feature vectors extracted according to the short text are aggregated into global features. By adopting the scheme, in the long text classification problem, each local feature can be effectively reserved, the accuracy and the integrity of the feature extracted from the text are improved, and the accuracy of classifying the natural language text is further improved. In one possible implementation, the method steps shown in fig. 1 can implement splitting of the text and extraction and fusion of the feature vectors through one mathematical model, and multiple models are not required to be used for processing and classifying the acquired text, so that the training cost and the deployment cost are lower.

In an alternative embodiment, step S14 may include analogizing each of the local feature vectors into a local feature vector input image feature extraction NeXtVlad model of an image frame, obtaining a vector representing a global feature of a video output by the NeXtVlad model, and using the vector as a global feature vector of the natural language text. The NeXtVlad model is obtained through training, neXtVlad is an algorithm in the image field, local features of multi-frame images can be effectively compressed into global features, and after the multi-frame images are used, local segment features of long texts can be effectively aggregated into global features. NeXtVlad is a lightweight algorithm, the complexity of the model is low, the parameter amount is small, and the training cost and the deployment cost are low. By adopting the scheme, the algorithm can effectively obtain global features, reduce complexity and parameter quantity and reduce training cost and deployment cost.

In another alternative embodiment, steps S11 to S15 may be performed by a text classification model. FIG. 2 provides a schematic representation of a classification model, as shown in FIG. 2, comprising: a data processing layer 201, a feature characterization layer 202 connected to the data processing layer 201, a feature aggregation layer 203 connected to the feature characterization layer 202, and a classifier layer 204 connected to the feature aggregation layer 203; the classification model is obtained by training the parameters of the feature aggregation layer 203 and the parameters of the classifier layer 204 based on the natural language text with the classification labels as training samples.

The training process of the classification model shown in fig. 2 will be described below using the NeXtVlad model as an example. Specifically, the training step of the classification model may be as shown in fig. 2, where the data processing layer 201 inputs the data set 205 with the class label into the data module 206 to obtain the split text segment 207, where the data set 205 is the natural language text with the class label. The feature characterization layer 202 is configured to obtain a text segment 207 obtained by the data processing layer 201, and input the text segment 207 into the Bert model 208 to obtain a text segment continuous vector 209, where the text segment continuous vector 209 represents a local feature of the text. The feature aggregation layer 203 is configured to input the text segment continuous vector 209 obtained by the feature characterization layer 202 into the NeXtVlad model 210 to obtain a document vector 211, where the document vector 211 is a global feature aggregated by local features represented by the text segment continuous vector 209, and is used to characterize the content of the entire text. The classifier layer 204 is configured to input the document vector 211 into the logic cliff classification model 212 to obtain a classification result 213. After the completion, the class result 213 is input into the loss function 214, the parameters of the NeXtVlad model 210 of the feature aggregation layer 203 and the logic clitoris classification model 212 of the classifier layer 204 are updated through directional propagation, training and verification are alternately performed in the whole training process, the parameters are updated during training, the parameters are not updated during testing, the accuracy, recall rate and F1 value (F1 value=accuracy rate x recall rate x 2/(accuracy rate+recall rate)) of the training set are calculated, and when the F1 value of the verification set is maximum, training is stopped, so that the optimal model is obtained. The classification model has lower complexity, and the parameter quantity required to be updated in the training process is less, so that the training cost and the deployment cost are lower.

In yet another alternative embodiment, the dividing the natural language text into a plurality of text segments in step S12 may include: and determining a target division mode of the natural language text according to the text length of the natural language text, and dividing the natural language text into a plurality of continuous text fragments according to the target division mode.

For example, if the number of characters of the text is less than 512, the target division manner may be determined to be a manner of dividing according to phrases; if the number of characters of the text is greater than 512, it may be determined that the target division manner is a manner of dividing according to long sentences, and in implementation, the text between two periods may be divided by using the periods as division marks, and the text between two periods is used as a text segment. By adopting the scheme, the most suitable text division mode can be selected for texts with different lengths, and the more targeted global features can be obtained.

In yet another alternative embodiment, the dividing the natural language text into a plurality of text segments in step S12 may include: dividing the natural language text according to different granularities to obtain a plurality of continuous text fragments obtained by dividing under each granularity, wherein the granularity of the text is the sentence length of the text, for example, a short sentence is a first granularity, a long sentence is a second granularity, and a paragraph is a third granularity. Correspondingly, the comma can be used as a division mark to divide the text to obtain a first text segment, the period is used as a division mark to obtain a second text segment, and the paragraph indentation is used as a division mark to divide the text to obtain a third text segment. And respectively acquiring local features of the text fragments corresponding to the three different granularities, and then aggregating the local features into global features. At this time, the text is divided for multiple times, so that more local features are obtained, the overall features formed by aggregation are more comprehensive, and the classification is more accurate.

In yet another alternative embodiment, the acquiring the natural language text in step S11 may further include: and acquiring natural language texts input by the user in the chat system, and determining the chat intention of the user according to the classification result. For example, when a user performs a customer service consultation in a chat system with a certain electronic mall, the text input by the user can be obtained and classified, so as to obtain the chat intention of the user. For example, the user needs to return goods or consult a latest product, so that the electronic system can select more proper manual customer service for the user. Or in a certain question-answering community, the user inputs the question, the server acquires the natural voice text input by the user and classifies the text to obtain the category of the user question, so that other users who know the category can answer the question.

In an alternative embodiment, a natural language text to be audited may be obtained, and whether the natural language text meets a network publishing condition is determined according to the classification result, where the natural language text is published under a column corresponding to the classification result if the natural language text meets the network publishing condition. For example, a user publishes a piece of text in a piece of social software, a server obtains the text submitted by the user and classifies the text, if the text category characterizes the text as fraud information or terrorism information, the text is refused to be published, and if the text category is a piece of sports news, the text is published into a sports news column, so that the user focusing on the column can see the text.

Fig. 3 is a flowchart illustrating another text classification method according to an exemplary embodiment, where an execution subject of the method may be an electronic device, and the electronic device may be, for example, a server or a terminal, and the embodiment of the present disclosure is not limited thereto, and as shown in fig. 3, the method includes the following steps:

Step S31, natural language text is acquired.

And step S32, dividing the natural language text according to different granularities to obtain a plurality of continuous text fragments obtained by dividing under each granularity.

For example, a plurality of continuous text fragments are obtained by short sentence division, and a plurality of continuous text fragments are obtained by long sentence division, and a plurality of continuous text fragments are obtained by paragraph division.

Step S33, generating a plurality of continuous feature vectors at each granularity for a plurality of continuous text fragments obtained by dividing at each granularity.

Still taking short sentences, long sentences and paragraphs with different granularity as examples, feature vectors corresponding to a plurality of continuous text fragments obtained by dividing short sentences, feature vectors corresponding to a plurality of continuous text fragments obtained by dividing long sentences, and feature vectors corresponding to a plurality of continuous text fragments obtained by dividing paragraphs are obtained after the step S13.

Step S34, the continuous multiple feature vectors under each granularity are analogically converted into local feature vectors of the image frames, and the local feature vectors are input into an image feature extraction NeXtVlad model to obtain the fusion feature vectors under the granularity.

And obtaining a fusion feature vector corresponding to the text fragments obtained by each granularity division.

And S35, all the fusion feature vectors are analogically to local feature vectors of the image frames and input into an image feature extraction NeXtVlad model, so that global feature vectors which are output by the NeXtVlad model and serve as the natural language text are obtained.

Step S36, classifying the natural language text according to the global feature vector to obtain a classification result.

By adopting the scheme, the thinking of video processing is applied, long texts are regarded as videos consisting of images of one frame by one frame, each text segment corresponds to one frame of image, and a NextVlad model is utilized, which is mainly applied to the field of images, and is used for aggregating local continuous features into global features and representing the whole long texts. For very long text, features of different layers, such as fragments, short sentences, long sentences, paragraphs and the like, are aggregated together to be used as feature representations of the whole text for classification, so that local and global information can be effectively utilized.

Fig. 4 is a block diagram of a text classification device according to an exemplary embodiment. The text classification means may be implemented as part or all of the terminal by software, hardware or a combination of both. Referring to fig. 4, the apparatus includes an acquisition module 41, a division module 42, a generation module 43, an aggregation module 44, and a classification module 45.

The retrieval module 41 is configured to retrieve natural language text.

The partitioning module 42 is configured to partition the natural language text into a plurality of consecutive text segments.

The generating module 43 is configured to generate a plurality of continuous feature vectors from the plurality of continuous text segments, wherein the plurality of text segments are in one-to-one correspondence with the plurality of feature vectors.

The aggregation module 44 is configured to aggregate each local feature vector with the feature vector as a local feature vector of the natural language text, to obtain a global feature vector of the natural language text.

The classification module 45 is configured to classify the natural language text according to the global feature vector to obtain a classification result.

Alternatively, the aggregation module 44 may be specifically configured to: and (3) inputting each local feature vector analog into an image frame to extract NeXtVlad a model, obtaining a vector which is outputted by the NeXtVlad model and represents the global feature of the video, and taking the vector as the global feature vector of the natural language text. NeXtVlad is a lightweight algorithm, the complexity of the model is low, the parameter amount is small, and the training cost and the deployment cost are low.

Optionally, the partitioning module 42 may be specifically configured to determine a target partitioning manner for the natural language text according to a text length of the natural language text; dividing the natural language text into a plurality of continuous text fragments according to the target division mode; the target division mode comprises one or more of division according to short sentences, division according to long sentences and division according to paragraphs. At this time, the most suitable text division mode can be selected for the texts with different lengths, so that more targeted global features can be obtained.

Optionally, the dividing module 42 may be further specifically configured to divide the natural language text according to different granularities, so as to obtain a plurality of continuous text segments obtained by dividing at each granularity. At this time, the text is divided for multiple times, so that more local features are obtained, the overall features formed by aggregation are more comprehensive, and the classification is more accurate.

Alternatively, the partitioning module 42, the generating module 43, the aggregation module 44, the classification module 45 may specifically function as a classification model comprising: the device comprises a data processing layer, a characteristic characterization layer connected with the data processing layer, a characteristic aggregation layer connected with the characteristic characterization layer and a classifier layer connected with the characteristic aggregation layer; the classification model is obtained by training the parameters of the feature aggregation layer and the parameters of the classifier layer based on the natural language text with the classification labels as training samples, and has low complexity, less parameter quantity required to be updated in the training process, and low training cost and deployment cost.

Optionally, the obtaining module 41 may be specifically configured to obtain a natural language text input by the user in the chat system; further, the apparatus 40 may be configured to determine chat intents of the user based on the classification result. For example, when a user performs a customer service consultation in a chat system with a certain electronic mall, the text input by the user can be obtained and classified, so as to obtain the chat intention of the user. For example, the user needs to return goods or consult a latest product, so that the electronic system can select more proper manual customer service for the user. Or in a certain question-answering community, the user inputs the question, the server acquires the natural voice text input by the user and classifies the text to obtain the category of the user question, so that other users who know the category can answer the question.

Optionally, the obtaining module 41 may be further configured to obtain a natural language text to be audited; further, the device 40 may be configured to determine whether the natural language text meets a web publishing condition according to the classification result; and under the condition that the natural language text accords with the network release condition, releasing the natural language text to a column corresponding to the classification result. For example, a user publishes a piece of text in a piece of social software, a server obtains the text submitted by the user and classifies the text, if the text category characterizes the text as fraud information or terrorism information, the text is refused to be published, and if the text category is a piece of sports news, the text is published into a sports news column, so that the user focusing on the column can see the text.

In the embodiment of the disclosure, the text is split into short text and feature vectors are extracted to obtain continuous text feature vectors, and the feature vectors extracted according to the short text are aggregated into global features. By adopting the scheme, in the long text classification problem, the characteristics of each part can be effectively reserved, the number of the models adopted by the scheme is small, a plurality of models are not needed to be used for processing and classifying the acquired text, the weight is light, the parameter quantity is small, and the training cost and the deployment cost are low.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

An exemplary embodiment of the present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of the above-described method embodiments.

An exemplary embodiment of the present disclosure also provides an electronic device including:

A memory having a computer program stored thereon;

and a processor for executing the computer program in the memory to implement the steps of the method embodiment described above.

Fig. 5 is a block diagram illustrating a structure of the above-described electronic device according to an exemplary embodiment. As shown in fig. 5, the electronic device 50 may include: a processor 51, a memory 52. The electronic device 50 may also include one or more of a multimedia component 53, an input/output (I/O) interface 54, and a communication component 55.

Wherein the processor 51 is configured to control the overall operation of the electronic device 50 to perform all or part of the steps of the text classification method described above. The memory 52 is used to store various types of data to support operation at the electronic device 50, which may include, for example, instructions for any application or method operating on the electronic device 50, as well as application-related data, such as contact data, messages, pictures, audio, video, and so forth. The Memory 52 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 53 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 52 or transmitted through the communication component 55. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 54 provides an interface between the processor 51 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 55 is used for wired or wireless communication between the electronic device 50 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near Field Communication (NFC) for short, 2G, 3G, 4G, NB-IOT, eMTC, or 5G, etc., or one or a combination of several thereof, is not limited herein. The corresponding communication assembly 55 may thus comprise: wi-Fi module, bluetooth module, NFC module, etc.

In an exemplary embodiment, the electronic device 50 may be implemented by one or more Application-specific integrated circuits (ASIC), digital signal processors (DIGITAL SIGNAL Processor DSP), digital signal processing device (DIGITAL SIGNAL Processing Device DSPD), programmable logic device (Programmable Logic Device PLD), field programmable gate array (Field Programmable GATE ARRAY FPGA), controller, microcontroller, microprocessor, or other electronic component for performing the above-described text classification method.

The computer readable storage medium provided in the above embodiment may be the above memory 52 including program instructions executable by the processor 51 of the electronic device 50 to perform the above text classification method.

Fig. 6 is another block diagram illustrating the structure of the above-described electronic device according to an exemplary embodiment. For example, the electronic device 60 may be provided as a server. Referring to fig. 6, the electronic device 60 comprises a processor 61, which may be one or more in number, and a memory 62 for storing a computer program executable by the processor 61. The computer program stored in memory 62 may include one or more modules each corresponding to a set of instructions. Further, the processor 61 may be configured to execute the computer program to perform the text classification method described above.

In addition, the electronic device 60 may further include a power supply component 63 and a communication component 64, the power supply component 63 may be configured to perform power management of the electronic device 60, and the communication component 64 may be configured to enable communication of the electronic device 60, e.g., wired or wireless communication. In addition, the electronic device 60 may also include an input/output (I/O) interface 65. The electronic device 60 may operate based on an operating system stored in the memory 62, such as Windows Server, mac OS XTM, unixTM, linuxTM, or the like.

The computer readable storage medium provided by the above embodiment may be the above memory 62 including program instructions executable by the processor 61 of the electronic device 60 to perform the above text classification method.

In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described text classification method when executed by the programmable apparatus.

The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, but the present disclosure is not limited to the specific details of the embodiments described above, and various simple modifications may be made to the technical solutions of the present disclosure within the scope of the technical concept of the present disclosure, and all the simple modifications belong to the protection scope of the present disclosure.

In addition, the specific features described in the foregoing embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, the present disclosure does not further describe various possible combinations.

Moreover, any combination between the various embodiments of the present disclosure is possible as long as it does not depart from the spirit of the present disclosure, which should also be construed as the disclosure of the present disclosure.

Claims

1. A method of text classification, comprising:

acquiring a natural language text;

Classifying the natural language text according to the global feature vector to obtain a classification result; the step of using the feature vector as a local feature vector of the natural language text, and aggregating each local feature vector to obtain a global feature vector of the natural language text, includes:

2. The method of claim 1, wherein the dividing the natural language text into a plurality of consecutive text segments comprises:

3. The method of claim 1, wherein the dividing the natural language text into a plurality of consecutive text segments comprises:

4. A method according to any one of claims 1-3, wherein said step of dividing said natural language text into successive text segments is performed by a text classification model to obtain said classification result;

5. The method of claim 4, wherein the obtaining natural language text comprises:

Acquiring a natural language text input by a user in a chat system;

The method further comprises the steps of:

6. The method of claim 4, wherein the obtaining natural language text comprises:

Acquiring a natural language text to be audited;

The method further comprises the steps of:

7. A text classification device, the device comprising:

the acquisition module is used for acquiring the natural language text;

the dividing module is used for dividing the natural language text into a plurality of continuous text fragments;

the generation module is used for generating a plurality of continuous feature vectors according to the plurality of continuous text fragments, wherein the plurality of text fragments are in one-to-one correspondence with the plurality of feature vectors;

The aggregation module is used for aggregating each local feature vector by taking the feature vector as the local feature vector of the natural language text to obtain a global feature vector of the natural language text;

the classification module is used for classifying the natural language text according to the global feature vector to obtain a classification result;

The aggregation module is used for analogizing each local feature vector into a local feature vector input image feature extraction NeXtVlad model of an image frame, obtaining a vector which is output by the NeXtVlad model and represents the global feature of the video, and taking the vector as the global feature vector of the natural language text.

8. An electronic device, the electronic device comprising:

A memory having a computer program stored thereon;

A processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-6.

9. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 1-6.