CN112000803B - Text classification method and device, electronic equipment and computer readable storage medium - Google Patents

Text classification method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN112000803B
CN112000803B CN202010739426.4A CN202010739426A CN112000803B CN 112000803 B CN112000803 B CN 112000803B CN 202010739426 A CN202010739426 A CN 202010739426A CN 112000803 B CN112000803 B CN 112000803B
Authority
CN
China
Prior art keywords
text
natural language
language text
feature vector
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010739426.4A
Other languages
Chinese (zh)
Other versions
CN112000803A (en
Inventor
彭团民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Pinecone Electronic Co Ltd
Priority to CN202010739426.4A priority Critical patent/CN112000803B/en
Publication of CN112000803A publication Critical patent/CN112000803A/en
Application granted granted Critical
Publication of CN112000803B publication Critical patent/CN112000803B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a text classification method and device, electronic equipment and a computer readable storage medium, and belongs to the field of text classification. The text classification method comprises the steps of obtaining natural language text; dividing the natural language text into a plurality of continuous text segments; generating a plurality of continuous feature vectors according to the plurality of continuous text fragments, wherein the plurality of text fragments are in one-to-one correspondence with the plurality of feature vectors; taking the feature vector as a local feature vector of the natural language text, and aggregating each local feature vector to obtain a global feature vector of the natural language text; and classifying the natural language text according to the global feature vector to obtain a classification result. The text is divided into text fragments, local features are extracted, and the local features are aggregated into global features, so that the obtained global features can effectively keep the features of each part, and the accuracy of classifying the natural language text is improved.

Description

Text classification method and device, electronic equipment and computer readable storage medium
Technical Field
The present disclosure relates to the field of text classification, and in particular, to a text classification method and apparatus, an electronic device, and a computer readable storage medium.
Background
Text classification aims at predicting the category of a given text, and is the basic task of NLP (Natural Language Processing). At present, the number of complex documents and texts rises rapidly, in the related technology, a plurality of neural network algorithms are often used for fusion, a method of weighting voting is adopted for text classification by adopting a plurality of models, at the moment, the models are more, the complexity is high, the deployment cost training cost is high, the results of each model are weighted and voted manually so that the obtained results are not optimal results, and the problem that characteristic information extracted from the texts is not abundant exists for a convolutional neural network or a cyclic neural network.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides a text classification method and apparatus, an electronic device, and a computer-readable storage medium.
According to a first aspect of an embodiment of the present disclosure, there is provided a text classification method, including:
acquiring a natural language text;
Dividing the natural language text into a plurality of continuous text segments;
generating a plurality of continuous feature vectors according to the plurality of continuous text fragments, wherein the plurality of text fragments are in one-to-one correspondence with the plurality of feature vectors;
taking the feature vector as a local feature vector of the natural language text, and aggregating each local feature vector to obtain a global feature vector of the natural language text;
And classifying the natural language text according to the global feature vector to obtain a classification result.
Optionally, the aggregating the local feature vectors with the feature vector as the local feature vector of the natural language text to obtain a global feature vector of the natural language text includes:
And (3) inputting each local feature vector analog into an image frame to extract NeXtVlad a model, obtaining a vector which is output by the NeXtVlad model and represents the global feature of the video, and taking the vector as the global feature vector of the natural language text.
Optionally, the dividing the natural language text into a plurality of consecutive text segments includes:
determining a target division mode of the natural language text according to the text length of the natural language text;
dividing the natural language text into a plurality of continuous text fragments according to the target division mode;
The target division mode comprises one or more of division according to short sentences, division according to long sentences and division according to paragraphs.
Optionally, the dividing the natural language text into a plurality of consecutive text segments includes:
And dividing the natural language text according to different granularities to obtain a plurality of continuous text fragments obtained by dividing under each granularity.
Optionally, the step of dividing the natural language text into a plurality of continuous text segments is performed by a text classification model to obtain the classification result;
Wherein the classification model comprises: the device comprises a data processing layer, a characteristic representation layer connected with the data processing layer, a characteristic aggregation layer connected with the characteristic representation layer and a classifier layer connected with the characteristic aggregation layer;
the classification model is obtained by training the parameters of the feature aggregation layer and the parameters of the classifier layer based on natural language text with classification labels as training samples.
Optionally, the acquiring the natural language text includes:
Acquiring a natural language text input by a user in a chat system;
The method further comprises the steps of:
and determining the chat intention of the user according to the classification result.
Optionally, the acquiring the natural language text includes:
Acquiring a natural language text to be audited;
The method further comprises the steps of:
determining whether the natural language text accords with a network release condition according to the classification result;
And under the condition that the natural language text accords with the network release condition, releasing the natural language text to a column corresponding to the classification result.
According to a second aspect of embodiments of the present disclosure, there is provided a text classification apparatus, including a data processing module that obtains a natural language text and divides the natural language text into a plurality of consecutive text segments;
The feature characterization module generates a plurality of continuous feature vectors according to the plurality of continuous text fragments, wherein the plurality of text fragments are in one-to-one correspondence with the plurality of feature vectors;
The feature aggregation module takes the feature vector as a local feature vector of the natural language text, and aggregates the local feature vectors to obtain a global feature vector of the natural language text;
And the classifier module classifies the natural language text according to the global feature vector to obtain a classification result.
According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:
A memory having a computer program stored thereon;
And a processor for executing the computer program in the memory to implement the steps of the above method.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the text classification method provided by the first aspect of the present disclosure.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: the natural language text is divided into a plurality of continuous text fragments, a plurality of continuous feature vectors are generated according to the plurality of continuous text fragments, and the plurality of continuous feature vectors are aggregated into global features, so that the obtained global features can effectively keep each local feature, the accuracy and the completeness of the features extracted from the text are improved, and the accuracy of classifying the natural language text is further improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flow chart illustrating a text classification method according to an exemplary embodiment.
FIG. 2 is a block diagram illustrating a training manner of a text classification model according to an exemplary embodiment.
Fig. 3 is another flow chart illustrating a text classification method according to an exemplary embodiment.
Fig. 4 is a block diagram illustrating a structure of a text classification apparatus according to an exemplary embodiment.
Fig. 5 is a block diagram of an electronic device, according to an example embodiment.
Fig. 6 is another block diagram of an electronic device, according to an example embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
Fig. 1 is a flowchart illustrating a text classification method according to an exemplary embodiment, where an execution subject of the method may be an electronic device, and the electronic device may be, for example, a server or a terminal, and the embodiment of the present disclosure is not limited to this, and as shown in fig. 1, the method includes the following steps:
step S11, natural language text is acquired.
And step S12, dividing the natural language text into a plurality of continuous text fragments.
And S13, generating a plurality of continuous feature vectors according to the plurality of continuous text fragments, wherein the plurality of text fragments are in one-to-one correspondence with the plurality of feature vectors.
And S14, taking the feature vector as a local feature vector of the natural language text, and aggregating the local feature vectors to obtain a global feature vector of the natural language text.
And S15, classifying the natural language text according to the global feature vector to obtain a classification result.
In the embodiment of the disclosure, the text is split into short text and feature vectors are extracted to obtain continuous text feature vectors, and the feature vectors extracted according to the short text are aggregated into global features. By adopting the scheme, in the long text classification problem, each local feature can be effectively reserved, the accuracy and the integrity of the feature extracted from the text are improved, and the accuracy of classifying the natural language text is further improved. In one possible implementation, the method steps shown in fig. 1 can implement splitting of the text and extraction and fusion of the feature vectors through one mathematical model, and multiple models are not required to be used for processing and classifying the acquired text, so that the training cost and the deployment cost are lower.
In an alternative embodiment, step S14 may include analogizing each of the local feature vectors into a local feature vector input image feature extraction NeXtVlad model of an image frame, obtaining a vector representing a global feature of a video output by the NeXtVlad model, and using the vector as a global feature vector of the natural language text. The NeXtVlad model is obtained through training, neXtVlad is an algorithm in the image field, local features of multi-frame images can be effectively compressed into global features, and after the multi-frame images are used, local segment features of long texts can be effectively aggregated into global features. NeXtVlad is a lightweight algorithm, the complexity of the model is low, the parameter amount is small, and the training cost and the deployment cost are low. By adopting the scheme, the algorithm can effectively obtain global features, reduce complexity and parameter quantity and reduce training cost and deployment cost.
In another alternative embodiment, steps S11 to S15 may be performed by a text classification model. FIG. 2 provides a schematic representation of a classification model, as shown in FIG. 2, comprising: a data processing layer 201, a feature characterization layer 202 connected to the data processing layer 201, a feature aggregation layer 203 connected to the feature characterization layer 202, and a classifier layer 204 connected to the feature aggregation layer 203; the classification model is obtained by training the parameters of the feature aggregation layer 203 and the parameters of the classifier layer 204 based on the natural language text with the classification labels as training samples.
The training process of the classification model shown in fig. 2 will be described below using the NeXtVlad model as an example. Specifically, the training step of the classification model may be as shown in fig. 2, where the data processing layer 201 inputs the data set 205 with the class label into the data module 206 to obtain the split text segment 207, where the data set 205 is the natural language text with the class label. The feature characterization layer 202 is configured to obtain a text segment 207 obtained by the data processing layer 201, and input the text segment 207 into the Bert model 208 to obtain a text segment continuous vector 209, where the text segment continuous vector 209 represents a local feature of the text. The feature aggregation layer 203 is configured to input the text segment continuous vector 209 obtained by the feature characterization layer 202 into the NeXtVlad model 210 to obtain a document vector 211, where the document vector 211 is a global feature aggregated by local features represented by the text segment continuous vector 209, and is used to characterize the content of the entire text. The classifier layer 204 is configured to input the document vector 211 into the logic cliff classification model 212 to obtain a classification result 213. After the completion, the class result 213 is input into the loss function 214, the parameters of the NeXtVlad model 210 of the feature aggregation layer 203 and the logic clitoris classification model 212 of the classifier layer 204 are updated through directional propagation, training and verification are alternately performed in the whole training process, the parameters are updated during training, the parameters are not updated during testing, the accuracy, recall rate and F1 value (F1 value=accuracy rate x recall rate x 2/(accuracy rate+recall rate)) of the training set are calculated, and when the F1 value of the verification set is maximum, training is stopped, so that the optimal model is obtained. The classification model has lower complexity, and the parameter quantity required to be updated in the training process is less, so that the training cost and the deployment cost are lower.
In yet another alternative embodiment, the dividing the natural language text into a plurality of text segments in step S12 may include: and determining a target division mode of the natural language text according to the text length of the natural language text, and dividing the natural language text into a plurality of continuous text fragments according to the target division mode.
For example, if the number of characters of the text is less than 512, the target division manner may be determined to be a manner of dividing according to phrases; if the number of characters of the text is greater than 512, it may be determined that the target division manner is a manner of dividing according to long sentences, and in implementation, the text between two periods may be divided by using the periods as division marks, and the text between two periods is used as a text segment. By adopting the scheme, the most suitable text division mode can be selected for texts with different lengths, and the more targeted global features can be obtained.
In yet another alternative embodiment, the dividing the natural language text into a plurality of text segments in step S12 may include: dividing the natural language text according to different granularities to obtain a plurality of continuous text fragments obtained by dividing under each granularity, wherein the granularity of the text is the sentence length of the text, for example, a short sentence is a first granularity, a long sentence is a second granularity, and a paragraph is a third granularity. Correspondingly, the comma can be used as a division mark to divide the text to obtain a first text segment, the period is used as a division mark to obtain a second text segment, and the paragraph indentation is used as a division mark to divide the text to obtain a third text segment. And respectively acquiring local features of the text fragments corresponding to the three different granularities, and then aggregating the local features into global features. At this time, the text is divided for multiple times, so that more local features are obtained, the overall features formed by aggregation are more comprehensive, and the classification is more accurate.
In yet another alternative embodiment, the acquiring the natural language text in step S11 may further include: and acquiring natural language texts input by the user in the chat system, and determining the chat intention of the user according to the classification result. For example, when a user performs a customer service consultation in a chat system with a certain electronic mall, the text input by the user can be obtained and classified, so as to obtain the chat intention of the user. For example, the user needs to return goods or consult a latest product, so that the electronic system can select more proper manual customer service for the user. Or in a certain question-answering community, the user inputs the question, the server acquires the natural voice text input by the user and classifies the text to obtain the category of the user question, so that other users who know the category can answer the question.
In an alternative embodiment, a natural language text to be audited may be obtained, and whether the natural language text meets a network publishing condition is determined according to the classification result, where the natural language text is published under a column corresponding to the classification result if the natural language text meets the network publishing condition. For example, a user publishes a piece of text in a piece of social software, a server obtains the text submitted by the user and classifies the text, if the text category characterizes the text as fraud information or terrorism information, the text is refused to be published, and if the text category is a piece of sports news, the text is published into a sports news column, so that the user focusing on the column can see the text.
Fig. 3 is a flowchart illustrating another text classification method according to an exemplary embodiment, where an execution subject of the method may be an electronic device, and the electronic device may be, for example, a server or a terminal, and the embodiment of the present disclosure is not limited thereto, and as shown in fig. 3, the method includes the following steps:
Step S31, natural language text is acquired.
And step S32, dividing the natural language text according to different granularities to obtain a plurality of continuous text fragments obtained by dividing under each granularity.
For example, a plurality of continuous text fragments are obtained by short sentence division, and a plurality of continuous text fragments are obtained by long sentence division, and a plurality of continuous text fragments are obtained by paragraph division.
Step S33, generating a plurality of continuous feature vectors at each granularity for a plurality of continuous text fragments obtained by dividing at each granularity.
Still taking short sentences, long sentences and paragraphs with different granularity as examples, feature vectors corresponding to a plurality of continuous text fragments obtained by dividing short sentences, feature vectors corresponding to a plurality of continuous text fragments obtained by dividing long sentences, and feature vectors corresponding to a plurality of continuous text fragments obtained by dividing paragraphs are obtained after the step S13.
Step S34, the continuous multiple feature vectors under each granularity are analogically converted into local feature vectors of the image frames, and the local feature vectors are input into an image feature extraction NeXtVlad model to obtain the fusion feature vectors under the granularity.
And obtaining a fusion feature vector corresponding to the text fragments obtained by each granularity division.
And S35, all the fusion feature vectors are analogically to local feature vectors of the image frames and input into an image feature extraction NeXtVlad model, so that global feature vectors which are output by the NeXtVlad model and serve as the natural language text are obtained.
Step S36, classifying the natural language text according to the global feature vector to obtain a classification result.
By adopting the scheme, the thinking of video processing is applied, long texts are regarded as videos consisting of images of one frame by one frame, each text segment corresponds to one frame of image, and a NextVlad model is utilized, which is mainly applied to the field of images, and is used for aggregating local continuous features into global features and representing the whole long texts. For very long text, features of different layers, such as fragments, short sentences, long sentences, paragraphs and the like, are aggregated together to be used as feature representations of the whole text for classification, so that local and global information can be effectively utilized.
Fig. 4 is a block diagram of a text classification device according to an exemplary embodiment. The text classification means may be implemented as part or all of the terminal by software, hardware or a combination of both. Referring to fig. 4, the apparatus includes an acquisition module 41, a division module 42, a generation module 43, an aggregation module 44, and a classification module 45.
The retrieval module 41 is configured to retrieve natural language text.
The partitioning module 42 is configured to partition the natural language text into a plurality of consecutive text segments.
The generating module 43 is configured to generate a plurality of continuous feature vectors from the plurality of continuous text segments, wherein the plurality of text segments are in one-to-one correspondence with the plurality of feature vectors.
The aggregation module 44 is configured to aggregate each local feature vector with the feature vector as a local feature vector of the natural language text, to obtain a global feature vector of the natural language text.
The classification module 45 is configured to classify the natural language text according to the global feature vector to obtain a classification result.
Alternatively, the aggregation module 44 may be specifically configured to: and (3) inputting each local feature vector analog into an image frame to extract NeXtVlad a model, obtaining a vector which is outputted by the NeXtVlad model and represents the global feature of the video, and taking the vector as the global feature vector of the natural language text. NeXtVlad is a lightweight algorithm, the complexity of the model is low, the parameter amount is small, and the training cost and the deployment cost are low.
Optionally, the partitioning module 42 may be specifically configured to determine a target partitioning manner for the natural language text according to a text length of the natural language text; dividing the natural language text into a plurality of continuous text fragments according to the target division mode; the target division mode comprises one or more of division according to short sentences, division according to long sentences and division according to paragraphs. At this time, the most suitable text division mode can be selected for the texts with different lengths, so that more targeted global features can be obtained.
Optionally, the dividing module 42 may be further specifically configured to divide the natural language text according to different granularities, so as to obtain a plurality of continuous text segments obtained by dividing at each granularity. At this time, the text is divided for multiple times, so that more local features are obtained, the overall features formed by aggregation are more comprehensive, and the classification is more accurate.
Alternatively, the partitioning module 42, the generating module 43, the aggregation module 44, the classification module 45 may specifically function as a classification model comprising: the device comprises a data processing layer, a characteristic characterization layer connected with the data processing layer, a characteristic aggregation layer connected with the characteristic characterization layer and a classifier layer connected with the characteristic aggregation layer; the classification model is obtained by training the parameters of the feature aggregation layer and the parameters of the classifier layer based on the natural language text with the classification labels as training samples, and has low complexity, less parameter quantity required to be updated in the training process, and low training cost and deployment cost.
Optionally, the obtaining module 41 may be specifically configured to obtain a natural language text input by the user in the chat system; further, the apparatus 40 may be configured to determine chat intents of the user based on the classification result. For example, when a user performs a customer service consultation in a chat system with a certain electronic mall, the text input by the user can be obtained and classified, so as to obtain the chat intention of the user. For example, the user needs to return goods or consult a latest product, so that the electronic system can select more proper manual customer service for the user. Or in a certain question-answering community, the user inputs the question, the server acquires the natural voice text input by the user and classifies the text to obtain the category of the user question, so that other users who know the category can answer the question.
Optionally, the obtaining module 41 may be further configured to obtain a natural language text to be audited; further, the device 40 may be configured to determine whether the natural language text meets a web publishing condition according to the classification result; and under the condition that the natural language text accords with the network release condition, releasing the natural language text to a column corresponding to the classification result. For example, a user publishes a piece of text in a piece of social software, a server obtains the text submitted by the user and classifies the text, if the text category characterizes the text as fraud information or terrorism information, the text is refused to be published, and if the text category is a piece of sports news, the text is published into a sports news column, so that the user focusing on the column can see the text.
In the embodiment of the disclosure, the text is split into short text and feature vectors are extracted to obtain continuous text feature vectors, and the feature vectors extracted according to the short text are aggregated into global features. By adopting the scheme, in the long text classification problem, the characteristics of each part can be effectively reserved, the number of the models adopted by the scheme is small, a plurality of models are not needed to be used for processing and classifying the acquired text, the weight is light, the parameter quantity is small, and the training cost and the deployment cost are low.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
An exemplary embodiment of the present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of the above-described method embodiments.
An exemplary embodiment of the present disclosure also provides an electronic device including:
A memory having a computer program stored thereon;
and a processor for executing the computer program in the memory to implement the steps of the method embodiment described above.
Fig. 5 is a block diagram illustrating a structure of the above-described electronic device according to an exemplary embodiment. As shown in fig. 5, the electronic device 50 may include: a processor 51, a memory 52. The electronic device 50 may also include one or more of a multimedia component 53, an input/output (I/O) interface 54, and a communication component 55.
Wherein the processor 51 is configured to control the overall operation of the electronic device 50 to perform all or part of the steps of the text classification method described above. The memory 52 is used to store various types of data to support operation at the electronic device 50, which may include, for example, instructions for any application or method operating on the electronic device 50, as well as application-related data, such as contact data, messages, pictures, audio, video, and so forth. The Memory 52 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 53 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 52 or transmitted through the communication component 55. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 54 provides an interface between the processor 51 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 55 is used for wired or wireless communication between the electronic device 50 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near Field Communication (NFC) for short, 2G, 3G, 4G, NB-IOT, eMTC, or 5G, etc., or one or a combination of several thereof, is not limited herein. The corresponding communication assembly 55 may thus comprise: wi-Fi module, bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic device 50 may be implemented by one or more Application-specific integrated circuits (ASIC), digital signal processors (DIGITAL SIGNAL Processor DSP), digital signal processing device (DIGITAL SIGNAL Processing Device DSPD), programmable logic device (Programmable Logic Device PLD), field programmable gate array (Field Programmable GATE ARRAY FPGA), controller, microcontroller, microprocessor, or other electronic component for performing the above-described text classification method.
The computer readable storage medium provided in the above embodiment may be the above memory 52 including program instructions executable by the processor 51 of the electronic device 50 to perform the above text classification method.
Fig. 6 is another block diagram illustrating the structure of the above-described electronic device according to an exemplary embodiment. For example, the electronic device 60 may be provided as a server. Referring to fig. 6, the electronic device 60 comprises a processor 61, which may be one or more in number, and a memory 62 for storing a computer program executable by the processor 61. The computer program stored in memory 62 may include one or more modules each corresponding to a set of instructions. Further, the processor 61 may be configured to execute the computer program to perform the text classification method described above.
In addition, the electronic device 60 may further include a power supply component 63 and a communication component 64, the power supply component 63 may be configured to perform power management of the electronic device 60, and the communication component 64 may be configured to enable communication of the electronic device 60, e.g., wired or wireless communication. In addition, the electronic device 60 may also include an input/output (I/O) interface 65. The electronic device 60 may operate based on an operating system stored in the memory 62, such as Windows Server, mac OS XTM, unixTM, linuxTM, or the like.
The computer readable storage medium provided by the above embodiment may be the above memory 62 including program instructions executable by the processor 61 of the electronic device 60 to perform the above text classification method.
In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described text classification method when executed by the programmable apparatus.
The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, but the present disclosure is not limited to the specific details of the embodiments described above, and various simple modifications may be made to the technical solutions of the present disclosure within the scope of the technical concept of the present disclosure, and all the simple modifications belong to the protection scope of the present disclosure.
In addition, the specific features described in the foregoing embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, the present disclosure does not further describe various possible combinations.
Moreover, any combination between the various embodiments of the present disclosure is possible as long as it does not depart from the spirit of the present disclosure, which should also be construed as the disclosure of the present disclosure.

Claims (9)

1. A method of text classification, comprising:
acquiring a natural language text;
Dividing the natural language text into a plurality of continuous text segments;
generating a plurality of continuous feature vectors according to the plurality of continuous text fragments, wherein the plurality of text fragments are in one-to-one correspondence with the plurality of feature vectors;
taking the feature vector as a local feature vector of the natural language text, and aggregating each local feature vector to obtain a global feature vector of the natural language text;
Classifying the natural language text according to the global feature vector to obtain a classification result; the step of using the feature vector as a local feature vector of the natural language text, and aggregating each local feature vector to obtain a global feature vector of the natural language text, includes:
And (3) inputting each local feature vector analog into an image frame to extract NeXtVlad a model, obtaining a vector which is output by the NeXtVlad model and represents the global feature of the video, and taking the vector as the global feature vector of the natural language text.
2. The method of claim 1, wherein the dividing the natural language text into a plurality of consecutive text segments comprises:
determining a target division mode of the natural language text according to the text length of the natural language text;
dividing the natural language text into a plurality of continuous text fragments according to the target division mode;
The target division mode comprises one or more of division according to short sentences, division according to long sentences and division according to paragraphs.
3. The method of claim 1, wherein the dividing the natural language text into a plurality of consecutive text segments comprises:
And dividing the natural language text according to different granularities to obtain a plurality of continuous text fragments obtained by dividing under each granularity.
4. A method according to any one of claims 1-3, wherein said step of dividing said natural language text into successive text segments is performed by a text classification model to obtain said classification result;
Wherein the classification model comprises: the device comprises a data processing layer, a characteristic representation layer connected with the data processing layer, a characteristic aggregation layer connected with the characteristic representation layer and a classifier layer connected with the characteristic aggregation layer;
the classification model is obtained by training the parameters of the feature aggregation layer and the parameters of the classifier layer based on natural language text with classification labels as training samples.
5. The method of claim 4, wherein the obtaining natural language text comprises:
Acquiring a natural language text input by a user in a chat system;
The method further comprises the steps of:
and determining the chat intention of the user according to the classification result.
6. The method of claim 4, wherein the obtaining natural language text comprises:
Acquiring a natural language text to be audited;
The method further comprises the steps of:
determining whether the natural language text accords with a network release condition according to the classification result;
And under the condition that the natural language text accords with the network release condition, releasing the natural language text to a column corresponding to the classification result.
7. A text classification device, the device comprising:
the acquisition module is used for acquiring the natural language text;
the dividing module is used for dividing the natural language text into a plurality of continuous text fragments;
the generation module is used for generating a plurality of continuous feature vectors according to the plurality of continuous text fragments, wherein the plurality of text fragments are in one-to-one correspondence with the plurality of feature vectors;
The aggregation module is used for aggregating each local feature vector by taking the feature vector as the local feature vector of the natural language text to obtain a global feature vector of the natural language text;
the classification module is used for classifying the natural language text according to the global feature vector to obtain a classification result;
The aggregation module is used for analogizing each local feature vector into a local feature vector input image feature extraction NeXtVlad model of an image frame, obtaining a vector which is output by the NeXtVlad model and represents the global feature of the video, and taking the vector as the global feature vector of the natural language text.
8. An electronic device, the electronic device comprising:
A memory having a computer program stored thereon;
A processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-6.
9. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 1-6.
CN202010739426.4A 2020-07-28 2020-07-28 Text classification method and device, electronic equipment and computer readable storage medium Active CN112000803B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010739426.4A CN112000803B (en) 2020-07-28 2020-07-28 Text classification method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010739426.4A CN112000803B (en) 2020-07-28 2020-07-28 Text classification method and device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112000803A CN112000803A (en) 2020-11-27
CN112000803B true CN112000803B (en) 2024-05-14

Family

ID=73462397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010739426.4A Active CN112000803B (en) 2020-07-28 2020-07-28 Text classification method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112000803B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590813A (en) * 2021-01-20 2021-11-02 腾讯科技(深圳)有限公司 Text classification method, recommendation device and electronic equipment
CN112836049B (en) * 2021-01-28 2023-04-07 杭州网易智企科技有限公司 Text classification method, device, medium and computing equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019052403A1 (en) * 2017-09-12 2019-03-21 腾讯科技(深圳)有限公司 Training method for image-text matching model, bidirectional search method, and related apparatus
CN110334110A (en) * 2019-05-28 2019-10-15 平安科技(深圳)有限公司 Natural language classification method, device, computer equipment and storage medium
CN110334705A (en) * 2019-06-25 2019-10-15 华中科技大学 A kind of Language Identification of the scene text image of the global and local information of combination
CN111428026A (en) * 2020-02-20 2020-07-17 西安电子科技大学 Multi-label text classification processing method and system and information data processing terminal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200184016A1 (en) * 2018-12-10 2020-06-11 Government Of The United States As Represetned By The Secretary Of The Air Force Segment vectors

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019052403A1 (en) * 2017-09-12 2019-03-21 腾讯科技(深圳)有限公司 Training method for image-text matching model, bidirectional search method, and related apparatus
CN110334110A (en) * 2019-05-28 2019-10-15 平安科技(深圳)有限公司 Natural language classification method, device, computer equipment and storage medium
CN110334705A (en) * 2019-06-25 2019-10-15 华中科技大学 A kind of Language Identification of the scene text image of the global and local information of combination
CN111428026A (en) * 2020-02-20 2020-07-17 西安电子科技大学 Multi-label text classification processing method and system and information data processing terminal

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于LSTM-CNN的中文短文本分类模型;杜雪嫣;王秋实;王斌君;;江苏警官学院学报(01);全文 *
基于多头注意力机制的人物关系抽取方法;夏鹤珑;严丹丹;;成都工业学院学报(01);全文 *

Also Published As

Publication number Publication date
CN112000803A (en) 2020-11-27

Similar Documents

Publication Publication Date Title
CN110347873B (en) Video classification method and device, electronic equipment and storage medium
WO2018157703A1 (en) Natural language semantic extraction method and device, and computer storage medium
US20220121906A1 (en) Task-aware neural network architecture search
CN109740018B (en) Method and device for generating video label model
CN110489578B (en) Picture processing method and device and computer equipment
CN110222649B (en) Video classification method and device, electronic equipment and storage medium
CN109299344A (en) The generation method of order models, the sort method of search result, device and equipment
KR102264234B1 (en) A document classification method with an explanation that provides words and sentences with high contribution in document classification
CN109740167B (en) Method and apparatus for generating information
US10685012B2 (en) Generating feature embeddings from a co-occurrence matrix
CN112000803B (en) Text classification method and device, electronic equipment and computer readable storage medium
EP3885966B1 (en) Method and device for generating natural language description information
CN113240510B (en) Abnormal user prediction method, device, equipment and storage medium
CN111274473B (en) Training method and device for recommendation model based on artificial intelligence and storage medium
CN110245310B (en) Object behavior analysis method, device and storage medium
US20230035366A1 (en) Image classification model training method and apparatus, computer device, and storage medium
CN111783873A (en) Incremental naive Bayes model-based user portrait method and device
CN107291774B (en) Error sample identification method and device
CN112182281B (en) Audio recommendation method, device and storage medium
CN111199540A (en) Image quality evaluation method, image quality evaluation device, electronic device, and storage medium
CN110442803A (en) Data processing method, device, medium and the calculating equipment executed by calculating equipment
CN113935332A (en) Book grading method and book grading equipment
JP6680663B2 (en) Information processing apparatus, information processing method, prediction model generation apparatus, prediction model generation method, and program
CN110933504A (en) Video recommendation method, device, server and storage medium
CN113051379B (en) Knowledge point recommendation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant