CN111078878B

CN111078878B - Text processing method, device, equipment and computer readable storage medium

Info

Publication number: CN111078878B
Application number: CN201911239505.2A
Authority: CN
Inventors: 石逸轩; 戴明洋; 潘剑飞; 周俊; 罗程亮; 许金泉; 姚远
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2023-07-04
Anticipated expiration: 2039-12-06
Also published as: CN111078878A

Abstract

The disclosure discloses a text processing method, a device, equipment and a computer readable storage medium, relating to the field of text processing. The specific implementation scheme is as follows: acquiring data to be classified input by a terminal device, wherein the data to be classified comprises a text to be classified and an identifier of a user inputting the text to be classified; obtaining user characteristics corresponding to the user according to the user identification, and carrying out vectorization processing on the text to be classified and the user characteristics to obtain vector information to be processed; processing the vector information to be processed by adopting a preset feature extraction model to obtain feature information corresponding to the vector information to be processed; and classifying the characteristic information through a cascade classifier to obtain the category information corresponding to the text to be classified. Therefore, factors of user characteristics can be considered in the classification process, and the accuracy of text classification is improved.

Description

Text processing method, device, equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of data processing, and in particular, to a text processing technique.

Background

When analyzing content generated by a user, a class of problems is often encountered, requiring hierarchical subject classification of the text content generated by the user. In practical applications, this task is applied in many business scenarios, such as post classification, question answering, advertisement delivery, search result organization, etc.

In order to realize classification of content data, a classification tree structure is generally constructed in advance in the prior art, different classification models are respectively constructed aiming at leaf nodes of the tree structure, and each classification model is adopted to classify the content data.

However, the text content produced by the user is generally greatly different from the natural language, the used language is more random, and the Out voice phenomenon is more serious, so that the text content is more dependent on the user information. Therefore, when classifying content data by the above method, such content data cannot be classified accurately.

Disclosure of Invention

The present disclosure provides a text processing method, apparatus, device, and computer readable storage medium for solving the problem that when the existing text processing method classifies content data, such content data cannot be accurately classified.

In a first aspect, an embodiment of the present disclosure provides a text processing method, including:

acquiring data to be classified input by a terminal device, wherein the data to be classified comprises a text to be classified and an identifier of a user inputting the text to be classified;

obtaining user characteristics corresponding to the user according to the user identification, and carrying out vectorization processing on the text to be classified and the user characteristics to obtain vector information to be processed;

processing the vector information to be processed by adopting a preset feature extraction model to obtain feature information corresponding to the vector information to be processed;

and classifying the characteristic information through a cascade classifier to obtain the category information corresponding to the text to be classified.

According to the text processing method provided by the embodiment, the user characteristics used for representing the conventional characteristics when the user publishes the text information are added in the characteristic extraction process, so that factors of the user characteristics can be considered in the classification process, and the accuracy of text classification is improved.

In one possible design, after the obtaining the data to be classified input by the terminal device, the method further includes:

word segmentation, punctuation mark removal and coding treatment are carried out on the text to be classified, and a preprocessed text to be classified is obtained;

Accordingly, the vectorizing the text to be classified and the user features includes:

and vectorizing the preprocessed text to be classified and the user features.

In one possible design, the vectorizing the text to be classified and the user features includes:

and vectorizing the text to be classified and the user features through the Embedding.

According to the text processing method, vectorization processing is carried out on the text to be classified and the user characteristics in an Embedding mode, so that basic granularity vector representation of the text to be classified can be accurately obtained.

In one possible design, the cascade classifier includes a multi-layer classifier, and the classifying the feature information by the cascade classifier includes:

and sequentially inputting the characteristic information and the classification result output by the classifier of the upper layer into the classifier of the lower layer, and taking the result output by the classifier of the last layer as the class information corresponding to the text to be classified.

According to the text processing method provided by the embodiment, the output result of the classifier of the upper layer and the feature information are input into the classifier of the lower layer, so that the subcategory of the classifier of the lower layer under the classification result can perform reclassifying operation on the feature information, and the classification efficiency and classification accuracy are effectively improved.

In one possible design, the sequentially inputting the feature information and the classification result output by the previous layer of classifier into the next layer of classifier, and taking the result output by the last layer of classifier as the category information corresponding to the text to be classified includes:

inputting the characteristic information into a preset first layer classifier to obtain a first class identifier corresponding to the characteristic information;

inputting the characteristic information and the first class identifier into a preset second-layer classifier, wherein the second classifier is used for classifying the characteristic information under the sub-class of the first class identifier to obtain a second class identifier corresponding to the characteristic information, and associating the first class identifier and the second class identifier to obtain a target class identifier;

and judging whether other subcategories are included under the second category identification, if so, inputting the target category identification and the characteristic information into a classifier of the next layer to perform classification operation until the category information output by the classifier does not include other subcategories.

In one possible design, after the classifying operation is performed on the feature information by using the cascade classifier to obtain the category information corresponding to the text to be classified, the method further includes:

and storing the text to be classified into a storage path corresponding to the category information according to the category information corresponding to the text to be classified.

According to the text processing method, the text to be classified is stored in the storage path corresponding to the category information according to the category information corresponding to the text to be classified, so that the text to be classified can be conveniently applied after being classified.

In a second aspect, an embodiment of the present disclosure provides a text processing apparatus, including:

the terminal equipment comprises an acquisition module, a classification module and a classification module, wherein the acquisition module is used for acquiring data to be classified, which are input by the terminal equipment, wherein the data to be classified comprise texts to be classified and identifiers of users who input the texts to be classified;

The vectorization processing module is used for acquiring user characteristics corresponding to the user according to the user identification, vectorizing the text to be classified and the user characteristics, and acquiring vector information to be processed;

the feature extraction module is used for processing the vector information to be processed by adopting a preset feature extraction model to obtain feature information corresponding to the vector information to be processed;

and the classification module is used for performing classification operation on the characteristic information through a cascade classifier to obtain category information corresponding to the text to be classified.

In one possible design, the apparatus further comprises:

the preprocessing module is used for carrying out word segmentation, punctuation mark removal and coding processing on the text to be classified to obtain preprocessed text to be classified;

accordingly, the vectorization processing module is configured to:

and vectorizing the preprocessed text to be classified and the user features.

In one possible design, the vectorization processing module is configured to:

In one possible design, the cascade classifier includes a multi-layer classifier, and the classification module is configured to:

In one possible design, the classification module is configured to:

In one possible design, the apparatus further comprises:

and the processing module is used for storing the text to be classified into a storage path corresponding to the category information according to the category information corresponding to the text to be classified.

In a third aspect, an embodiment of the present disclosure provides a text processing apparatus, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of the first aspect.

In a fifth aspect, an embodiment of the present disclosure provides a text processing method, including:

obtaining data to be classified, wherein the data to be classified comprises a text to be classified and an identifier of a user inputting the text to be classified;

processing the vector information to be processed to obtain characteristic information corresponding to the vector information to be processed;

And classifying the characteristic information to obtain the category information corresponding to the text to be classified.

According to the text processing method, device, equipment and computer readable storage medium, the user characteristics used for representing the conventional characteristics when the user publishes text information are added in the characteristic extraction process, so that factors of the user characteristics can be considered in the classification process, and the accuracy of text classification is improved.

Other effects of the above alternative will be described below in connection with specific embodiments.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a diagram of a system architecture upon which the present disclosure is based;

fig. 2 is a flow chart of a text processing method according to a first embodiment of the disclosure;

FIG. 3 is a class organization provided by an embodiment of the present disclosure;

fig. 4 is a flow chart of a text processing method according to a second embodiment of the disclosure;

fig. 5 is a schematic structural diagram of a text processing device according to a third embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a text processing device according to a fourth embodiment of the present disclosure;

fig. 7 is a flow chart of a text processing method according to a fifth embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In order to solve the problem that when the existing text processing method classifies content data, the content data cannot be accurately classified, the disclosure provides a text processing method, a text processing device, text processing equipment and a computer readable storage medium. Because the existing text processing method does not consider personalized information of the user, the classification result is inaccurate, and therefore, in order to improve the accuracy of the classification result, the user characteristic information can be added in the characteristic extraction process.

It should be noted that the text processing method, apparatus, device and computer readable storage medium provided in the present disclosure can be applied to any scene of text classification.

Fig. 1 is a system architecture diagram based on the present disclosure, and as shown in fig. 1, the system architecture diagram based on the present disclosure at least includes a plurality of terminal devices 1 and a text processing device 2, where the text processing device 2 is written in a language such as C/c++, java, shell, python, or the like; the terminal device 1 may be, for example, a desktop computer, a tablet computer, etc. The terminal device 1 is communicatively connected to the text processing means 2 so as to be capable of information interaction with the terminal device.

Fig. 2 is a flow chart of a text processing method according to an embodiment of the disclosure, as shown in fig. 2, where the method includes:

step 101, obtaining data to be classified input by a terminal device, wherein the data to be classified comprises a text to be classified and an identification of a user inputting the text to be classified.

The execution main body of the embodiment is a text processing device, and the text processing device is in communication connection with the terminal equipment, so that information interaction with the terminal equipment can be performed. The terminal device can acquire data to be classified which needs to be classified. Specifically, the user can publish text content on the terminal equipment, and correspondingly, after receiving the text content, the terminal equipment can send the text content to the text processing device in real time for classification processing; optionally, the text processing device may also acquire text content published by the user from the terminal device according to a preset period, and perform a classification operation on the text content. Accordingly, the text processing device may obtain the data to be classified from the terminal device.

It should be noted that, since the text content produced by the user generally has a larger difference from the natural language, the language used is more random and depends on the user information, in order to improve the accuracy of classifying the text to be classified, the text to be classified can also carry the identification of the user who publishes the text to be classified.

Step 102, obtaining user characteristics corresponding to the user according to the user identification, and carrying out vectorization processing on the text to be classified and the user characteristics to obtain vector information to be processed.

In this embodiment, in order to achieve the acquisition of the user features, a database including a large number of user features may be pre-established, where the user features can characterize the usage features of the user when publishing text information, and so on. Accordingly, after the user identifier is obtained, the user feature corresponding to the user can be obtained from the database according to the user identifier.

After the text to be classified and the user characteristics are obtained, the feature extraction operation can be performed on the text to be classified and the user characteristics. Accordingly, before extracting the features, in order to facilitate model processing, vectorization processing can be performed on the text to be classified and the user features, so as to obtain the text to be classified and the vector information to be processed corresponding to the user features.

Specifically, based on the above embodiment, step 102 specifically includes:

In this embodiment, vectorization processing may be performed on the text to be classified and the user feature in an Embedding manner, so as to obtain a basic granularity vector representation of the text to be classified. The basic granularity may be word granularity or word granularity. The method can be used for carrying out word segmentation on the text to be classified, carrying out vectorization on each word group after word segmentation to obtain word vectors corresponding to each word group, or carrying out vectorization on the text to be classified directly without carrying out word segmentation on the text to be classified to obtain the word vectors. The present disclosure is not limited in this regard.

And 103, processing the vector information to be processed by adopting a preset feature extraction model to obtain feature information corresponding to the vector information to be processed.

In this embodiment, after obtaining the text to be classified and the to-be-processed vector information corresponding to the user feature, the feature information of the to-be-processed vector information may be obtained. Specifically, the vector information to be processed can be processed by adopting a preset feature extraction model, so as to obtain feature information corresponding to the vector information to be processed. Any feature extraction model capable of performing feature extraction may be used to process the vector information to be processed, for example CNN, RNN, LSTM, transformer, which is not limited in this disclosure.

As an implementation manner, since different network models have different advantages in task processing, after receiving data to be classified, the characteristics of the text to be classified can be judged first, and for different characteristics, different network models are adopted to perform characteristic extraction. For example, CNNs are good at extracting text relationships of neighboring windows; the LSTM can obtain the dependency information in the long sentence text; the transducer is suitable for the task of the Seq2Seq, and the BERT model adopts a bidirectional transducer structure to make breakthrough progress on a plurality of NLP tasks.

And 104, classifying the characteristic information through a cascade classifier to obtain the category information corresponding to the text to be classified.

In this embodiment, fig. 3 is a class organization structure provided by the embodiment of the present disclosure, as shown in fig. 3, because the classification process includes multiple different class levels, for example, secondary classification under technology (digital, internet, mathematics, physics, etc.), tertiary classification (television, mobile phone, etc.), and quaternary classification (full-screen mobile phone, non-full-screen mobile phone, etc.). Therefore, in order to realize accurate classification of the feature information, a preset cascade classifier can be adopted to classify the feature information, so that the category information corresponding to the text to be classified is obtained.

Further, on the basis of any of the above embodiments, after step 104, the method further includes:

In this embodiment, after the text to be classified is subjected to the classification operation, it may be stored in a storage path corresponding to the category information. When the text information corresponding to a certain category is called, all the text information can be directly obtained from the storage path corresponding to the category.

Further, on the basis of any one of the above embodiments, after step 101, the method further includes:

accordingly, step 102 specifically includes:

and vectorizing the preprocessed text to be classified and the user features.

In this embodiment, in order to improve the classification efficiency of the text to be classified, the text to be classified may be first preprocessed before performing the classification operation. Specifically, the text to be classified after pretreatment can be obtained by performing word segmentation, punctuation removal, encoding treatment and the like on the text to be classified. Accordingly, vectorization processing can be performed on the user characteristics and the preprocessed text to be classified to obtain vector information to be processed.

According to the text processing method, the text to be classified is preprocessed before being classified, so that useless characters and the like in the text to be classified can be removed, and the classification efficiency of the text to be classified is improved.

Further, on the basis of any one of the foregoing embodiments, the cascade classifier includes a multi-layer classifier, and step 104 specifically includes:

In this embodiment, since the classification process includes a plurality of different class levels, a cascade classifier including a multi-layer classifier may be used to classify the feature information. Specifically, in order to improve the classification efficiency and the classification precision, the classification result output by the previous layer may be input to the next layer classifier together with the feature information, so that the next layer classifier can perform the reclassification operation on the feature information in the subcategory under the classification result. For example, if the classification result output by the first layer classifier is "science and technology", the science and technology label and the feature information can be input into the next layer classifier, and accordingly, the next layer classifier can perform classification operation on the feature information in a plurality of subcategories of "digital, internet, mathematics, physical, etc. under the science and technology. And executing the steps aiming at each layer of classifier, and taking the classification result output by the last layer of classifier as the class information corresponding to the text to be classified. If the current classifier is the first classifier in the cascade classifier, the feature information can be input into the classifier only; if the current classifier is the nth classifier in the cascade classifier, the classification result and the feature information of the previous classifier can be input into the classifier.

It should be noted that in the prior art, a classification tree structure is generally constructed in advance, different classification models are respectively constructed for leaf nodes of the tree structure, and each classification model is adopted to classify content data. However, classification of content data using the above method relies on the fact that a model needs to be trained for each sub-class for each layer to solve the sub-class classification problem. If the topic tree structure is deep, the models which cover each sub-category are difficult to train, so that the classification efficiency is seriously affected.

Fig. 4 is a schematic flow chart of a text processing method according to a second embodiment of the present disclosure, where, based on any one of the embodiments described above, as shown in fig. 4, the steps of sequentially inputting the feature information and the classification result output by the previous layer of classifier into the next layer of classifier, and taking the result output by the last layer of classifier as the classification information corresponding to the text to be classified include:

Step 201, inputting the characteristic information into a preset first layer classifier to obtain a first class identifier corresponding to the characteristic information;

step 202, inputting the characteristic information and the first class identifier into a preset second layer classifier, wherein the second classifier is used for classifying the characteristic information under the sub-class of the first class identifier to obtain a second class identifier corresponding to the characteristic information, and associating the first class identifier and the second class identifier to obtain a target class identifier;

and 203, judging whether other subcategories are included under the second category identification, if so, inputting the target category identification and the characteristic information into a classifier of the next layer to perform classification operation until the category information output by the classifier does not include other subcategories.

In this embodiment, after the feature information is obtained, the feature information may be input into a preset first layer classifier, a first class identifier corresponding to the feature information is obtained, and the first class identifier and the feature information are input into a second layer classifier together, so that the second layer classifier can perform a classification operation on the feature information in multiple sub-classes under the first class identifier, and a second class identifier corresponding to the feature information is obtained. And associating the first category identifier with the second category identifier to obtain the target category identifier. And determining whether other subcategories are included under the second category identification, if so, continuing to adopt a subsequent classifier to classify the characteristic information, and if not, taking the second category identification as the category information corresponding to the text to be classified.

For example, the classification information output by the first layer classifier is scientific and technical, and the classification information is provided with a plurality of subcategories, so that the label of the science and technology and the characteristic information can be input into the second layer classifier together, the second classifier classifies the characteristic information in a plurality of subcategories of digital, internet, mathematics and physics under the scientific and technical classification, the classification result of digital is obtained, the characteristic information is continuously classified in a plurality of subcategories under the digital classification, the classification result of mobile phone is obtained, and the classification operation is continuously performed on a plurality of subcategories under the mobile phone classification, so that the mobile phone with a comprehensive screen is obtained. Correspondingly, a plurality of category identifiers are associated to obtain final characteristic information of 'science and technology-digital code-mobile phone-full screen mobile phone'.

Fig. 5 is a schematic structural diagram of a text processing device according to a third embodiment of the present disclosure, as shown in fig. 5, the text processing device 30 includes: an acquisition module 31, a vectorization processing module 32, a feature extraction module 33 and a classification module 34. The acquiring module 31 is configured to acquire data to be classified input by a terminal device, where the data to be classified includes a text to be classified and an identifier of a user who inputs the text to be classified; the vectorization processing module 32 is configured to obtain a user feature corresponding to the user according to the identifier of the user, and perform vectorization processing on the text to be classified and the user feature to obtain vector information to be processed; the feature extraction module 33 is configured to process the vector information to be processed by using a preset feature extraction model, so as to obtain feature information corresponding to the vector information to be processed; and the classification module is used for performing classification operation on the characteristic information through a cascade classifier to obtain category information corresponding to the text to be classified.

Further, on the basis of the third embodiment, the apparatus further includes:

accordingly, the vectorization processing module is configured to:

and vectorizing the preprocessed text to be classified and the user features.

Further, on the basis of any one of the foregoing embodiments, the vectorization processing module is configured to:

Further, on the basis of any one of the foregoing embodiments, the cascade classifier includes a multi-layer classifier, and the classification module is configured to:

Further, on the basis of any one of the above embodiments, the classification module is configured to:

Further, on the basis of any one of the foregoing embodiments, the apparatus further includes:

According to an embodiment of the present disclosure, the present disclosure also provides a text processing apparatus and a readable storage medium.

Fig. 6 is a schematic structural diagram of a text processing device according to a fourth embodiment of the present disclosure, and as shown in fig. 6, is a block diagram of the text processing device according to the text processing method according to the embodiment of the present disclosure. Text processing devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The text processing device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the text processing apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the text processing device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple text processing devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 601 is illustrated in fig. 6.

Memory 602 is a non-transitory computer-readable storage medium provided by the present disclosure. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the text processing methods provided by the present disclosure. The non-transitory computer readable storage medium of the present disclosure stores computer instructions for causing a computer to perform the text processing method provided by the present disclosure.

The memory 602 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 31, the vectorization processing module 32, the feature extraction module 33, and the classification module 34 shown in fig. 5) corresponding to the text processing method in the embodiments of the present disclosure. The processor 601 executes various functional applications of the server and data processing, i.e., implements the text processing method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 602.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of a text processing device for text processing, or the like. In addition, the memory 602 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 602 may optionally include memory located remotely from processor 601, which may be connected to a text processing device for text processing via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The text processing apparatus of the text processing method may further include: an input device 603 and an output device 604. The processor 601, memory 602, input device 603 and output device 604 may be connected by a bus or otherwise, for example in fig. 6.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the text processing apparatus, such as a touch screen, keypad, mouse, trackpad, touchpad, pointer stick, one or more mouse buttons, trackball, joystick, and like input devices. The output means 604 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Fig. 7 is a flow chart of a text processing method provided in a fifth embodiment of the present disclosure, as shown in fig. 5, where the method includes:

Step 501, obtaining data to be classified, wherein the data to be classified comprises a text to be classified and an identifier of a user inputting the text to be classified;

step 502, obtaining user characteristics corresponding to the user according to the user identification, and carrying out vectorization processing on the text to be classified and the user characteristics to obtain vector information to be processed;

step 503, processing the vector information to be processed to obtain feature information corresponding to the vector information to be processed;

and 504, classifying the characteristic information to obtain the class information corresponding to the text to be classified.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present application may be performed in parallel or sequentially or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A text processing method, comprising:

2. The method according to claim 1, wherein after the obtaining the data to be classified input by the terminal device, further comprises:

and vectorizing the preprocessed text to be classified and the user features.

3. The method of claim 1, wherein the vectorizing the text to be classified and the user features comprises:

4. A method according to any one of claims 1-3, wherein the cascade classifier comprises a multi-layer classifier, and the classifying the feature information by the cascade classifier comprises:

5. The method according to claim 4, wherein the sequentially inputting the feature information and the classification result output by the previous classifier into the next classifier, and taking the result output by the last classifier as the classification information corresponding to the text to be classified includes:

inputting the characteristic information and the first class identifier into a preset second-layer classifier, wherein the second-layer classifier is used for classifying the characteristic information under the sub-class of the first class identifier to obtain a second class identifier corresponding to the characteristic information, and associating the first class identifier and the second class identifier to obtain a target class identifier;

6. A method according to any one of claims 1 to 3, wherein after the classifying operation is performed on the feature information by using a cascade classifier to obtain the category information corresponding to the text to be classified, the method further includes:

7. A text processing apparatus, comprising:

8. The apparatus of claim 7, wherein the apparatus further comprises:

accordingly, the vectorization processing module is configured to:

and vectorizing the preprocessed text to be classified and the user features.

9. The apparatus of claim 7, wherein the vectorization processing module is to:

10. The apparatus of any one of claims 7-9, wherein the cascade classifier includes a multi-layer classifier therein, the classification module being configured to:

11. The apparatus of claim 10, wherein the classification module is configured to:

12. The apparatus according to any one of claims 7-9, wherein the apparatus further comprises:

13. A text processing apparatus, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A text processing method, comprising: