WO2018028065A1

WO2018028065A1 - Method and device for classifying short message and computer storage medium

Info

Publication number: WO2018028065A1
Application number: PCT/CN2016/105378
Authority: WO
Inventors: 陈军
Original assignee: 中兴通讯股份有限公司
Priority date: 2016-08-11
Filing date: 2016-11-10
Publication date: 2018-02-15
Also published as: CN107734131B; CN107734131A

Abstract

Provided in the present invention are a method and a device for classifying a short message and a computer storage medium. The method for classifying a short message comprises: recognizing a preset feature word in a received short message; substituting the preset feature word in the short message with a feature symbol corresponding to the preset feature word; determining a first classification model; reading, from a high-frequency word vector library of the first classification model, a symbol vector of the feature symbol and a word vector of the remaining words other than the preset feature word in the short message, performing a weighted operation, according to the first classification model, on the symbol vector and word vector that have been read, to obtain a first operation result, and determining the type of the short message according to the first operation result. The solution of the present invention, by means of a preset classification model, can accurately determine the type of short message to which a short message belongs, achieve a smart management of short messages, and facilitate a user to query and organize short messages.

Description

Short message classification method, device and computer storage medium

Technical field

The invention relates to the technical field of text classification statistics, in particular to a short message classification method, device and computer storage medium.

Background technique

At present, the short messages in the terminal (including the text message of the notification center) are basically not classified, or are only classified and stored by the sender number, and are arranged according to the time of reception.

Thus, when a large amount of short information is stored in the terminal, the above classification method makes it extremely inconvenient for the user to query and organize the short message. For example, the user wants to find a credit card repayment message sent by China Merchants Bank a few days ago. At this time, the user needs to manually find the SMS sent by a large number of China Merchants Bank, which is time-consuming and laborious. Even if the user often manually organizes the short message, it is prone to accidental deletion and deletion.

Summary of the invention

The purpose of the embodiments of the present invention is to provide a method and a device for classifying short messages, so as to solve the problem that the existing method for classifying short messages makes the user query for short information inconvenient.

In order to achieve the above object, an embodiment of the present invention provides a short message classification method, including:

Identifying a preset feature word in the received short message;

Substituting the preset feature words in the short message with the feature symbols corresponding to the preset feature words;

Determining a first classification model, wherein the short information type corresponding to the first classification model includes at least one first short information type and a non-first short information type;

Reading a symbol vector of the feature symbol and a word vector of the remaining words of the short message except the preset feature word from a high frequency word vector library of the first classification model;

Performing a weighting operation on the read symbol vector and the word vector according to the first classification model, Obtaining the first operation result;

Determining, according to the first operation result, that the type of the short message is the first short information type or the non-first short information type.

Preferably, the method further includes:

If the type of the short message is the non-first short message type, determining a second classification model, where the short information type corresponding to the second classification model includes at least one second short information type and a non-second short Type of information;

Reading a symbol vector of the feature symbol and a word vector of the remaining words of the short message except the preset feature word from a high frequency word vector library of the second classification model;

Performing a weighting operation on the read symbol vector and the word vector according to the second classification model to obtain a second operation result;

And determining, according to the second operation result, that the type of the short information is the second short information type or the non-second short information type.

Preferably, the step of performing a weighting operation on the read symbol vector and the word vector according to the first classification model to obtain a first operation result includes:

Processing the read symbol vector and the word vector according to the first classification model to obtain an information vector corresponding to the short information;

Determining, for each of the first short message type and the non-first short message type, a weight coefficient vector corresponding to the information vector, wherein the information value in the information vector and the weight coefficient in the weight coefficient vector One-to-one correspondence;

The weighting operation is performed by using the information vector and the determined weight coefficient vector of each short information type to obtain at least two predicted quantized values.

Preferably, the step of determining, according to the first operation result, that the type of the short message is the first short information type or the non-first short information type comprises:

Comparing the at least two predicted quantized values to obtain the most of the at least two predicted quantized values Large predicted quantized value;

Determining that the type of the short message is a short message type corresponding to the largest predicted quantized value.

Preferably, before the step of identifying the preset feature words in the received short message, the method further includes:

Standardizing the received short message;

The step of identifying a preset feature word in the received short message includes:

Identifying a preset feature word in the short message processed by the specification.

Preferably, the step of reading the word vector of the remaining words except the preset feature word in the short message includes:

Acquiring words in the remaining words of the short message except the preset feature words according to a text segmentation technique;

Reading a word vector of the acquired word and a word vector of the remaining words other than the preset feature word and the acquired word in the short message.

Preferably, after the step of determining that the type of the short information is the first short information type or the non-first short information type according to the first operation result, the method further includes:

The short message classification is saved to the short message type to which it belongs.

Outputting at least one of the preset feature words.

The invention also provides a short message classification device, comprising:

An identification module, configured to identify a preset feature word in the received short message;

a replacement module, configured to replace a preset feature word in the short message with a feature symbol corresponding to the preset feature word;

a first determining module, configured to determine a first classification model, where the short information type corresponding to the first classification model includes at least one first short information type and a non-first short information type;

a first reading module, configured to read, from a high frequency word vector library of the first classification model, a symbol vector of the feature symbol and a rest of the short information except the preset feature word Word vector of words;

a first operation module, configured to perform a weighting operation on the read symbol vector and the word vector according to the first classification model, to obtain a first operation result;

The first determining module is configured to determine, according to the first operation result, that the type of the short message is the first short information type or the non-first short information type.

Preferably, the device further comprises:

a second determining module, configured to determine a second classification model when the type of the short information is the non-first short information type, where the short information type corresponding to the second classification model includes at least one second Short message type and non-second short message type;

a second reading module, configured to read, from a high frequency word vector library of the second classification model, a symbol vector of the feature symbol and a rest of the short information except the preset feature word Word vector of words;

a second operation module, configured to perform a weighting operation on the read symbol vector and the word vector according to the second classification model, to obtain a second operation result;

And a second determining module, configured to determine, according to the second operation result, that the type of the short message is the second short information type or the non-second short information type.

Preferably, the first operation module includes:

a processing unit, configured to process the read symbol vector and the word vector according to the first classification model, to obtain an information vector corresponding to the short information;

a determining unit, configured to determine a weight coefficient vector corresponding to the information vector of each of the first short information type and the non-first short information type, wherein the information value in the information vector and the weight coefficient vector The weighting factors in the one-to-one correspondence;

An arithmetic unit for utilizing the information vector and the determined weighting type of each short message type The number vector is weighted to obtain at least two predicted quantized values.

Preferably, the first determining module comprises:

a comparing unit, configured to compare the at least two predicted quantized values to obtain a largest predicted quantized value of the at least two predicted quantized values;

The determining unit is configured to determine that the type of the short message is a short message type corresponding to the largest predicted quantized value.

Preferably, the device further comprises:

a specification processing module, configured to perform normal processing on the received short message;

The identification module is specifically configured to:

Preferably, the reading module comprises:

An obtaining unit, configured to acquire, in the short message, words in the remaining words except the preset feature words according to a text segmentation technique;

a reading unit, configured to read a word vector of the acquired word and a word vector of the remaining words of the short message except the preset feature word and the acquired word.

Preferably, the device further comprises:

The category saving module is configured to save the short message category into the short message type to which it belongs.

Preferably, the device further comprises:

And an output module, configured to output at least one of the preset feature words.

Embodiments of the present invention also provide a computer storage medium, the storage medium comprising a set of instructions that, when executed, cause at least one processor to perform operations including:

Identifying a preset feature word in the received short message; replacing the preset feature word in the short message with the feature symbol corresponding to the preset feature word;

Performing a weighting operation on the read symbol vector and the word vector according to the first classification model to obtain a first operation result; and determining, according to the first operation result, the type of the short message as the first short message type Or the non-first short message type.

Through the above technical solutions of the embodiments of the present invention, the beneficial effects of the present invention are:

According to the short message classification method in the embodiment of the present invention, the short message type to which the short message belongs can be accurately determined through the pre-set classification model, thereby realizing intelligent management of the short message, and facilitating the user to query and organize the short message.

DRAWINGS

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without paying for creative labor.

FIG. 1 is a flow chart showing a short message classification method according to an embodiment of the present invention.

FIG. 2 is a schematic structural diagram of a short message classification apparatus according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

Referring to FIG. 1, an embodiment of the present invention provides a short message classification method, where the method includes the following steps:

Step 101: Identify a preset feature word in the received short message.

Step 102: Replace a preset feature word in the short message with a feature symbol corresponding to the preset feature word;

Step 103: Determine a first classification model, where the short information type corresponding to the first classification model includes at least one first short information type and a non-first short information type;

Step 104: Read, from the high-frequency word vector library of the first classification model, a symbol vector of the feature symbol and a word vector of the remaining words of the short message except the preset feature word;

Step 105: Perform a weighting operation on the read symbol vector and the word vector according to the first classification model to obtain a first operation result.

Step 106: Determine, according to the first operation result, that the type of the short message is the first short information type or the non-first short information type.

The preset feature words may be an email address, a web address, a date, a time, a percentage, a quantifier, a currency, a phone number, a number, a foreign language, etc., or may be a customized vocabulary, including a vocabulary of a professional application field. Idioms, food, places, works, equipment, names of people, place names and institution names, etc., are not limited by the present invention.

And the feature symbol corresponding to the preset feature word is preset. For example, the feature symbol corresponding to the time may be DATE, the feature symbol corresponding to the currency may be CURRENCY, the feature symbol corresponding to the bank may be BANK, and the like.

It should be noted that the feature symbols are preset and the feature words are replaced, mainly because in the short message classification process, the terminal only needs to know which feature words exist in the short message, and does not care about the feature words. What is it?

For example, the terminal received the short message "Your personal credit card November bill RMB 4818.93, The repayment date is November 23rd. [China Merchants Bank]", after the identification, the default feature words "11th month", "RMB4818.93", "November 23rd" and "China Merchants Bank" are obtained, then the short message is replaced by the corresponding feature symbol. Just become "your personal credit card DATE bill CURRENCY, due date DATE. [BANK], which reflects the characteristics of the short message. That is to say, when analyzing the short message, the terminal does not care about the specific amount, date, specific bank, etc., only need to know the existence of money, date, Banks can wait.

In the embodiment of the present invention, the first classification model is pre-trained, and the short information type corresponding to the first classification model includes at least one first short information type and a non-first short information type. That is, according to the first classification model, the type of the short message received by the terminal may be determined as the first short information type (ie, one of the at least one first short information type), or not the first Short message type.

For example, the first classification model may be a single class classifier, and the corresponding short message type includes a repayment reminding short message type and a non-repayment reminding short message type; or the first classification model may also be more than one Class classifier, corresponding short message types include repayment reminder short message type, consumption bill short message type and account billing short message type, and other types of short message types (ie non-repayment reminders, non-consumer bills and short bills) Type of information).

In daily life, the number of commonly used Chinese characters and symbols is about 3,500, but the Chinese character number (ie, high frequency word) appearing in a certain type of short message is far less than so, so for resource-constrained terminals, it is not necessary. The short message type can be determined by all Chinese characters and symbols, and only the high frequency words under a specific classification model can be considered. That is, when training the sample classification model, it is only necessary to retain the word vector of the high frequency word, and the low frequency words are replaced by a uniform specific symbol, that is, the low frequency word shares a word vector, thereby forming a high frequency word corresponding to the classification model. Word vector library.

Wherein, the word vector refers to a finite dimensional floating point number, which represents a quantized value of the semantics of the word. The finite dimension here can be 4D, 8D or 12D, etc., depending on the sample size and training model during training, usually taking a multiple of 4.

In the process of analyzing the short message, the symbol vector of the feature symbol and the remaining words of the short message except the preset feature word are read from the high frequency word vector library of the first classification model. The word vector, and the short information is analyzed based on the read symbol vector and word vector.

Specifically, the first classification model is, for example, a trained model using a dynamic k-max pooled convolutional neural network. And the step of performing a weighting operation on the read word vector according to the first classification model to obtain a first operation result is specifically:

And processing the read symbol vector and the word vector according to the first classification model to obtain an information vector corresponding to the short information; this step is performing convolution operation on the symbol vector and the word vector of the short information. Then extract a vector that can represent the semantics of the sentence.

It should be noted that the predicted quantized value may be a predicted probability value or a score for determining the type of the short message. In practical applications, in order to accurately determine the type of the short message, when the predicted quantized value is obtained, an offset coefficient may be added to the summed result value obtained by the weighting operation.

Further, the step of determining that the type of the short information is the first short information type or the non-first short information type according to the first operation result is specifically:

Comparing the at least two predicted quantized values to obtain a largest predicted quantized value of the at least two predicted quantized values;

That is to say, when the weighting operation is performed by using the information vector and the determined weight coefficient vector of each short information type, the predicted quantized value corresponding to each short information type is calculated. The type of short message corresponding to the largest predicted quantized value is determined as the type of the short message.

In the embodiment of the present invention, after the step 106, the method further includes:

In another embodiment, if the first short message type is further type-divided, the short information determined as the first short message type may be input into the third classification model for further classification. For example, the first classification model only identifies whether the short message is a bank bill type or a non-bank bill type. For the short message identifying the bank bill type, a third classification model (which can identify the type of consumption, the type of repayment, the type of repayment, and other bank bill types) can be further subdivided.

That is to say, for a resource-constrained terminal, the short information may be gradually determined by using a cascading manner, that is, the first classification model, the second classification model, the third classification model, and the fourth classification model are sequentially used to determine Achieve a finer classification.

In the process of cascading determination, the classification model involved may adopt a single classification model such as a bank bill classification model, a departure schedule reminder classification model such as a flight train, an advertisement message classification model, and a fraud message classification model to meet different user requirements. .

In the embodiment of the present invention, before the step 101, the method further includes:

Standardizing the received short message;

The step 101 is specifically: identifying a preset feature in the short message after the specification processing word.

In this way, the standardized short message can facilitate subsequent semantic analysis.

The specific specification processing may include unified character encoding, traditional to simplified, full-width half-angle conversion, non-standard term substitution, culling redundant white space in the text, eliminating modal particles, special punctuation marks, etc., which are not helpful for semantic analysis, and the like. Do not limit it.

In the embodiment of the present invention, before the word vector is read, the short message text can also be segmented by using the text segmentation technology in the prior art, that is, the common words are separated, which can have more semantic features. Because a single word in a Chinese kanji often cannot accurately express meaning, a word composed of several Chinese characters can more accurately express a specific meaning. For example, the meanings of "public" and "division" are completely different from "company"; thus, when the word segmentation is performed, the word vector of "company" can be read without having to read "public" and " Division" two word vector. Among them, the processing and operation process after reading the word vector are the same as the word vector.

Specifically, in the embodiment of the present invention, the step of reading the word vector of the remaining words except the preset feature word in the short message is specifically:

In this way, the accuracy of the subsequent information vector corresponding to the short message can be improved.

In this way, the received short information is classified and saved, which is convenient for the user to query and organize.

Outputting at least one of the preset feature words.

It should be noted that the output here can be output to the terminal screen display to prompt the user. Check to prevent some misjudgments or missed judgments, or output them to other APP applications for use.

For example, the above-mentioned short message after the feature symbol replacement "your personal credit card DATE bill CURRENCY, due date DATE. [BANK]", when identifying the credit card repayment reminder short message type, DATE and CURRENCY can be corresponding The original texts, namely "November", "RMB 48.8.93" and "November 23", are output to the terminal screen display to prompt the user to check. Moreover, the outputted information can be further stored in the terminal schedule to form a reminder time.

For another example, the terminal receives the short message “Your CCB has reached 10,000 points and can be exchanged for 5% cash. Please go to www.xxxx.com for redemption, overdue points will be cleared [xx branches]”, after the feature symbol is replaced. The short message becomes "Your CCB credit has reached CURRENCY, can be exchanged for PERCENT cash, please log in to the URL for redemption, overdue points are cleared [BANK]"; when it is identified as a spam type, the URL can be Corresponding original text "www.xxxx.com" is output to prompt the user to confirm the verification to prevent misjudgment or missed judgment.

Referring to FIG. 2, an embodiment of the present invention further provides a short message classification device, which corresponds to the short message classification method shown in FIG. 1, and the device includes:

The identification module 21 is configured to identify a preset feature word in the received short message;

The replacement module 22 is configured to replace the preset feature words in the short message with the feature symbols corresponding to the preset feature words;

a first determining module 23, configured to determine a first classification model, where the short information type corresponding to the first classification model includes at least one first short information type and a non-first short information type;

a first reading module 24, configured to read, from a high-frequency word vector library of the first classification model, a symbol vector of the feature symbol and a short message other than the preset feature word in the short message The word vector of the remaining words;

The first operation module 25 is configured to perform a weighting operation on the read symbol vector and the word vector according to the first classification model to obtain a first operation result;

The first determining module 26 is configured to determine, according to the first operation result, that the type of the short message is the first short information type or the non-first short information type.

The short message classification device of the embodiment of the present invention can accurately determine the short message type to which the short message belongs by using the classification model set in advance, thereby realizing intelligent management of the short message, and facilitating the user to query and organize the short message.

Specifically, the device further includes:

In the embodiment of the present invention, the first computing module includes:

And an operation unit, configured to perform a weighting operation by using the information vector and the determined weight coefficient vector of each short information type to obtain at least two predicted quantized values.

Further, the first determining module includes:

In the embodiment of the present invention, the device further includes:

The identification module is specifically configured to: identify a preset feature word in the short message after the specification processing.

In the embodiment of the present invention, the reading module includes:

In the embodiment of the present invention, the device further includes:

The above description is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

In an embodiment of the present invention, a computer storage medium is further provided, the storage medium comprising a set of instructions, when executed, causing at least one processor to perform operations including:

Determining a first classification model, wherein the short information type corresponding to the first classification model includes At least one first short message type and a non-first short message type;

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention.

Industrial applicability

The embodiment of the invention discloses a short message classification method and device, and a computer storage medium, which identifies a preset feature word in the received short message, and replaces the preset feature word in the short message with the preset feature word. The feature symbol determines a first classification model, and according to the first classification model, obtains a first operation result, and determines a type of the short message according to the first operation result. According to the solution of the present invention, the short message type to which the short message belongs can be accurately determined by the classification model set in advance, thereby realizing intelligent management of the short message, and facilitating the user to query and organize the short message.

Claims

A short message classification method, including:

Identifying a preset feature word in the received short message;

Substituting the preset feature words in the short message with the feature symbols corresponding to the preset feature words;

Determining a first classification model, wherein the short information type corresponding to the first classification model includes at least one first short information type and a non-first short information type;

Reading a symbol vector of the feature symbol and a word vector of the remaining words of the short message except the preset feature word from a high frequency word vector library of the first classification model;

Performing a weighting operation on the read symbol vector and the word vector according to the first classification model to obtain a first operation result;

Determining, according to the first operation result, that the type of the short message is the first short information type or the non-first short information type.
The method of claim 1 wherein the method further comprises:

If the type of the short message is the non-first short message type, determining a second classification model, where the short information type corresponding to the second classification model includes at least one second short information type and a non-second short Type of information;

Reading a symbol vector of the feature symbol and a word vector of the remaining words of the short message except the preset feature word from a high frequency word vector library of the second classification model;

Performing a weighting operation on the read symbol vector and the word vector according to the second classification model to obtain a second operation result;

And determining, according to the second operation result, that the type of the short information is the second short information type or the non-second short information type.
The method according to claim 1, wherein said step of performing a weighting operation on the read symbol vector and the word vector according to the first classification model to obtain a first operation result, the package include:

Processing the read symbol vector and the word vector according to the first classification model to obtain an information vector corresponding to the short information;

Determining, for each of the first short message type and the non-first short message type, a weight coefficient vector corresponding to the information vector, wherein the information value in the information vector and the weight coefficient in the weight coefficient vector One-to-one correspondence;

The weighting operation is performed by using the information vector and the determined weight coefficient vector of each short information type to obtain at least two predicted quantized values.
The method according to claim 3, wherein the step of determining, according to the first operation result, that the type of the short message is the first short information type or the non-first short information type comprises:

Comparing the at least two predicted quantized values to obtain a largest predicted quantized value of the at least two predicted quantized values;

Determining that the type of the short message is a short message type corresponding to the largest predicted quantized value.
The method of claim 1, wherein before the step of identifying a preset feature word in the received short message, the method further comprises:

Standardizing the received short message;

The step of identifying a preset feature word in the received short message includes:

Identifying a preset feature word in the short message processed by the specification.
The method of claim 1, wherein the step of reading a word vector of the remaining words other than the preset feature word in the short message comprises:

Acquiring words in the remaining words of the short message except the preset feature words according to a text segmentation technique;

Reading a word vector of the acquired word and a word vector of the remaining words other than the preset feature word and the acquired word in the short message.
The method according to claim 1, wherein said determining, according to said first operation result, said step of said short message type being said first short message type or said non-first short message type The method also includes:

The short message classification is saved to the short message type to which it belongs.
The method according to claim 1, wherein said determining, according to said first operation result, said step of said short message type being said first short message type or said non-first short message type The method also includes:

Outputting at least one of the preset feature words.
A short message classification device comprising:

An identification module configured to identify a preset feature word in the received short message;

a replacement module, configured to replace a preset feature word in the short message with a feature symbol corresponding to the preset feature word;

a first determining module, configured to determine a first classification model, where the short information type corresponding to the first classification model includes at least one first short information type and a non-first short information type;

a first reading module configured to read a symbol vector of the feature symbol and a remaining of the short information except the preset feature word from a high frequency word vector library of the first classification model Word vector of words;

The first operation module is configured to perform a weighting operation on the read symbol vector and the word vector according to the first classification model to obtain a first operation result;

The first determining module is configured to determine, according to the first operation result, that the type of the short message is the first short information type or the non-first short information type.
The apparatus of claim 9 wherein said apparatus further comprises:

a second determining module, configured to determine a second classification model when the type of the short information is the non-first short information type, where the short information type corresponding to the second classification model includes at least one second Short message type and non-second short message type;

a second reading module, configured to read, from a high-frequency word vector library of the second classification model, a symbol vector of the feature symbol and a remaining of the short information except the preset feature word Word vector of words;

a second operation module, configured to perform a weighting operation on the read symbol vector and the word vector according to the second classification model to obtain a second operation result;

The second determining module is configured to determine, according to the second operation result, that the type of the short message is the second short information type or the non-second short information type.
The apparatus of claim 9, wherein the first computing module comprises:

The processing unit is configured to process the read symbol vector and the word vector according to the first classification model to obtain an information vector corresponding to the short information;

a determining unit configured to determine a weight coefficient vector corresponding to the information vector of each of the first short information type and the non-first short information type, wherein the information value in the information vector and the weight coefficient vector The weighting factors in the one-to-one correspondence;

And an operation unit configured to perform a weighting operation by using the information vector and the determined weight coefficient vector of each short information type to obtain at least two predicted quantized values.
The apparatus of claim 11 wherein said first decision module comprises:

a comparing unit configured to compare the at least two predicted quantized values to obtain a largest predicted quantized value of the at least two predicted quantized values;

The determining unit is configured to determine that the type of the short message is a short message type corresponding to the largest predicted quantized value.
The apparatus of claim 9 wherein said apparatus further comprises:

a specification processing module configured to perform specification processing on the received short message;

The identification module is configured to identify a preset feature word in the short message after the specification processing.
The apparatus of claim 9 wherein said reading module comprises:

Obtaining a unit, configured to acquire the short message in addition to the preset according to a text segmentation technique Words in the remaining words other than the feature word;

a reading unit configured to read a word vector of the acquired word and a word vector of the remaining words of the short message except the preset feature word and the acquired word.
The apparatus of claim 9 wherein said apparatus further comprises:

The category saving module is configured to save the short message category into the short message type to which it belongs.
The apparatus of claim 9 wherein said apparatus further comprises:

And an output module configured to output at least one of the preset feature words.
A computer storage medium comprising a set of instructions that, when executed, cause at least one processor to perform operations comprising:

Identifying a preset feature word in the received short message; replacing the preset feature word in the short message with the feature symbol corresponding to the preset feature word;

Determining a first classification model, wherein the short information type corresponding to the first classification model includes at least one first short information type and a non-first short information type;

Reading a symbol vector of the feature symbol and a word vector of the remaining words of the short message except the preset feature word from a high frequency word vector library of the first classification model;

Performing a weighting operation on the read symbol vector and the word vector according to the first classification model to obtain a first operation result; and determining, according to the first operation result, the type of the short message as the first short message type Or the non-first short message type.