CN102073704B - Text classification processing method, system and equipment - Google Patents

Text classification processing method, system and equipment Download PDF

Info

Publication number
CN102073704B
CN102073704B CN 201010614959 CN201010614959A CN102073704B CN 102073704 B CN102073704 B CN 102073704B CN 201010614959 CN201010614959 CN 201010614959 CN 201010614959 A CN201010614959 A CN 201010614959A CN 102073704 B CN102073704 B CN 102073704B
Authority
CN
China
Prior art keywords
text
mathematical model
sorting parameter
text message
subscriber equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201010614959
Other languages
Chinese (zh)
Other versions
CN102073704A (en
Inventor
李建刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Device Co Ltd
Original Assignee
Huawei Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Device Co Ltd filed Critical Huawei Device Co Ltd
Priority to CN 201010614959 priority Critical patent/CN102073704B/en
Publication of CN102073704A publication Critical patent/CN102073704A/en
Application granted granted Critical
Publication of CN102073704B publication Critical patent/CN102073704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a text classification processing method, a text classification processing system and text classification processing equipment. The text classification processing method comprises the following steps of: receiving classification parameters transmitted by network equipment; performing text preprocessing on the collected text information to acquire a plurality of text characteristics; according to the classification parameters, performing classification processing on the plurality of text characteristics by adopting a preset mathematical model to acquire a degree of correlation between the plurality of the text characteristics and the classification parameters; and storing the text information in a text information subject corresponding to a group of classification parameters with the highest degree of correlation by using user equipment. The invention realizes the classification processing of the text information on the user equipment so as to effectively improve the convenience for a user.

Description

Text classification disposal route and system and equipment
Technical field
The embodiment of the invention relates to the communication technology, relates in particular to a kind of text classification disposal route and system and equipment.
Background technology
At present, the epoch that the cellphone subscriber faces is an information overexpansion, rise suddenly and sharply, especially the arrival in 3G epoch along with the development of note, multimedia message, FaceBook, Twitter, microblogging and blog (Blog), makes the cellphone subscriber can get access to various abundant information.
In realizing process of the present invention, the inventor finds that there are the following problems at least in the prior art: the cellphone subscriber is when getting access to abundant information, also can bring the information puzzlement to the cellphone subscriber, be that the uninterested information of a lot of cellphone subscribers is full of in mobile phone, make the user be merely able to by browsing all information, just can select useful or information of interest, the great inconvenience of bringing for user's use.
Summary of the invention
The embodiment of the invention provides a kind of text classification disposal route and system and equipment, the classification of text message is handled in order to having realized, and has been improved the convenience that the user uses effectively.
The embodiment of the invention provides a kind of text classification disposal route, comprising:
Subscriber equipment receives the sorting parameter that network equipment sends;
Described subscriber equipment carries out the text pre-service to the text message of collecting, and obtains a plurality of text features;
Described subscriber equipment is according to described sorting parameter, adopts the mathematical model that sets in advance to the processing of classifying of described a plurality of text features, obtains the degree of correlation of described a plurality of text feature and described sorting parameter;
Described subscriber equipment is stored in described text message in the text message theme of the highest group categories parameter correspondence of degree of correlation.
The embodiment of the invention also provides a kind of text classification disposal route, comprising:
The sorting parameter that carries text message theme and mathematical model sign that network equipment receives the subscriber equipment transmission obtains request;
Described network equipment obtains the sorting parameter corresponding with described text message theme and mathematical model sign according to the corresponding relation of the text message theme that sets in advance, mathematical model sign and sorting parameter;
Described network equipment sends described sorting parameter to described subscriber equipment, for described subscriber equipment according to described sorting parameter, to a plurality of text feature words that obtain after the pre-service processing of classifying.
The embodiment of the invention provides a kind of text classification treatment facility, comprising:
The sorting parameter receiver module is used for receiving the sorting parameter that network equipment sends;
Pretreatment module is used for the text message of collecting is carried out the text pre-service, obtains a plurality of text features;
The degree of correlation acquisition module is used for according to described sorting parameter, adopts the mathematical model that sets in advance to the processing of classifying of described a plurality of text features, obtains the degree of correlation of described a plurality of text feature and described sorting parameter;
The classification processing module is for the text message theme that described text message is stored in the highest group categories parameter correspondence of degree of correlation.
The embodiment of the invention provides a kind of text classification treatment facility, comprising:
Sorting parameter obtains the request receiver module, and the sorting parameter that carries text message theme and mathematical model sign that is used for the transmission of reception subscriber equipment obtains request;
The sorting parameter acquisition module is used for the corresponding relation according to the text message theme that sets in advance, mathematical model sign and sorting parameter, obtains the sorting parameter corresponding with described text message theme and mathematical model sign;
The sorting parameter sending module be used for to send described sorting parameter to described subscriber equipment, for described subscriber equipment according to described sorting parameter, to a plurality of text feature words that obtain after the pre-service processing of classifying.
The embodiment of the invention provides a kind of text classification disposal system, comprises network equipment and subscriber equipment, and described network equipment is text classification treatment facility described above, and described subscriber equipment is text classification treatment facility described above.
The text classification disposal route of the embodiment of the invention and system and equipment, by collecting text message, and text information carried out the text pre-service, obtain a plurality of text features, the sorting parameter that sends according to the network equipment that receives again, the mathematical model that employing sets in advance is to the processing of classifying of a plurality of text features, obtain the degree of correlation of a plurality of text features and sorting parameter, at last text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do one to the accompanying drawing of required use in embodiment or the description of the Prior Art below introduces simply, apparently, accompanying drawing in describing below is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the process flow diagram of an embodiment of text classification disposal route of the present invention;
Fig. 2 is the process flow diagram of another embodiment of text classification disposal route of the present invention;
Fig. 3 is the process flow diagram of another embodiment of text classification disposal route of the present invention;
Fig. 4 is the structural representation of an embodiment of subscriber equipment of the present invention;
Fig. 5 is the structural representation of another embodiment of subscriber equipment of the present invention;
Fig. 6 is the structural representation of an embodiment of network equipment of the present invention;
Fig. 7 is the structural representation of an embodiment of text classification disposal system of the present invention.
Embodiment
For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
Fig. 1 is the process flow diagram of an embodiment of text classification disposal route of the present invention, and as shown in Figure 1, the method for present embodiment comprises:
Step 101, subscriber equipment receive the sorting parameter that network equipment sends.
In the present embodiment, network equipment receives the feature word that is used for can representing the text message theme of network equipment keeper input, the mathematical model that arranges is trained, when training result reaches the desired value that the keeper sets in advance according to the mathematical model type, think that this mathematical model trains successfully, and obtain the sorting parameter in the successful mathematical model of this training; For instance, this mathematical model is the mathematical model of neural network, and when according to the feature word, when training successfully to the mathematical model of this neural network, the sorting parameter that obtains is each internodal connection weight value on the mathematical model of neural network.
In addition, the sorting parameter that network equipment obtains in can the mathematical model with above-mentioned training success sends to subscriber equipment, and wherein, this subscriber equipment can be specially subscriber equipmenies such as mobile phone.
Step 102, subscriber equipment carry out the text pre-service to the text message of collecting, and obtain a plurality of text features.
In the present embodiment, subscriber equipment can be note, multimedia message, or stored text information is collected among FaceBook, Twitter, microblogging, the Blog, and the text message of collecting is carried out the text pre-service, obtains a plurality of text features; Wherein, text feature can comprise noun word and verb word etc., and is used for the content of reasonable expression text information.
Step 103, subscriber equipment be according to sorting parameter, adopts the mathematical model that sets in advance to the processing of classifying of a plurality of text features, obtains the degree of correlation of a plurality of text features and sorting parameter.
Step 104, subscriber equipment are stored in text message in the text message theme of the highest group categories parameter correspondence of degree of correlation.
In the present embodiment, subscriber equipment can set in advance mathematical model, according to the sorting parameter that obtains, to a plurality of text features that obtain after the pre-service processing of classifying, and obtains classification results.For instance, subscriber equipment obtains the two fold classification parameter according to user's request, and wherein, a group categories parameter is used for the expression theme of soccer; Another group categories parameter is used for expression automobile theme.Subscriber equipment can be respectively according to above-mentioned two fold classification parameter, the mathematical model that employing sets in advance, the a plurality of text features that obtain after the pre-service are calculated, result calculated can characterize the degree of correlation of these a plurality of text features and every group categories parameter, text information is stored in the text message theme of the high group categories parameter correspondence of degree of correlation, as classification results.For example: the text message relevant with theme of soccer classified as under the theme of soccer catalogue, and the text message that the automobile theme is relevant classifies as under the automobile subject catalogue, thereby makes that the user can the interested text message of fast browsing oneself.
In addition, in the present embodiment, the mathematical model that subscriber equipment adopts can be the mathematical model of decision tree, KNN algorithm, Bayesian network, neural network, Boosting or support vector machine.
In the present embodiment, by collecting text message, and text information carried out the text pre-service, obtain a plurality of text features, the sorting parameter that sends according to the network equipment that receives again, the mathematical model that employing sets in advance is to the processing of classifying of a plurality of text features, obtain the degree of correlation of a plurality of text features and sorting parameter, at last text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.
Fig. 2 is the process flow diagram of another embodiment of text classification disposal route of the present invention, and as shown in Figure 2, the method for present embodiment comprises:
Step 201, subscriber equipment receive user's request, and this user's request comprises the text message theme.
Step 202, subscriber equipment send sorting parameter to network equipment and obtain request; The mathematical model sign that the request of obtaining of this sorting parameter carries the text message theme and sets in advance.
The sorting parameter corresponding with text message theme and the mathematical model sign that sets in advance that step 203, subscriber equipment receive that network equipment sends.
In the present embodiment, the user can according to oneself hobby and subscriber equipment in the mathematical model that arranges, from network equipment, obtain corresponding with it sorting parameter.For instance, when the user needs fast browsing to arrive the text message relevant with football, automobile and theme of news, import the text message theme of this football, automobile and news to subscriber equipment, subscriber equipment is carried at sorting parameter with the mathematical model sign that sets in advance in text message subject and the subscriber equipment and obtains in the request, send to network equipment, afterwards, network equipment will send to subscriber equipment to the sorting parameter corresponding with text message theme and mathematical model sign again.
Step 204, subscriber equipment carry out word segmentation processing to the text message of collecting, and obtain multiple word.
For instance, can adopt recursive function that each sentence in the text message is carried out word segmentation processing, concrete, all possible participle in the sentence all to be listed, the method for utilizing word degree of polymerization maximum then is that the method for shortest path is obtained multiple word.
Step 205, subscriber equipment carry out part-of-speech tagging to multiple word to be handled, and obtains the word of multiple part of speech.
Step 206, subscriber equipment extract the frequency of occurrences the highest and corresponding with the quantity of the input point of described mathematical model a plurality of noun words and verb word, as text feature from the word of multiple part of speech.
In the present embodiment, the text message of this collection is that subscriber equipment is from this locality or the text message that obtains of network.In addition, a text message can be made of a plurality of sentences, and a sentence can be made of a plurality of words, generalized case, and the sentence meaning that comprises in noun word and the verb word is much larger than adjective word and preposition word etc.Therefore, subscriber equipment can be handled by multiple word being carried out part-of-speech tagging, obtain the word of multiple part of speech, and from the word of multiple part of speech, extract existing frequency the highest and corresponding with the quantity of the input point of described mathematical model a plurality of noun words and verb word out and be used as text feature.
Present embodiment can adopt HMM and Viterbi algorithm that multiple word is carried out part-of-speech tagging and handle, then the specific implementation of step 205 can for: Pie used respectively in the multiple word that will obtain, a, represent with three parameters of b, and adopt HMM and Viterbi algorithm to carry out computing, thereby obtain the part of speech of multiple word.Wherein, the Pie parameter is used for the word of the some parts of speech of expression as the probability of the beginning of sentence; The a parameter is used for the redirect probability between the different parts of speech of expression, and for example: noun jumps to the probability of verb, and verb jumps to adjectival probability; The b parameter is used for the expression emission probability, namely refers to the probability that a word occurs in the word of all these parts of speech as certain part of speech.For example: " strike " word can also can be the noun word for the verb word, the probability of then adding up probability that it occurs as the verb word and occur in all noun words during as the noun word in all verb words.
For instance, for " I like China " this sentence, it is carried out word segmentation processing, can obtain " I ", " love " and " China " these three words.Wherein, each word part of speech that may possess respectively is 39 kinds.Use the Viterbi algorithm, in the parameter of sentence beginning, namely the Pie parameter is carried out multiplying respectively with Pie parameter and the b parameter of " I ", obtains the part of speech of one of them maximal value correspondence, as the part of speech of " I " when taking out " I " as various part of speech; Part of speech with this maximal value correspondence multiply by a parameter of " love " again, multiply by the b parameter of " love " again, obtains the part of speech of one of them maximal value correspondence, as the part of speech of " love "; Part of speech with this maximal value correspondence multiply by a parameter of " China " again, multiply by the b parameter of " China " again, obtains the part of speech of one of them maximal value correspondence, as the part of speech of " China "; Both can draw the Computation of Maximal Waiting route at last, i.e. the sequence of part-of-speech tagging, thus obtain the part of speech of " I ", " love " and " China " these three words.
Step 207, subscriber equipment be according to sorting parameter, adopts the mathematical model that sets in advance to the processing of classifying of a plurality of text features, obtains the degree of correlation of a plurality of text features and sorting parameter.
In the present embodiment, the implementation of step 207 and the implementation of step 103 shown in Figure 1 are similar, repeat no more herein.
Step 208, subscriber equipment are stored in text message in the text message theme of the highest group categories parameter correspondence of degree of correlation.
For instance, according to the degree of correlation of these a plurality of text features and every group categories parameter and the corresponding relation of sorting parameter and text message theme, the link of text message is placed under the catalogue of different text message themes, thereby make that the user can be open-and-shut, and browse to own interested text message rapidly.
In the present embodiment, by collecting text message, and text information carried out word segmentation processing, obtain multiple word, and multiple word is carried out part-of-speech tagging handle, obtain the word of multiple part of speech, from the word of multiple part of speech, extract a plurality of noun words and verb word, as text feature, again according to the sorting parameter that receives, the mathematical model that employing sets in advance is obtained the degree of correlation of a plurality of text features and sorting parameter to the processing of classifying of a plurality of text features, at last text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.In addition, owing to adopt network equipment mathematical model is trained to obtain sorting parameter, therefore make subscriber equipment in realizing the process that classification is handled, save a large amount of training times, and because the data volume of this sorting parameter is very little, therefore make subscriber equipment also relatively very little at the load that obtains this sorting parameter, simultaneously, owing to this sorting parameter need do not obtained in real time when classification is handled, therefore improved the convenience that the user uses more effectively.
Fig. 3 is the process flow diagram of another embodiment of text classification disposal route of the present invention, and as shown in Figure 3, the method for present embodiment comprises:
The sorting parameter that carries text message theme and mathematical model sign that step 301, network equipment receive the subscriber equipment transmission obtains request.
The corresponding relation of the text message theme that step 302, network equipment basis set in advance, mathematical model sign and sorting parameter obtains the sorting parameter corresponding with described text message theme and mathematical model sign.
Step 303, network equipment send this corresponding sorting parameter to subscriber equipment, for subscriber equipment according to this sorting parameter, to a plurality of text features that obtain after the pre-service processing of classifying.
In the present embodiment, subscriber equipment can send to network equipment with text message subject and this mathematical model sign according to the mathematical model that arranges in the text message theme of user input and the subscriber equipment; Network equipment is according to the corresponding relation of the text message theme that sets in advance, mathematical model sign and the sorting parameter of storage in advance, obtain the sorting parameter corresponding with text message theme and mathematical model sign, and this sorting parameter sent to subscriber equipment, subscriber equipment can execution graph 1 or the technical scheme of method embodiment shown in Figure 2 afterwards, its principle is similar, repeats no more herein.
Concrete, corresponding with the mathematical model that subscriber equipment adopts, network equipment can make up multiple mathematical model, for example: decision tree, K arest neighbors (K Nearest Neighbors, KNN) mathematical model of algorithm, Bayesian network, neural network, Boosting or support vector machine.
In the present embodiment, obtain request by the sorting parameter that carries text message theme and mathematical model sign that receives the subscriber equipment transmission, corresponding relation according to the text message theme that sets in advance, mathematical model sign and sorting parameter, obtain the sorting parameter corresponding with text message subject and mathematical model sign, and give subscriber equipment with this sorting parameter, for subscriber equipment according to sorting parameter, to a plurality of text features that obtain after the pre-service processing of classifying.Thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.
Further, in another embodiment of the present invention, the process of obtaining of the corresponding relation of step 302 Chinese version message subject, mathematical model sign and sorting parameter can be specially:
Obtain the feature word for expression text message theme;
According to the feature word, the mathematical model that sets in advance is carried out training managing, obtain the sorting parameter corresponding with mathematical model;
Preserve the corresponding relation of described text message theme, mathematical model sign and sorting parameter.
In the present embodiment, the mathematical model of setting up neural network with network equipment is example, introduces the technical scheme that network equipment obtains sorting parameter in detail.
The mathematical model of neural network can comprise input layer, hide layer and output layer.Wherein, input layer is ground floor the most, and 150 input points can be arranged, and is used for 150 feature words of corresponding this paper information; Hiding layer is following one deck of input layer, is the layer with Classification and Identification function, and this hides layer can 8 nodes; Output layer is for hiding following one deck of layer, and for generation of the result that classification is judged, this layer can have 1 node.In addition, layer with layer between the principle that is connected be that each node on upper strata is connected with each node of following one deck, on each connects, comprise a weighted value, be used for corresponding error log, in order to do the usefulness of the reverse propagation of error during to the network training of back.
Concrete, the process of the mathematical model training of this neural network and the implementation of obtaining sorting parameter can be specially:
At first, to the numerical value between one 0 to 1 of the weighted value tax at random of each connection on the mathematical model of neural network; The candidate feature word of receiving management person's input, this candidate feature word is for representing the word of text message theme; With the input value of this candidate feature word correspondence, be input in the input point of mathematical model of this neural network and be used for training.
Then, adopt the reversal error transmission method to train the mathematical model of this neural network, the difference of namely utilizing actual output and desired output to each layer connection weight value of the mathematical model of this neural network by after successively proofread and correct forward; When the cumulative errors rate of the mathematical model of whole neural network during less than a preset threshold value, just stop training network.
Wherein, the number of times that this candidate feature word occurs in article sum * training set in article number/training set that this candidate feature word occurs in the input value=training set of a candidate feature word correspondence.
For instance, when the training set to these four text message themes of news, recreation, health and family carries out the branch time-like, at first set up the mathematical model of four neural networks corresponding with above-mentioned text message theme at network equipment, collect each 100 pieces in the article of the training set of these four text message themes of news, recreation, health and family then, respectively the mathematical model of the neural network of these four text message theme correspondences of news, recreation, health and family is trained.Mathematical model with the neural network of news correspondence is trained for example, need will be near 1 to the operation result of theme of news, operation result for other three themes will be near 0, thereby when 400 pieces of training sets are carried out computing, suppose training 10 times, average computation result for the article of theme of news is 0.95, average computation result for the article of other three text message themes was respectively 0.05 o'clock, namely stop training, think and train successfully, and preservation theme of news, the mathematical model sign of neural network and the corresponding relation of the sorting parameter of this moment, the sorting parameter of this this moment is each internodal connection weight value on the mathematical model of the successful neural network of training.
In the present embodiment, because adopting network equipment trains mathematical model, obtain sorting parameter, therefore make subscriber equipment in realizing the process that classification is handled, save a large amount of training times, and because the data volume of this sorting parameter is very little, therefore make subscriber equipment also relatively very little at the load that obtains this sorting parameter, simultaneously, owing to need when classification is handled, not obtain this sorting parameter in real time, therefore improve the convenience that the user uses more effectively.
Fig. 4 is the structural representation of an embodiment of subscriber equipment of the present invention, and as shown in Figure 4, the subscriber equipment of present embodiment comprises: sorting parameter receiver module 11, pretreatment module 12, degree of correlation acquisition module 13 and classification processing module 14.Wherein, sorting parameter receiver module 11 is used for receiving the sorting parameter that network equipment sends; Pretreatment module 12 is used for the text message of collecting is carried out the text pre-service, obtains a plurality of text features; Degree of correlation acquisition module 13 is used for according to sorting parameter, adopts the mathematical model that sets in advance to the processing of classifying of a plurality of text features, obtains the degree of correlation of a plurality of text features and sorting parameter; Classification processing module 14 is used for text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation.
The subscriber equipment of present embodiment can be carried out the technical scheme of the embodiment of method shown in Figure 1, and its principle is similar, repeats no more herein.
In the present embodiment, by collecting text message, and text information carried out the text pre-service, obtain a plurality of text features, the sorting parameter that sends according to the network equipment that receives again, the mathematical model that employing sets in advance is to the processing of classifying of a plurality of text features, obtain the degree of correlation of a plurality of text features and sorting parameter, at last text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.
Fig. 5 is the structural representation of another embodiment of subscriber equipment of the present invention, as shown in Figure 5, on the basis of above-mentioned embodiment shown in Figure 4, the subscriber equipment of present embodiment also comprises: the user asks receiver module 15, sorting parameter to obtain request sending module 16.Wherein, the user asks receiver module 15 to be used for receiving user's request, and this user's request comprises the text message theme; Sorting parameter obtains request sending module 16 and obtains request to network equipment for the sorting parameter that sends the mathematical model sign that carries the text message theme and set in advance.
Further, this pretreatment module 12 can specifically comprise: word segmentation processing unit 121, part-of-speech tagging processing unit 122 and text feature extracting unit 123.Wherein, word segmentation processing unit 121 is used for the text message of collecting is carried out word segmentation processing, obtains multiple word; Part-of-speech tagging processing unit 122 is used for that multiple part of speech is carried out part-of-speech tagging to be handled, and obtains the word of multiple part of speech; Text feature extracting unit 123 is used for the word from multiple part of speech, extracts the frequency of occurrences the highest and corresponding with the quantity of the input point of mathematical model a plurality of noun words and verb word, the composition text feature.
The subscriber equipment of present embodiment can be carried out the technical scheme of the embodiment of method shown in Figure 2, and its principle is similar, repeats no more herein.
In the present embodiment, by collecting text message, and text information carried out word segmentation processing, obtain multiple word, and multiple word is carried out part-of-speech tagging handle, obtain the word of multiple part of speech, from the word of multiple part of speech, extract a plurality of noun words and verb word, as text feature, again according to the sorting parameter that receives, the mathematical model that employing sets in advance is obtained the degree of correlation of a plurality of text features and sorting parameter to the processing of classifying of a plurality of text features, at last text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.In addition, because adopting network equipment trains mathematical model, obtain sorting parameter, therefore make subscriber equipment in realizing the process that classification is handled, save a large amount of training times, and because the data volume of this sorting parameter is very little, therefore make subscriber equipment also relatively very little at the load that obtains this sorting parameter, simultaneously, owing to need when classification is handled, not obtain this sorting parameter in real time, therefore improve the convenience that the user uses more effectively.
Further, the subscriber equipment of above-described embodiment can have the subscriber equipment of Presentation Function for mobile phone, human-computer interaction terminal, e-book or other.Be under the feelings of mobile phone at subscriber equipment, this mobile phone also comprises: radio circuit, voicefrequency circuit, power supply, in order to finish the basic function of mobile phone, below radio circuit, microphone, loudspeaker, power supply are briefly introduced respectively: radio circuit, be mainly used in setting up communicating by letter of mobile phone and wireless network, realize that mobile phone and the data of wireless network receive and transmission; Microphone is used for gathering sound and the sound of gathering is converted into voice data, so that mobile phone sends voice data by radio circuit to wireless network; Loudspeaker is used for mobile phone by radio circuit being reduced to sound and playing this sound to the user from the voice data that wireless network receives; Power supply is mainly used in each circuit or device power supply into mobile phone, guarantees the operate as normal of mobile phone.
Fig. 6 is the structural representation of an embodiment of network equipment of the present invention, and as shown in Figure 6, the network equipment of present embodiment comprises: the sorting parameter request of obtaining receiver module 21, sorting parameter acquisition module 22 and sorting parameter sending module 23.Wherein, sorting parameter obtains request receiver module 21 and obtains request for the sorting parameter that carries text message theme and mathematical model sign that receives the subscriber equipment transmission; Sorting parameter acquisition module 22 is used for the corresponding relation according to the text message theme that sets in advance, mathematical model sign and sorting parameter, obtains the sorting parameter corresponding with text message theme and mathematical model sign; Sorting parameter sending module 23 is used for sending sorting parameter to subscriber equipment, for subscriber equipment according to sorting parameter, to a plurality of text feature words that obtain after the pre-service processing of classifying.
The network equipment of present embodiment can be carried out the technical scheme of the embodiment of method shown in Figure 3, and its principle is similar, repeats no more herein.
In the present embodiment, obtain request by the sorting parameter that carries text message theme and mathematical model sign that receives the subscriber equipment transmission, according to the text message theme that sets in advance, the corresponding relation of mathematical model sign and sorting parameter, obtain the sorting parameter corresponding with text message subject and mathematical model sign, and give subscriber equipment with this sorting parameter, for subscriber equipment according to sorting parameter, to a plurality of text features that obtain after the pre-service processing of classifying, obtain the degree of correlation of a plurality of text features and sorting parameter, end user's equipment is stored in text message in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.
Further, in another embodiment of the present invention, network equipment can also comprise: feature word acquisition module, training managing module and preservation module, and wherein, feature word acquisition module is used for obtaining the feature word for expression text message theme; The training managing module is used for according to the feature word, and the mathematical model that sets in advance is carried out training managing, obtains the sorting parameter corresponding with mathematical model; Preserve the corresponding relation that module is used for preserving text message theme, mathematical model sign and sorting parameter.
In the present embodiment, because adopting network equipment trains mathematical model, obtain sorting parameter, therefore make subscriber equipment in realizing the process that classification is handled, save a large amount of training times, and because the data volume of this sorting parameter is very little, therefore make subscriber equipment also relatively very little at the load that obtains this sorting parameter, simultaneously, owing to need when classification is handled, not obtain this sorting parameter in real time, therefore improve the convenience that the user uses more effectively.
Fig. 7 is the structural representation of an embodiment of text classification disposal system of the present invention, shown in Fig. 7, the system of present embodiment comprises: network equipment 31 and subscriber equipment 32, and wherein, network equipment 31 can be carried out the technical scheme of the embodiment of method shown in Figure 3; Subscriber equipment 32 can execution graph 1 or the technical scheme of the embodiment of method shown in Figure 2, and its principle is similar, repeats no more herein.
In the present embodiment, the text classification disposal system is owing to adopt network equipment that mathematical model is trained, obtain sorting parameter, therefore make subscriber equipment in realizing the process that classification is handled, save a large amount of training times, and because the data volume of this sorting parameter is very little, therefore make subscriber equipment also relatively very little at the load that obtains this sorting parameter, in addition, subscriber equipment is according to this sorting parameter, the mathematical model that employing sets in advance is obtained the text feature processing of classifying after to pre-service, obtain the degree of correlation of a plurality of text features and sorting parameter, at last text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be finished by the relevant hardware of programmed instruction, aforesaid program can be stored in the computer read/write memory medium, this program is carried out the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment puts down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (17)

1. a text classification disposal route is characterized in that, comprising:
Subscriber equipment receives the sorting parameter that network equipment sends;
Described subscriber equipment carries out the text pre-service to the text message of collecting, and obtains a plurality of text features;
Described subscriber equipment is according to described sorting parameter, adopts the mathematical model that sets in advance to the processing of classifying of described a plurality of text features, obtains the degree of correlation of described a plurality of text feature and described sorting parameter;
Described subscriber equipment is stored in described text message in the text message theme of the highest group categories parameter correspondence of degree of correlation.
2. text classification disposal route according to claim 1 is characterized in that: described sorting parameter is by described network equipment the mathematical model that arranges to be trained, and from the successful mathematical model of this training, obtain.
3. text classification disposal route according to claim 1 is characterized in that, described subscriber equipment receives before the sorting parameter of network equipment transmission, also comprises:
Described subscriber equipment receives user's request, and described user's request comprises the text message theme;
The sorting parameter that described subscriber equipment sends the mathematical model sign that carries described text message theme and set in advance obtains request to described network equipment.
4. text classification disposal route according to claim 3 is characterized in that, described subscriber equipment carries out the text pre-service to the text message of collecting, and obtains a plurality of text features, comprising:
Described subscriber equipment carries out word segmentation processing to the text message of collecting, and obtains multiple word;
Described subscriber equipment carries out part-of-speech tagging to described multiple word to be handled, and obtains the word of multiple part of speech;
Described subscriber equipment extracts the frequency of occurrences the highest and corresponding with the quantity of the input point of described mathematical model a plurality of noun words and verb word, as text feature from the word of described multiple part of speech.
5. text classification disposal route according to claim 1 is characterized in that, described mathematical model is the mathematical model of decision tree, K nearest neighbor algorithm, Bayesian network, neural network, Boosting or support vector machine.
6. according to each described text classification disposal route of claim 1 to 5, it is characterized in that the text message of described collection is that described subscriber equipment is from this locality or the text message that obtains of network.
7. a text classification disposal route is characterized in that, comprising:
The sorting parameter that carries text message theme and mathematical model sign that network equipment receives the subscriber equipment transmission obtains request;
Described network equipment obtains the sorting parameter corresponding with described text message theme and mathematical model sign according to the corresponding relation of the text message theme that sets in advance, mathematical model sign and sorting parameter;
Described network equipment sends described sorting parameter to described subscriber equipment, for described subscriber equipment according to described sorting parameter, to a plurality of text features that obtain after the pre-service processing of classifying.
8. text classification disposal route according to claim 7 is characterized in that, also comprises:
Described network equipment obtains the feature word for expression text message theme;
Described network equipment carries out training managing according to described feature word to the mathematical model that sets in advance, and obtains the sorting parameter corresponding with described mathematical model;
Described network equipment is preserved the corresponding relation of described text message theme, described mathematical model sign and described sorting parameter.
9. according to claim 7 or 8 described text classification disposal routes, it is characterized in that described mathematical model is the mathematical model of decision tree, K nearest neighbor algorithm, Bayesian network, neural network, Boosting or support vector machine.
10. text classification disposal route according to claim 9 is characterized in that, the mathematical model of described network equipment training is corresponding with the mathematical model that described subscriber equipment adopts.
11. a text classification treatment facility is characterized in that, comprising:
The sorting parameter receiver module is used for receiving the sorting parameter that network equipment sends;
Pretreatment module is used for the text message of collecting is carried out the text pre-service, obtains a plurality of text features;
The degree of correlation acquisition module is used for according to described sorting parameter, adopts the mathematical model that sets in advance to the processing of classifying of described a plurality of text features, obtains the degree of correlation of described a plurality of text feature and described sorting parameter;
The classification processing module is for the text message theme that described text message is stored in the highest group categories parameter correspondence of degree of correlation.
12. text classification treatment facility according to claim 11 is characterized in that: described sorting parameter is by described network equipment the mathematical model that arranges to be trained, and from the successful mathematical model of this training, obtain.
13. text classification treatment facility according to claim 11 is characterized in that, also comprises
The user asks receiver module, is used for receiving user's request, and described user's request comprises the text message theme;
Sorting parameter obtains request sending module, and the sorting parameter that is used for sending the mathematical model sign that carries the text message theme and set in advance obtains request to network equipment.
14. text classification treatment facility according to claim 11 is characterized in that, described pretreatment module comprises:
The word segmentation processing unit is used for the text message of collecting is carried out word segmentation processing, obtains multiple word;
The part-of-speech tagging processing unit is used for that described multiple part of speech is carried out part-of-speech tagging and handles, and obtains the word of multiple part of speech;
The text feature extracting unit is used for the word from described multiple part of speech, extracts the frequency of occurrences the highest and corresponding with the quantity of the input point of described mathematical model a plurality of noun words and verb word, the composition text feature.
15. a text classification treatment facility is characterized in that, comprising:
Sorting parameter obtains the request receiver module, and the sorting parameter that carries text message theme and mathematical model sign that is used for the transmission of reception subscriber equipment obtains request;
The sorting parameter acquisition module is used for the corresponding relation according to the text message theme that sets in advance, mathematical model sign and sorting parameter, obtains the sorting parameter corresponding with described text message theme and mathematical model sign;
The sorting parameter sending module be used for to send described sorting parameter to described subscriber equipment, for described subscriber equipment according to described sorting parameter, to a plurality of text feature words that obtain after the pre-service processing of classifying.
16. text classification treatment facility according to claim 15 is characterized in that, also comprises:
Feature word acquisition module is used for obtaining the feature word for expression text message theme;
The training managing module is used for according to described feature word, and the mathematical model that sets in advance is carried out training managing, obtains the sorting parameter corresponding with described mathematical model;
Preserve module, be used for preserving the corresponding relation of described text message theme, described mathematical model sign and described sorting parameter.
17. text classification disposal system, it is characterized in that, comprise network equipment and subscriber equipment, wherein, described network equipment is as claim 15 or 16 described text classification treatment facilities, and subscriber equipment is as each described text classification treatment facility of claim 11 to 14.
CN 201010614959 2010-12-24 2010-12-24 Text classification processing method, system and equipment Active CN102073704B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010614959 CN102073704B (en) 2010-12-24 2010-12-24 Text classification processing method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010614959 CN102073704B (en) 2010-12-24 2010-12-24 Text classification processing method, system and equipment

Publications (2)

Publication Number Publication Date
CN102073704A CN102073704A (en) 2011-05-25
CN102073704B true CN102073704B (en) 2013-09-25

Family

ID=44032243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010614959 Active CN102073704B (en) 2010-12-24 2010-12-24 Text classification processing method, system and equipment

Country Status (1)

Country Link
CN (1) CN102073704B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833176B (en) * 2011-06-13 2018-01-26 腾讯科技(深圳)有限公司 Obtain the methods, devices and systems of information
CN102945246B (en) * 2012-09-28 2015-12-02 北界创想(北京)软件有限公司 The disposal route of network information data and device
CN103778225B (en) * 2014-01-23 2018-04-03 北京奇虎科技有限公司 Processing method, identification device and the system of advertisement marketing speech like sound information
CN105045779A (en) * 2015-07-13 2015-11-11 北京大学 Deep neural network and multi-tag classification based wrong sentence detection method
CN108319599B (en) 2017-01-17 2021-02-26 华为技术有限公司 Man-machine conversation method and device
CN106649890B (en) * 2017-02-07 2020-07-14 税云网络科技服务有限公司 Data storage method and device
CN107066560B (en) * 2017-03-30 2019-12-06 东软集团股份有限公司 Text classification method and device
CN110020431B (en) * 2019-03-06 2023-07-18 平安科技(深圳)有限公司 Feature extraction method and device of text information, computer equipment and storage medium
CN114491296B (en) * 2022-04-18 2022-07-12 湖南正宇软件技术开发有限公司 Proposal affiliate recommendation method, system, computer device and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1310825A (en) * 1998-06-23 2001-08-29 微软公司 Methods and apparatus for classifying text and for building a text classifier
CN1629837A (en) * 2003-12-17 2005-06-22 国际商业机器公司 Method and apparatus for processing, browsing and classified searching of electronic document and system thereof
CN101621391A (en) * 2009-08-07 2010-01-06 北京百问百答网络技术有限公司 Method and system for classifying short texts based on probability topic
CN101887443A (en) * 2009-05-13 2010-11-17 华为技术有限公司 Method and device for classifying texts
CN201654779U (en) * 2009-04-22 2010-11-24 同方知网(北京)技术有限公司 Scientific document automatic classification system
CN101902523A (en) * 2010-07-09 2010-12-01 中兴通讯股份有限公司 Mobile terminal and filtering method of short messages thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1310825A (en) * 1998-06-23 2001-08-29 微软公司 Methods and apparatus for classifying text and for building a text classifier
CN1629837A (en) * 2003-12-17 2005-06-22 国际商业机器公司 Method and apparatus for processing, browsing and classified searching of electronic document and system thereof
CN201654779U (en) * 2009-04-22 2010-11-24 同方知网(北京)技术有限公司 Scientific document automatic classification system
CN101887443A (en) * 2009-05-13 2010-11-17 华为技术有限公司 Method and device for classifying texts
CN101621391A (en) * 2009-08-07 2010-01-06 北京百问百答网络技术有限公司 Method and system for classifying short texts based on probability topic
CN101902523A (en) * 2010-07-09 2010-12-01 中兴通讯股份有限公司 Mobile terminal and filtering method of short messages thereof

Also Published As

Publication number Publication date
CN102073704A (en) 2011-05-25

Similar Documents

Publication Publication Date Title
CN102073704B (en) Text classification processing method, system and equipment
US20200301954A1 (en) Reply information obtaining method and apparatus
CN110543552B (en) Conversation interaction method and device and electronic equipment
CN111090727B (en) Language conversion processing method and device and dialect voice interaction system
CN104615608B (en) A kind of data mining processing system and method
CN110930980B (en) Acoustic recognition method and system for Chinese and English mixed voice
CN110377908B (en) Semantic understanding method, semantic understanding device, semantic understanding equipment and readable storage medium
CN111783468B (en) Text processing method, device, equipment and medium
CN102508554A (en) Input method with communication association, personal repertoire and system
CN104461226A (en) Chatting method, device and system used for network
CN102866782A (en) Input method and input method system for improving sentence generating efficiency
CN103268313A (en) Method and device for semantic analysis of natural language
CN102110140A (en) Network-based method for analyzing opinion information in discrete text
CN101937524A (en) Graduation design personalized guide system
CN112199606B (en) Social media-oriented rumor detection system based on hierarchical user representation
CN113392331A (en) Text processing method and equipment
CN109918627A (en) Document creation method, device, electronic equipment and storage medium
CN104778184A (en) Feedback keyword determining method and device
CN113505198A (en) Keyword-driven generating type dialogue reply method and device and electronic equipment
CN113342948A (en) Intelligent question and answer method and device
CN111178081A (en) Semantic recognition method, server, electronic device and computer storage medium
CN114756677B (en) Sample generation method, training method of text classification model and text classification method
CN102866783B (en) Syncopation method of Chinese phonetic string and system thereof
CN111581347B (en) Sentence similarity matching method and device
CN103942226A (en) Method and device for obtaining hot content

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20171027

Address after: Metro Songshan Lake high tech Industrial Development Zone, Guangdong Province, Dongguan City Road 523808 No. 2 South Factory (1) project B2 -5 production workshop

Patentee after: HUAWEI terminal (Dongguan) Co., Ltd.

Address before: 518129 Longgang District, Guangdong, Bantian HUAWEI base B District, building 2, building No.

Patentee before: Huawei Device Co., Ltd.

TR01 Transfer of patent right
CP01 Change in the name or title of a patent holder

Address after: 523808 Southern Factory Building (Phase I) Project B2 Production Plant-5, New Town Avenue, Songshan Lake High-tech Industrial Development Zone, Dongguan City, Guangdong Province

Patentee after: Huawei Device Co., Ltd.

Address before: 523808 Southern Factory Building (Phase I) Project B2 Production Plant-5, New Town Avenue, Songshan Lake High-tech Industrial Development Zone, Dongguan City, Guangdong Province

Patentee before: HUAWEI terminal (Dongguan) Co., Ltd.

CP01 Change in the name or title of a patent holder