Embodiment
For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
Fig. 1 is the process flow diagram of an embodiment of text classification disposal route of the present invention, and as shown in Figure 1, the method for present embodiment comprises:
Step 101, subscriber equipment receive the sorting parameter that network equipment sends.
In the present embodiment, network equipment receives the feature word that is used for can representing the text message theme of network equipment keeper input, the mathematical model that arranges is trained, when training result reaches the desired value that the keeper sets in advance according to the mathematical model type, think that this mathematical model trains successfully, and obtain the sorting parameter in the successful mathematical model of this training; For instance, this mathematical model is the mathematical model of neural network, and when according to the feature word, when training successfully to the mathematical model of this neural network, the sorting parameter that obtains is each internodal connection weight value on the mathematical model of neural network.
In addition, the sorting parameter that network equipment obtains in can the mathematical model with above-mentioned training success sends to subscriber equipment, and wherein, this subscriber equipment can be specially subscriber equipmenies such as mobile phone.
Step 102, subscriber equipment carry out the text pre-service to the text message of collecting, and obtain a plurality of text features.
In the present embodiment, subscriber equipment can be note, multimedia message, or stored text information is collected among FaceBook, Twitter, microblogging, the Blog, and the text message of collecting is carried out the text pre-service, obtains a plurality of text features; Wherein, text feature can comprise noun word and verb word etc., and is used for the content of reasonable expression text information.
Step 103, subscriber equipment be according to sorting parameter, adopts the mathematical model that sets in advance to the processing of classifying of a plurality of text features, obtains the degree of correlation of a plurality of text features and sorting parameter.
Step 104, subscriber equipment are stored in text message in the text message theme of the highest group categories parameter correspondence of degree of correlation.
In the present embodiment, subscriber equipment can set in advance mathematical model, according to the sorting parameter that obtains, to a plurality of text features that obtain after the pre-service processing of classifying, and obtains classification results.For instance, subscriber equipment obtains the two fold classification parameter according to user's request, and wherein, a group categories parameter is used for the expression theme of soccer; Another group categories parameter is used for expression automobile theme.Subscriber equipment can be respectively according to above-mentioned two fold classification parameter, the mathematical model that employing sets in advance, the a plurality of text features that obtain after the pre-service are calculated, result calculated can characterize the degree of correlation of these a plurality of text features and every group categories parameter, text information is stored in the text message theme of the high group categories parameter correspondence of degree of correlation, as classification results.For example: the text message relevant with theme of soccer classified as under the theme of soccer catalogue, and the text message that the automobile theme is relevant classifies as under the automobile subject catalogue, thereby makes that the user can the interested text message of fast browsing oneself.
In addition, in the present embodiment, the mathematical model that subscriber equipment adopts can be the mathematical model of decision tree, KNN algorithm, Bayesian network, neural network, Boosting or support vector machine.
In the present embodiment, by collecting text message, and text information carried out the text pre-service, obtain a plurality of text features, the sorting parameter that sends according to the network equipment that receives again, the mathematical model that employing sets in advance is to the processing of classifying of a plurality of text features, obtain the degree of correlation of a plurality of text features and sorting parameter, at last text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.
Fig. 2 is the process flow diagram of another embodiment of text classification disposal route of the present invention, and as shown in Figure 2, the method for present embodiment comprises:
Step 201, subscriber equipment receive user's request, and this user's request comprises the text message theme.
Step 202, subscriber equipment send sorting parameter to network equipment and obtain request; The mathematical model sign that the request of obtaining of this sorting parameter carries the text message theme and sets in advance.
The sorting parameter corresponding with text message theme and the mathematical model sign that sets in advance that step 203, subscriber equipment receive that network equipment sends.
In the present embodiment, the user can according to oneself hobby and subscriber equipment in the mathematical model that arranges, from network equipment, obtain corresponding with it sorting parameter.For instance, when the user needs fast browsing to arrive the text message relevant with football, automobile and theme of news, import the text message theme of this football, automobile and news to subscriber equipment, subscriber equipment is carried at sorting parameter with the mathematical model sign that sets in advance in text message subject and the subscriber equipment and obtains in the request, send to network equipment, afterwards, network equipment will send to subscriber equipment to the sorting parameter corresponding with text message theme and mathematical model sign again.
Step 204, subscriber equipment carry out word segmentation processing to the text message of collecting, and obtain multiple word.
For instance, can adopt recursive function that each sentence in the text message is carried out word segmentation processing, concrete, all possible participle in the sentence all to be listed, the method for utilizing word degree of polymerization maximum then is that the method for shortest path is obtained multiple word.
Step 205, subscriber equipment carry out part-of-speech tagging to multiple word to be handled, and obtains the word of multiple part of speech.
Step 206, subscriber equipment extract the frequency of occurrences the highest and corresponding with the quantity of the input point of described mathematical model a plurality of noun words and verb word, as text feature from the word of multiple part of speech.
In the present embodiment, the text message of this collection is that subscriber equipment is from this locality or the text message that obtains of network.In addition, a text message can be made of a plurality of sentences, and a sentence can be made of a plurality of words, generalized case, and the sentence meaning that comprises in noun word and the verb word is much larger than adjective word and preposition word etc.Therefore, subscriber equipment can be handled by multiple word being carried out part-of-speech tagging, obtain the word of multiple part of speech, and from the word of multiple part of speech, extract existing frequency the highest and corresponding with the quantity of the input point of described mathematical model a plurality of noun words and verb word out and be used as text feature.
Present embodiment can adopt HMM and Viterbi algorithm that multiple word is carried out part-of-speech tagging and handle, then the specific implementation of step 205 can for: Pie used respectively in the multiple word that will obtain, a, represent with three parameters of b, and adopt HMM and Viterbi algorithm to carry out computing, thereby obtain the part of speech of multiple word.Wherein, the Pie parameter is used for the word of the some parts of speech of expression as the probability of the beginning of sentence; The a parameter is used for the redirect probability between the different parts of speech of expression, and for example: noun jumps to the probability of verb, and verb jumps to adjectival probability; The b parameter is used for the expression emission probability, namely refers to the probability that a word occurs in the word of all these parts of speech as certain part of speech.For example: " strike " word can also can be the noun word for the verb word, the probability of then adding up probability that it occurs as the verb word and occur in all noun words during as the noun word in all verb words.
For instance, for " I like China " this sentence, it is carried out word segmentation processing, can obtain " I ", " love " and " China " these three words.Wherein, each word part of speech that may possess respectively is 39 kinds.Use the Viterbi algorithm, in the parameter of sentence beginning, namely the Pie parameter is carried out multiplying respectively with Pie parameter and the b parameter of " I ", obtains the part of speech of one of them maximal value correspondence, as the part of speech of " I " when taking out " I " as various part of speech; Part of speech with this maximal value correspondence multiply by a parameter of " love " again, multiply by the b parameter of " love " again, obtains the part of speech of one of them maximal value correspondence, as the part of speech of " love "; Part of speech with this maximal value correspondence multiply by a parameter of " China " again, multiply by the b parameter of " China " again, obtains the part of speech of one of them maximal value correspondence, as the part of speech of " China "; Both can draw the Computation of Maximal Waiting route at last, i.e. the sequence of part-of-speech tagging, thus obtain the part of speech of " I ", " love " and " China " these three words.
Step 207, subscriber equipment be according to sorting parameter, adopts the mathematical model that sets in advance to the processing of classifying of a plurality of text features, obtains the degree of correlation of a plurality of text features and sorting parameter.
In the present embodiment, the implementation of step 207 and the implementation of step 103 shown in Figure 1 are similar, repeat no more herein.
Step 208, subscriber equipment are stored in text message in the text message theme of the highest group categories parameter correspondence of degree of correlation.
For instance, according to the degree of correlation of these a plurality of text features and every group categories parameter and the corresponding relation of sorting parameter and text message theme, the link of text message is placed under the catalogue of different text message themes, thereby make that the user can be open-and-shut, and browse to own interested text message rapidly.
In the present embodiment, by collecting text message, and text information carried out word segmentation processing, obtain multiple word, and multiple word is carried out part-of-speech tagging handle, obtain the word of multiple part of speech, from the word of multiple part of speech, extract a plurality of noun words and verb word, as text feature, again according to the sorting parameter that receives, the mathematical model that employing sets in advance is obtained the degree of correlation of a plurality of text features and sorting parameter to the processing of classifying of a plurality of text features, at last text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.In addition, owing to adopt network equipment mathematical model is trained to obtain sorting parameter, therefore make subscriber equipment in realizing the process that classification is handled, save a large amount of training times, and because the data volume of this sorting parameter is very little, therefore make subscriber equipment also relatively very little at the load that obtains this sorting parameter, simultaneously, owing to this sorting parameter need do not obtained in real time when classification is handled, therefore improved the convenience that the user uses more effectively.
Fig. 3 is the process flow diagram of another embodiment of text classification disposal route of the present invention, and as shown in Figure 3, the method for present embodiment comprises:
The sorting parameter that carries text message theme and mathematical model sign that step 301, network equipment receive the subscriber equipment transmission obtains request.
The corresponding relation of the text message theme that step 302, network equipment basis set in advance, mathematical model sign and sorting parameter obtains the sorting parameter corresponding with described text message theme and mathematical model sign.
Step 303, network equipment send this corresponding sorting parameter to subscriber equipment, for subscriber equipment according to this sorting parameter, to a plurality of text features that obtain after the pre-service processing of classifying.
In the present embodiment, subscriber equipment can send to network equipment with text message subject and this mathematical model sign according to the mathematical model that arranges in the text message theme of user input and the subscriber equipment; Network equipment is according to the corresponding relation of the text message theme that sets in advance, mathematical model sign and the sorting parameter of storage in advance, obtain the sorting parameter corresponding with text message theme and mathematical model sign, and this sorting parameter sent to subscriber equipment, subscriber equipment can execution graph 1 or the technical scheme of method embodiment shown in Figure 2 afterwards, its principle is similar, repeats no more herein.
Concrete, corresponding with the mathematical model that subscriber equipment adopts, network equipment can make up multiple mathematical model, for example: decision tree, K arest neighbors (K Nearest Neighbors, KNN) mathematical model of algorithm, Bayesian network, neural network, Boosting or support vector machine.
In the present embodiment, obtain request by the sorting parameter that carries text message theme and mathematical model sign that receives the subscriber equipment transmission, corresponding relation according to the text message theme that sets in advance, mathematical model sign and sorting parameter, obtain the sorting parameter corresponding with text message subject and mathematical model sign, and give subscriber equipment with this sorting parameter, for subscriber equipment according to sorting parameter, to a plurality of text features that obtain after the pre-service processing of classifying.Thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.
Further, in another embodiment of the present invention, the process of obtaining of the corresponding relation of step 302 Chinese version message subject, mathematical model sign and sorting parameter can be specially:
Obtain the feature word for expression text message theme;
According to the feature word, the mathematical model that sets in advance is carried out training managing, obtain the sorting parameter corresponding with mathematical model;
Preserve the corresponding relation of described text message theme, mathematical model sign and sorting parameter.
In the present embodiment, the mathematical model of setting up neural network with network equipment is example, introduces the technical scheme that network equipment obtains sorting parameter in detail.
The mathematical model of neural network can comprise input layer, hide layer and output layer.Wherein, input layer is ground floor the most, and 150 input points can be arranged, and is used for 150 feature words of corresponding this paper information; Hiding layer is following one deck of input layer, is the layer with Classification and Identification function, and this hides layer can 8 nodes; Output layer is for hiding following one deck of layer, and for generation of the result that classification is judged, this layer can have 1 node.In addition, layer with layer between the principle that is connected be that each node on upper strata is connected with each node of following one deck, on each connects, comprise a weighted value, be used for corresponding error log, in order to do the usefulness of the reverse propagation of error during to the network training of back.
Concrete, the process of the mathematical model training of this neural network and the implementation of obtaining sorting parameter can be specially:
At first, to the numerical value between one 0 to 1 of the weighted value tax at random of each connection on the mathematical model of neural network; The candidate feature word of receiving management person's input, this candidate feature word is for representing the word of text message theme; With the input value of this candidate feature word correspondence, be input in the input point of mathematical model of this neural network and be used for training.
Then, adopt the reversal error transmission method to train the mathematical model of this neural network, the difference of namely utilizing actual output and desired output to each layer connection weight value of the mathematical model of this neural network by after successively proofread and correct forward; When the cumulative errors rate of the mathematical model of whole neural network during less than a preset threshold value, just stop training network.
Wherein, the number of times that this candidate feature word occurs in article sum * training set in article number/training set that this candidate feature word occurs in the input value=training set of a candidate feature word correspondence.
For instance, when the training set to these four text message themes of news, recreation, health and family carries out the branch time-like, at first set up the mathematical model of four neural networks corresponding with above-mentioned text message theme at network equipment, collect each 100 pieces in the article of the training set of these four text message themes of news, recreation, health and family then, respectively the mathematical model of the neural network of these four text message theme correspondences of news, recreation, health and family is trained.Mathematical model with the neural network of news correspondence is trained for example, need will be near 1 to the operation result of theme of news, operation result for other three themes will be near 0, thereby when 400 pieces of training sets are carried out computing, suppose training 10 times, average computation result for the article of theme of news is 0.95, average computation result for the article of other three text message themes was respectively 0.05 o'clock, namely stop training, think and train successfully, and preservation theme of news, the mathematical model sign of neural network and the corresponding relation of the sorting parameter of this moment, the sorting parameter of this this moment is each internodal connection weight value on the mathematical model of the successful neural network of training.
In the present embodiment, because adopting network equipment trains mathematical model, obtain sorting parameter, therefore make subscriber equipment in realizing the process that classification is handled, save a large amount of training times, and because the data volume of this sorting parameter is very little, therefore make subscriber equipment also relatively very little at the load that obtains this sorting parameter, simultaneously, owing to need when classification is handled, not obtain this sorting parameter in real time, therefore improve the convenience that the user uses more effectively.
Fig. 4 is the structural representation of an embodiment of subscriber equipment of the present invention, and as shown in Figure 4, the subscriber equipment of present embodiment comprises: sorting parameter receiver module 11, pretreatment module 12, degree of correlation acquisition module 13 and classification processing module 14.Wherein, sorting parameter receiver module 11 is used for receiving the sorting parameter that network equipment sends; Pretreatment module 12 is used for the text message of collecting is carried out the text pre-service, obtains a plurality of text features; Degree of correlation acquisition module 13 is used for according to sorting parameter, adopts the mathematical model that sets in advance to the processing of classifying of a plurality of text features, obtains the degree of correlation of a plurality of text features and sorting parameter; Classification processing module 14 is used for text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation.
The subscriber equipment of present embodiment can be carried out the technical scheme of the embodiment of method shown in Figure 1, and its principle is similar, repeats no more herein.
In the present embodiment, by collecting text message, and text information carried out the text pre-service, obtain a plurality of text features, the sorting parameter that sends according to the network equipment that receives again, the mathematical model that employing sets in advance is to the processing of classifying of a plurality of text features, obtain the degree of correlation of a plurality of text features and sorting parameter, at last text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.
Fig. 5 is the structural representation of another embodiment of subscriber equipment of the present invention, as shown in Figure 5, on the basis of above-mentioned embodiment shown in Figure 4, the subscriber equipment of present embodiment also comprises: the user asks receiver module 15, sorting parameter to obtain request sending module 16.Wherein, the user asks receiver module 15 to be used for receiving user's request, and this user's request comprises the text message theme; Sorting parameter obtains request sending module 16 and obtains request to network equipment for the sorting parameter that sends the mathematical model sign that carries the text message theme and set in advance.
Further, this pretreatment module 12 can specifically comprise: word segmentation processing unit 121, part-of-speech tagging processing unit 122 and text feature extracting unit 123.Wherein, word segmentation processing unit 121 is used for the text message of collecting is carried out word segmentation processing, obtains multiple word; Part-of-speech tagging processing unit 122 is used for that multiple part of speech is carried out part-of-speech tagging to be handled, and obtains the word of multiple part of speech; Text feature extracting unit 123 is used for the word from multiple part of speech, extracts the frequency of occurrences the highest and corresponding with the quantity of the input point of mathematical model a plurality of noun words and verb word, the composition text feature.
The subscriber equipment of present embodiment can be carried out the technical scheme of the embodiment of method shown in Figure 2, and its principle is similar, repeats no more herein.
In the present embodiment, by collecting text message, and text information carried out word segmentation processing, obtain multiple word, and multiple word is carried out part-of-speech tagging handle, obtain the word of multiple part of speech, from the word of multiple part of speech, extract a plurality of noun words and verb word, as text feature, again according to the sorting parameter that receives, the mathematical model that employing sets in advance is obtained the degree of correlation of a plurality of text features and sorting parameter to the processing of classifying of a plurality of text features, at last text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.In addition, because adopting network equipment trains mathematical model, obtain sorting parameter, therefore make subscriber equipment in realizing the process that classification is handled, save a large amount of training times, and because the data volume of this sorting parameter is very little, therefore make subscriber equipment also relatively very little at the load that obtains this sorting parameter, simultaneously, owing to need when classification is handled, not obtain this sorting parameter in real time, therefore improve the convenience that the user uses more effectively.
Further, the subscriber equipment of above-described embodiment can have the subscriber equipment of Presentation Function for mobile phone, human-computer interaction terminal, e-book or other.Be under the feelings of mobile phone at subscriber equipment, this mobile phone also comprises: radio circuit, voicefrequency circuit, power supply, in order to finish the basic function of mobile phone, below radio circuit, microphone, loudspeaker, power supply are briefly introduced respectively: radio circuit, be mainly used in setting up communicating by letter of mobile phone and wireless network, realize that mobile phone and the data of wireless network receive and transmission; Microphone is used for gathering sound and the sound of gathering is converted into voice data, so that mobile phone sends voice data by radio circuit to wireless network; Loudspeaker is used for mobile phone by radio circuit being reduced to sound and playing this sound to the user from the voice data that wireless network receives; Power supply is mainly used in each circuit or device power supply into mobile phone, guarantees the operate as normal of mobile phone.
Fig. 6 is the structural representation of an embodiment of network equipment of the present invention, and as shown in Figure 6, the network equipment of present embodiment comprises: the sorting parameter request of obtaining receiver module 21, sorting parameter acquisition module 22 and sorting parameter sending module 23.Wherein, sorting parameter obtains request receiver module 21 and obtains request for the sorting parameter that carries text message theme and mathematical model sign that receives the subscriber equipment transmission; Sorting parameter acquisition module 22 is used for the corresponding relation according to the text message theme that sets in advance, mathematical model sign and sorting parameter, obtains the sorting parameter corresponding with text message theme and mathematical model sign; Sorting parameter sending module 23 is used for sending sorting parameter to subscriber equipment, for subscriber equipment according to sorting parameter, to a plurality of text feature words that obtain after the pre-service processing of classifying.
The network equipment of present embodiment can be carried out the technical scheme of the embodiment of method shown in Figure 3, and its principle is similar, repeats no more herein.
In the present embodiment, obtain request by the sorting parameter that carries text message theme and mathematical model sign that receives the subscriber equipment transmission, according to the text message theme that sets in advance, the corresponding relation of mathematical model sign and sorting parameter, obtain the sorting parameter corresponding with text message subject and mathematical model sign, and give subscriber equipment with this sorting parameter, for subscriber equipment according to sorting parameter, to a plurality of text features that obtain after the pre-service processing of classifying, obtain the degree of correlation of a plurality of text features and sorting parameter, end user's equipment is stored in text message in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.
Further, in another embodiment of the present invention, network equipment can also comprise: feature word acquisition module, training managing module and preservation module, and wherein, feature word acquisition module is used for obtaining the feature word for expression text message theme; The training managing module is used for according to the feature word, and the mathematical model that sets in advance is carried out training managing, obtains the sorting parameter corresponding with mathematical model; Preserve the corresponding relation that module is used for preserving text message theme, mathematical model sign and sorting parameter.
In the present embodiment, because adopting network equipment trains mathematical model, obtain sorting parameter, therefore make subscriber equipment in realizing the process that classification is handled, save a large amount of training times, and because the data volume of this sorting parameter is very little, therefore make subscriber equipment also relatively very little at the load that obtains this sorting parameter, simultaneously, owing to need when classification is handled, not obtain this sorting parameter in real time, therefore improve the convenience that the user uses more effectively.
Fig. 7 is the structural representation of an embodiment of text classification disposal system of the present invention, shown in Fig. 7, the system of present embodiment comprises: network equipment 31 and subscriber equipment 32, and wherein, network equipment 31 can be carried out the technical scheme of the embodiment of method shown in Figure 3; Subscriber equipment 32 can execution graph 1 or the technical scheme of the embodiment of method shown in Figure 2, and its principle is similar, repeats no more herein.
In the present embodiment, the text classification disposal system is owing to adopt network equipment that mathematical model is trained, obtain sorting parameter, therefore make subscriber equipment in realizing the process that classification is handled, save a large amount of training times, and because the data volume of this sorting parameter is very little, therefore make subscriber equipment also relatively very little at the load that obtains this sorting parameter, in addition, subscriber equipment is according to this sorting parameter, the mathematical model that employing sets in advance is obtained the text feature processing of classifying after to pre-service, obtain the degree of correlation of a plurality of text features and sorting parameter, at last text message is stored in the text message theme of the highest group categories parameter correspondence of degree of correlation, thereby realized the classification of text message is handled at subscriber equipment, and then improved the convenience that the user uses effectively.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be finished by the relevant hardware of programmed instruction, aforesaid program can be stored in the computer read/write memory medium, this program is carried out the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment puts down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.