CN108090040A - A kind of text message sorting technique and system - Google Patents

A kind of text message sorting technique and system Download PDF

Info

Publication number
CN108090040A
CN108090040A CN201611044117.5A CN201611044117A CN108090040A CN 108090040 A CN108090040 A CN 108090040A CN 201611044117 A CN201611044117 A CN 201611044117A CN 108090040 A CN108090040 A CN 108090040A
Authority
CN
China
Prior art keywords
participle
fraction
preset
text message
son
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611044117.5A
Other languages
Chinese (zh)
Other versions
CN108090040B (en
Inventor
郭秦龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201611044117.5A priority Critical patent/CN108090040B/en
Publication of CN108090040A publication Critical patent/CN108090040A/en
Application granted granted Critical
Publication of CN108090040B publication Critical patent/CN108090040B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The embodiment of the invention discloses a kind of text message sorting technique and system, for being used to improve the accuracy of text emotion classification.Present invention method includes:Obtain text message;The first participle is obtained, the first participle carries out word segmentation processing acquisition according to the first presetting rule to the text message;The first participle is inserted into preset emotion score counter, the first fraction is calculated;The second participle is obtained, second participle carries out word segmentation processing acquisition according to the second presetting rule to the text message;Described second participle is inserted into preset training pattern, the second fraction is calculated;When determining the language environment of the text message according to preset text rule, weight distribution is carried out to first fraction and second fraction using preset integrated logic;The weight distributed according to preset integrated logic draws the composite score of text message, and the classification results of the text message are drawn according to the composite score.

Description

A kind of text message sorting technique and system
Technical field
The present invention relates to text message classification field, more particularly to a kind of text message sorting technique and system.
Background technology
Emotional semantic classification is an allusion quotation in natural language processing (Natural Language Processing, NLP) field Type problem, problem are described as, and given passage (can be a word or an article) is judged expressed by this article Emotion is positive, negative sense or neutrality.
Emotional semantic classification problem is an academia or industrial quarters all topics with further investigation extensively in itself.It utilizes Emotion dictionary is a kind of method for solving the problems, such as emotional semantic classification.By the way that the marking of some emotion words is manually set, such as positive emotion Word, negative sense emotion word.For the text of input, by seeing the accounting of positive negative sense emotion word, to determine the emotional semantic classification of text.
The classifying quality of the prior art extremely relies on the quality of sentiment dictionary.If the quality of sentiment dictionary is not good enough, than Side say some word mistakes classification or some there are the word that emotional semantic classification obscures, for example " it is unexpected ", with being in Electrical domain generally refers to household electrical appliances and the problem of unknowable occurs, but if used in cinematographic field, refers generally to the attraction of film plot People.
The prior art utilizes single emotional semantic classification algorithm, it is impossible to flexibly be given a mark according to specific field so that The accuracy of emotional semantic classification is not high.
The content of the invention
An embodiment of the present invention provides a kind of text message sorting technique and system, for improving the standard of text emotion classification True property.
First aspect of the embodiment of the present invention provides a kind of text message sorting technique, specifically includes:
Obtain text message;
The first participle is obtained, the first participle carries out word segmentation processing acquisition according to the first presetting rule to text message;
The first participle is inserted into preset emotion score counter, the first fraction is calculated;
The second participle is obtained, the second participle carries out word segmentation processing acquisition according to the second presetting rule to text message;
Second participle is inserted into preset training pattern, the second fraction is calculated;
Weight distribution is carried out to the first fraction and the second fraction using preset integrated logic;
The weight distributed according to preset integrated logic draws the composite score of text message,
The classification results of text message are drawn according to composite score.
Second aspect of the embodiment of the present invention provides a kind of Text Classification System, specifically includes:
First acquisition unit, for obtaining text message;
Second acquisition unit, for obtaining the first participle, the first participle is single to being obtained by first according to the first presetting rule The text message that member is got carries out word segmentation processing acquisition;
First inserts unit, is counted for the first participle got by second acquisition unit to be inserted preset emotion score The first fraction is calculated in device;
3rd acquiring unit, for obtaining the second participle, the second participle is according to the second presetting rule to there is first module to obtain The text message got carries out word segmentation processing acquisition;
Second inserts unit, and the second fraction is calculated for the second participle to be inserted preset training pattern;
First allocation unit, for carrying out weight distribution to the first fraction and the second fraction using preset integrated logic;
Computing unit, the weight for being distributed according to integrated logic draw the composite score of text message;
Processing unit, the composite score for being drawn according to computing unit draw the classification results of text message.
The third aspect of the embodiment of the present invention provides a kind of terminal, specifically includes:
Input unit, output device, processor and memory;
Input unit performs following steps:
Obtain text message;
The first participle is obtained, the first participle carries out word segmentation processing acquisition according to the first presetting rule to text message;
The second participle is obtained, the second participle carries out word segmentation processing acquisition according to the second presetting rule to text message;
Processor is by calling the operational order of memory storage, for performing following steps:
The first participle is inserted into preset emotion score counter, the first fraction is calculated;
Second participle is inserted into preset training pattern, the second fraction is calculated;
Weight distribution is carried out to the first fraction and the second fraction using preset integrated logic;
The weight distributed according to preset integrated logic draws the composite score of text message,
The classification results of text message are drawn according to composite score.
As can be seen from the above technical solutions, the embodiment of the present invention has the following advantages:
In the embodiment of the present invention, text message is obtained first;Word segmentation processing is carried out by the text and obtains the first participle;It will The first participle inserts preset emotion score counter and the first fraction is calculated;Word segmentation processing is carried out by the text and obtains second Participle;Second participle is inserted into preset training pattern, the second fraction is calculated;Using preset integrated logic to the first fraction with Second fraction carries out weight distribution, and the weight distributed according to preset integrated logic draws the composite score of text message, according to Composite score draws the classification results of text message.The embodiment of the present invention utilizes a kind of sensibility classification method of serialization, to not Weight distribution is carried out in conjunction with language environment with the fraction that algorithm is drawn, improves the accuracy of text classification.
Description of the drawings
Fig. 1 is schematic network structure in the embodiment of the present invention;
Fig. 2 is text information classification approach one embodiment schematic diagram in the embodiment of the present invention;
Fig. 3 is another embodiment schematic diagram of text information classification approach in the embodiment of the present invention;
Fig. 4 is system one embodiment schematic diagram in the embodiment of the present invention;
Fig. 5 is another embodiment schematic diagram of system in the embodiment of the present invention;
Fig. 6 is another embodiment schematic diagram of system in the embodiment of the present invention.
Specific embodiment
An embodiment of the present invention provides a kind of text message sorting technique and system, for improving the standard of text emotion classification True property.
In order to which those skilled in the art is made to more fully understand the present invention program, below in conjunction in the embodiment of the present invention The technical solution in the embodiment of the present invention is clearly and completely described in attached drawing, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people Member's all other embodiments obtained without making creative work should all belong to the model that the present invention protects It encloses.
Term " first ", " second ", " the 3rd " in description and claims of this specification and above-mentioned attached drawing, " The (if present)s such as four " are the objects for distinguishing similar, without being used to describe specific order or precedence.It should manage The data that solution so uses can exchange in the appropriate case, so that the embodiments described herein can be with except illustrating herein Or the order beyond the content of description is implemented.In addition, term " comprising " or " having " and its any deformation, it is intended that covering is not Exclusive includes, for example, contain the process of series of steps or unit, method, system, product or equipment be not necessarily limited to it is clear Those steps or unit that ground is listed, but may include not list clearly or for these processes, method, product or set Standby intrinsic other steps or unit.
The embodiment of the present invention can be applied to the network architecture as shown in Figure 1, and in the network architecture, user can be used to set for user Standby (such as personal computer, laptop, tablet computer, mobile phone etc.) is got by storage facilities etc. classifies Text.Then by the Text Classification System on user equipment, the text for needing to carry out emotional semantic classification is analyzed, is obtained Analysis result.
In the embodiment of the present invention, the text message to classify is obtained first, is then obtained using the algorithm of emotion dictionary The first fraction of text message is taken, the second fraction of text message is obtained using the algorithm based on machine learning, when according to preset When text rule determines the language environment of text message, weight point is carried out to the first fraction and the second fraction using integrated logic Match somebody with somebody, integrated logic is drawn according to language environment, and the weight finally distributed according to integrated logic draws the classification knot of text message Fruit.The embodiment of the present invention utilizes a kind of sensibility classification method of serialization, and the fraction drawn to algorithms of different is in conjunction with language Environment carries out weight distribution, improves the accuracy of text classification.
Referring to Fig. 2, text information classification approach one embodiment includes in the embodiment of the present invention:
201st, text message is obtained.
In the present embodiment, before needing to carry out emotional semantic classification to text message, it is necessary first to obtain the text for needing to classify This information.
It should be noted that system can obtain text message by internet, can also be obtained from other approach, such as It is obtained from storage device, specific acquisition modes do not limit herein.
202nd, the first participle is obtained.
It, will be according to the first preset rule when system, which is got, needs to carry out the text message of sentiment analysis in the present embodiment Word segmentation processing acquisition then is carried out to text message, wherein, the first presetting rule is to divide text module according to word and/or sentence Rule, the first participle for participle gather, all sub- participles including text information.
It should be noted that the first participle includes word and sentence.
203rd, the first participle is inserted into preset emotion score counter and the first fraction is calculated.
In the present embodiment, after system gets the first participle, system is deposited obtains minute counter there are one emotion, by first Participle, which inserts preset emotion and obtains minute counter, to carry out can be calculated the first fraction.
204th, the second participle is obtained.
In the present embodiment, when system by the first participle insert preset emotion score counter be calculated the first fraction it Afterwards, will Screening Treatment be carried out to the first participle according to the second presetting rule, wherein the second presetting rule is, inside the first participle All first sons participles compare with the word having inside preset sentiment dictionary after, with being deposited in preset sentiment dictionary The first son participle screening removal having, using the set of the first son participle after screening as the second participle.
It should be noted that the second participle includes word and sentence.
205th, the second participle is inserted into preset training pattern and the second fraction is calculated.
In the present embodiment, after system obtains the second participle, the second participle is inserted into preset training pattern and is calculated Second fraction, wherein, preset training pattern has the correspondence of preset scores vector and fraction.
206th, weight distribution is carried out to the first fraction and the second fraction using preset integrated logic.
In the present embodiment, when getting the first fraction and the second fraction, system will utilize integrated logic to the first fraction with Second fraction carries out weight distribution, wherein preset integrated logic judges the language ring of the text for the special word in text Border, the rule then set according to the language environment.
207th, the weight distributed according to preset integrated logic draws the composite score of text message.
In the present embodiment, after system carries out weight distribution using integrated logic to the first fraction and the second fraction, The weight distributed according to preset integrated logic is drawn to the composite score of text message.
Wherein, the first weights of fraction * the second weights of the+the second fraction * of composite score=first are, it is necessary to which explanation is the first power Weight gives the weight of the first fraction distribution for integrated logic, and the second weight gives the weight of the second fraction distribution for integrated logic, wherein, The sum of weight is 1.Under normal circumstances, the weight of the second fraction of weight ratio of the first fraction is high.
208th, the classification results of text message are drawn according to composite score.
In the present embodiment, after the weight distributed according to preset integrated logic draws the composite score of text message, The classification results of text message will be drawn according to composite score.
In the embodiment of the present invention, text message is obtained first;Word segmentation processing is carried out by the text and obtains the first participle;It will The first participle inserts preset emotion score counter and the first fraction is calculated;Word segmentation processing is carried out by the text and obtains second Participle;Second participle is inserted into preset training pattern, the second fraction is calculated;When determining text according to preset text rule During the language environment of information, weight distribution is carried out to the first fraction and the second fraction using preset integrated logic, according to preset comprehensive Logical distributed weight draws the composite score of text message, and the classification results of text message are drawn according to composite score. The embodiment of the present invention utilizes a kind of sensibility classification method of serialization, and the fraction drawn to algorithms of different is in conjunction with language environment Weight distribution is carried out, improves the accuracy of text classification.
Referring to Fig. 3, another embodiment of text information classification approach includes in the embodiment of the present invention:
301st, text message is obtained.
In the present embodiment, before needing to carry out emotional semantic classification to text message, it is necessary first to obtain the text for needing to classify This information.
It should be noted that system can obtain text message by internet, can also be obtained from other approach, such as It is obtained from storage device, specific acquisition modes do not limit herein.
302nd, the first participle is obtained.
It, will be according to the first preset rule when system, which is got, needs to carry out the text message of sentiment analysis in the present embodiment Word segmentation processing acquisition then is carried out to text message, wherein, the first presetting rule is to divide text module according to word and/or sentence Rule, the first participle for participle gather, all sub- participles including text information.
It should be noted that the first participle includes word and sentence.
303rd, the first participle is inserted into preset emotion score counter and the first fraction is calculated.
In the present embodiment, system is deposited obtains minute counter there are one emotion, includes inside the minute counter of the emotion preset Sentiment dictionary, being stored with a large amount of words in the preset sentiment dictionary, uniquely corresponding fractional value, system will obtain with the word The first son participle inside the first participle taken compares with the word that has inside preset sentiment dictionary, when discovery there are with Inside preset sentiment dictionary when identical word or sentence, then the emotion obtains minute counter plus the word or sentence with being found Corresponding fraction is uniquely corresponded to, the corresponding score addition of all participles had both been obtained into the first fraction.
304th, the second participle is obtained.
In the present embodiment, when system by the first participle insert preset emotion score counter be calculated the first fraction it Afterwards, will Screening Treatment be carried out to the first participle according to the second presetting rule, wherein the second presetting rule is, inside the first participle All first sons participles compare with the word having inside preset sentiment dictionary after, with being deposited in preset sentiment dictionary The first son participle screening removal having, using the set of the first son participle after screening as the second participle.
It should be noted that the second participle includes word and sentence.
305th, the second participle is inserted into preset training pattern and the second fraction is calculated.
It, will be by preset training pattern by the in the second participle after system obtains the second participle in the present embodiment Two son participles are converted into numerical value vector, are then searched and second from the preset scores vector database in preset training pattern Son segments the nearest preset scores vector of corresponding numerical value vector distance, by the preset scores vector nearest with numerical value vector distance Each second son, is finally segmented corresponding fraction and is added to obtain second point by fraction of the corresponding fraction as the second son participle Number.Wherein, the correspondence of preset scores vector and fraction is stored in preset scores vector database.
306th, the 3rd fraction of text message is obtained using sensibility classification method.
In the present embodiment, the system supports extended function, if in the future with the differentiation of business scenario, it was found that new is suitable Suitable algorithm, we will be added, Ran Houli by algorithm custom feature using new algorithm as the submodule of this algorithm The 3rd fraction of text message is obtained with the sensibility classification method.
It should be noted that the sensibility classification method subsequently added can be a variety of, do not limit herein specifically.
307th, weight distribution is carried out to the first fraction, the second fraction and the 3rd score using preset integrated logic.
In the present embodiment, after the first fraction, the second fraction and three scores is got, system will utilize integrated logic pair First fraction, the second fraction and the 3rd score carry out weight distribution, wherein preset integrated logic is the special word in text Language judges the language environment of the text, the rule then set according to the language environment.
308th, the weight distributed according to preset integrated logic draws the composite score of text message.
In the present embodiment, when using integrated logic to the first fraction, the second fraction and the 3rd fraction carry out weight distribution it Afterwards, the weight distributed according to preset integrated logic draws the composite score of text message.
Wherein, composite score is weighed for=the first the+the three fraction * the 3rd of fraction * the first weight the second weights of the+the second fraction * For weight, it is necessary to the weight of to be the first weight give for integrated logic the first fraction distribution of explanation, the second weight is integrated logic to the The weight of two fractions distribution, the 3rd weight give the weight of the 3rd fraction distribution for integrated logic, and wherein the sum of weight is 1.Generally In the case of, the weight of the second fraction of weight ratio of the first fraction is high.
309th, the classification results of text message are drawn according to composite score.
In the embodiment of the present invention, first determine whether the preset score threshold scope residing for composite score, draw judging result, so The classification results of text message are drawn according to judging result afterwards.
Wherein, positive emotion, negative sense emotion and neutral emotion correspond to a preset score threshold scope, the number of scope respectively Value can be, the preset score threshold scope of positive emotion is (- 100, -2), the preset score threshold scope of neutral emotion is【- 2,2】, positive emotion preset score threshold scope be (2,100), this preset score threshold scope can also be according to specific feelings Condition is adjusted, and the preset score threshold scope of all kinds of emotions can also take other values, not limit herein specifically.
It should be noted that the system may determine that the preset score threshold scope residing for composite score, judgement knot is drawn Then fruit draws the classification results of text message according to judging result, can also obtain text message according to other determination methods Classification results, such as judge composite score and preset emotion fraction which be closer, which kind of method of concrete application draws text The classification results of this information, do not limit specifically herein.
Wherein preset emotion fraction can be that positive emotion is 2 points, and neutral emotion is 0 point, and negative sense emotion is -2 points, tool Body emotion fractional value can be adjusted according to practical situations, not limited herein specifically.
It should be noted that when that cannot determine the language environment of text message according to preset text rule, it is available Custom logic to each weight distribution, wherein, custom logic is the logic that is inputted by parameter configuration port of user.
Only it is negative sense on a small quantity if for example, it is all positive that emotion text is most of in a business scenario, then It can be negative sense by inputting custom logic as all graders, text results are considered to bear to improve classifying quality Purpose.
It should be noted that when that cannot determine the language environment of text message according to preset text rule, except can To install custom logic input by user to each carry out weight distribution, there are other distribution methods, such as directly weight is divided With mean allocation is configured to, which kind of specifically used distribution method does not limit specifically herein.
Wherein, it is equally assigned into and configures 0.5 weight to the first fraction, 0.5 weight is also configured to the second fraction, wherein, most Fraction * the first weight the+the second fraction the second weights of * of whole fraction=first, both the fraction * 0.5+ second of final score=first divide Number * 0.5.Wherein, the sum of weight is 1.If there is multiple weights, then ÷ weight numbers of each weight=1.
In the embodiment of the present invention, text message is obtained first;Word segmentation processing is carried out by the text and obtains the first participle;It will The first participle inserts preset emotion score counter and the first fraction is calculated;Word segmentation processing is carried out by the text and obtains second Participle;Second participle is inserted into preset training pattern, the second fraction is calculated;Text message is obtained using sensibility classification method The 3rd fraction.When determining the language environment of text message according to preset text rule, using preset integrated logic to One fraction and the second fraction and the 3rd fraction carry out weight distribution, and text envelope is drawn according to the weight that preset integrated logic is distributed The composite score of breath draws the classification results of text message according to composite score.The embodiment of the present invention utilizes a kind of serialization Sensibility classification method carries out weight distribution in conjunction with language environment to the fraction that algorithms of different is drawn, improves text classification Accuracy.
For ease of understanding, the present embodiment is described with reference to specific application scenarios:
Scene 1, system obtain text message, " it is small it is beautiful it is current examine it is fine, her father know this message it Afterwards, it is glad to have no means of putting into words.”
System carries out word segmentation processing to text above, obtains " taking an examination ", " " fine ", " happiness ", " having no means of putting into words " this four Then this four words are put into emotion and obtained in minute counter and searched, have found " fine ", " happiness " two words by word, corresponding Fraction be 3 points and 4 points respectively.So the first fraction is, 3+4=7 points.Then to " examination ", " fine ", " happiness ", " nothing Method is described " this four words are screened, draw " examination ", " having no means of putting into words " the two words, then insert the two words preset Training pattern is converted into corresponding numerical value vector, be calculated the nearest preset scores vector of the two numerical value vector distances it Afterwards, fraction corresponding with corresponding two preset scores vectors is obtained, the numerical value vector corresponding with " having no means of putting into words " is corresponding Fraction is -2 points, and the vectorial corresponding fraction of the numerical value corresponding with " examination " is 0 point, and the second fraction is calculated as -2 points.Root The language environment of the text is obtained according to memory language environment masterplate analysis " examination ", " fine " the two words, belongs to and narrates text, Narrating the logic corresponding to text is, the first fraction respective weights are 0.7, and the second fraction respective weights are 0.3, draws comprehensive point Number is 7*0.7+ (- 2) * 0.3=4.3 points.The preset score threshold scope of positive emotion is (- 100, -1), neutral emotion it is pre- Putting score threshold scope is【- 1,1】, positive emotion preset score threshold scope be (1,100), 4.3 points in positive emotion In the range of preset score threshold, so the text is classified as the text of positive emotion by system.
Scene 2, system obtain text message, " a few days ago go to see with friend《Crazy Stone》This ability film, originally with It is very boring for meeting, but this film really allows me unexpected.”
System carries out word segmentation processing to text above, obtains " friend ", " boring ", " film ", " unexpected " this four Then this four words are put into emotion and obtained in minute counter and searched by word, have found " boring " word, corresponding fraction difference It is -3 points.So the first fraction is -2 points.Then " friend ", " boring ", " film ", " unexpected " this four words are sieved Choosing, draws " friend ", " film ", " unexpected " these three words, these three words then is inserted preset training pattern, are converted into Corresponding numerical value vector is calculated after the nearest preset scores vector of these three numerical value vector distances, obtain with it is corresponding The corresponding fraction of three preset scores vectors, the corresponding fraction of corresponding with " friend " numerical value vector is 1 point, right with " film " The corresponding fraction of numerical value vector answered is 0 point, and the vectorial corresponding fraction of the numerical value corresponding with " unexpected " is 1 point, meter Calculation show that the second fraction is 2 points." film " is analyzed according to memory language environment masterplate, " unexpected " the two words obtain this article This language environment, the text belong to cinematographic field, and the logic corresponding to cinematographic field text is that the first fraction respective weights are 0.2, the second fraction respective weights are 0.8, draw composite score for -2*0.2+2*0.8=1.2 points, preset point of positive emotion Number threshold range is (- 100, -1), the preset score threshold scope of neutral emotion is【- 1,1】, positive emotion preset fraction threshold Be worth scope be (1,100), 1.2 points in the range of the preset score threshold of positive emotion, so the text is classified as forward direction by system The text of emotion.
The text message sorting technique in the embodiment of the present invention is described above, below in the embodiment of the present invention System be described, referring to Fig. 4, the system in the embodiment of the present invention includes:
First acquisition unit 401, for obtaining text message;
Second acquisition unit 402, for obtaining the first participle, the first participle is according to the first presetting rule to being obtained by first The text message that unit is got carries out word segmentation processing acquisition;
First inserts unit 403, for the first participle got by second acquisition unit to be inserted preset emotion score The first fraction is calculated in counter;
3rd acquiring unit 404, for obtaining the second participle, the second participle is according to the second presetting rule to there is first module The text message got carries out word segmentation processing acquisition;
Second inserts unit 405, and the second fraction is calculated for the second participle to be inserted preset training pattern;
First allocation unit 406, for when determining the language environment of text message according to preset text rule, utilizing Preset integrated logic carries out weight distribution to the first fraction and the second fraction;
Computing unit 407, the weight for being distributed according to integrated logic draw the composite score of text message;
Processing unit 408, the composite score for being drawn according to computing unit draw the classification results of text message.
In the embodiment of the present invention, first acquisition unit 401 obtains text message first;Second acquisition unit 402 passes through this Text carries out word segmentation processing and obtains the first participle;First, which inserts unit 403, inserts preset emotion by the first participle and obtains minute counter The first fraction is calculated;3rd acquiring unit 404 carries out word segmentation processing by the text and obtains the second participle;Second inserts list Second participle is inserted preset training pattern and the second fraction is calculated by member 405;When determining text according to preset text rule During the language environment of information, the first allocation unit 406 carries out weight using preset integrated logic to the first fraction and the second fraction Distribution, the weight that computing unit 407 is distributed according to integrated logic draw the composite score of text message;Processing unit 408 The composite score drawn according to computing unit draws the classification results of text message.The embodiment of the present invention utilizes a kind of feelings of serialization Feel sorting technique, weight distribution is carried out in conjunction with language environment to the fraction that algorithms of different is drawn, improves text classification Accuracy.
Referring to Fig. 5, another embodiment of system includes in the embodiment of the present invention:
First acquisition unit 501, for obtaining text message;
Second acquisition unit 502, for obtaining the first participle, the first participle is according to the first presetting rule to being obtained by first The text message that unit is got carries out word segmentation processing acquisition;
First inserts unit 503, for the first participle got by second acquisition unit to be inserted preset emotion score The first fraction is calculated in counter;
Wherein, first insert unit 503 and include:
Subelement 5031 is searched, for searching whether that there are the first son participle, the first son participles in preset sentiment dictionary It is contained in the first participle;
Extract subelement 5032, for when searched by searching for unit there is the first son participle when, then extract existing for the One son segments corresponding fraction, has the correspondence of the first son participle and fraction in preset emotion dictionary;
First computation subunit 5033, for obtaining the minute counter fraction corresponding to the first son participle according to preset emotion It carries out that the first fraction is calculated.
3rd acquiring unit 504, for obtaining the second participle, the second participle is according to the second presetting rule to there is first module The text message got carries out word segmentation processing acquisition;
Second inserts unit 505, and the second fraction is calculated for the second participle to be inserted preset training pattern;
Wherein, second insert unit 505 and include:
Conversion subunit 5051, for the second son participle to be converted into numerical value vector, the second son according to preset training pattern Participle is contained in the second participle;
Second computation subunit 5052, for the vectorial spacing between preset scores vector of evaluation;
Determination subelement 5053 segments for the preset scores vector nearest with numerical value vector distance to be determined as the second son Fraction;
3rd computation subunit 5054 is added to obtain the second fraction for each second son to be segmented corresponding fraction.
4th acquiring unit 506, for obtaining the 3rd fraction of text message, emotional semantic classification side using sensibility classification method Method is according to the configured method of language environment variation.
Second allocation unit 507, for being carried out using preset integrated logic to the first fraction, the second fraction and the 3rd fraction Weight distribution.
Computing unit 508, the weight for being distributed according to integrated logic draw the composite score of text message;
Processing unit 509, the composite score for being drawn according to computing unit draw the classification results of text message.
Wherein, processing unit 509 includes:
Second determination subelement 5091 for determining the preset score threshold scope residing for composite score, draws judgement knot Fruit;
Subelement 5092 is handled, for drawing text classification result according to judging result.
In the embodiment of the present invention, first acquisition unit 501 obtains text message first;Second acquisition unit 502 passes through this Text carries out word segmentation processing and obtains the first participle;First, which inserts unit 503, inserts preset emotion by the first participle and obtains minute counter The first fraction is calculated;3rd acquiring unit 504 carries out word segmentation processing by the text and obtains the second participle;Second inserts list Second participle is inserted preset training pattern and the second fraction is calculated by member 505;4th acquiring unit 506 utilizes emotional semantic classification side Method obtains the 3rd fraction of text message.When determining the language environment of text message according to preset text rule, second point Weight distribution, computing unit are carried out to the first fraction, the second fraction and the 3rd fraction using preset integrated logic with unit 507 508 weights distributed according to preset integrated logic draw the composite score of text message, and processing unit 509 is according to composite score Draw the classification results of text message.The embodiment of the present invention utilizes a kind of sensibility classification method of serialization, and algorithms of different is obtained Fraction out carries out weight distribution in conjunction with language environment, improves the accuracy of text classification.
Referring to Fig. 6, Fig. 6 is a kind of server architecture schematic diagram provided in an embodiment of the present invention, which can be because Configuration or performance are different and generate bigger difference, can include one or more central processing units (central Processing units, CPU) 622 (for example, one or more processors) and memory 632, one or more Store the storage medium 630 (such as one or more mass memory units) of application program 642 or data 644.Wherein, deposit Reservoir 632 and storage medium 630 can be of short duration storage or persistent storage.Being stored in the program of storage medium 630 can include One or more modules (diagram does not mark), each module can include operating the series of instructions in server.More Further, central processing unit 622 could be provided as communicating with storage medium 630, and storage medium is performed on server 600 Series of instructions operation in 630.
Server 600 can also include one or more power supplys 626, one or more wired or wireless networks Interface 650, one or more input/output interfaces 658 and/or, one or more operating systems 641, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
It can be based on the server architecture shown in the Fig. 6 as the step performed by server in above-described embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit may be referred to the corresponding process in preceding method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, apparatus and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit Division is only a kind of division of logic function, can there is other dividing mode, such as multiple units or component in actual implementation It may be combined or can be integrated into another system or some features can be ignored or does not perform.It is another, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be the indirect coupling by some interfaces, device or unit It closes or communicates to connect, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical location, you can be located at a place or can also be distributed to multiple In network element.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also That unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list The form that hardware had both may be employed in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is independent production marketing or use When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially The part to contribute in other words to the prior art or all or part of the technical solution can be in the form of software products It embodies, which is stored in a storage medium, is used including some instructions so that a computer Equipment (can be personal computer, server or the network equipment etc.) performs the complete of each embodiment the method for the present invention Portion or part steps.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to before Embodiment is stated the present invention is described in detail, it will be understood by those of ordinary skill in the art that:It still can be to preceding The technical solution recorded in each embodiment is stated to modify or carry out equivalent substitution to which part technical characteristic;And these Modification is replaced, and the essence of appropriate technical solution is not made to depart from the spirit and scope of various embodiments of the present invention technical solution.

Claims (10)

1. a kind of text message sorting technique, which is characterized in that including:
Obtain text message;
The first participle is obtained, the first participle carries out word segmentation processing acquisition according to the first presetting rule to the text message;
The first participle is inserted into preset emotion score counter, the first fraction is calculated;
The second participle is obtained, second participle carries out Screening Treatment acquisition according to the second presetting rule to the first participle;
Described second participle is inserted into preset training pattern, the second fraction is calculated;
Weight distribution is carried out to first fraction and second fraction using preset integrated logic;
The weight distributed according to the preset integrated logic draws the composite score of the text message;
The classification results of the text message are drawn according to the composite score.
2. text message sorting technique according to claim 1, which is characterized in that it is described the first participle is inserted it is pre- Putting emotion score counter the first fraction is calculated includes:
It is searched whether in the preset emotion obtains the preset sentiment dictionary being equipped in minute counter there are the first son participle, it is described First son participle is contained in the first participle, and it is corresponding with fraction to have the first son participle in the preset emotion dictionary Relation;
If there are the described first son participle, the corresponding fraction of the first son participle existing for extraction;
The minute counter fraction corresponding to the described first son participle is obtained according to the preset emotion and be calculated described the One fraction.
3. text message sorting technique according to claim 1, which is characterized in that it is described by described second participle insert it is pre- Putting training pattern the second fraction is calculated includes:
Described second son participle is converted by numerical value vector according to preset training pattern, the second son participle is contained in described the Two participles;
Calculate the vectorial spacing between preset scores vector of the numerical value;
Using the fraction corresponding to the preset scores vector nearest with the numerical value vector distance as point of the described second son participle Number;
Each second son is segmented corresponding fraction to be added to obtain second fraction.
4. text message sorting technique according to claim 1, which is characterized in that described to be drawn according to the composite score The classification results of the text message include:
Judge the preset score threshold scope residing for the composite score, draw judging result;
The classification results of the text message are drawn according to the judging result.
5. text message sorting technique according to any one of claim 1 to 4, which is characterized in that the acquisition text After information, the method further includes:
The 3rd fraction of the text message is obtained using sensibility classification method, the sensibility classification method is according to language environment The configured method of variation.
6. a kind of Text Classification System, which is characterized in that including:
First acquisition unit, for obtaining text message;
Second acquisition unit, for obtaining the first participle, the first participle is according to the first presetting rule to being obtained by described first The text message that unit is got is taken to carry out word segmentation processing acquisition;
First inserts unit, for the first participle got by the second acquisition unit to be inserted preset emotion score The first fraction is calculated in counter;
3rd acquiring unit, for obtaining the second participle, second participle is according to the second presetting rule to there is first list The text message that member is got carries out word segmentation processing acquisition;
Second inserts unit, and the second fraction is calculated for the described second participle to be inserted preset training pattern;
First allocation unit, for carrying out weight point to first fraction and second fraction using preset integrated logic Match somebody with somebody;
Computing unit, the weight for being distributed according to the integrated logic draw the composite score of the text message;
Processing unit, the composite score for being drawn according to the computing unit draw the classification knot of the text message Fruit.
7. system according to claim 6, which is characterized in that described first, which inserts unit, includes:
Subelement is searched, for searching whether that, there are the first son participle, the first son participle includes in preset sentiment dictionary In the first participle;
Subelement is extracted, for being searched when by searching for unit there are during the described first son participle, then described the existing for extraction One son segments corresponding fraction, has the correspondence of the first son participle and fraction in the preset emotion dictionary;
First computation subunit, for obtaining the minute counter fraction corresponding to the described first son participle according to the preset emotion It carries out that first fraction is calculated.
8. system according to claim 6, which is characterized in that described second, which inserts unit, includes:
Conversion subunit, for the described second son participle to be converted into numerical value vector, second son according to preset training pattern Participle is contained in second participle;
Second computation subunit, for calculating the vectorial spacing between preset scores vector of the numerical value;
First determination subelement, for using the fraction corresponding to the preset scores vector nearest with the numerical value vector distance as The fraction of the second son participle;
3rd computation subunit is added to obtain second fraction for each second son to be segmented corresponding fraction.
9. system according to claim 6, which is characterized in that the processing unit includes:
Second determination subelement for judging the preset score threshold scope residing for the composite score, draws judging result;
Subelement is handled, for drawing the text classification result according to the judging result.
10. the system according to any one of claim 6 to 8, which is characterized in that the system also includes:
4th acquiring unit, for obtaining the 3rd fraction of the text message, the emotional semantic classification using sensibility classification method Method is according to the configured method of language environment variation.
CN201611044117.5A 2016-11-23 2016-11-23 Text information classification method and system Active CN108090040B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611044117.5A CN108090040B (en) 2016-11-23 2016-11-23 Text information classification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611044117.5A CN108090040B (en) 2016-11-23 2016-11-23 Text information classification method and system

Publications (2)

Publication Number Publication Date
CN108090040A true CN108090040A (en) 2018-05-29
CN108090040B CN108090040B (en) 2021-08-17

Family

ID=62170951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611044117.5A Active CN108090040B (en) 2016-11-23 2016-11-23 Text information classification method and system

Country Status (1)

Country Link
CN (1) CN108090040B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829167A (en) * 2019-02-22 2019-05-31 维沃移动通信有限公司 A kind of participle processing method and mobile terminal
CN110046342A (en) * 2019-02-19 2019-07-23 阿里巴巴集团控股有限公司 A kind of text quality's detection method
CN110472242A (en) * 2019-08-05 2019-11-19 腾讯科技(深圳)有限公司 A kind of text handling method, device and computer readable storage medium
WO2020082612A1 (en) * 2018-10-22 2020-04-30 平安科技(深圳)有限公司 Method and apparatus for sentiment analysis on security research report using big data, and computer device

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023986A (en) * 2009-09-22 2011-04-20 日电(中国)有限公司 Method and equipment for constructing text classifier by referencing external knowledge
US8364618B1 (en) * 2003-11-14 2013-01-29 Google Inc. Large scale machine learning systems and methods
CN102929861A (en) * 2012-10-22 2013-02-13 杭州东信北邮信息技术有限公司 Method and system for calculating text emotion index
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN103927302A (en) * 2013-01-10 2014-07-16 阿里巴巴集团控股有限公司 Text classification method and system
CN104008091A (en) * 2014-05-26 2014-08-27 上海大学 Sentiment value based web text sentiment analysis method
CN104392006A (en) * 2014-12-17 2015-03-04 中国农业银行股份有限公司 Event query processing method and device
US20150073774A1 (en) * 2013-09-11 2015-03-12 Avaya Inc. Automatic Domain Sentiment Expansion
CN104462409A (en) * 2014-12-12 2015-03-25 重庆理工大学 Cross-language emotional resource data identification method based on AdaBoost
CN104951548A (en) * 2015-06-24 2015-09-30 烟台中科网络技术研究所 Method and system for calculating negative public opinion index
CN105260356A (en) * 2015-10-10 2016-01-20 西安交通大学 Chinese interactive text emotion and topic identification method based on multitask learning
CN105653649A (en) * 2015-12-28 2016-06-08 福建亿榕信息技术有限公司 Identification method and device of low-proportion information in mass texts
CN105740228A (en) * 2016-01-25 2016-07-06 云南大学 Internet public opinion analysis method
CN106096623A (en) * 2016-05-25 2016-11-09 中山大学 A kind of crime identifies and Forecasting Methodology

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8364618B1 (en) * 2003-11-14 2013-01-29 Google Inc. Large scale machine learning systems and methods
CN102023986A (en) * 2009-09-22 2011-04-20 日电(中国)有限公司 Method and equipment for constructing text classifier by referencing external knowledge
CN102929861A (en) * 2012-10-22 2013-02-13 杭州东信北邮信息技术有限公司 Method and system for calculating text emotion index
CN103927302A (en) * 2013-01-10 2014-07-16 阿里巴巴集团控股有限公司 Text classification method and system
US20150073774A1 (en) * 2013-09-11 2015-03-12 Avaya Inc. Automatic Domain Sentiment Expansion
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN104008091A (en) * 2014-05-26 2014-08-27 上海大学 Sentiment value based web text sentiment analysis method
CN104462409A (en) * 2014-12-12 2015-03-25 重庆理工大学 Cross-language emotional resource data identification method based on AdaBoost
CN104392006A (en) * 2014-12-17 2015-03-04 中国农业银行股份有限公司 Event query processing method and device
CN104951548A (en) * 2015-06-24 2015-09-30 烟台中科网络技术研究所 Method and system for calculating negative public opinion index
CN105260356A (en) * 2015-10-10 2016-01-20 西安交通大学 Chinese interactive text emotion and topic identification method based on multitask learning
CN105653649A (en) * 2015-12-28 2016-06-08 福建亿榕信息技术有限公司 Identification method and device of low-proportion information in mass texts
CN105740228A (en) * 2016-01-25 2016-07-06 云南大学 Internet public opinion analysis method
CN106096623A (en) * 2016-05-25 2016-11-09 中山大学 A kind of crime identifies and Forecasting Methodology

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BAYU YUDHA PRATAMA 等: "Personality Classification Based on Twitter Text Using Naive Bayes, KNN and SVM", 《2015 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING》 *
祝翠玲: "基于类别结构的文本层次分类方法研究", 《中国博士学位论文全文数据库 信息科技辑》 *
邓时滔: "中文文本情感倾向性分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020082612A1 (en) * 2018-10-22 2020-04-30 平安科技(深圳)有限公司 Method and apparatus for sentiment analysis on security research report using big data, and computer device
CN110046342A (en) * 2019-02-19 2019-07-23 阿里巴巴集团控股有限公司 A kind of text quality's detection method
CN109829167A (en) * 2019-02-22 2019-05-31 维沃移动通信有限公司 A kind of participle processing method and mobile terminal
CN109829167B (en) * 2019-02-22 2023-11-21 维沃移动通信有限公司 Word segmentation processing method and mobile terminal
CN110472242A (en) * 2019-08-05 2019-11-19 腾讯科技(深圳)有限公司 A kind of text handling method, device and computer readable storage medium

Also Published As

Publication number Publication date
CN108090040B (en) 2021-08-17

Similar Documents

Publication Publication Date Title
CN104462053B (en) A kind of personal pronoun reference resolution method based on semantic feature in text
CN107515877A (en) The generation method and device of sensitive theme word set
CN110209764A (en) The generation method and device of corpus labeling collection, electronic equipment, storage medium
CN108052505A (en) Text emotion analysis method and device, storage medium, terminal
CN108021691A (en) Answer lookup method, customer service robot and computer-readable recording medium
CN107391760A (en) User interest recognition methods, device and computer-readable recording medium
CN107330904A (en) Image processing method, image processing device, electronic equipment and storage medium
CN108090040A (en) A kind of text message sorting technique and system
CN103294817A (en) Text feature extraction method based on categorical distribution probability
WO2019179010A1 (en) Data set acquisition method, classification method and device, apparatus, and storage medium
CN107807914A (en) Recognition methods, object classification method and the data handling system of Sentiment orientation
CN109241297B (en) Content classification and aggregation method, electronic equipment, storage medium and engine
CN107145516A (en) A kind of Text Clustering Method and system
CN110309308A (en) Text information classification method and device and electronic equipment
CN110458296A (en) The labeling method and device of object event, storage medium and electronic device
CN105912525A (en) Sentiment classification method for semi-supervised learning based on theme characteristics
CN111274390B (en) Emotion cause determining method and device based on dialogue data
CN103324758A (en) News classifying method and system
Nguyen et al. An ensemble of shallow and deep learning algorithms for Vietnamese sentiment analysis
CN112580555A (en) Spontaneous micro-expression recognition method
CN110019563B (en) Portrait modeling method and device based on multi-dimensional data
CN108268461A (en) A kind of document sorting apparatus based on hybrid classifer
CN110334180A (en) A kind of mobile application security appraisal procedure based on comment data
CN105337842B (en) A kind of rubbish mail filtering method unrelated with content
CN113204643A (en) Entity alignment method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100080 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant