CN109684467A - A kind of classification method and device of text - Google Patents
A kind of classification method and device of text Download PDFInfo
- Publication number
- CN109684467A CN109684467A CN201811368735.4A CN201811368735A CN109684467A CN 109684467 A CN109684467 A CN 109684467A CN 201811368735 A CN201811368735 A CN 201811368735A CN 109684467 A CN109684467 A CN 109684467A
- Authority
- CN
- China
- Prior art keywords
- entry
- text
- sorted
- classification
- default dictionary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 230000015654 memory Effects 0.000 claims description 21
- 238000003860 storage Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 12
- 238000012216 screening Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 abstract description 8
- 238000013145 classification model Methods 0.000 description 32
- 238000012512 characterization method Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 6
- 239000000047 product Substances 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 238000012706 support-vector machine Methods 0.000 description 4
- 230000033228 biological regulation Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000007115 recruitment Effects 0.000 description 3
- 208000012931 Urologic disease Diseases 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 208000014001 urinary system disease Diseases 0.000 description 2
- 241000239290 Araneae Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000009123 feedback regulation Effects 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of classification method of text and devices, comprising: obtains text to be sorted;Multiple entries similar with the text to be sorted are selected from default dictionary, wherein classification belonging to multiple entries and each entry is stored in the default dictionary, the entry belongs to the multiple entry;According to the default dictionary, classification belonging to each entry in the multiple entry is determined;According to classification belonging to each entry in the multiple entry, target category is determined, and using the target category as classification belonging to the text to be sorted.The Error Text error-correcting effect that the present invention solves the prior art is poor, can not correctly be classified to text to be sorted, the low problem of classification accuracy.
Description
Technical field
The present invention relates to field of computer technology more particularly to the classification methods and device of a kind of text.
Background technique
With the development of society and epoch, the work and life of people is increasingly dependent on internet, can be with by internet
Inquiry data buys commodity, launches advertisement etc..But it is current to interconnect user on the network's production and the natural text retrieved daily exponentially
The speed of grade increases.Information overload is easy to appear when by search engine retrieving content for the numerous and jumbled content on network
Situation, it is therefore desirable to classify to text information.Meanwhile text classification can help business department to carry out flow analysis, interior
Hold audit, building user/product portrait, precisely recommend, keyword expands cluster, CTR is estimated etc., there is extremely important meaning.
But there are a large amount of local error text in the amount of text of magnanimity, current file classification method is to this
Usually there is relatively large deviation in the semantic understanding that class has the text of mistake, text error-correcting effect is poor, therefore in text classification
When usually can not correctly be classified to text to be sorted, classification accuracy it is low.
Summary of the invention
In view of the above problems, the invention proposes a kind of classification method of text and devices, solve the mistake of the prior art
Accidentally text error-correcting effect is poor, can not correctly be classified to text to be sorted, the low problem of classification accuracy.
In a first aspect, the application is provided the following technical solutions by the embodiment of the application:
A kind of classification method of text, comprising: obtain text to be sorted;It is selected from default dictionary and the text to be sorted
This similar multiple entry, wherein class belonging to multiple entries and each entry is stored in the default dictionary
Not, the entry belongs to the multiple entry;According to the default dictionary, each of the multiple entry is determined
Classification belonging to entry;According to classification belonging to each entry in the multiple entry, target class is determined
Not, and using the target category as classification belonging to the text to be sorted.
It is preferably, described to select multiple entries similar with the text to be sorted from default dictionary, comprising:
Successively calculate the editing distance of each entry in the text to be sorted and the default dictionary;It will be in the default dictionary
The entry that the editing distance is less than pre-determined distance is determined as the entry.
Preferably, the classification according to belonging to each entry, determines target category, comprising: according to affiliated
Classification difference, the multiple entry is grouped, obtain Q group entry, wherein positioned at same group of entry
Affiliated classification is all the same, and Q is positive integer;Select one group of most entry of number of entries from the Q group entry, and by the group
Classification belonging to entry is as the target category.
Preferably, described before selecting multiple entries similar with the text to be sorted in default dictionary, also
It include: the matching entry corresponding with the text to be sorted in the default dictionary according to the text to be sorted;If matching
Failure then executes and described selects multiple entries similar with the text to be sorted from default dictionary.
Preferably, described according to the text to be sorted, matching and the text pair to be sorted in the default dictionary
It the step of entry answered, specifically includes: according to the text to be sorted, being searched and the text to be sorted in the default dictionary
This identical entry.
Second aspect, based on the same inventive concept, the application are provided the following technical solutions by the embodiment of the application:
A kind of sorter of text characterized by comprising receiving module, for obtaining text to be sorted;Screen mould
Block, for selecting multiple entries similar with the text to be sorted from default dictionary, wherein in the default dictionary
It is stored with classification belonging to multiple entries and each entry, the entry belongs to the multiple entry;First determines mould
Block, for determining classification belonging to each entry in the multiple entry according to the default dictionary;Second really
Cover half block determines target category for the classification according to belonging to each entry in the multiple entry, and by institute
Target category is stated as classification belonging to the text to be sorted.
Preferably, screening module also particularly useful for: successively calculate the text to be sorted with it is every in the default dictionary
The editing distance of a entry;The entry that the editing distance in the default dictionary is less than pre-determined distance is determined as the mesh
Mark entry.
Preferably, second determining module also particularly useful for: according to the difference of affiliated classification, to the multiple target
Entry is grouped, and obtains Q group entry, wherein the classification belonging to same group of the entry is all the same, and Q is positive integer;
One group of most entry of number of entries is selected from the Q group entry, and using classification belonging to this group of entry as the target
Classification.
Preferably, further include matching module, for it is described selected from default dictionary it is similar to the text to be sorted
Multiple entries before, according to the text to be sorted, matching and the text pair to be sorted in the default dictionary
The entry answered;If it fails to match, executes and described select multiple mesh similar with the text to be sorted from default dictionary
Mark entry.
Preferably, matching module, also particularly useful for: according to the text to be sorted, searched in the default dictionary with
The identical entry of the text to be sorted.
The third aspect, based on the same inventive concept, the application are provided the following technical solutions by the embodiment of the application:
A kind of user terminal, including processor and memory, the memory are couple to the processor, the memory
Store instruction makes the user terminal execute side described in any one of first aspect when executed by the processor
The step of method.
Fourth aspect, based on the same inventive concept, the application are provided the following technical solutions by the embodiment of the application:
A kind of computer readable storage medium, is stored thereon with computer program, realization when which is executed by processor
The step of any one of first aspect the method.
In the classification method and device of text provided in an embodiment of the present invention, after obtaining text to be sorted, from default word
Multiple entries similar with the text to be sorted are selected in library, realize fuzzy matching.Wherein, it is deposited in the default dictionary
Classification belonging to multiple entries and each entry is contained, the entry belongs to the multiple entry;It is by the step
Making text to be sorted, there are mistakes, also can be improved and find entry corresponding with the text to be sorted in default dictionary
Probability, avoid the case where can not classifying.Then it according to the default dictionary, determines in the multiple entry
Classification belonging to each entry;The finally classification according to belonging to each entry in the multiple entry, really
Set the goal classification, and using the target category as classification belonging to the text to be sorted, wherein belonging to text to be sorted
Classification is to be determined by entry each in multiple entries, rather than single entry determines, therefore text to be sorted
The determination of affiliated classification is more accurate.To sum up, the present invention solve the prior art Error Text error-correcting effect it is poor, can not
Correctly classified to text to be sorted, the low problem of classification accuracy.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of the classification method of text of first embodiment of the invention offer;
Fig. 2 shows the specific flow charts of the step S40 in Fig. 1;
Fig. 3 shows a kind of flow chart of the classification method of text of second embodiment of the invention offer;
Fig. 4 shows a kind of flow chart of the classification method of text of third embodiment of the invention offer;
Fig. 5 shows a kind of functional block diagram of the sorter of text of fourth embodiment of the invention offer;
Fig. 6 shows a kind of module frame chart of user terminal of fifth embodiment of the invention offer.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
A kind of classification method of the text provided in the present invention, can be to text and there are the texts to be sorted of erroneous words
Fuzzy matching is carried out, and comprehensive determination is carried out using the generic that multiple matching results treat classifying text, ensure that point
The accuracy of class result.It explains and says below in conjunction with classification method of multiple embodiments to text provided by the invention
It is bright.Application scenarios of the invention include but is not limited to: trade classification article, violated advertisement identification, keyword relevancies analysis, net
Page label, user's portrait, advertising display, shunting information etc..
First embodiment
Fig. 1 is please referred to, a kind of classification method of text is provided in the present embodiment, Fig. 1 shows the method stream of the present embodiment
Cheng Tu below will be described in detail a step in the present embodiment.Specific step is as follows:
Step S10: text to be sorted is obtained.
Step S20: multiple entries similar with the text to be sorted are selected from default dictionary, wherein described
It is stored with classification belonging to multiple entries and each entry in default dictionary, the entry belongs to the multiple entry.
Step S30: according to the default dictionary, class belonging to each entry in the multiple entry is determined
Not.
Step S40: according to classification belonging to each entry in the multiple entry, determining target category,
And using the target category as classification belonging to the text to be sorted.
In step slo, text to be sorted is the text information classified, the text to be sorted can include: is appointed
The text of meaning, phrase, sentence, dialecticism, number, character string etc..Wherein, the language form of text is with no restrictions, it may include:
Chinese, English, German, Russian etc., with no restriction.It should be noted that existing and emerging language or for remembering from now on
Information carrying informative text can be used as text to be sorted.The concrete form of text to be sorted can be by arbitrary text, phrase, language
Article, news, event bulletin, webpage (content of pages) of the compositions such as sentence, dialecticism, number, character string etc..
Text to be sorted with specific reference to example, it is as follows:
" Ancient Greece and Rome mythology Chinese ppt ", " Jingdone district app kinds of goods lower right corner advertisement ", " QQ space two dimensional code ",
" 300450 artificial intelligence ", " dream about others and send mobile phone ", " my world is automatically repaired bug plug-in unit ", " Tsinghua University ", " Ah
Dam ", " urologic disease " etc..
Based on above-mentioned text to be sorted, it is to be understood that since a large amount of texts to be sorted of internet are artificial record
The text entered, for example generated by the mistake etc. that personal habits and input method are keyed in, there may be mistakes in text to be sorted
Entry.
Text to be sorted containing wrong article is such as: " electric sound paradise " (correct are as follows: " film paradise ");" 360 are safely
Four " (correct are as follows: " 360 security guard ");" my world is automatically repaired bug plug-in unit " is (correct are as follows: " my world is automatically repaired bug
Plug-in unit ");" in emerging security " (correct are as follows: " CITIC Securities ") etc..
In step S20, wherein default dictionary is machine or the artificial dictionary classified or marked, it can also be existing
General dictionary.For example, to introduce following three classes dictionary in the present embodiment, but not limited to this:
1, the first default dictionary, wherein be stored with class belonging to each entry in M entry and the M entry
Not, classification belonging to each entry is marked by manual type in the M entry, and M is positive integer.First default dictionary is
It artificially collects and there are following situations: comprising nonstandardized technique term in the dictionary;All new network is being generated daily on internet
Term, this part word/sentence are to be not present in existing dictionary, therefore need to carry out handmarking and classify;With society
It continues to develop, existing dictionary is endowed new understanding and meaning;These three types of situations are required to carry out artificial judgment and mark
Note, guarantees the growth and accuracy of dictionary.Specifically such as:
The default dictionary example of table 1: the first
Word or sentence in first dictionary are not present in existing (internet) dictionary, existing Internet dictionary example
Such as: in Chinese voluminous dictionary, the online dictionary reference book of Baidu or relevant classifieds website.But such word or sentence are interconnection
The higher word of network users frequency of usage, therefore such word can be collected to and is labeled to it affiliated classification, form people
The dictionary (the first default dictionary) of work label.
2, the second default dictionary, wherein be stored with class belonging to each entry in N number of entry and N number of entry
Not, the described second default dictionary is provided by default industry dictionary website, and N is positive integer.Default industrial sustainability includes collecting to have respectively
The website of a industry vocabulary, industry vocabulary are related practitioner or public acceptance or well known vocabulary in the sector.Such as:
Electric business class relative words, general amusement class relative words, hand swim class relative words, PC/ software class relative words, educational related term
It converges, financial class relative words etc..The website for providing above-mentioned industry dictionary includes but is not limited to: Baidu's industry dictionary, search dog row
Industry dictionary, the dictionary in Baidu's roll of the hour, 5118.com industry dictionary etc..Specific example is as follows:
The default dictionary example of table 2: the second
Construct the second default dictionary concrete mode can by purchase and web crawlers (be otherwise known as webpage spider,
Network robot) mode crawled.
3, third presets dictionary, wherein is stored with class belonging to each entry in P entry and the P entry
Not, the third is preset dictionary and is provided by regulation engine, and P is positive integer.Regulation engine in the present embodiment has been determined for providing
The related entry of the business rule of justice, such as: place name, school's name, stock code, special proprietary digital word stock, medical correlation etc..
Table 3: third presets dictionary
Entry | Generic |
Recruitment | Recruitment | recruitment |
Security personnel | Safe security | security personnel's security |
Sports lottery ticket | Lottery ticket | welfare lottery ticket |
B2B | E-commerce | B2B |
Tidy street | E-commerce | vertical B2C |
Producer's supply | E-commerce | other |
Tsinghua University | Educational training | the education with record of formal schooling |
Bengbu College | Educational training | the education with record of formal schooling |
Aba | Tourism | other |
900953 | Financial service | equity fund |
Urologic disease | Medical treatment & health | andrology |
360 | IT product | software |
263 | It is social | other |
… | … |
It should be noted that table 1- table 3 is merely illustrative, content therein be it is schematical, not to of the invention
Protection scope is construed as limiting.
Presetting the entry in dictionary in the first above-mentioned default dictionary, the second default dictionary and third can belong to simultaneously
Multiple classifications.The same entry is allowed to exist simultaneously in three dictionaries, and the entry can belong to difference in different dictionaries
Classification.Any dictionary in above-mentioned dictionary can be used to carry out step S20 as default dictionary in the present embodiment.
In step S20, multiple entries similar with the text to be sorted are selected from default dictionary, are implemented
Concrete mode can are as follows:
Firstly, successively calculating the editing distance of each entry in the text to be sorted and the default dictionary.Wherein,
Editing distance is the quantization measurement for the difference degree of two character strings (for example, Chinese word, English words), and measurement mode is to see
Another character string could be become for a character string by least needing the processing of how many times.
Then, the entry that the editing distance in the default dictionary is less than and (also can use and be equal to) pre-determined distance is determined
For the entry.Wherein, pre-determined distance can customize setting, and for example, 1,2,3 etc.;Pre-determined distance can also pass through step
S40 carries out feedback regulation, such as when the classification belonging to the text to be sorted obtained in step S40 contains multiple not accurate enough,
Pre-determined distance can suitably be reduced.
In step s 30, it according to the default dictionary, determines belonging to each entry in the multiple entry
Classification.Since each entry is selected in default dictionary, the entry is corresponding to have affiliated classification.
In step s 40, the classification according to belonging to each entry in the multiple entry, determines target class
Not, and using the target category as classification belonging to the text to be sorted.Referring to figure 2., target class is determined in the step
Other specific implementation may include following steps:
Step S41: according to the difference of affiliated classification, being grouped the multiple entry, obtains Q group entry,
Wherein, classification belonging to the entry positioned at same group is all the same, and Q is positive integer.
Step S42: one group of most entry of number of entries is selected from the Q group entry, and will be belonging to this group of entry
Classification is as the target category.
The target category that step S41 and step S42 is determined is as classification belonging to text to be sorted.
During specific classification, it can directly adopt and close on sorting algorithm (KNN, K-NearestNeighbor) progress
Specific implementation.
It should be understood that
If in step S42 in Q group entry there are the most group of number of entries be it is two or more when.In the present embodiment
Following two processing mode is provided with alternative steps S42:
Alternative steps 1 select that number of entries is most or the entry of preceding S group from the Q group entry, and the multiple groups that will be selected
Classification belonging to entry is as the target category, and wherein S is the positive integer more than or equal to 2.
If when identical and most there are multiple groups number of entries in alternative steps 2, Q group entry, feedback adjustment pre-determined distance.
Can the pre-determined distance be reduced or be increased.Until obtaining target category.
In order in the present embodiment, the classification accuracy for guaranteeing text to be sorted while fuzzy matching realized, in step
Before rapid S20, can also following steps be carried out:
According to the text to be sorted, the matching entry corresponding with the text to be sorted in the default dictionary;Its
In, match concrete mode are as follows: according to the text to be sorted, search in the default dictionary identical as the text to be sorted
Entry, i.e., 100% identical matching.
If it fails to match, executes and described select multiple targets similar with the text to be sorted from default dictionary
Entry.
It, can be directly using the generic of the correspondence entry of successful match as belonging to text to be sorted if successful match
Classification.
In order to which the scheme to the present embodiment more easily understands, following example is please referred to:
Execute step S10, the text to be sorted of acquisition are as follows: " under Baidu ".
Firstly, matching whether there is and entry identical " under Baidu " in default dictionary.If there is no (it fails to match),
Executable step S20, by taking pre-determined distance 2 as an example (that is: editing distance is less than or equal to 2).It is matched, is obtained in default dictionary
The entry as, as follows:
Table 4
Entry and generic in table 4 is exemplary illustration, is not limited the scope of the invention, in reality
Border implement the present invention during can from there are different in table 4.
Executing step S30 can determine the generic of entry.
Then, step S40 is executed, 3 groups can be divided into the entry in table 4 according to generic, wherein entry number
The most corresponding classification of a group of amount is " search engine ", number of entries 3.It then can will be belonging to text to be sorted " under Baidu "
Classification be determined as " search engine ".
In the classification method and device of text provided in this embodiment, after obtaining text to be sorted, from default dictionary
Multiple entries similar with the text to be sorted are selected, realize fuzzy matching.Wherein, it is stored in the default dictionary
Classification belonging to multiple entries and each entry, the entry belong to the multiple entry;Even if by the step to
There are mistakes for classifying text, also can be improved and find the general of entry corresponding with the text to be sorted in default dictionary
Rate avoids the case where can not classifying.Then according to the default dictionary, each of the multiple entry is determined
Classification belonging to entry;The finally classification according to belonging to each entry in the multiple entry, determines mesh
Classification is marked, and using the target category as classification belonging to the text to be sorted, wherein classification belonging to text to be sorted
It is to be determined by entry each in multiple entries, rather than single entry determines, therefore belonging to text to be sorted
Classification determination it is more accurate.Therefore in conclusion the present invention Error Text error-correcting effect that solves the prior art is poor,
Can not correctly it be classified to text to be sorted, the low problem of classification accuracy.
Second embodiment
Referring to figure 3., a kind of classification method of text is also provided in the present embodiment based on the same inventive concept, it is real with first
It applies unlike example, steps are as follows for method execution in the present embodiment:
Step S201: text to be sorted is received.
Step S202: according to the text to be sorted, the matching mesh corresponding with the text to be sorted in default dictionary
Mark entry, wherein classification belonging to multiple entries and each entry is stored in the default dictionary.
Step S203: if successful match, classification belonging to the entry is determined as the text institute to be sorted
The classification of category.
Step S204: it if it fails to match, by the text input to be sorted into integrated classifier, obtains described wait divide
The generic of class text, wherein one or more textual classification model is provided in the integrated classifier.
Relative to first embodiment, step S201 is identical as step S10 in the present embodiment, but increase step S203 and
Step S204, while applying in step S202 the step S20- step S40 of first embodiment, i.e., in text to be sorted and the
One default dictionary, the second default dictionary, third preset dictionary matching when, used matching process may each comprise (or part wrap
Include) first embodiment provide method, i.e. the step S20- step S40 of first embodiment, if the Q group word in step s 40
The most group of number of entries is not unique in item, then can determine that classification is inaccurate, can continue to execute step S204.
In step S202, matched mode when matching corresponding entry in default dictionary can are as follows: searches pre-
If whether being stored with entry included by text to be sorted in dictionary;Preferably, can search in default dictionary whether be stored with
The identical entry of the text to be sorted.
For the matching in step S202, two kinds of results with step S203 and step S204:
If the successful match in step S203, prove there is target word corresponding with text to be sorted in presetting database
Classification belonging to the entry can be determined as classification belonging to the text to be sorted by item.Complete text to be sorted
Classification.
In step S204, if it fails to match, illustrates that there is no targets corresponding with text to be sorted in presetting database
Entry.At this point, obtaining the generic of the text to be sorted by the text input to be sorted into integrated classifier.?
Multiple textual classification models are provided in integrated classifier, further can according to the classification results of each textual classification model,
The classification for treating classifying text carries out the generic to text to be sorted of comprehensive descision, improves accuracy.
The present invention in default dictionary by carrying out matching corresponding entry, then again in the case where it fails to match
Classified by textual classification model, relative to directly by textual classification model with more high accuracy.Simultaneously integrated
Disaggregated model present in classifier has multiple (two or more), and multiple classification results can be obtained, avoid single text
It can not occur and correct classification results when the classification error of this disaggregated model.
A kind of concrete implementation mode is provided to the matching of step S202 in the present embodiment:
According to the text to be sorted, according to preset order, successively in multiple default dictionaries matching with it is described to be sorted
The corresponding entry of text, the entry stored in the multiple default dictionary are different.Wherein, preset order refer to it is multiple
Default dictionary matches sequencing when corresponding entry, can customize setting, with no restriction.
In this implementation, said so that the first default dictionary, the second default dictionary and third preset dictionary as an example
It is bright.When i.e. default dictionary is 3, matching order is followed successively by the first default dictionary, the second default dictionary and third and presets dictionary.
First default dictionary match the step of include:
Obtain the first default dictionary.
According to the text to be sorted, matching first word corresponding with the text to be sorted in the first default dictionary
Item.
If successful match, first entry is determined as the entry.
In the first default dictionary, if it fails to match, and continuation is matched in the second default dictionary, and matching step includes:
Obtain the second default dictionary.
According to the text to be sorted, matching second word corresponding with the text to be sorted in the second default dictionary
Item.
If second entry is determined as the entry by successful match in the second default dictionary.
It should be noted that the classification accuracy in order to guarantee text to be sorted.First is matched in the first default dictionary
When entry, and when matching the second entry in the second default dictionary, matched mode be can be used: search default dictionary (first
Default dictionary or the second default dictionary) in whether be stored with entry identical with the text to be sorted, guarantee text to be sorted
Sub-category accuracy.
For example:
If text to be sorted is " dream about others and send mobile phone ", corresponding identical the can be matched in the first default dictionary
One entry, it may be determined that the classification of text to be sorted are as follows: amusement and recreation and Constellation.
If text to be sorted be " Da Er it is excellent ", can be in the second default dictionary after it fails to match in the first default dictionary
It is matched, identical second entry can be matched to, determine the classification of text to be sorted are as follows: IT product and manufacturer computer.
In the second default dictionary, if it fails to match, and continuation is preset in dictionary in third to be matched, and matching step includes:
It obtains third and presets dictionary.
According to the text to be sorted, matching third word corresponding with the text to be sorted is preset in dictionary in third
Item.
If successful match, the third entry is determined as the entry.
If it fails to match, execute it is described by the text input to be sorted into integrated classifier, obtain it is described to point
The generic of class text.
Regulation engine also defines matched rule, and wherein whether matching rule includes: and search to be stored in default dictionary
There is entry identical with the third entry in text to be sorted in third entry;Third entry if it exists, then can will be to be sorted
The classification of text is determined as classification belonging to third entry.
For example:
For example, text to be sorted be " product in tidy street it is good ", the entry is in the first default dictionary, the second default dictionary
It can not successful match;So when third is preset dictionary and matched, the third entry that can be inquired is " tidy street ", due to
The classification of the third entry be " e-commerce, B2B ", then can by the classification of the entry to be sorted determine are as follows: e-commerce and
B2B。
When third, which is preset, there is multiple third entries with text matches to be sorted in dictionary, then statistics available third entry
Classification and classification quantity, guarantee classification objectivity and accuracy.
For example, text to be sorted be " product in tidy street be producer supply ", the entry is in the first default dictionary, second
It can not successful match in default dictionary;So when third is preset dictionary and matched, the third entry that can be inquired is " Chu
Chu Jie ", " producer's supply ", since the classification of the third entry is respectively " e-commerce, B2B " and " e-commerce, vertical
B2C ", then can determine the classification of the entry to be sorted are as follows: e-commerce.
If presetting in dictionary in third, it fails to match, by the text input to be sorted into integrated classifier, namely
Step S204.
In step S204, if it fails to match, by the text input to be sorted into integrated classifier, described in acquisition
The generic of text to be sorted, wherein one or more textual classification model is provided in the integrated classifier.
More preferably, the textual classification model in classifier be two and its more than, text classification mould in the present embodiment
The quantity of type is 3, and the structure of each model is different, specifically can include: is based on SVM (Support Vector
Machine, support vector machines) textual classification model;FastText model (one of Facebook AI Reserch open source
Term vector and text classification tool);Textual classification model etc. based on deep learning.Textual classification model can directly adopt existing
Common model be trained acquisition.
When being trained to the textual classification model in integrated classifier, any dictionary that can be used in default dictionary is made
For learning sample.More preferably mode, using the first default dictionary as learning sample.Since the first default dictionary is artificial mark
Note obtains, the semantic understanding of entry it is more accurate, it can be ensured that the classification of entry is correct.
In the present embodiment, the matching order that the first default dictionary, the second default dictionary and third preset dictionary is increased,
And it is extensive to avoid text classification so that the classification of text to be sorted is more accurate for the method for combining first embodiment to provide
The problem of.Classify in addition, increasing integrated classifier and treating classifying text, classification can not be determined by avoiding text to be sorted
Situation.To sum up, method provided in an embodiment of the present invention solve the prior art Error Text error-correcting effect it is poor, can not treat point
The text of class is correctly classified, the low problem of classification accuracy.
3rd embodiment
Referring to Fig. 4, additionally providing a kind of classification method of text based on the same inventive concept, in the present embodiment.It is described
The detailed process of method is as follows:
Step S301: text to be sorted is received.
Step S302: being input in integrated classifier using the text to be sorted as input data, to pass through the collection
Constituent class device classifies to the text to be sorted.
Step S303: if classification failure, by the text input to be sorted into search engine, to pass through described search
Engine scans for the text to be sorted, obtains search result.
Step S304: the input data is adjusted based on described search result, obtains input data adjusted.
Step S305: input data adjusted is input in the integrated classifier, to pass through the Ensemble classifier
Device classifies to the text to be sorted.
For second embodiment, when the step S301 in the present embodiment is implemented, step S201 execution can refer to.
When executing step S204, after text input to integrated classifier to be sorted, held according to step S302 to step S305
Row, until obtaining the generic of text to be sorted.
In step s 302, the integrated classifier is classified for treating classifying text, in integrated classifier
Multiple textual classification models can be integrated.Step progress in detail below can be used to the embodiment of step S302 in the present embodiment:
According to multiple textual classification models in the integrated classifier, the corresponding model of each textual classification model is obtained
As a result.
Wherein, T textual classification model is provided in integrated classifier, T is positive integer.Text point in integrated classifier
The quantity of class model is not construed as limiting, and can be more than or equal to two.In more preferably embodiment, the quantity of textual classification model is taken
Odd number, for example, 3,5 etc..The structure of each textual classification model is different, specifically can include: is based on SVM
The textual classification model of (Support Vector Machine, support vector machines);FastText model (Facebook AI
A term vector and text classification tool for Reserch open source);Textual classification model etc. based on deep learning.Text classification
Model can directly adopt existing common model and be trained acquisition, referring in particular to second embodiment.
Integrated classifier in the present embodiment is when treating classifying text and being classified, it may include following steps:
Receive the input data.Wherein, input data can be text to be sorted, be also possible to based on search result tune
Input data after whole.
Based on the input data, classified respectively to the text to be sorted by the T textual classification model,
Obtaining T category of model result, wherein the T category of model result and the T textual classification model correspond, and
Classification information comprising a characterization text generic to be sorted in each category of model result;That is integrated classifier
After receiving text input to be sorted, each textual classification model can correspond to obtain a model result, and model result is text
The output data of this disaggregated model.
According to the T category of model as a result, obtaining target classification result.Wherein, it needs to multiple category of model results
Comprehensive descision is carried out, to determine target classification as a result, target classification result can be divided into two kinds of situations: 1, characterization classification is successful
First object classification results;2, the second target classification result of characterization classification failure.
It is specific:
It is grouped first: according to the difference of the corresponding classification information of T category of model result, by the T
A category of model result is divided into R group, i.e., each category of model result comprising the same category information is divided into one group.Wherein, together
The corresponding classification information of category of model result in one group is all the same, and R is positive integer.
Then, how two kinds of implementations of offer in target classification result the present embodiment are provided:
1, a weighted value can be assigned to each textual classification model (T) in integrated classifier in advance, in R group
In, the weighted value of the corresponding textual classification model of each category of model result in each group;To all in each group
Category of model result is weighted summation.Classification results are finally determined according to the size of weighted sum value.Such as: weighted sum
Value is more than the group of a certain default value as target group, for example, default value is 50%, 60%, 70% etc..
Such target group if it exists then illustrates to classify successfully, obtains the successful first object classification results of characterization classification,
Using the classification information that category of model result is included in the target group as the generic of text to be sorted.If it does not exist this
The target group of sample then illustrates classification failure, obtains the second target classification result of characterization classification failure.
2, inquiry whether there is a target group, the quantity symbol of the category of model result in the target group in R group
Close default class condition.Such target group if it exists then obtains characterization and classifies successful first object classification results, and described the
It include the classification information of the text to be sorted in one target classification result.Included by category of model result in the target group
Generic of the classification information as text to be sorted.Such target group if it does not exist then illustrates classification failure, obtains table
Second target classification result of sign classification failure.Wherein presetting class condition can are as follows: the category of model result in target group
Quantity be maximum in R group;The quantity of the category of model result in target group is maximum and only in R group
One;The quantity of the category of model result in target group is more than setting numerical value (such as 2,3,4).
For example:
With default class condition, " quantity of the category of model result in target group is maximum and only in R group
For one ".
If treating classifying text A (distinguishing with text B to be sorted hereinafter) as input data is input to integrated classifier
In, there are 3 textual classification models in integrated classifier.The category of model result of first textual classification model output is x (with mould
Type classification results y, z are distinguished), the category of model result of the second textual classification model output is y, and third textual classification model is defeated
Result out is z;Therefore category of model result can be divided into 3 groups, and each group of category of model fruiting quantities are 1, and there is no meet
The target group of default class condition.Therefore, the classification results of characterization classification failure are obtained.
If treating classifying text B to be input in integrated classifier as input data, there are 3 texts in integrated classifier
This disaggregated model.The category of model result of first textual classification model output is x, the model point of the second textual classification model output
Class result is x, and the result of third textual classification model output is z;Therefore category of model result can be divided into 2 groups, first group of (model
Classification results are that category of model fruiting quantities x) are 2, the category of model fruiting quantities of second group (category of model result is z)
It is 1, there is the target group (i.e. first group) for meeting default class condition.Therefore, the successful classification results of characterization classification can be obtained.
After step S302, must for classification results characterization classify successfully when, can be based on described in be input to it is integrated
The first object classification results of classifier output determine classification belonging to the text to be sorted, wherein the first object point
Class result characterization is classified successfully, and includes the classification information of the text to be sorted in the first object classification results.
Step S303: if classification failure, by the text input to be sorted into search engine, to pass through described search
Engine scans for the text to be sorted, obtains search result.
In step S303, any search engine first deposited is can be used in the search engine, such as: Baidu search, 360 are searched
Rope, Google search must should be searched for etc., with no restriction.It should include corresponding title, abstract in every search result, may be used also
Include keyword.
Step S304: the input data is adjusted based on described search result, obtains input data adjusted;
Wherein, may include process performed below:
1, key message is extracted from described search result.
Wherein, key message can be the heading message extracted in search result and/or summary info.Wherein, it is mentioning
Top n search result may be selected when taking key message, N is positive integer, such as takes 1,2,3,4.Then, it will can directly search
The title and/or abstract of hitch fruit input in integrated classifier collectively as input data, realize the expansion for treating classifying text
Exhibition and explanation, improve the classification accuracy of integrated classifier.If not including text to be sorted in the search result of predetermined number,
It can also be by the title and/or abstract of text to be sorted and search result collectively as input data.
In addition, also can extract the keyword in each search result, using keyword as the supplement for treating classifying text and
Extension.Keyword can also can be extracted at random, with no restriction by manually demarcating.
2, the key message is added in the input data, obtains input data adjusted;Or by the pass
Key information is as input data adjusted.
Wherein, for the same text to be sorted, when being adjusted according to described search result to the input data,
Used same search result or same keyword are discharged outside when should all adjust the last time.
It should be noted that in the present embodiment, when obtaining the second classification results of characterization classification failure, step S302 is extremely
Step S305 is recyclable to be executed, until obtaining terminating when the characterization successful classification results of classification.
To sum up, in the present embodiment, the method for the text classification, will be described to be sorted by receiving text to be sorted
Text is input in integrated classifier as input data classifies, and obtains classification results.Wherein, classification results can characterize to
The classification success or not of classifying text.If classification results characterization classification failure, by the text input to be sorted to searching
Index scans in holding up, and obtains search result;It scans for can get in a search engine more related to text to be sorted
The text information of connection, therefore search result can form the extension for treating classifying text.Then, based on described search result to described
Input data is adjusted, and obtains input data adjusted;Input data adjusted is input to the integrated classifier
It is middle to carry out subseries again, the resolution that integrated classifier treats classifying text can be improved, also further increase text to be sorted
Classification accuracy.Therefore, the method for the invention causes to treat classifying text progress secondary classification in conjunction with search, solves existing
Technology is complex to some semantemes and uncommon some text identification rates to be sorted are low, classification error or not accurate enough
The problem of.
Fourth embodiment
Referring to Fig. 5, additionally providing a kind of sorter of text based on the same inventive concept, in the present embodiment.In Fig. 5
The functional block diagram of the sorter 400 of text is shown, specific described device includes: receiving module 401, screening module
402, the first determining module 403 and the second determining module 404.
Described device 400 specifically includes:
Receiving module 401, for obtaining text to be sorted;Screening module 402, for selected from default dictionary with it is described
The similar multiple entries of text to be sorted, wherein multiple entries and each entry institute are stored in the default dictionary
The classification of category, the entry belong to the multiple entry;First determining module 403 is used for according to the default dictionary, really
Classification belonging to each entry in fixed the multiple entry;Second determining module 404, for according to the multiple
Classification belonging to each entry in entry determines target category, and using the target category as described wait divide
Classification belonging to class text.
As an alternative embodiment, screening module 402 also particularly useful for: successively calculate the text to be sorted with
The editing distance of each entry in the default dictionary;The editing distance in the default dictionary is less than pre-determined distance
Entry be determined as the entry.
As an alternative embodiment, second determining module 404 also particularly useful for: according to affiliated classification
Difference is grouped the multiple entry, obtains Q group entry, wherein the class belonging to same group of the entry
Not all the same, Q is positive integer;One group of most entry of number of entries is selected from the Q group entry, and will be belonging to this group of entry
Classification as the target category.
As an alternative embodiment, further include matching module, for being selected from default dictionary and institute described
Before stating the similar multiple entries of text to be sorted, according to the text to be sorted, in the default dictionary matching with
The corresponding entry of the text to be sorted;If it fails to match, execute it is described from default dictionary select with it is described to be sorted
The similar multiple entries of text.
As an alternative embodiment, matching module, also particularly useful for: according to the text to be sorted, described
Entry identical with the text to be sorted is searched in default dictionary.
It should be noted that the sorter 400 of text provided by the embodiment of the present invention, specific implementation and generation
Technical effect is identical with preceding method embodiment, and to briefly describe, Installation practice part does not refer to place, can refer to aforementioned side
Corresponding contents in method embodiment.
5th embodiment
In addition, based on the same inventive concept, fifth embodiment of the invention additionally provides a kind of user terminal, including processor
And memory, the memory are couple to the processor, the memory store instruction, when described instruction is by the processor
The user terminal is set to execute following operation when execution:
Obtain text to be sorted.
Multiple entries similar with the text to be sorted are selected from default dictionary, wherein the default dictionary
In be stored with classification belonging to multiple entries and each entry, the entry belongs to the multiple entry.
According to the default dictionary, classification belonging to each entry in the multiple entry is determined.
According to classification belonging to each entry in the multiple entry, target category is determined, and will be described
Target category is as classification belonging to the text to be sorted.
It should be noted that in user terminal provided by the embodiment of the present invention, the specific implementation of above-mentioned each step and
The technical effect of generation is identical with preceding method embodiment, and to briefly describe, the present embodiment does not refer to that place can refer to aforementioned side
Corresponding contents in method embodiment.
Operating system and third party application are installed in the embodiment of the present invention, in user terminal.User terminal
It can be tablet computer, mobile phone, laptop, PC (personal computer, personal computer), wearable device, vehicle
The subscriber terminal equipments such as mounted terminal.
Fig. 6 shows a kind of module frame chart of exemplary user terminal 500.As shown in fig. 6, user terminal 500 includes depositing
Reservoir 502, storage control 504, one or more (one is only shown in figure) processors 506, Peripheral Interface 508, network mould
Block 510, input/output module 512, display module 514 etc..These components pass through one or more communication bus/signal wire 516
Mutually communication.
Memory 502 can be used for storing software program and module, such as the classification method of the text in the embodiment of the present invention
And the corresponding program instruction/module of device, processor 506 by the software program that is stored in memory 502 of operation and
Module, thereby executing various function application and data processing, such as classification method of text provided in an embodiment of the present invention.
Memory 502 may include high speed random access memory, may also include nonvolatile memory, such as one or more magnetic
Property storage device, flash memory or other non-volatile solid state memories.Processor 506 and other possible components are to storage
The access of device 502 can carry out under the control of storage control 504.
Various input/output devices are couple processor 506 and memory 502 by Peripheral Interface 508.In some implementations
In example, Peripheral Interface 508, processor 506 and storage control 504 can be realized in one single chip.In some other reality
In example, they can be realized by independent chip respectively.
Network module 510 is for receiving and transmitting network signal.Above-mentioned network signal may include wireless signal or have
Line signal.
Input/output module 512 is used to be supplied to the interaction that user input data realizes user and user terminal.It is described defeated
Entering output module 512 may be, but not limited to, mouse, keyboard and Touch Screen etc..
Display module 514 provides an interactive interface (such as user interface) between user terminal 500 and user
Or it is referred to for display image data to user.In the present embodiment, the display module 514 can be liquid crystal display or
Touch control display.It can be the capacitance type touch control screen or resistance-type of support single-point and multi-point touch operation if touch control display
Touch screen etc..Support single-point and multi-point touch operation refer to touch control display can sense on the touch control display one or
The touch control operation generated simultaneously at multiple positions, and the touch control operation that this is sensed transfers to processor to be calculated and handled.
It is appreciated that structure shown in fig. 6 is only to illustrate, user terminal 500 may also include it is more than shown in Fig. 6 or
Less component, or with the configuration different from shown in Fig. 5.Each component shown in Fig. 5 can using hardware, software or its
Combination is realized.
Sixth embodiment
Sixth embodiment of the invention provides a kind of computer storage medium, point of the text in fourth embodiment of the invention
If the integrated functional module of class device is realized and when sold or used as an independent product in the form of software function module,
It can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned first embodiment
All or part of the process into the classification method of the text of 3rd embodiment can also instruct correlation by computer program
Hardware complete, the computer program can be stored in a computer readable storage medium, the computer program is in quilt
When processor executes, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer program
Code, the computer program code can be source code form, object identification code form, executable file or certain intermediate forms
Deng.The computer-readable medium may include: any entity or device, record that can carry the computer program code
Medium, USB flash disk, mobile hard disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), with
Machine access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..
It should be noted that the content that the computer-readable medium includes can be according to legislation and patent practice in jurisdiction
It is required that carrying out increase and decrease appropriate, such as in certain jurisdictions, do not wrapped according to legislation and patent practice, computer-readable medium
Include electric carrier signal and telecommunication signal.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein.
Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system
Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various
Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects,
Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect
Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself
All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments in this include institute in other embodiments
Including certain features rather than other feature, but the combination of the feature of different embodiment means in the scope of the present invention
Within and form different embodiments.For example, in the following claims, embodiment claimed it is any it
One can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors
Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice
Microprocessor or digital signal processor (DSP) are whole to realize the sorter of text according to an embodiment of the present invention, user
The some or all functions of some or all components at end.The present invention is also implemented as described herein for executing
Some or all device or device programs (for example, computer program and computer program product) of method.In this way
Realization program of the invention can store on a computer-readable medium, or can have the shape of one or more signal
Formula.Such signal can be downloaded from an internet website to obtain, and perhaps be provided on the carrier signal or with any other shape
Formula provides.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability
Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real
It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch
To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame
Claim.
The invention discloses A1, a kind of classification method of text, which comprises
Obtain text to be sorted;Multiple entries similar with the text to be sorted are selected from default dictionary,
In, classification belonging to multiple entries and each entry is stored in the default dictionary, the entry belongs to described more
A entry;According to the default dictionary, classification belonging to each entry in the multiple entry is determined;According to institute
Classification belonging to each entry in multiple entries is stated, determines target category, and using the target category as institute
State classification belonging to text to be sorted.
A2. method according to a1, it is described similar multiple with the text to be sorted from selecting in default dictionary
Entry, comprising:
Successively calculate the editing distance of each entry in the text to be sorted and the default dictionary;It will be described default
The entry that the editing distance in dictionary is less than pre-determined distance is determined as the entry.
A3. method according to a1, the classification according to belonging to each entry, determines target category,
Include:
According to the difference of affiliated classification, the multiple entry is grouped, obtains Q group entry, wherein be located at
Classification belonging to same group of entry is all the same, and Q is positive integer;It is most that number of entries is selected from the Q group entry
One group of entry, and using classification belonging to this group of entry as the target category.
A4. method according to a1, it is described that multiple mesh similar with the text to be sorted are selected from default dictionary
Before mark entry, further includes:
According to the text to be sorted, the matching entry corresponding with the text to be sorted in the default dictionary;If
It fails to match, then executes and described select multiple entries similar with the text to be sorted from default dictionary.
A5. method according to a4, it is described according to the text to be sorted, in the default dictionary matching with it is described
It the step of text to be sorted corresponding entry, specifically includes:
According to the text to be sorted, entry identical with the text to be sorted is searched in the default dictionary.
The invention also discloses a kind of sorter of text of B6., described device includes:
Receiving module, for obtaining text to be sorted;Screening module, for selected from default dictionary with it is described to be sorted
The similar multiple entries of text, wherein class belonging to multiple entries and each entry is stored in the default dictionary
Not, the entry belongs to the multiple entry;First determining module, for determining described more according to the default dictionary
Classification belonging to each entry in a entry;Second determining module, for according in the multiple entry
Each entry belonging to classification, determine target category, and using the target category as belonging to the text to be sorted
Classification.
B7. the device according to B6, screening module also particularly useful for:
Successively calculate the editing distance of each entry in the text to be sorted and the default dictionary;It will be described default
The entry that the editing distance in dictionary is less than pre-determined distance is determined as the entry.
B8. the device according to B6, second determining module also particularly useful for:
According to the difference of affiliated classification, the multiple entry is grouped, obtains Q group entry, wherein be located at
Classification belonging to same group of entry is all the same, and Q is positive integer;It is most that number of entries is selected from the Q group entry
One group of entry, and using classification belonging to this group of entry as the target category.
B9. the device according to B6, further includes: matching module, for it is described selected from default dictionary with it is described
Before the similar multiple entries of text to be sorted, according to the text to be sorted, matching and institute in the default dictionary
State the corresponding entry of text to be sorted;If it fails to match, execute described from selecting and the text to be sorted in default dictionary
This similar multiple entry.
B10. the device according to B9, matching module, also particularly useful for:
According to the text to be sorted, entry identical with the text to be sorted is searched in the default dictionary.
The invention discloses a kind of user terminal of C11., the user terminal includes processor and memory, the storage
Device is couple to the processor, and the memory store instruction makes the user when executed by the processor
Terminal executes the step of any one of A1-A5 the method.
The invention discloses a kind of computer readable storage mediums of D12., are stored thereon with computer program, the program quilt
The step of any one of A1-A5 the method is realized when processor executes.
Claims (10)
1. a kind of classification method of text characterized by comprising
Obtain text to be sorted;
Multiple entries similar with the text to be sorted are selected from default dictionary, wherein deposit in the default dictionary
Classification belonging to multiple entries and each entry is contained, the entry belongs to the multiple entry;
According to the default dictionary, classification belonging to each entry in the multiple entry is determined;
According to classification belonging to each entry in the multiple entry, target category is determined, and by the target
Classification is as classification belonging to the text to be sorted.
2. the method according to claim 1, wherein described from selecting and the text to be sorted in default dictionary
This similar multiple entry, comprising:
Successively calculate the editing distance of each entry in the text to be sorted and the default dictionary;
The entry that the editing distance in the default dictionary is less than pre-determined distance is determined as the entry.
3. the method according to claim 1, wherein the classification according to belonging to each entry,
Determine target category, comprising:
According to the difference of affiliated classification, the multiple entry is grouped, obtains Q group entry, wherein is located at same
Classification belonging to the entry of group is all the same, and Q is positive integer;
One group of most entry of number of entries is selected from the Q group entry, and using classification belonging to this group of entry as described in
Target category.
4. the method according to claim 1, wherein described select and the text to be sorted from default dictionary
Before similar multiple entries, further includes:
According to the text to be sorted, the matching entry corresponding with the text to be sorted in the default dictionary;
If it fails to match, executes and described select multiple target words similar with the text to be sorted from default dictionary
Item.
5. according to the method described in claim 4, it is characterized in that, described according to the text to be sorted, in the default word
In library the step of matching corresponding with the text to be sorted entry, specifically include:
According to the text to be sorted, entry identical with the text to be sorted is searched in the default dictionary.
6. a kind of sorter of text characterized by comprising
Receiving module, for obtaining text to be sorted;
Screening module, for selecting multiple entries similar with the text to be sorted from default dictionary, wherein described
It is stored with classification belonging to multiple entries and each entry in default dictionary, the entry belongs to the multiple entry;
First determining module, for determining each entry institute in the multiple entry according to the default dictionary
The classification of category;
Second determining module determines target for the classification according to belonging to each entry in the multiple entry
Classification, and using the target category as classification belonging to the text to be sorted.
7. device according to claim 6, which is characterized in that screening module also particularly useful for:
Successively calculate the editing distance of each entry in the text to be sorted and the default dictionary;By the default dictionary
In the editing distance be less than pre-determined distance entry be determined as the entry.
8. device according to claim 6, which is characterized in that second determining module also particularly useful for:
According to the difference of affiliated classification, the multiple entry is grouped, obtains Q group entry, wherein is located at same
Classification belonging to the entry of group is all the same, and Q is positive integer;One group of number of entries at most is selected from the Q group entry
Entry, and using classification belonging to this group of entry as the target category.
9. a kind of user terminal, which is characterized in that including processor and memory, the memory is couple to the processor,
The memory store instruction makes the user terminal perform claim require 1-5 when executed by the processor
Any one of the method the step of.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
The step of any one of claim 1-5 the method is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811368735.4A CN109684467A (en) | 2018-11-16 | 2018-11-16 | A kind of classification method and device of text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811368735.4A CN109684467A (en) | 2018-11-16 | 2018-11-16 | A kind of classification method and device of text |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109684467A true CN109684467A (en) | 2019-04-26 |
Family
ID=66185834
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811368735.4A Pending CN109684467A (en) | 2018-11-16 | 2018-11-16 | A kind of classification method and device of text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109684467A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111767403A (en) * | 2020-07-07 | 2020-10-13 | 腾讯科技(深圳)有限公司 | Text classification method and device |
CN111782727A (en) * | 2020-06-28 | 2020-10-16 | 平安医疗健康管理股份有限公司 | Data processing method and device based on machine learning |
CN112085040A (en) * | 2019-06-12 | 2020-12-15 | 腾讯科技(深圳)有限公司 | Object tag determination method and device and computer equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101984422A (en) * | 2010-10-18 | 2011-03-09 | 百度在线网络技术(北京)有限公司 | Fault-tolerant text query method and equipment |
CN107436875A (en) * | 2016-05-25 | 2017-12-05 | 华为技术有限公司 | File classification method and device |
CN107844559A (en) * | 2017-10-31 | 2018-03-27 | 国信优易数据有限公司 | A kind of file classifying method, device and electronic equipment |
CN108563722A (en) * | 2018-04-03 | 2018-09-21 | 有米科技股份有限公司 | Trade classification method, system, computer equipment and the storage medium of text message |
-
2018
- 2018-11-16 CN CN201811368735.4A patent/CN109684467A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101984422A (en) * | 2010-10-18 | 2011-03-09 | 百度在线网络技术(北京)有限公司 | Fault-tolerant text query method and equipment |
CN107436875A (en) * | 2016-05-25 | 2017-12-05 | 华为技术有限公司 | File classification method and device |
CN107844559A (en) * | 2017-10-31 | 2018-03-27 | 国信优易数据有限公司 | A kind of file classifying method, device and electronic equipment |
CN108563722A (en) * | 2018-04-03 | 2018-09-21 | 有米科技股份有限公司 | Trade classification method, system, computer equipment and the storage medium of text message |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112085040A (en) * | 2019-06-12 | 2020-12-15 | 腾讯科技(深圳)有限公司 | Object tag determination method and device and computer equipment |
CN111782727A (en) * | 2020-06-28 | 2020-10-16 | 平安医疗健康管理股份有限公司 | Data processing method and device based on machine learning |
CN111782727B (en) * | 2020-06-28 | 2022-08-12 | 深圳平安医疗健康科技服务有限公司 | Data processing method and device based on machine learning |
CN111767403A (en) * | 2020-07-07 | 2020-10-13 | 腾讯科技(深圳)有限公司 | Text classification method and device |
CN111767403B (en) * | 2020-07-07 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Text classification method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109684627A (en) | A kind of file classification method and device | |
CN111177569B (en) | Recommendation processing method, device and equipment based on artificial intelligence | |
CN112632385B (en) | Course recommendation method, course recommendation device, computer equipment and medium | |
AU2018383346B2 (en) | Domain-specific natural language understanding of customer intent in self-help | |
CN106649818B (en) | Application search intention identification method and device, application search method and server | |
US8082264B2 (en) | Automated scheme for identifying user intent in real-time | |
CN109582792A (en) | A kind of method and device of text classification | |
CN110532451A (en) | Search method and device for policy text, storage medium, electronic device | |
CN108363790A (en) | For the method, apparatus, equipment and storage medium to being assessed | |
CN111767716B (en) | Method and device for determining enterprise multi-level industry information and computer equipment | |
CN103870001B (en) | A kind of method and electronic device for generating candidates of input method | |
CN109766438A (en) | Biographic information extracting method, device, computer equipment and storage medium | |
CN111324771B (en) | Video tag determination method and device, electronic equipment and storage medium | |
US20210118024A1 (en) | Multi-label product categorization | |
CN112035595B (en) | Method and device for constructing auditing rule engine in medical field and computer equipment | |
CN109800307A (en) | Analysis method, device, computer equipment and the storage medium of product evaluation | |
CN110597978B (en) | Article abstract generation method, system, electronic equipment and readable storage medium | |
US11734322B2 (en) | Enhanced intent matching using keyword-based word mover's distance | |
CN109684467A (en) | A kind of classification method and device of text | |
CN110276009B (en) | Association word recommendation method and device, electronic equipment and storage medium | |
CN114037545A (en) | Client recommendation method, device, equipment and storage medium | |
Aralikatte et al. | Fault in your stars: an analysis of android app reviews | |
CN116796730A (en) | Text error correction method, device, equipment and storage medium based on artificial intelligence | |
CN110647504B (en) | Method and device for searching judicial documents | |
CN113366511B (en) | Named entity identification and extraction using genetic programming |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |