CN109684467A - A kind of classification method and device of text - Google Patents

A kind of classification method and device of text Download PDF

Info

Publication number
CN109684467A
CN109684467A CN201811368735.4A CN201811368735A CN109684467A CN 109684467 A CN109684467 A CN 109684467A CN 201811368735 A CN201811368735 A CN 201811368735A CN 109684467 A CN109684467 A CN 109684467A
Authority
CN
China
Prior art keywords
entry
text
sorted
classification
default dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811368735.4A
Other languages
Chinese (zh)
Inventor
熊安斌
李倩倩
颜培英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201811368735.4A priority Critical patent/CN109684467A/en
Publication of CN109684467A publication Critical patent/CN109684467A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of classification method of text and devices, comprising: obtains text to be sorted;Multiple entries similar with the text to be sorted are selected from default dictionary, wherein classification belonging to multiple entries and each entry is stored in the default dictionary, the entry belongs to the multiple entry;According to the default dictionary, classification belonging to each entry in the multiple entry is determined;According to classification belonging to each entry in the multiple entry, target category is determined, and using the target category as classification belonging to the text to be sorted.The Error Text error-correcting effect that the present invention solves the prior art is poor, can not correctly be classified to text to be sorted, the low problem of classification accuracy.

Description

A kind of classification method and device of text
Technical field
The present invention relates to field of computer technology more particularly to the classification methods and device of a kind of text.
Background technique
With the development of society and epoch, the work and life of people is increasingly dependent on internet, can be with by internet Inquiry data buys commodity, launches advertisement etc..But it is current to interconnect user on the network's production and the natural text retrieved daily exponentially The speed of grade increases.Information overload is easy to appear when by search engine retrieving content for the numerous and jumbled content on network Situation, it is therefore desirable to classify to text information.Meanwhile text classification can help business department to carry out flow analysis, interior Hold audit, building user/product portrait, precisely recommend, keyword expands cluster, CTR is estimated etc., there is extremely important meaning.
But there are a large amount of local error text in the amount of text of magnanimity, current file classification method is to this Usually there is relatively large deviation in the semantic understanding that class has the text of mistake, text error-correcting effect is poor, therefore in text classification When usually can not correctly be classified to text to be sorted, classification accuracy it is low.
Summary of the invention
In view of the above problems, the invention proposes a kind of classification method of text and devices, solve the mistake of the prior art Accidentally text error-correcting effect is poor, can not correctly be classified to text to be sorted, the low problem of classification accuracy.
In a first aspect, the application is provided the following technical solutions by the embodiment of the application:
A kind of classification method of text, comprising: obtain text to be sorted;It is selected from default dictionary and the text to be sorted This similar multiple entry, wherein class belonging to multiple entries and each entry is stored in the default dictionary Not, the entry belongs to the multiple entry;According to the default dictionary, each of the multiple entry is determined Classification belonging to entry;According to classification belonging to each entry in the multiple entry, target class is determined Not, and using the target category as classification belonging to the text to be sorted.
It is preferably, described to select multiple entries similar with the text to be sorted from default dictionary, comprising: Successively calculate the editing distance of each entry in the text to be sorted and the default dictionary;It will be in the default dictionary The entry that the editing distance is less than pre-determined distance is determined as the entry.
Preferably, the classification according to belonging to each entry, determines target category, comprising: according to affiliated Classification difference, the multiple entry is grouped, obtain Q group entry, wherein positioned at same group of entry Affiliated classification is all the same, and Q is positive integer;Select one group of most entry of number of entries from the Q group entry, and by the group Classification belonging to entry is as the target category.
Preferably, described before selecting multiple entries similar with the text to be sorted in default dictionary, also It include: the matching entry corresponding with the text to be sorted in the default dictionary according to the text to be sorted;If matching Failure then executes and described selects multiple entries similar with the text to be sorted from default dictionary.
Preferably, described according to the text to be sorted, matching and the text pair to be sorted in the default dictionary It the step of entry answered, specifically includes: according to the text to be sorted, being searched and the text to be sorted in the default dictionary This identical entry.
Second aspect, based on the same inventive concept, the application are provided the following technical solutions by the embodiment of the application:
A kind of sorter of text characterized by comprising receiving module, for obtaining text to be sorted;Screen mould Block, for selecting multiple entries similar with the text to be sorted from default dictionary, wherein in the default dictionary It is stored with classification belonging to multiple entries and each entry, the entry belongs to the multiple entry;First determines mould Block, for determining classification belonging to each entry in the multiple entry according to the default dictionary;Second really Cover half block determines target category for the classification according to belonging to each entry in the multiple entry, and by institute Target category is stated as classification belonging to the text to be sorted.
Preferably, screening module also particularly useful for: successively calculate the text to be sorted with it is every in the default dictionary The editing distance of a entry;The entry that the editing distance in the default dictionary is less than pre-determined distance is determined as the mesh Mark entry.
Preferably, second determining module also particularly useful for: according to the difference of affiliated classification, to the multiple target Entry is grouped, and obtains Q group entry, wherein the classification belonging to same group of the entry is all the same, and Q is positive integer; One group of most entry of number of entries is selected from the Q group entry, and using classification belonging to this group of entry as the target Classification.
Preferably, further include matching module, for it is described selected from default dictionary it is similar to the text to be sorted Multiple entries before, according to the text to be sorted, matching and the text pair to be sorted in the default dictionary The entry answered;If it fails to match, executes and described select multiple mesh similar with the text to be sorted from default dictionary Mark entry.
Preferably, matching module, also particularly useful for: according to the text to be sorted, searched in the default dictionary with The identical entry of the text to be sorted.
The third aspect, based on the same inventive concept, the application are provided the following technical solutions by the embodiment of the application:
A kind of user terminal, including processor and memory, the memory are couple to the processor, the memory Store instruction makes the user terminal execute side described in any one of first aspect when executed by the processor The step of method.
Fourth aspect, based on the same inventive concept, the application are provided the following technical solutions by the embodiment of the application:
A kind of computer readable storage medium, is stored thereon with computer program, realization when which is executed by processor The step of any one of first aspect the method.
In the classification method and device of text provided in an embodiment of the present invention, after obtaining text to be sorted, from default word Multiple entries similar with the text to be sorted are selected in library, realize fuzzy matching.Wherein, it is deposited in the default dictionary Classification belonging to multiple entries and each entry is contained, the entry belongs to the multiple entry;It is by the step Making text to be sorted, there are mistakes, also can be improved and find entry corresponding with the text to be sorted in default dictionary Probability, avoid the case where can not classifying.Then it according to the default dictionary, determines in the multiple entry Classification belonging to each entry;The finally classification according to belonging to each entry in the multiple entry, really Set the goal classification, and using the target category as classification belonging to the text to be sorted, wherein belonging to text to be sorted Classification is to be determined by entry each in multiple entries, rather than single entry determines, therefore text to be sorted The determination of affiliated classification is more accurate.To sum up, the present invention solve the prior art Error Text error-correcting effect it is poor, can not Correctly classified to text to be sorted, the low problem of classification accuracy.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of the classification method of text of first embodiment of the invention offer;
Fig. 2 shows the specific flow charts of the step S40 in Fig. 1;
Fig. 3 shows a kind of flow chart of the classification method of text of second embodiment of the invention offer;
Fig. 4 shows a kind of flow chart of the classification method of text of third embodiment of the invention offer;
Fig. 5 shows a kind of functional block diagram of the sorter of text of fourth embodiment of the invention offer;
Fig. 6 shows a kind of module frame chart of user terminal of fifth embodiment of the invention offer.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
A kind of classification method of the text provided in the present invention, can be to text and there are the texts to be sorted of erroneous words Fuzzy matching is carried out, and comprehensive determination is carried out using the generic that multiple matching results treat classifying text, ensure that point The accuracy of class result.It explains and says below in conjunction with classification method of multiple embodiments to text provided by the invention It is bright.Application scenarios of the invention include but is not limited to: trade classification article, violated advertisement identification, keyword relevancies analysis, net Page label, user's portrait, advertising display, shunting information etc..
First embodiment
Fig. 1 is please referred to, a kind of classification method of text is provided in the present embodiment, Fig. 1 shows the method stream of the present embodiment Cheng Tu below will be described in detail a step in the present embodiment.Specific step is as follows:
Step S10: text to be sorted is obtained.
Step S20: multiple entries similar with the text to be sorted are selected from default dictionary, wherein described It is stored with classification belonging to multiple entries and each entry in default dictionary, the entry belongs to the multiple entry.
Step S30: according to the default dictionary, class belonging to each entry in the multiple entry is determined Not.
Step S40: according to classification belonging to each entry in the multiple entry, determining target category, And using the target category as classification belonging to the text to be sorted.
In step slo, text to be sorted is the text information classified, the text to be sorted can include: is appointed The text of meaning, phrase, sentence, dialecticism, number, character string etc..Wherein, the language form of text is with no restrictions, it may include: Chinese, English, German, Russian etc., with no restriction.It should be noted that existing and emerging language or for remembering from now on Information carrying informative text can be used as text to be sorted.The concrete form of text to be sorted can be by arbitrary text, phrase, language Article, news, event bulletin, webpage (content of pages) of the compositions such as sentence, dialecticism, number, character string etc..
Text to be sorted with specific reference to example, it is as follows:
" Ancient Greece and Rome mythology Chinese ppt ", " Jingdone district app kinds of goods lower right corner advertisement ", " QQ space two dimensional code ", " 300450 artificial intelligence ", " dream about others and send mobile phone ", " my world is automatically repaired bug plug-in unit ", " Tsinghua University ", " Ah Dam ", " urologic disease " etc..
Based on above-mentioned text to be sorted, it is to be understood that since a large amount of texts to be sorted of internet are artificial record The text entered, for example generated by the mistake etc. that personal habits and input method are keyed in, there may be mistakes in text to be sorted Entry.
Text to be sorted containing wrong article is such as: " electric sound paradise " (correct are as follows: " film paradise ");" 360 are safely Four " (correct are as follows: " 360 security guard ");" my world is automatically repaired bug plug-in unit " is (correct are as follows: " my world is automatically repaired bug Plug-in unit ");" in emerging security " (correct are as follows: " CITIC Securities ") etc..
In step S20, wherein default dictionary is machine or the artificial dictionary classified or marked, it can also be existing General dictionary.For example, to introduce following three classes dictionary in the present embodiment, but not limited to this:
1, the first default dictionary, wherein be stored with class belonging to each entry in M entry and the M entry Not, classification belonging to each entry is marked by manual type in the M entry, and M is positive integer.First default dictionary is It artificially collects and there are following situations: comprising nonstandardized technique term in the dictionary;All new network is being generated daily on internet Term, this part word/sentence are to be not present in existing dictionary, therefore need to carry out handmarking and classify;With society It continues to develop, existing dictionary is endowed new understanding and meaning;These three types of situations are required to carry out artificial judgment and mark Note, guarantees the growth and accuracy of dictionary.Specifically such as:
The default dictionary example of table 1: the first
Word or sentence in first dictionary are not present in existing (internet) dictionary, existing Internet dictionary example Such as: in Chinese voluminous dictionary, the online dictionary reference book of Baidu or relevant classifieds website.But such word or sentence are interconnection The higher word of network users frequency of usage, therefore such word can be collected to and is labeled to it affiliated classification, form people The dictionary (the first default dictionary) of work label.
2, the second default dictionary, wherein be stored with class belonging to each entry in N number of entry and N number of entry Not, the described second default dictionary is provided by default industry dictionary website, and N is positive integer.Default industrial sustainability includes collecting to have respectively The website of a industry vocabulary, industry vocabulary are related practitioner or public acceptance or well known vocabulary in the sector.Such as: Electric business class relative words, general amusement class relative words, hand swim class relative words, PC/ software class relative words, educational related term It converges, financial class relative words etc..The website for providing above-mentioned industry dictionary includes but is not limited to: Baidu's industry dictionary, search dog row Industry dictionary, the dictionary in Baidu's roll of the hour, 5118.com industry dictionary etc..Specific example is as follows:
The default dictionary example of table 2: the second
Construct the second default dictionary concrete mode can by purchase and web crawlers (be otherwise known as webpage spider, Network robot) mode crawled.
3, third presets dictionary, wherein is stored with class belonging to each entry in P entry and the P entry Not, the third is preset dictionary and is provided by regulation engine, and P is positive integer.Regulation engine in the present embodiment has been determined for providing The related entry of the business rule of justice, such as: place name, school's name, stock code, special proprietary digital word stock, medical correlation etc..
Table 3: third presets dictionary
Entry Generic
Recruitment Recruitment | recruitment
Security personnel Safe security | security personnel's security
Sports lottery ticket Lottery ticket | welfare lottery ticket
B2B E-commerce | B2B
Tidy street E-commerce | vertical B2C
Producer's supply E-commerce | other
Tsinghua University Educational training | the education with record of formal schooling
Bengbu College Educational training | the education with record of formal schooling
Aba Tourism | other
900953 Financial service | equity fund
Urologic disease Medical treatment & health | andrology
360 IT product | software
263 It is social | other
It should be noted that table 1- table 3 is merely illustrative, content therein be it is schematical, not to of the invention Protection scope is construed as limiting.
Presetting the entry in dictionary in the first above-mentioned default dictionary, the second default dictionary and third can belong to simultaneously Multiple classifications.The same entry is allowed to exist simultaneously in three dictionaries, and the entry can belong to difference in different dictionaries Classification.Any dictionary in above-mentioned dictionary can be used to carry out step S20 as default dictionary in the present embodiment.
In step S20, multiple entries similar with the text to be sorted are selected from default dictionary, are implemented Concrete mode can are as follows:
Firstly, successively calculating the editing distance of each entry in the text to be sorted and the default dictionary.Wherein, Editing distance is the quantization measurement for the difference degree of two character strings (for example, Chinese word, English words), and measurement mode is to see Another character string could be become for a character string by least needing the processing of how many times.
Then, the entry that the editing distance in the default dictionary is less than and (also can use and be equal to) pre-determined distance is determined For the entry.Wherein, pre-determined distance can customize setting, and for example, 1,2,3 etc.;Pre-determined distance can also pass through step S40 carries out feedback regulation, such as when the classification belonging to the text to be sorted obtained in step S40 contains multiple not accurate enough, Pre-determined distance can suitably be reduced.
In step s 30, it according to the default dictionary, determines belonging to each entry in the multiple entry Classification.Since each entry is selected in default dictionary, the entry is corresponding to have affiliated classification.
In step s 40, the classification according to belonging to each entry in the multiple entry, determines target class Not, and using the target category as classification belonging to the text to be sorted.Referring to figure 2., target class is determined in the step Other specific implementation may include following steps:
Step S41: according to the difference of affiliated classification, being grouped the multiple entry, obtains Q group entry, Wherein, classification belonging to the entry positioned at same group is all the same, and Q is positive integer.
Step S42: one group of most entry of number of entries is selected from the Q group entry, and will be belonging to this group of entry Classification is as the target category.
The target category that step S41 and step S42 is determined is as classification belonging to text to be sorted.
During specific classification, it can directly adopt and close on sorting algorithm (KNN, K-NearestNeighbor) progress Specific implementation.
It should be understood that
If in step S42 in Q group entry there are the most group of number of entries be it is two or more when.In the present embodiment Following two processing mode is provided with alternative steps S42:
Alternative steps 1 select that number of entries is most or the entry of preceding S group from the Q group entry, and the multiple groups that will be selected Classification belonging to entry is as the target category, and wherein S is the positive integer more than or equal to 2.
If when identical and most there are multiple groups number of entries in alternative steps 2, Q group entry, feedback adjustment pre-determined distance. Can the pre-determined distance be reduced or be increased.Until obtaining target category.
In order in the present embodiment, the classification accuracy for guaranteeing text to be sorted while fuzzy matching realized, in step Before rapid S20, can also following steps be carried out:
According to the text to be sorted, the matching entry corresponding with the text to be sorted in the default dictionary;Its In, match concrete mode are as follows: according to the text to be sorted, search in the default dictionary identical as the text to be sorted Entry, i.e., 100% identical matching.
If it fails to match, executes and described select multiple targets similar with the text to be sorted from default dictionary Entry.
It, can be directly using the generic of the correspondence entry of successful match as belonging to text to be sorted if successful match Classification.
In order to which the scheme to the present embodiment more easily understands, following example is please referred to:
Execute step S10, the text to be sorted of acquisition are as follows: " under Baidu ".
Firstly, matching whether there is and entry identical " under Baidu " in default dictionary.If there is no (it fails to match), Executable step S20, by taking pre-determined distance 2 as an example (that is: editing distance is less than or equal to 2).It is matched, is obtained in default dictionary The entry as, as follows:
Table 4
Entry and generic in table 4 is exemplary illustration, is not limited the scope of the invention, in reality Border implement the present invention during can from there are different in table 4.
Executing step S30 can determine the generic of entry.
Then, step S40 is executed, 3 groups can be divided into the entry in table 4 according to generic, wherein entry number The most corresponding classification of a group of amount is " search engine ", number of entries 3.It then can will be belonging to text to be sorted " under Baidu " Classification be determined as " search engine ".
In the classification method and device of text provided in this embodiment, after obtaining text to be sorted, from default dictionary Multiple entries similar with the text to be sorted are selected, realize fuzzy matching.Wherein, it is stored in the default dictionary Classification belonging to multiple entries and each entry, the entry belong to the multiple entry;Even if by the step to There are mistakes for classifying text, also can be improved and find the general of entry corresponding with the text to be sorted in default dictionary Rate avoids the case where can not classifying.Then according to the default dictionary, each of the multiple entry is determined Classification belonging to entry;The finally classification according to belonging to each entry in the multiple entry, determines mesh Classification is marked, and using the target category as classification belonging to the text to be sorted, wherein classification belonging to text to be sorted It is to be determined by entry each in multiple entries, rather than single entry determines, therefore belonging to text to be sorted Classification determination it is more accurate.Therefore in conclusion the present invention Error Text error-correcting effect that solves the prior art is poor, Can not correctly it be classified to text to be sorted, the low problem of classification accuracy.
Second embodiment
Referring to figure 3., a kind of classification method of text is also provided in the present embodiment based on the same inventive concept, it is real with first It applies unlike example, steps are as follows for method execution in the present embodiment:
Step S201: text to be sorted is received.
Step S202: according to the text to be sorted, the matching mesh corresponding with the text to be sorted in default dictionary Mark entry, wherein classification belonging to multiple entries and each entry is stored in the default dictionary.
Step S203: if successful match, classification belonging to the entry is determined as the text institute to be sorted The classification of category.
Step S204: it if it fails to match, by the text input to be sorted into integrated classifier, obtains described wait divide The generic of class text, wherein one or more textual classification model is provided in the integrated classifier.
Relative to first embodiment, step S201 is identical as step S10 in the present embodiment, but increase step S203 and Step S204, while applying in step S202 the step S20- step S40 of first embodiment, i.e., in text to be sorted and the One default dictionary, the second default dictionary, third preset dictionary matching when, used matching process may each comprise (or part wrap Include) first embodiment provide method, i.e. the step S20- step S40 of first embodiment, if the Q group word in step s 40 The most group of number of entries is not unique in item, then can determine that classification is inaccurate, can continue to execute step S204.
In step S202, matched mode when matching corresponding entry in default dictionary can are as follows: searches pre- If whether being stored with entry included by text to be sorted in dictionary;Preferably, can search in default dictionary whether be stored with The identical entry of the text to be sorted.
For the matching in step S202, two kinds of results with step S203 and step S204:
If the successful match in step S203, prove there is target word corresponding with text to be sorted in presetting database Classification belonging to the entry can be determined as classification belonging to the text to be sorted by item.Complete text to be sorted Classification.
In step S204, if it fails to match, illustrates that there is no targets corresponding with text to be sorted in presetting database Entry.At this point, obtaining the generic of the text to be sorted by the text input to be sorted into integrated classifier.? Multiple textual classification models are provided in integrated classifier, further can according to the classification results of each textual classification model, The classification for treating classifying text carries out the generic to text to be sorted of comprehensive descision, improves accuracy.
The present invention in default dictionary by carrying out matching corresponding entry, then again in the case where it fails to match Classified by textual classification model, relative to directly by textual classification model with more high accuracy.Simultaneously integrated Disaggregated model present in classifier has multiple (two or more), and multiple classification results can be obtained, avoid single text It can not occur and correct classification results when the classification error of this disaggregated model.
A kind of concrete implementation mode is provided to the matching of step S202 in the present embodiment:
According to the text to be sorted, according to preset order, successively in multiple default dictionaries matching with it is described to be sorted The corresponding entry of text, the entry stored in the multiple default dictionary are different.Wherein, preset order refer to it is multiple Default dictionary matches sequencing when corresponding entry, can customize setting, with no restriction.
In this implementation, said so that the first default dictionary, the second default dictionary and third preset dictionary as an example It is bright.When i.e. default dictionary is 3, matching order is followed successively by the first default dictionary, the second default dictionary and third and presets dictionary. First default dictionary match the step of include:
Obtain the first default dictionary.
According to the text to be sorted, matching first word corresponding with the text to be sorted in the first default dictionary Item.
If successful match, first entry is determined as the entry.
In the first default dictionary, if it fails to match, and continuation is matched in the second default dictionary, and matching step includes:
Obtain the second default dictionary.
According to the text to be sorted, matching second word corresponding with the text to be sorted in the second default dictionary Item.
If second entry is determined as the entry by successful match in the second default dictionary.
It should be noted that the classification accuracy in order to guarantee text to be sorted.First is matched in the first default dictionary When entry, and when matching the second entry in the second default dictionary, matched mode be can be used: search default dictionary (first Default dictionary or the second default dictionary) in whether be stored with entry identical with the text to be sorted, guarantee text to be sorted Sub-category accuracy.
For example:
If text to be sorted is " dream about others and send mobile phone ", corresponding identical the can be matched in the first default dictionary One entry, it may be determined that the classification of text to be sorted are as follows: amusement and recreation and Constellation.
If text to be sorted be " Da Er it is excellent ", can be in the second default dictionary after it fails to match in the first default dictionary It is matched, identical second entry can be matched to, determine the classification of text to be sorted are as follows: IT product and manufacturer computer.
In the second default dictionary, if it fails to match, and continuation is preset in dictionary in third to be matched, and matching step includes:
It obtains third and presets dictionary.
According to the text to be sorted, matching third word corresponding with the text to be sorted is preset in dictionary in third Item.
If successful match, the third entry is determined as the entry.
If it fails to match, execute it is described by the text input to be sorted into integrated classifier, obtain it is described to point The generic of class text.
Regulation engine also defines matched rule, and wherein whether matching rule includes: and search to be stored in default dictionary There is entry identical with the third entry in text to be sorted in third entry;Third entry if it exists, then can will be to be sorted The classification of text is determined as classification belonging to third entry.
For example:
For example, text to be sorted be " product in tidy street it is good ", the entry is in the first default dictionary, the second default dictionary It can not successful match;So when third is preset dictionary and matched, the third entry that can be inquired is " tidy street ", due to The classification of the third entry be " e-commerce, B2B ", then can by the classification of the entry to be sorted determine are as follows: e-commerce and B2B。
When third, which is preset, there is multiple third entries with text matches to be sorted in dictionary, then statistics available third entry Classification and classification quantity, guarantee classification objectivity and accuracy.
For example, text to be sorted be " product in tidy street be producer supply ", the entry is in the first default dictionary, second It can not successful match in default dictionary;So when third is preset dictionary and matched, the third entry that can be inquired is " Chu Chu Jie ", " producer's supply ", since the classification of the third entry is respectively " e-commerce, B2B " and " e-commerce, vertical B2C ", then can determine the classification of the entry to be sorted are as follows: e-commerce.
If presetting in dictionary in third, it fails to match, by the text input to be sorted into integrated classifier, namely Step S204.
In step S204, if it fails to match, by the text input to be sorted into integrated classifier, described in acquisition The generic of text to be sorted, wherein one or more textual classification model is provided in the integrated classifier.
More preferably, the textual classification model in classifier be two and its more than, text classification mould in the present embodiment The quantity of type is 3, and the structure of each model is different, specifically can include: is based on SVM (Support Vector Machine, support vector machines) textual classification model;FastText model (one of Facebook AI Reserch open source Term vector and text classification tool);Textual classification model etc. based on deep learning.Textual classification model can directly adopt existing Common model be trained acquisition.
When being trained to the textual classification model in integrated classifier, any dictionary that can be used in default dictionary is made For learning sample.More preferably mode, using the first default dictionary as learning sample.Since the first default dictionary is artificial mark Note obtains, the semantic understanding of entry it is more accurate, it can be ensured that the classification of entry is correct.
In the present embodiment, the matching order that the first default dictionary, the second default dictionary and third preset dictionary is increased, And it is extensive to avoid text classification so that the classification of text to be sorted is more accurate for the method for combining first embodiment to provide The problem of.Classify in addition, increasing integrated classifier and treating classifying text, classification can not be determined by avoiding text to be sorted Situation.To sum up, method provided in an embodiment of the present invention solve the prior art Error Text error-correcting effect it is poor, can not treat point The text of class is correctly classified, the low problem of classification accuracy.
3rd embodiment
Referring to Fig. 4, additionally providing a kind of classification method of text based on the same inventive concept, in the present embodiment.It is described The detailed process of method is as follows:
Step S301: text to be sorted is received.
Step S302: being input in integrated classifier using the text to be sorted as input data, to pass through the collection Constituent class device classifies to the text to be sorted.
Step S303: if classification failure, by the text input to be sorted into search engine, to pass through described search Engine scans for the text to be sorted, obtains search result.
Step S304: the input data is adjusted based on described search result, obtains input data adjusted.
Step S305: input data adjusted is input in the integrated classifier, to pass through the Ensemble classifier Device classifies to the text to be sorted.
For second embodiment, when the step S301 in the present embodiment is implemented, step S201 execution can refer to. When executing step S204, after text input to integrated classifier to be sorted, held according to step S302 to step S305 Row, until obtaining the generic of text to be sorted.
In step s 302, the integrated classifier is classified for treating classifying text, in integrated classifier Multiple textual classification models can be integrated.Step progress in detail below can be used to the embodiment of step S302 in the present embodiment:
According to multiple textual classification models in the integrated classifier, the corresponding model of each textual classification model is obtained As a result.
Wherein, T textual classification model is provided in integrated classifier, T is positive integer.Text point in integrated classifier The quantity of class model is not construed as limiting, and can be more than or equal to two.In more preferably embodiment, the quantity of textual classification model is taken Odd number, for example, 3,5 etc..The structure of each textual classification model is different, specifically can include: is based on SVM The textual classification model of (Support Vector Machine, support vector machines);FastText model (Facebook AI A term vector and text classification tool for Reserch open source);Textual classification model etc. based on deep learning.Text classification Model can directly adopt existing common model and be trained acquisition, referring in particular to second embodiment.
Integrated classifier in the present embodiment is when treating classifying text and being classified, it may include following steps:
Receive the input data.Wherein, input data can be text to be sorted, be also possible to based on search result tune Input data after whole.
Based on the input data, classified respectively to the text to be sorted by the T textual classification model, Obtaining T category of model result, wherein the T category of model result and the T textual classification model correspond, and Classification information comprising a characterization text generic to be sorted in each category of model result;That is integrated classifier After receiving text input to be sorted, each textual classification model can correspond to obtain a model result, and model result is text The output data of this disaggregated model.
According to the T category of model as a result, obtaining target classification result.Wherein, it needs to multiple category of model results Comprehensive descision is carried out, to determine target classification as a result, target classification result can be divided into two kinds of situations: 1, characterization classification is successful First object classification results;2, the second target classification result of characterization classification failure.
It is specific:
It is grouped first: according to the difference of the corresponding classification information of T category of model result, by the T A category of model result is divided into R group, i.e., each category of model result comprising the same category information is divided into one group.Wherein, together The corresponding classification information of category of model result in one group is all the same, and R is positive integer.
Then, how two kinds of implementations of offer in target classification result the present embodiment are provided:
1, a weighted value can be assigned to each textual classification model (T) in integrated classifier in advance, in R group In, the weighted value of the corresponding textual classification model of each category of model result in each group;To all in each group Category of model result is weighted summation.Classification results are finally determined according to the size of weighted sum value.Such as: weighted sum Value is more than the group of a certain default value as target group, for example, default value is 50%, 60%, 70% etc..
Such target group if it exists then illustrates to classify successfully, obtains the successful first object classification results of characterization classification, Using the classification information that category of model result is included in the target group as the generic of text to be sorted.If it does not exist this The target group of sample then illustrates classification failure, obtains the second target classification result of characterization classification failure.
2, inquiry whether there is a target group, the quantity symbol of the category of model result in the target group in R group Close default class condition.Such target group if it exists then obtains characterization and classifies successful first object classification results, and described the It include the classification information of the text to be sorted in one target classification result.Included by category of model result in the target group Generic of the classification information as text to be sorted.Such target group if it does not exist then illustrates classification failure, obtains table Second target classification result of sign classification failure.Wherein presetting class condition can are as follows: the category of model result in target group Quantity be maximum in R group;The quantity of the category of model result in target group is maximum and only in R group One;The quantity of the category of model result in target group is more than setting numerical value (such as 2,3,4).
For example:
With default class condition, " quantity of the category of model result in target group is maximum and only in R group For one ".
If treating classifying text A (distinguishing with text B to be sorted hereinafter) as input data is input to integrated classifier In, there are 3 textual classification models in integrated classifier.The category of model result of first textual classification model output is x (with mould Type classification results y, z are distinguished), the category of model result of the second textual classification model output is y, and third textual classification model is defeated Result out is z;Therefore category of model result can be divided into 3 groups, and each group of category of model fruiting quantities are 1, and there is no meet The target group of default class condition.Therefore, the classification results of characterization classification failure are obtained.
If treating classifying text B to be input in integrated classifier as input data, there are 3 texts in integrated classifier This disaggregated model.The category of model result of first textual classification model output is x, the model point of the second textual classification model output Class result is x, and the result of third textual classification model output is z;Therefore category of model result can be divided into 2 groups, first group of (model Classification results are that category of model fruiting quantities x) are 2, the category of model fruiting quantities of second group (category of model result is z) It is 1, there is the target group (i.e. first group) for meeting default class condition.Therefore, the successful classification results of characterization classification can be obtained.
After step S302, must for classification results characterization classify successfully when, can be based on described in be input to it is integrated The first object classification results of classifier output determine classification belonging to the text to be sorted, wherein the first object point Class result characterization is classified successfully, and includes the classification information of the text to be sorted in the first object classification results.
Step S303: if classification failure, by the text input to be sorted into search engine, to pass through described search Engine scans for the text to be sorted, obtains search result.
In step S303, any search engine first deposited is can be used in the search engine, such as: Baidu search, 360 are searched Rope, Google search must should be searched for etc., with no restriction.It should include corresponding title, abstract in every search result, may be used also Include keyword.
Step S304: the input data is adjusted based on described search result, obtains input data adjusted; Wherein, may include process performed below:
1, key message is extracted from described search result.
Wherein, key message can be the heading message extracted in search result and/or summary info.Wherein, it is mentioning Top n search result may be selected when taking key message, N is positive integer, such as takes 1,2,3,4.Then, it will can directly search The title and/or abstract of hitch fruit input in integrated classifier collectively as input data, realize the expansion for treating classifying text Exhibition and explanation, improve the classification accuracy of integrated classifier.If not including text to be sorted in the search result of predetermined number, It can also be by the title and/or abstract of text to be sorted and search result collectively as input data.
In addition, also can extract the keyword in each search result, using keyword as the supplement for treating classifying text and Extension.Keyword can also can be extracted at random, with no restriction by manually demarcating.
2, the key message is added in the input data, obtains input data adjusted;Or by the pass Key information is as input data adjusted.
Wherein, for the same text to be sorted, when being adjusted according to described search result to the input data, Used same search result or same keyword are discharged outside when should all adjust the last time.
It should be noted that in the present embodiment, when obtaining the second classification results of characterization classification failure, step S302 is extremely Step S305 is recyclable to be executed, until obtaining terminating when the characterization successful classification results of classification.
To sum up, in the present embodiment, the method for the text classification, will be described to be sorted by receiving text to be sorted Text is input in integrated classifier as input data classifies, and obtains classification results.Wherein, classification results can characterize to The classification success or not of classifying text.If classification results characterization classification failure, by the text input to be sorted to searching Index scans in holding up, and obtains search result;It scans for can get in a search engine more related to text to be sorted The text information of connection, therefore search result can form the extension for treating classifying text.Then, based on described search result to described Input data is adjusted, and obtains input data adjusted;Input data adjusted is input to the integrated classifier It is middle to carry out subseries again, the resolution that integrated classifier treats classifying text can be improved, also further increase text to be sorted Classification accuracy.Therefore, the method for the invention causes to treat classifying text progress secondary classification in conjunction with search, solves existing Technology is complex to some semantemes and uncommon some text identification rates to be sorted are low, classification error or not accurate enough The problem of.
Fourth embodiment
Referring to Fig. 5, additionally providing a kind of sorter of text based on the same inventive concept, in the present embodiment.In Fig. 5 The functional block diagram of the sorter 400 of text is shown, specific described device includes: receiving module 401, screening module 402, the first determining module 403 and the second determining module 404.
Described device 400 specifically includes:
Receiving module 401, for obtaining text to be sorted;Screening module 402, for selected from default dictionary with it is described The similar multiple entries of text to be sorted, wherein multiple entries and each entry institute are stored in the default dictionary The classification of category, the entry belong to the multiple entry;First determining module 403 is used for according to the default dictionary, really Classification belonging to each entry in fixed the multiple entry;Second determining module 404, for according to the multiple Classification belonging to each entry in entry determines target category, and using the target category as described wait divide Classification belonging to class text.
As an alternative embodiment, screening module 402 also particularly useful for: successively calculate the text to be sorted with The editing distance of each entry in the default dictionary;The editing distance in the default dictionary is less than pre-determined distance Entry be determined as the entry.
As an alternative embodiment, second determining module 404 also particularly useful for: according to affiliated classification Difference is grouped the multiple entry, obtains Q group entry, wherein the class belonging to same group of the entry Not all the same, Q is positive integer;One group of most entry of number of entries is selected from the Q group entry, and will be belonging to this group of entry Classification as the target category.
As an alternative embodiment, further include matching module, for being selected from default dictionary and institute described Before stating the similar multiple entries of text to be sorted, according to the text to be sorted, in the default dictionary matching with The corresponding entry of the text to be sorted;If it fails to match, execute it is described from default dictionary select with it is described to be sorted The similar multiple entries of text.
As an alternative embodiment, matching module, also particularly useful for: according to the text to be sorted, described Entry identical with the text to be sorted is searched in default dictionary.
It should be noted that the sorter 400 of text provided by the embodiment of the present invention, specific implementation and generation Technical effect is identical with preceding method embodiment, and to briefly describe, Installation practice part does not refer to place, can refer to aforementioned side Corresponding contents in method embodiment.
5th embodiment
In addition, based on the same inventive concept, fifth embodiment of the invention additionally provides a kind of user terminal, including processor And memory, the memory are couple to the processor, the memory store instruction, when described instruction is by the processor The user terminal is set to execute following operation when execution:
Obtain text to be sorted.
Multiple entries similar with the text to be sorted are selected from default dictionary, wherein the default dictionary In be stored with classification belonging to multiple entries and each entry, the entry belongs to the multiple entry.
According to the default dictionary, classification belonging to each entry in the multiple entry is determined.
According to classification belonging to each entry in the multiple entry, target category is determined, and will be described Target category is as classification belonging to the text to be sorted.
It should be noted that in user terminal provided by the embodiment of the present invention, the specific implementation of above-mentioned each step and The technical effect of generation is identical with preceding method embodiment, and to briefly describe, the present embodiment does not refer to that place can refer to aforementioned side Corresponding contents in method embodiment.
Operating system and third party application are installed in the embodiment of the present invention, in user terminal.User terminal It can be tablet computer, mobile phone, laptop, PC (personal computer, personal computer), wearable device, vehicle The subscriber terminal equipments such as mounted terminal.
Fig. 6 shows a kind of module frame chart of exemplary user terminal 500.As shown in fig. 6, user terminal 500 includes depositing Reservoir 502, storage control 504, one or more (one is only shown in figure) processors 506, Peripheral Interface 508, network mould Block 510, input/output module 512, display module 514 etc..These components pass through one or more communication bus/signal wire 516 Mutually communication.
Memory 502 can be used for storing software program and module, such as the classification method of the text in the embodiment of the present invention And the corresponding program instruction/module of device, processor 506 by the software program that is stored in memory 502 of operation and Module, thereby executing various function application and data processing, such as classification method of text provided in an embodiment of the present invention.
Memory 502 may include high speed random access memory, may also include nonvolatile memory, such as one or more magnetic Property storage device, flash memory or other non-volatile solid state memories.Processor 506 and other possible components are to storage The access of device 502 can carry out under the control of storage control 504.
Various input/output devices are couple processor 506 and memory 502 by Peripheral Interface 508.In some implementations In example, Peripheral Interface 508, processor 506 and storage control 504 can be realized in one single chip.In some other reality In example, they can be realized by independent chip respectively.
Network module 510 is for receiving and transmitting network signal.Above-mentioned network signal may include wireless signal or have Line signal.
Input/output module 512 is used to be supplied to the interaction that user input data realizes user and user terminal.It is described defeated Entering output module 512 may be, but not limited to, mouse, keyboard and Touch Screen etc..
Display module 514 provides an interactive interface (such as user interface) between user terminal 500 and user Or it is referred to for display image data to user.In the present embodiment, the display module 514 can be liquid crystal display or Touch control display.It can be the capacitance type touch control screen or resistance-type of support single-point and multi-point touch operation if touch control display Touch screen etc..Support single-point and multi-point touch operation refer to touch control display can sense on the touch control display one or The touch control operation generated simultaneously at multiple positions, and the touch control operation that this is sensed transfers to processor to be calculated and handled.
It is appreciated that structure shown in fig. 6 is only to illustrate, user terminal 500 may also include it is more than shown in Fig. 6 or Less component, or with the configuration different from shown in Fig. 5.Each component shown in Fig. 5 can using hardware, software or its Combination is realized.
Sixth embodiment
Sixth embodiment of the invention provides a kind of computer storage medium, point of the text in fourth embodiment of the invention If the integrated functional module of class device is realized and when sold or used as an independent product in the form of software function module, It can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned first embodiment All or part of the process into the classification method of the text of 3rd embodiment can also instruct correlation by computer program Hardware complete, the computer program can be stored in a computer readable storage medium, the computer program is in quilt When processor executes, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer program Code, the computer program code can be source code form, object identification code form, executable file or certain intermediate forms Deng.The computer-readable medium may include: any entity or device, record that can carry the computer program code Medium, USB flash disk, mobile hard disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), with Machine access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc.. It should be noted that the content that the computer-readable medium includes can be according to legislation and patent practice in jurisdiction It is required that carrying out increase and decrease appropriate, such as in certain jurisdictions, do not wrapped according to legislation and patent practice, computer-readable medium Include electric carrier signal and telecommunication signal.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein. Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments in this include institute in other embodiments Including certain features rather than other feature, but the combination of the feature of different embodiment means in the scope of the present invention Within and form different embodiments.For example, in the following claims, embodiment claimed it is any it One can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) are whole to realize the sorter of text according to an embodiment of the present invention, user The some or all functions of some or all components at end.The present invention is also implemented as described herein for executing Some or all device or device programs (for example, computer program and computer program product) of method.In this way Realization program of the invention can store on a computer-readable medium, or can have the shape of one or more signal Formula.Such signal can be downloaded from an internet website to obtain, and perhaps be provided on the carrier signal or with any other shape Formula provides.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.
The invention discloses A1, a kind of classification method of text, which comprises
Obtain text to be sorted;Multiple entries similar with the text to be sorted are selected from default dictionary, In, classification belonging to multiple entries and each entry is stored in the default dictionary, the entry belongs to described more A entry;According to the default dictionary, classification belonging to each entry in the multiple entry is determined;According to institute Classification belonging to each entry in multiple entries is stated, determines target category, and using the target category as institute State classification belonging to text to be sorted.
A2. method according to a1, it is described similar multiple with the text to be sorted from selecting in default dictionary Entry, comprising:
Successively calculate the editing distance of each entry in the text to be sorted and the default dictionary;It will be described default The entry that the editing distance in dictionary is less than pre-determined distance is determined as the entry.
A3. method according to a1, the classification according to belonging to each entry, determines target category, Include:
According to the difference of affiliated classification, the multiple entry is grouped, obtains Q group entry, wherein be located at Classification belonging to same group of entry is all the same, and Q is positive integer;It is most that number of entries is selected from the Q group entry One group of entry, and using classification belonging to this group of entry as the target category.
A4. method according to a1, it is described that multiple mesh similar with the text to be sorted are selected from default dictionary Before mark entry, further includes:
According to the text to be sorted, the matching entry corresponding with the text to be sorted in the default dictionary;If It fails to match, then executes and described select multiple entries similar with the text to be sorted from default dictionary.
A5. method according to a4, it is described according to the text to be sorted, in the default dictionary matching with it is described It the step of text to be sorted corresponding entry, specifically includes:
According to the text to be sorted, entry identical with the text to be sorted is searched in the default dictionary.
The invention also discloses a kind of sorter of text of B6., described device includes:
Receiving module, for obtaining text to be sorted;Screening module, for selected from default dictionary with it is described to be sorted The similar multiple entries of text, wherein class belonging to multiple entries and each entry is stored in the default dictionary Not, the entry belongs to the multiple entry;First determining module, for determining described more according to the default dictionary Classification belonging to each entry in a entry;Second determining module, for according in the multiple entry Each entry belonging to classification, determine target category, and using the target category as belonging to the text to be sorted Classification.
B7. the device according to B6, screening module also particularly useful for:
Successively calculate the editing distance of each entry in the text to be sorted and the default dictionary;It will be described default The entry that the editing distance in dictionary is less than pre-determined distance is determined as the entry.
B8. the device according to B6, second determining module also particularly useful for:
According to the difference of affiliated classification, the multiple entry is grouped, obtains Q group entry, wherein be located at Classification belonging to same group of entry is all the same, and Q is positive integer;It is most that number of entries is selected from the Q group entry One group of entry, and using classification belonging to this group of entry as the target category.
B9. the device according to B6, further includes: matching module, for it is described selected from default dictionary with it is described Before the similar multiple entries of text to be sorted, according to the text to be sorted, matching and institute in the default dictionary State the corresponding entry of text to be sorted;If it fails to match, execute described from selecting and the text to be sorted in default dictionary This similar multiple entry.
B10. the device according to B9, matching module, also particularly useful for:
According to the text to be sorted, entry identical with the text to be sorted is searched in the default dictionary.
The invention discloses a kind of user terminal of C11., the user terminal includes processor and memory, the storage Device is couple to the processor, and the memory store instruction makes the user when executed by the processor Terminal executes the step of any one of A1-A5 the method.
The invention discloses a kind of computer readable storage mediums of D12., are stored thereon with computer program, the program quilt The step of any one of A1-A5 the method is realized when processor executes.

Claims (10)

1. a kind of classification method of text characterized by comprising
Obtain text to be sorted;
Multiple entries similar with the text to be sorted are selected from default dictionary, wherein deposit in the default dictionary Classification belonging to multiple entries and each entry is contained, the entry belongs to the multiple entry;
According to the default dictionary, classification belonging to each entry in the multiple entry is determined;
According to classification belonging to each entry in the multiple entry, target category is determined, and by the target Classification is as classification belonging to the text to be sorted.
2. the method according to claim 1, wherein described from selecting and the text to be sorted in default dictionary This similar multiple entry, comprising:
Successively calculate the editing distance of each entry in the text to be sorted and the default dictionary;
The entry that the editing distance in the default dictionary is less than pre-determined distance is determined as the entry.
3. the method according to claim 1, wherein the classification according to belonging to each entry, Determine target category, comprising:
According to the difference of affiliated classification, the multiple entry is grouped, obtains Q group entry, wherein is located at same Classification belonging to the entry of group is all the same, and Q is positive integer;
One group of most entry of number of entries is selected from the Q group entry, and using classification belonging to this group of entry as described in Target category.
4. the method according to claim 1, wherein described select and the text to be sorted from default dictionary Before similar multiple entries, further includes:
According to the text to be sorted, the matching entry corresponding with the text to be sorted in the default dictionary;
If it fails to match, executes and described select multiple target words similar with the text to be sorted from default dictionary Item.
5. according to the method described in claim 4, it is characterized in that, described according to the text to be sorted, in the default word In library the step of matching corresponding with the text to be sorted entry, specifically include:
According to the text to be sorted, entry identical with the text to be sorted is searched in the default dictionary.
6. a kind of sorter of text characterized by comprising
Receiving module, for obtaining text to be sorted;
Screening module, for selecting multiple entries similar with the text to be sorted from default dictionary, wherein described It is stored with classification belonging to multiple entries and each entry in default dictionary, the entry belongs to the multiple entry;
First determining module, for determining each entry institute in the multiple entry according to the default dictionary The classification of category;
Second determining module determines target for the classification according to belonging to each entry in the multiple entry Classification, and using the target category as classification belonging to the text to be sorted.
7. device according to claim 6, which is characterized in that screening module also particularly useful for:
Successively calculate the editing distance of each entry in the text to be sorted and the default dictionary;By the default dictionary In the editing distance be less than pre-determined distance entry be determined as the entry.
8. device according to claim 6, which is characterized in that second determining module also particularly useful for:
According to the difference of affiliated classification, the multiple entry is grouped, obtains Q group entry, wherein is located at same Classification belonging to the entry of group is all the same, and Q is positive integer;One group of number of entries at most is selected from the Q group entry Entry, and using classification belonging to this group of entry as the target category.
9. a kind of user terminal, which is characterized in that including processor and memory, the memory is couple to the processor, The memory store instruction makes the user terminal perform claim require 1-5 when executed by the processor Any one of the method the step of.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The step of any one of claim 1-5 the method is realized when execution.
CN201811368735.4A 2018-11-16 2018-11-16 A kind of classification method and device of text Pending CN109684467A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811368735.4A CN109684467A (en) 2018-11-16 2018-11-16 A kind of classification method and device of text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811368735.4A CN109684467A (en) 2018-11-16 2018-11-16 A kind of classification method and device of text

Publications (1)

Publication Number Publication Date
CN109684467A true CN109684467A (en) 2019-04-26

Family

ID=66185834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811368735.4A Pending CN109684467A (en) 2018-11-16 2018-11-16 A kind of classification method and device of text

Country Status (1)

Country Link
CN (1) CN109684467A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767403A (en) * 2020-07-07 2020-10-13 腾讯科技(深圳)有限公司 Text classification method and device
CN111782727A (en) * 2020-06-28 2020-10-16 平安医疗健康管理股份有限公司 Data processing method and device based on machine learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984422A (en) * 2010-10-18 2011-03-09 百度在线网络技术(北京)有限公司 Fault-tolerant text query method and equipment
CN107436875A (en) * 2016-05-25 2017-12-05 华为技术有限公司 File classification method and device
CN107844559A (en) * 2017-10-31 2018-03-27 国信优易数据有限公司 A kind of file classifying method, device and electronic equipment
CN108563722A (en) * 2018-04-03 2018-09-21 有米科技股份有限公司 Trade classification method, system, computer equipment and the storage medium of text message

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984422A (en) * 2010-10-18 2011-03-09 百度在线网络技术(北京)有限公司 Fault-tolerant text query method and equipment
CN107436875A (en) * 2016-05-25 2017-12-05 华为技术有限公司 File classification method and device
CN107844559A (en) * 2017-10-31 2018-03-27 国信优易数据有限公司 A kind of file classifying method, device and electronic equipment
CN108563722A (en) * 2018-04-03 2018-09-21 有米科技股份有限公司 Trade classification method, system, computer equipment and the storage medium of text message

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782727A (en) * 2020-06-28 2020-10-16 平安医疗健康管理股份有限公司 Data processing method and device based on machine learning
CN111782727B (en) * 2020-06-28 2022-08-12 深圳平安医疗健康科技服务有限公司 Data processing method and device based on machine learning
CN111767403A (en) * 2020-07-07 2020-10-13 腾讯科技(深圳)有限公司 Text classification method and device
CN111767403B (en) * 2020-07-07 2023-10-31 腾讯科技(深圳)有限公司 Text classification method and device

Similar Documents

Publication Publication Date Title
CN109684627A (en) A kind of file classification method and device
CN111177569B (en) Recommendation processing method, device and equipment based on artificial intelligence
CN112632385B (en) Course recommendation method, course recommendation device, computer equipment and medium
US11334635B2 (en) Domain specific natural language understanding of customer intent in self-help
CN106649818B (en) Application search intention identification method and device, application search method and server
CN103514299B (en) Information search method and device
US8082264B2 (en) Automated scheme for identifying user intent in real-time
CN110532451A (en) Search method and device for policy text, storage medium, electronic device
CN109582792A (en) A kind of method and device of text classification
US11507989B2 (en) Multi-label product categorization
CN111324771B (en) Video tag determination method and device, electronic equipment and storage medium
CN109800307A (en) Analysis method, device, computer equipment and the storage medium of product evaluation
CN110597978B (en) Article abstract generation method, system, electronic equipment and readable storage medium
US11734322B2 (en) Enhanced intent matching using keyword-based word mover's distance
CN112256845A (en) Intention recognition method, device, electronic equipment and computer readable storage medium
Aralikatte et al. Fault in your stars: an analysis of android app reviews
CN109684467A (en) A kind of classification method and device of text
CN110647504B (en) Method and device for searching judicial documents
CN114037545A (en) Client recommendation method, device, equipment and storage medium
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN114742062B (en) Text keyword extraction processing method and system
CN110389963A (en) The recognition methods of channel effect, device, equipment and storage medium based on big data
CN108733702B (en) Method, device, electronic equipment and medium for extracting upper and lower relation of user query
CN115221323A (en) Cold start processing method, device, equipment and medium based on intention recognition model
CN113366511B (en) Named entity identification and extraction using genetic programming

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination