CN107844559A - A kind of file classifying method, device and electronic equipment - Google Patents

A kind of file classifying method, device and electronic equipment Download PDF

Info

Publication number
CN107844559A
CN107844559A CN201711051376.5A CN201711051376A CN107844559A CN 107844559 A CN107844559 A CN 107844559A CN 201711051376 A CN201711051376 A CN 201711051376A CN 107844559 A CN107844559 A CN 107844559A
Authority
CN
China
Prior art keywords
word
dictionary
complaint
matched
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711051376.5A
Other languages
Chinese (zh)
Inventor
张斌德
夏耘海
王甲樑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guoxin Youe Data Co Ltd
Original Assignee
Guoxin Youe Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guoxin Youe Data Co Ltd filed Critical Guoxin Youe Data Co Ltd
Priority to CN201711051376.5A priority Critical patent/CN107844559A/en
Publication of CN107844559A publication Critical patent/CN107844559A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The embodiment of the present invention provides a kind of file classifying method, device and electronic equipment, belongs to technical field of data processing.Methods described includes:Complaint text to be sorted is subjected to word segmentation processing, obtains multiple words to be matched;The dictionary of the multiple word to be matched complaint problem different from sign is matched respectively, obtains matching result;Classification is complained according to belonging to the matching result determines the complaint text to be sorted, to classify to above-mentioned complaint text to be sorted;Wherein, multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems.The multiple dictionaries obtained in this method by training in advance, allow to multiple words to be matched and dictionary matching, it is hereby achieved that more accurate matching result, complaint text to be sorted can be subjected to Accurate classification, the complaint text realized for different complaint problems has higher nicety of grading, improves the performance of text classification.

Description

A kind of file classifying method, device and electronic equipment
Technical field
The present invention relates to technical field of data processing, is set in particular to a kind of file classifying method, device and electronics It is standby.
Background technology
With the development of computer technology, increasing enterprise, tissue and government organs etc. are dependent at computer All kinds of affairs are managed, in this course, continuously produce substantial amounts of electronic document.In routine duties or carry out archives During management, generally require and these electronic documents are divided into specific classification, still, the present of explosive increase is presented in data volume My god, some enterprises' possibility just produce several TB data in one day, correspond to thousands of electronic document, it is manually discriminated Undoubtedly efficiency is low for other and management, and as computer implemented automatic classification has brought very big facility, but due to text This classification has the characteristics that higher-dimension, high degree of rarefication, and the performance of text classification is not met by the actual demand of people, also had Very big room for improvement.
And as the fast development of E-Government, the center of gravity of Government Websites Construction are shifted, it is main from first stage of construction Send out news information resource various for each department of government, turned to for the purpose of the supervision function and service level that improve government, Should be from the real work of website, the working system of constituting criterion government website, lift service awareness and government website Capax negotii;Strengthen the cooperation of website and government affairs, expand government website and popular interaction;Establish efficient complaint body System, strengthen supervision.There is substantial amounts of complaint and suggestion text data with daily, so, how complaint text to be carried out soon Fast accurate classification is current urgent problem.
The content of the invention
In view of this, the purpose of the embodiment of the present invention is to provide a kind of file classifying method, device and electronic equipment, its Can effectively solve the problems, such as in the prior art can not be to complaining text classification accuracy low.
In a first aspect, the embodiments of the invention provide a kind of file classifying method, methods described includes:By complaint to be sorted Text carries out word segmentation processing, obtains multiple words to be matched;By the multiple word to be matched complaint problem different from sign Dictionary is matched respectively, obtains matching result;The complaint according to belonging to the matching result determines the complaint text to be sorted Classification;Wherein, multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems.
Second aspect, the embodiments of the invention provide a kind of device for sorting document, described device includes:Word segmentation processing mould Block, for complaint text to be sorted to be carried out into word segmentation processing, obtain multiple words to be matched;Matching module, for will be described more The dictionary of individual word to be matched complaint problem different from sign is matched respectively, obtains matching result;Sort module, for root Determine to complain classification belonging to the complaint text to be sorted according to the matching result;Wherein, it is described to characterize different complaint problems Multiple history complaint text is trained to obtain by dictionary.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, and the electronic equipment includes processor and storage Device, the memory are couple to the processor, the memory store instruction, when executed by the processor The electronic equipment performs following operate:Complaint text to be sorted is subjected to word segmentation processing, obtains multiple words to be matched;By institute The dictionary for stating multiple words to be matched complaint problem different from sign is matched respectively, obtains matching result;According to described Determine to complain classification belonging to the complaint text to be sorted with result;Wherein, the dictionary for characterizing different complaint problems is to incite somebody to action Multiple history complain text to be trained what is obtained.
Fourth aspect, the embodiment of the present invention provide a kind of read/write memory medium, it is characterised in that described that storage can be read For media storage in computer, the read/write memory medium includes a plurality of instruction, and a plurality of instruction is configured so that meter Calculation machine performs the file classifying method provided such as first aspect.
The embodiment of the present invention provides a kind of file classifying method, device and electronic equipment, first by by complaint to be sorted Text carries out word segmentation processing, multiple words to be matched is obtained, then by multiple words to be matched complaint problem different from sign Dictionary is matched respectively, obtains matching result, wherein, the dictionary for characterizing different complaint problems is to complain multiple history Text is trained what is obtained, then classification is complained according to belonging to matching result determines the complaint text to be sorted, with to upper State complaint text to be sorted to be classified, the multiple dictionaries obtained by training in advance in this method so that can treat multiple Word and dictionary matching are matched, it is hereby achieved that more accurate matching result, it is accurate to carry out complaint text to be sorted Classification, the complaint text realized for different complaint problems have higher nicety of grading, improve the performance of text classification.
Other features and advantages of the present invention will illustrate in subsequent specification, also, partly become from specification It is clear that or by implementing understanding of the embodiment of the present invention.The purpose of the present invention and other advantages can be by saying what is write Specifically noted structure is realized and obtained in bright book, claims and accompanying drawing.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by embodiment it is required use it is attached Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 shows a kind of structured flowchart for the electronic equipment that can be applied in the embodiment of the present invention;
Fig. 2 is a kind of flow chart for file classifying method that first embodiment of the invention provides;
Fig. 3 is a kind of structured flowchart for device for sorting document that second embodiment of the invention provides;
Fig. 4 is a kind of structured flowchart for matching module that second embodiment of the invention provides;
Fig. 5 is the structural representation of another electronic equipment provided in an embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Ground describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Generally exist The component of the embodiment of the present invention described and illustrated in accompanying drawing can be configured to arrange and design with a variety of herein.Cause This, the detailed description of the embodiments of the invention to providing in the accompanying drawings is not intended to limit claimed invention below Scope, but it is merely representative of the selected embodiment of the present invention.Based on embodiments of the invention, those skilled in the art are not doing The every other embodiment obtained on the premise of going out creative work, belongs to the scope of protection of the invention.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent accompanying drawing in individual accompanying drawing.Meanwhile the present invention's In description, term " first ", " second " etc. are only used for distinguishing description, and it is not intended that instruction or hint relative importance.
Fig. 1 shows a kind of structured flowchart for the electronic equipment 100 that can be applied in the embodiment of the present invention.As shown in figure 1, Electronic equipment 100 includes memory 101, storage control 102, one or more (one is only shown in figure) processors 103, outer If interface 104, radio-frequency module 105, audio-frequency module 106, Touch Screen 107 etc..These components are total by one or more communication Line/signal wire 108 mutually communicates.
Memory 101 can be used for storage software program and module, such as the file classifying method pair in the embodiment of the present invention Programmed instruction/the module answered, processor 103 is stored in software program and module in memory 101 by operation, so as to hold Row various function application and data processing, such as file classifying method provided in an embodiment of the present invention.
Memory 101 may include high speed random access memory, may also include nonvolatile memory, such as one or more magnetic Property storage device, flash memory or other non-volatile solid state memories.Processor 103 and other possible components are to storage The access of device 101 can be carried out under the control of storage control 102.
Various input/output devices are coupled to processor 103 and memory 101 by Peripheral Interface 104.In some implementations In example, Peripheral Interface 104, processor 103 and storage control 102 can be realized in one single chip.In some other reality In example, they can be realized by independent chip respectively.
Radio-frequency module 105 is used to receiving and sending electromagnetic wave, realizes the mutual conversion of electromagnetic wave and electric signal, so that with Communication network or other equipment are communicated.
Audio-frequency module 106 provides a user COBBAIF, and it may include one or more microphones, one or more raises Sound device and voicefrequency circuit.
Touch Screen 107 provides an output and inputting interface simultaneously between electronic equipment 100 and user.Specifically, Touch Screen 107 shows video frequency output to user, and the contents of these video frequency outputs may include word, figure, video and its any Combination.
It is appreciated that the structure shown in Fig. 1 is only to illustrate, the electronic equipment 100 may also include more more than shown in Fig. 1 Either less component or there is the configuration different from shown in Fig. 1.Each component shown in Fig. 1 can use hardware, software Or its combination is realized.
First embodiment
It refer to Fig. 2, a kind of flow chart for file classifying method that Fig. 2 provides for first embodiment of the invention, the side Method is applied to device for sorting document, and this document sorter runs on above-mentioned electronic equipment, and methods described includes:
Step S110:Complaint text to be sorted is subjected to word segmentation processing, obtains multiple words to be matched.
For electronic document, " keyword " can be used to represent analysis and understand all features involved during document, closed Keyword such as " taxi ", " share-car ", " fee register " etc., certainly, for different main bodys, such as bank, government organs and one As enterprise, determining the keyword of the classification when institute foundation of electronic document may differ, in the electronics being related to for some enterprises When document is classified, above-mentioned keyword can be rule of thumb predefined.
When needing the electronic document of multiple customer complaints to receiving to classify such as government organs, first to electronic document Pre-processed, i.e., word segmentation processing is carried out to the complaint text to be sorted of acquisition, the complaint text to be sorted is above-mentioned electronics Document.Wherein, word segmentation processing is carried out to the complaint text to be sorted, it is necessary first to identify minimum semantic primitive therein, make For a kind of embodiment, the Chinese Word Automatic Segmentation that Lucene search engines can be used to carry carries out word segmentation processing, and Lucene has The Chinese analysis device of their own, wherein mainly StandardAnalyzer and CJKAnalyzer.StandardAnalyzer Analyzer is using individual character participle method, and CJKAnalyzer analyzers use dichotomy.
Character string matching method most commonly is based in the Chinese Word Automatic Segmentation of Lucene search engines, it is basic herein Above there are a kind of positive word matching segmentation methods that most increase, the positive word matching segmentation methods that most increase realize that thought is to prepare one The dictionary of participle, then the sentence of input is from left to right scanned using algorithm, the purpose is to by the word in sentence Symbol string is matched one by one with the entry in dictionary.Matching field is since a word, constantly increases word in matching, until matching Untill not going down, each round terminates obtained result, and take maximum can be with the current matching field that the match is successful, for example, treating point Class complains a word for scanning in text as " today It is gloomy heavy ", have in dictionary " today ", " weather ", " my god ", Words such as " cloudy heavy ", then since " the present " word, is scanned successively backward, takes " the present ", " today ", " day today ", " today day respectively Gas ", " today, weather was cloudy ", " It is gloomy today ", " today, It is gloomy sank ", " today, It is gloomy sank " are matched, Most long matched character string is " today " in dictionary, then the word is split out, next since " my god " scan word, repeat Aforesaid operations, it is as a result " today/weather/cloudy heavy/", and for its each word mark part of speech, wherein, noun, verb, number The parts of speech such as word, adjective, preposition, auxiliary word, conjunction, punctuate mark is respectively the symbols such as n, v, m, a, p, u, c, wp, for example, will " today " is labeled as noun, then will should (today, weather, it is cloudy heavy) be used as initial word set, certainly, for subsequent match Accuracy, also need to delete initial word and concentrate word that is conventional and having little significance, be referred to as stop words, such as:, be, etc. Word, so being in the above-mentioned word that is obtained after stop words of removing:It is today, weather, cloudy heavy, then can using these words as A word to be sorted for complaining text carries out the word to be matched obtained after word segmentation processing, and in this approach, can obtain entire chapter should It is to be sorted to complain text to carry out the multiple words to be matched obtained after word segmentation processing.
Step S120:The dictionary of the multiple word to be matched complaint problem different from sign is matched respectively, obtained Take matching result.
Multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems, and the dictionary is corresponding not With complaint Question Classification, and the weight of each word in each dictionary is in default proportion range.
For example, when complaining file to carry out classification processing to government, can be classified for different complaint problems, Such as:Taxi problem, communication medium problem, bus problem, parking problem etc., then each complaint problem is predefined more Individual keyword, e.g., confirmable multiple keywords are in taxi problem:Taxi, joining-person, fee register, call a taxi, raise the price Deng word, confirmable multiple keywords are in communication medium problem:Broadband, phone, UNICOM, CHINAUNICOM, dial Word, the word that above-mentioned keyword is concentrated for the semanteme of following determinations.
Then complain text to be trained multiple history of acquisition, the plurality of history is complained in text each gone through first History complains text to carry out above-mentioned word segmentation processing, the dictionary formed with the word of the different complaint problems of determination sign, for each Dictionary, semanteme and the dictionary of each word included according to the dictionary characterize the height of the correlation degree of complaint problem, will Each multiple semantic collection of word division, and can include for proportion range, each semantic collection corresponding to each semantic collection distribution with not It is to determine weight in proportion range corresponding to each word from affiliated semantic collection with the multiple keywords complained under Question Classification, Wherein, it is bigger to distribute proportion range respective weights for the semantic collection with complaining problem correlation degree higher.
For example in taxi Question Classification, the semantic collection 1 of division is (taxi, joining-person, fee register), semanteme collection 2 is (call a taxi, raise the price), the semanteme collect the correlation degree highest of each word and the taxi problem in 1, can be the power of its distribution Weight scope is 0.9-0.98, and semanteme integrates the proportion range of 2 distribution as 0.8-0.89, if calculating the power of the word in semantic collection 1 When again not in the range of 0.9-0.98, it is likely that represent the weight inaccuracy calculated, text classification mistake may be finally resulted in The problem of, it is possible to the weight that the semanteme is concentrated is redistributed, if for example, the weight for calculating " taxi " is 0.85, not in the range of above-mentioned 0.9-0.98, then the word " taxi " is redistributed into new weight so that " taxi " New weight be in the range of 0.9-0.98, wherein, can be in default proportion range as a kind of mode, i.e. 0.9-0.98 In the range of randomly select a weight as new weight distribution to " taxi ", if weight selection is 0.95 to distribute to and " hire out The new weight of car ", i.e. " taxi " is redefined as 0.95.
Furthermore it is also possible to which the semanteme to determine concentrates each word to distribute a proportion range in advance, e.g., asked in taxi In topic classification, word " taxi " may be considered a word for occurring that frequency is larger in such problem, so, can be with A larger proportion range is distributed for it, such as 0.9-0.98, the proportion range for being word " share-car " distribution is 0.87-0.89. If the weight for then calculating " taxi " is 0.85, then it represents that its weight is not in default proportion range, it is likely that represents meter The weight that calculates is inaccurate, the problem of may finally resulting in text classification mistake, it is possible to will " taxi " weight progress Redistribute, will the word " taxi " redistribute new weight so that the new weight of " taxi " is in default weight model In enclosing, i.e. 0.9-0.98, wherein, as a kind of mode, a weight can be randomly selected in default proportion range as new Weight distribution gives " taxi ", and if weight selection is 0.95 to distribute to " taxi ", i.e. the new weight of " taxi " redefines For 0.95.
In addition, as a kind of embodiment, an also settable computation rule, if for example, calculating the weight of " taxi " Not in default proportion range, then the weight of current " taxi " is added into a preset value, be used as new weight so that at new weight In in default proportion range.Certainly, the preset value can set smaller, and such as 0.1 or 0.05, if at current " taxi " Weight add after the preset value obtained new weight again without in default proportion range, then can also be in new weight On the basis of add preset value, the new weight to the last obtained is in default proportion range.
Certainly, alternatively embodiment, be also based on each history complain text determine to characterize first it is different The dictionary that the word of complaint problem is formed, each word in the dictionary semantic collect now without distribution weight to be each Corresponding proportion range is assigned, so again to determine weight in proportion range corresponding to each word from affiliated semantic collection, The proportion range for integrate 1 (taxi, joining-person, fee register) distribution such as semanteme is then each word of semanteme concentration as 0.9-0.98 The weight being randomly assigned in a 0.9-0.98 proportion range, such as it is that " taxi " distribution weight is 0.97, is distributed for " joining-person " Weight is 0.95, is that " fee register " distribution weight is 0.9.
By the above method, the semantic new weight for concentrating each word can be obtained, is then based on different classifications, described above Taxi problem, communication medium problem etc., establish multiple dictionaries, i.e., establish a dictionary under each classification, included in the dictionary Multiple words and its corresponding new weight.
Wherein, TF-IDF algorithms can be used to obtain each word to be matched in the complaint text to be sorted in the present embodiment The TF-IDF values of language, the weight using the TF-IDF values of word to be matched as the word to be matched.
TF-IDF (term frequency-inverse document frequency) be it is a kind of be used for information retrieval with The conventional weighting technique that information is prospected.TF-IDF is a kind of statistical method, to assess a words for a file set or one The significance level of a copy of it file in individual corpus.The number that the importance of words occurs hereof with it is directly proportional Increase, but the frequency that can occur simultaneously with it in corpus is inversely proportional decline.
TF-IDF main thought is:If the frequency TF that some word or phrase occur in an article is high, and Seldom occur in other articles, then it is assumed that this word has good class discrimination ability, is adapted to point to come.TF-IDF is actual On be TF*IDF, TF word frequency (Term Frequency), IDF inverse document frequencies (Inverse Document Frequency). TF represents the frequency that entry occurs in document d, and IDF main thought is:If the document comprising entry t is fewer, IDF is got over Greatly, then illustrate that entry t has good class discrimination ability.If the number of files comprising entry t is m in certain a kind of document C, and The total number of documents that other classes include t is k, it is clear that all number of files n=m+k comprising t, when m is big, according to IDF formula Obtained IDF value can be small, just illustrates that entry t class discriminations are indifferent.So in actual applications, if an entry Frequently occurred in the document of a class, then illustrate that the entry can represent the feature of text of this class, such word very well Bar should give them to assign higher weights, and select and be used as the Feature Words of the class text to distinguish and other class documents.
Specifically, the TF-IDF values for obtaining each word calculate and obtain each word in complaint text to be sorted and exist first It is affiliated it is to be sorted complaint text in word frequency TF, some word of word frequency TF=it is affiliated it is to be sorted complaint text in occurrence number/should The total word number to be sorted for complaining text, its calculation formula areWherein ni,jIt is that the word goes out in affiliated text Existing number, denominator is represented in the text so the occurrence number sum of words, if word " taxi " is affiliated to be sorted It is 300 times to complain the occurrence number in text, and this is to be sorted, and to complain total word number of text be 1200, then the word " taxi " Word frequency TF=300/1200=0.25.Then the inverse document frequency IDF of each word, inverse document frequency IDF=log are obtained again (number of files+1 of the total number of documents of corpus/the include word), its calculation formula is Wherein | D | the total number of documents in corpus is represented, | { j:ti∈dj| represent the number of files for including the word.Each word is based on again The word frequency TF and inverse document frequency IDF of language, obtain the TF-IDF values of each word, i.e. TF-IDF values=word frequency TF* of word is inverse Document frequency IDF.
Thus the TF-IDF values to be sorted for complaining each word to be matched in text can be obtained, certainly, for one History complains text, and the TF-IDF values of each word in each dictionary can be also obtained by the above method, and the history is complained into text Each word in this carries out descending arrangement with TF-IDF values, wherein, as a kind of mode, it can use each history and complain in text 100 words for coming foremost form dictionary as semantic collection.
Tables 1 and 2 is can refer to, it is multiple dictionaries that some government organs establishes for different complaint problems, and table 1 is The multiple history obtained complain text, and table 2 is the multiple dictionaries established based on different complaint Question Classifications.
Table 1
Table 2
Then multiple words to be matched of above-mentioned acquisition are matched respectively with multiple dictionaries of above-mentioned foundation, will be treated Multiple words to be matched that classification complains text obtained after word segmentation processing are matched with the word in multiple dictionaries.Specifically Ground, weight of each word to be matched in above-mentioned complaint text to be sorted in multiple words to be matched is obtained first, will be each Word to be matched is vectorial as the first word frequency, for example, for sentence " this leather boots number is big, and that number is suitable ", will The sentence obtained after being segmented " this/leather boots/number/big, that/number/suitable ", calculate the word frequency of each word, i.e., Weight, weight corresponding to each of which word are:This 1, leather boots 1, number 2 is big by 1, that 1, suitable 1, not 0, it is small by 0, more 0.
Then each dictionary is directed to, is retrieved as the weight of each word distribution in the dictionary, obtains corresponding to the dictionary the Two word frequency vector, i.e., multiple second word frequency vectors are the word frequency vector for different complaint Question Classifications, and each classification obtains one Individual second word frequency vector, as shown in Table 2 above, according still further to default similarity mode algorithm, by first word frequency vector difference Corresponding second word frequency vector carries out similarity mode successively respectively with each dictionary, the second word frequency vector until determining matching Then stop continuing to match, and obtain matching result.
KNN (k-NearestNeighbor, nearest neighbor algorithm), simple pattra leaves can be used by carrying out the method for Similarity Measure This, SVMs, neutral net, decision tree, included angle cosine algorithm the methods of, in the present embodiment, preset described default similar Degree matching algorithm is included angle cosine algorithm, is illustrated below by taking included angle cosine algorithm as an example.
The vectorial included angle cosine between any second word frequency vector of first word frequency is determined in the following way, is completed Similarity mode:
If it is A=[A by the first word frequency vector representation1,A2...An], the second word frequency vector representation is B=[B1, B2...Bn], included angle cosine formula isSpecifically, if such as The weight of each word in the above-mentioned sentence " this leather boots number is big, and that number is suitable " calculated, as the first word frequency to Measure A=[1,1,2,1,1,1,0,0,0], if a certain first complain the lower word of classification be " this/small leather boots/number/or not, that Only/more/suitable ", its each self-corresponding weight in dictionary is:This 1, leather boots 1, number 1 is big by 0, that 1, suitable 1, no 1, it is small by 1, more 1, if it is " this/BMW/very/have type " to complain the word under classification another second, its in dictionary each Corresponding weight is:This 1, BMW 2, very 0, there is type 1;Then corresponding second word frequency vector can be B1=[1,1,1,0, 1,1,1,1,1] and B2=[1,2,0,1], then tried to achieve respectively according to above-mentioned included angle cosine formulaSo, it can be deduced that matching result is above-mentioned The value tried to achieve according to included angle cosine formula.
Step S130:Classification is complained according to belonging to the matching result determines the complaint text to be sorted.
In the matching result for obtaining multiple words to be matched and being matched respectively with dictionary, as obtained in step S120 The similarity of first word frequency vector and the second word frequency vector, wherein, multiple similarities of acquisition are compared, such as the first word frequency Vectorial A and the second word frequency vector B1 similarity is higher than with the second word frequency vector B2 similarity, then by complaint text to be sorted It is categorized into above-mentioned first to complain in classification, thus can completes the classification to complaint text to be sorted.
Or threshold value can be set, determine to treat if the similarity obtained with certain word frequency vector reaches the threshold value of setting point Class complains the classification of text to belong to classification corresponding to certain word frequency vector.
For example, its result classified by the above method to complaint text to be sorted is as shown in table 3 below.
Table 3
It can be seen that establishing dictionary with the above method, then complaint text to be sorted is classified with classification again, had Higher nicety of grading.
First embodiment of the invention provides a kind of file classifying method, first by the way that complaint text to be sorted is segmented Processing, multiple words to be matched are obtained, then carry out the dictionary of multiple words to be matched complaint problem different from sign respectively Matching, matching result is obtained, wherein, the dictionary for characterizing different complaint problems is to be trained multiple history complaint text Obtain, then classification is complained according to belonging to matching result determines the complaint text to be sorted, with to above-mentioned complaint to be sorted Text is classified, the multiple dictionaries obtained in this method by training in advance so that can be by multiple words to be matched and word Allusion quotation matches, it is hereby achieved that more accurate matching result, can carry out Accurate classification by complaint text to be sorted, realize pin Complaint text to different complaint problems has higher nicety of grading, improves the performance of text classification.
Second embodiment
It refer to Fig. 3, a kind of structured flowchart for device for sorting document 200 that Fig. 3 provides for second embodiment of the invention, institute The file classifying method that device is used to perform first embodiment offer is stated, described device includes:
Word segmentation processing module 210, for complaint text to be sorted to be carried out into word segmentation processing, obtain multiple words to be matched.
Matching module 220, for the dictionary of the multiple word to be matched complaint problem different from sign to be carried out respectively Matching, obtain matching result.
Sort module 230, for according to the matching result determine it is described it is to be sorted complaint text belonging to complain classification.
Wherein, multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems.
Described device also includes:
Dictionary acquisition module, for complaining each history in text to complain text to carry out at participle to the multiple history Reason, it is determined that the dictionary that the word for characterizing different complaint problems is formed.
Weight distribution module, for for each dictionary, semanteme and the dictionary of each word included according to the dictionary The height of the correlation degree of characterized complaint problem, each word is divided into semantic collection, and to be weighed corresponding to each semantic collection distribution Weight scope;And weight determination module, for determine weight in proportion range corresponding to each word from affiliated semantic collection.
Wherein, it is bigger to distribute proportion range respective weights for the semantic collection with complaining problem correlation degree higher.
Fig. 4 is refer to, the matching module 220 includes:
First word frequency vector acquiring unit 221, for obtaining each word to be matched in the multiple word to be matched Weight in the complaint text to be sorted, using the weight of each word to be matched as the first word frequency vector.
The first word frequency vector acquiring unit 221, it is additionally operable to obtain the complaint text to be sorted using TF-IDF algorithms The TF-IDF values of each word to be matched in this, the weight using the TF-IDF values of word to be matched as the word to be matched will The weight of each word to be matched is as the first word frequency vector.
Second word frequency vector acquiring unit 222, for for each dictionary, being retrieved as each word distribution in the dictionary Weight, obtain the second word frequency vector corresponding to the dictionary.
Matching unit 223, for according to default similarity mode algorithm, by the first word frequency vector respectively with each dictionary The second word frequency vector carries out similarity mode successively corresponding to respectively, until determine matching the second word frequency vector then stop after Continuous matching, and obtain matching result.
Wherein, the default similarity mode algorithm is included angle cosine algorithm, and the matching unit 223 also includes angle Cosine-algorithm unit, for determining the vectorial angle between any second word frequency vector of first word frequency in the following way Cosine, complete similarity mode:
It is A=[A by the first word frequency vector representation1,A2...An], the second word frequency vector representation is B=[B1, B2...Bn], based on included angle cosine formulaCarry out similarity Match somebody with somebody, obtain matching result.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description Specific work process, may be referred to the corresponding process in preceding method, no longer excessively repeat herein.
In summary, the embodiment of the present invention provides a kind of file classifying method, device and electronic equipment, first by that will treat Classification complains text to carry out word segmentation processing, multiple words to be matched is obtained, then by multiple words to be matched and the different throwings of sign Tell that the dictionary of problem is matched respectively, obtain matching result, wherein, the dictionary for characterizing different complaint problems is will be multiple History complains text to be trained what is obtained, then complains class according to belonging to matching result determines the complaint text to be sorted Not, to classify to above-mentioned complaint text to be sorted, multiple dictionaries that training in advance obtains are passed through in this method so that can be with By multiple words to be matched and dictionary matching, it is hereby achieved that more accurate matching result, can be by complaint text to be sorted Accurate classification is carried out, the complaint text realized for different complaint problems has higher nicety of grading, improves text classification Performance.
Corresponding to the file classifying method in Fig. 2, the embodiment of the present application additionally provides a kind of electronic equipment, as shown in figure 5, The equipment includes memory 1000, processor 2000 and is stored on the memory 1000 and can manage in this place to run on device 2000 Computer program, wherein, above-mentioned processor 2000 realizes the step of above-mentioned file classifying method when performing above computer program Suddenly.
Specifically, above-mentioned memory 1000 and processor 2000 can be general memory and processor, not do here It is specific to limit, when the computer program of the run memory 1000 of processor 2000 storage, it is able to carry out above-mentioned document classification side Method, so as to clearly be visually known in multiple urban nodes, two urban nodes are planned to combine scenic spot data point Probability, further can be with scientific and reasonable to city so as to improve the tourism data analysis efficiency of industry and enterprise Tourism planning is instructed, and promotes the development of tourist industry.
Corresponding to the file classifying method in Fig. 1, the embodiment of the present application additionally provides a kind of computer-readable recording medium, Computer program is stored with the computer-readable recording medium, the computer program performs above-mentioned file when being run by processor The step of sorting technique.
Specifically, the storage medium can be general storage medium, such as mobile disk, hard disk, in the storage medium Computer program when being run, above-mentioned file classifying method is able to carry out, so as to clearly be visually known multiple In urban node, two urban nodes are planned to combine the probability of scenic spot data point, so as to improve the tourism of industry and enterprise Data analysis efficiency, city tourism planning can further be instructed with scientific and reasonable, promote the development of tourist industry.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can also pass through Other modes are realized.Device embodiment described above is only schematical, for example, flow chart and block diagram in accompanying drawing Show the device of multiple embodiments according to the present invention, method and computer program product architectural framework in the cards, Function and operation.At this point, each square frame in flow chart or block diagram can represent the one of a module, program segment or code Part, a part for the module, program segment or code include one or more and are used to realize holding for defined logic function Row instruction.It should also be noted that at some as in the implementation replaced, the function that is marked in square frame can also with different from The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially perform substantially in parallel, they are sometimes It can perform in the opposite order, this is depending on involved function.It is it is also noted that every in block diagram and/or flow chart The combination of individual square frame and block diagram and/or the square frame in flow chart, function or the special base of action as defined in performing can be used Realize, or can be realized with the combination of specialized hardware and computer instruction in the system of hardware.
In addition, each functional module in each embodiment of the present invention can integrate to form an independent portion Point or modules individualism, can also two or more modules be integrated to form an independent part.
If the function is realized in the form of software function module and is used as independent production marketing or in use, can be with It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part to be contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, including some instructions are causing a computer equipment (can be People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the present invention. And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.It should be noted that:Similar label and letter exists Similar terms is represented in following accompanying drawing, therefore, once being defined in a certain Xiang Yi accompanying drawing, is then not required in subsequent accompanying drawing It is further defined and explained.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation make a distinction with another entity or operation, and not necessarily require or imply and deposited between these entities or operation In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to Nonexcludability includes, so that process, method, article or equipment including a series of elements not only will including those Element, but also the other element including being not expressly set out, or it is this process, method, article or equipment also to include Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that Other identical element also be present in process, method, article or equipment including the key element.

Claims (10)

1. a kind of file classifying method, it is characterised in that methods described includes:
Complaint text to be sorted is subjected to word segmentation processing, obtains multiple words to be matched;
The dictionary of the multiple word to be matched complaint problem different from sign is matched respectively, obtains matching result;
Classification is complained according to belonging to the matching result determines the complaint text to be sorted;
Wherein, multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems.
2. according to the method for claim 1, it is characterised in that also include:
Each history in text is complained to complain text to carry out word segmentation processing to the multiple history, it is determined that characterizing different complaint problems Word form dictionary;
For each dictionary, what the semanteme of each word included according to the dictionary and the dictionary characterized complaint problem associates journey The height of degree, each word is divided into semantic collection, and be proportion range corresponding to each semantic collection distribution;And
To determine weight in proportion range corresponding to each word from affiliated semantic collection;
Wherein, it is bigger to distribute proportion range respective weights for the semantic collection with complaining problem correlation degree higher.
3. according to the method for claim 2, it is characterised in that ask the complaint different from sign of the multiple word to be matched The dictionary of topic is matched respectively, is obtained matching result, is specifically included:
The weight in the complaint text to be sorted of each word to be matched in the multiple word to be matched is obtained, will be every The weight of individual word to be matched is as the first word frequency vector;
For each dictionary, be retrieved as the weight of each word distribution in the dictionary, obtain the second word frequency corresponding to the dictionary to Amount;
According to default similarity mode algorithm, by the first word frequency vector respectively with each dictionary respectively corresponding second word frequency to Amount carries out similarity mode successively, until determining that the second word frequency vector of matching then stops continuing to match, and obtains matching knot Fruit.
4. according to the method for claim 3, it is characterised in that obtain each word to be matched in the multiple word to be matched The weight in the complaint text to be sorted of language, is specifically included:
The TF-IDF values to be sorted for complaining each word to be matched in text are obtained using TF-IDF algorithms, by word to be matched Weight of the TF-IDF values of language as the word to be matched.
5. according to the method described in any claim in claim 3-4, it is characterised in that the default similarity mode algorithm is Included angle cosine algorithm;
The vectorial included angle cosine between any second word frequency vector of first word frequency is determined in the following way, is completed similar Degree matching:
It is A=[A by the first word frequency vector representation1,A2...An], the second word frequency vector representation is B=[B1,B2...Bn], Based on included angle cosine formulaCarry out similarity mode, acquisition With result.
6. a kind of device for sorting document, it is characterised in that described device includes:
Word segmentation processing module, for complaint text to be sorted to be carried out into word segmentation processing, obtain multiple words to be matched;
Matching module, for the dictionary of the multiple word to be matched complaint problem different from sign to be matched respectively, obtain Take matching result;
Sort module, for according to the matching result determine it is described it is to be sorted complaint text belonging to complain classification;
Wherein, multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems.
7. device according to claim 6, it is characterised in that described device also includes:
Dictionary acquisition module, for complaining each history in text to complain text to carry out word segmentation processing to the multiple history, really Surely the dictionary that the word of different complaint problems is formed is characterized;
Weight distribution module, for for each dictionary, semanteme and the dictionary institute table of each word included according to the dictionary The height of the correlation degree of complaint problem is levied, each word is divided into semantic collection, and be weight model corresponding to each semantic collection distribution Enclose;And
Weight determination module, for determine weight in proportion range corresponding to each word from affiliated semantic collection;
Wherein, it is bigger to distribute proportion range respective weights for the semantic collection with complaining problem correlation degree higher.
8. device according to claim 7, it is characterised in that the matching module includes:
First word frequency vector acquiring unit, for obtaining being treated described for each word to be matched in the multiple word to be matched The weight in text is complained in classification, using the weight of each word to be matched as the first word frequency vector;
Second word frequency vector acquiring unit, for for each dictionary, being retrieved as the weight of each word distribution in the dictionary, obtaining To the second word frequency vector corresponding to the dictionary;
Matching unit, it is for according to default similarity mode algorithm, the first word frequency vector is right respectively with each dictionary respectively The the second word frequency vector answered carries out similarity mode successively, until determining that the second word frequency vector of matching then stops continuation Match somebody with somebody, and obtain matching result.
9. a kind of electronic equipment, it is characterised in that the electronic equipment includes processor and memory, the memory coupling To the processor, the memory store instruction, when executed by the processor the electronic equipment execution Operate below:
Complaint text to be sorted is subjected to word segmentation processing, obtains multiple words to be matched;
The dictionary of the multiple word to be matched complaint problem different from sign is matched respectively, obtains matching result;
Classification is complained according to belonging to the matching result determines the complaint text to be sorted;
Wherein, multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems.
A kind of 10. read/write memory medium, it is characterised in that the read/write memory medium is stored in computer, it is described can Reading storage medium includes a plurality of instruction, and a plurality of instruction is configured so that computer is performed as claim 1-5 is any Item methods described.
CN201711051376.5A 2017-10-31 2017-10-31 A kind of file classifying method, device and electronic equipment Pending CN107844559A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711051376.5A CN107844559A (en) 2017-10-31 2017-10-31 A kind of file classifying method, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711051376.5A CN107844559A (en) 2017-10-31 2017-10-31 A kind of file classifying method, device and electronic equipment

Publications (1)

Publication Number Publication Date
CN107844559A true CN107844559A (en) 2018-03-27

Family

ID=61682098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711051376.5A Pending CN107844559A (en) 2017-10-31 2017-10-31 A kind of file classifying method, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN107844559A (en)

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681819A (en) * 2018-05-21 2018-10-19 中国平安人寿保险股份有限公司 Employee's image grade is classified method, apparatus, computer equipment and storage medium
CN108710627A (en) * 2018-03-29 2018-10-26 广东欧珀移动通信有限公司 The device and electronic equipment that method that feedback information is shown, feedback information are shown
CN108710651A (en) * 2018-05-08 2018-10-26 华南理工大学 A kind of large scale customer complaint data automatic classification method
CN109146395A (en) * 2018-06-29 2019-01-04 阿里巴巴集团控股有限公司 A kind of method, device and equipment of data processing
CN109189890A (en) * 2018-09-12 2019-01-11 张连祥 Complaint of inviting outside investment coordinates intelligence and handles system and method
CN109271481A (en) * 2018-08-31 2019-01-25 国网河北省电力有限公司沧州供电分公司 A kind of classification method, system and the terminal device of electric power demand information
CN109446321A (en) * 2018-10-11 2019-03-08 深圳前海达闼云端智能科技有限公司 Text classification method, text classification device, terminal and computer readable storage medium
CN109670843A (en) * 2018-11-12 2019-04-23 平安科技(深圳)有限公司 Data processing method, device, computer equipment and the storage medium of complaint business
CN109684467A (en) * 2018-11-16 2019-04-26 北京奇虎科技有限公司 A kind of classification method and device of text
CN109684475A (en) * 2018-11-21 2019-04-26 斑马网络技术有限公司 Processing method, device, equipment and the storage medium of complaint
CN109739985A (en) * 2018-12-26 2019-05-10 斑马网络技术有限公司 Automatic document classification method, equipment and storage medium
CN109815709A (en) * 2018-12-11 2019-05-28 顺丰科技有限公司 Recognition methods, device, equipment and the storage medium that sensitive information illegally copies
CN110134957A (en) * 2019-05-14 2019-08-16 云南电网有限责任公司电力科学研究院 A kind of scientific and technological achievement storage method and system based on semantic analysis
CN110347840A (en) * 2019-07-18 2019-10-18 携程计算机技术(上海)有限公司 Complain prediction technique, system, equipment and the storage medium of text categories
CN110598200A (en) * 2018-06-13 2019-12-20 北京百度网讯科技有限公司 Semantic recognition method and device
CN110705245A (en) * 2018-07-09 2020-01-17 中国移动通信集团有限公司 Method and device for acquiring reference processing scheme and storage medium
CN110728142A (en) * 2019-09-09 2020-01-24 上海凯京信达科技集团有限公司 Method and device for identifying running files, computer storage medium and electronic equipment
CN110888977A (en) * 2018-09-05 2020-03-17 广州视源电子科技股份有限公司 Text classification method and device, computer equipment and storage medium
CN110895703A (en) * 2018-09-12 2020-03-20 北京国双科技有限公司 Legal document routing identification method and device
WO2020087774A1 (en) * 2018-10-31 2020-05-07 平安科技(深圳)有限公司 Concept-tree-based intention recognition method and apparatus, and computer device
CN111126055A (en) * 2019-10-28 2020-05-08 国电南瑞科技股份有限公司 Power grid equipment name matching method and system
CN111159355A (en) * 2019-12-31 2020-05-15 中国银行股份有限公司 Customer complaint order processing method and device
CN111191445A (en) * 2018-11-15 2020-05-22 北京京东金融科技控股有限公司 Advertisement text classification method and device
CN111339290A (en) * 2018-11-30 2020-06-26 北京嘀嘀无限科技发展有限公司 Text classification method and system
CN111460098A (en) * 2020-03-27 2020-07-28 深圳价值在线信息科技股份有限公司 Text matching method and device and terminal equipment
CN111506727A (en) * 2020-04-16 2020-08-07 腾讯科技(深圳)有限公司 Text content category acquisition method and device, computer equipment and storage medium
CN111831286A (en) * 2019-04-12 2020-10-27 中国移动通信集团河南有限公司 User complaint processing method and device
CN111860657A (en) * 2020-07-23 2020-10-30 中国建设银行股份有限公司 Image classification method and device, electronic equipment and storage medium
CN112184027A (en) * 2020-09-29 2021-01-05 壹链盟生态科技有限公司 Task progress updating method and device and storage medium
CN112445910A (en) * 2019-09-02 2021-03-05 上海哔哩哔哩科技有限公司 Information classification method and system
CN112463929A (en) * 2020-12-11 2021-03-09 广东电网有限责任公司佛山供电局 Automatic classification method of fault information
CN112507113A (en) * 2020-09-18 2021-03-16 青岛海洋科学与技术国家实验室发展中心 Ocean big data text classification method and system
CN112528031A (en) * 2021-02-09 2021-03-19 中关村科学城城市大脑股份有限公司 Work order intelligent distribution method and system
CN112528673A (en) * 2020-12-14 2021-03-19 中国联合网络通信集团有限公司 Text batch processing method, system, terminal equipment and computer storage medium
CN112989761A (en) * 2021-05-20 2021-06-18 腾讯科技(深圳)有限公司 Text classification method and device
CN113010669A (en) * 2020-12-24 2021-06-22 华戎信息产业有限公司 News classification method and system
CN113111898A (en) * 2020-02-13 2021-07-13 北京明亿科技有限公司 Vehicle type determination method and device based on support vector machine
CN113705200A (en) * 2021-08-31 2021-11-26 中国平安财产保险股份有限公司 Method, device and equipment for analyzing complaint behavior data and storage medium
CN113779973A (en) * 2020-06-09 2021-12-10 杭州晨熹多媒体科技有限公司 Text data processing method and device
CN113934848A (en) * 2021-10-22 2022-01-14 马上消费金融股份有限公司 Data classification method and device and electronic equipment
CN114090620A (en) * 2022-01-19 2022-02-25 支付宝(杭州)信息技术有限公司 Query request processing method and device
CN111191445B (en) * 2018-11-15 2024-04-19 京东科技控股股份有限公司 Advertisement text classification method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541958A (en) * 2010-12-30 2012-07-04 百度在线网络技术(北京)有限公司 Method, device and computer equipment for identifying short text category information
US20140337349A1 (en) * 2013-05-09 2014-11-13 Hon Hai Precision Industry Co., Ltd. Electronic device and document classification method
CN104182388A (en) * 2014-07-21 2014-12-03 安徽华贞信息科技有限公司 Semantic analysis based text clustering system and method
CN104424279A (en) * 2013-08-30 2015-03-18 腾讯科技(深圳)有限公司 Text relevance calculating method and device
CN104750835A (en) * 2015-04-03 2015-07-01 浪潮集团有限公司 Text classification method and device
CN106250398A (en) * 2016-07-19 2016-12-21 北京京东尚科信息技术有限公司 A kind of complaint classifying content decision method complaining event and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541958A (en) * 2010-12-30 2012-07-04 百度在线网络技术(北京)有限公司 Method, device and computer equipment for identifying short text category information
US20140337349A1 (en) * 2013-05-09 2014-11-13 Hon Hai Precision Industry Co., Ltd. Electronic device and document classification method
CN104424279A (en) * 2013-08-30 2015-03-18 腾讯科技(深圳)有限公司 Text relevance calculating method and device
CN104182388A (en) * 2014-07-21 2014-12-03 安徽华贞信息科技有限公司 Semantic analysis based text clustering system and method
CN104750835A (en) * 2015-04-03 2015-07-01 浪潮集团有限公司 Text classification method and device
CN106250398A (en) * 2016-07-19 2016-12-21 北京京东尚科信息技术有限公司 A kind of complaint classifying content decision method complaining event and device

Cited By (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710627A (en) * 2018-03-29 2018-10-26 广东欧珀移动通信有限公司 The device and electronic equipment that method that feedback information is shown, feedback information are shown
CN108710651A (en) * 2018-05-08 2018-10-26 华南理工大学 A kind of large scale customer complaint data automatic classification method
CN108710651B (en) * 2018-05-08 2022-03-25 华南理工大学 Automatic classification method for large-scale customer complaint data
CN108681819A (en) * 2018-05-21 2018-10-19 中国平安人寿保险股份有限公司 Employee's image grade is classified method, apparatus, computer equipment and storage medium
CN110598200A (en) * 2018-06-13 2019-12-20 北京百度网讯科技有限公司 Semantic recognition method and device
CN110598200B (en) * 2018-06-13 2023-05-23 北京百度网讯科技有限公司 Semantic recognition method and device
CN109146395A (en) * 2018-06-29 2019-01-04 阿里巴巴集团控股有限公司 A kind of method, device and equipment of data processing
CN109146395B (en) * 2018-06-29 2022-04-05 创新先进技术有限公司 Data processing method, device and equipment
CN110705245A (en) * 2018-07-09 2020-01-17 中国移动通信集团有限公司 Method and device for acquiring reference processing scheme and storage medium
CN110705245B (en) * 2018-07-09 2023-04-28 中移(成都)信息通信科技有限公司 Method and device for acquiring reference processing scheme and storage medium
CN109271481A (en) * 2018-08-31 2019-01-25 国网河北省电力有限公司沧州供电分公司 A kind of classification method, system and the terminal device of electric power demand information
CN110888977A (en) * 2018-09-05 2020-03-17 广州视源电子科技股份有限公司 Text classification method and device, computer equipment and storage medium
CN110895703A (en) * 2018-09-12 2020-03-20 北京国双科技有限公司 Legal document routing identification method and device
CN109189890A (en) * 2018-09-12 2019-01-11 张连祥 Complaint of inviting outside investment coordinates intelligence and handles system and method
CN110895703B (en) * 2018-09-12 2023-05-23 北京国双科技有限公司 Legal document case recognition method and device
CN109446321A (en) * 2018-10-11 2019-03-08 深圳前海达闼云端智能科技有限公司 Text classification method, text classification device, terminal and computer readable storage medium
WO2020087774A1 (en) * 2018-10-31 2020-05-07 平安科技(深圳)有限公司 Concept-tree-based intention recognition method and apparatus, and computer device
CN109670843A (en) * 2018-11-12 2019-04-23 平安科技(深圳)有限公司 Data processing method, device, computer equipment and the storage medium of complaint business
CN111191445B (en) * 2018-11-15 2024-04-19 京东科技控股股份有限公司 Advertisement text classification method and device
CN111191445A (en) * 2018-11-15 2020-05-22 北京京东金融科技控股有限公司 Advertisement text classification method and device
CN109684467A (en) * 2018-11-16 2019-04-26 北京奇虎科技有限公司 A kind of classification method and device of text
CN109684475A (en) * 2018-11-21 2019-04-26 斑马网络技术有限公司 Processing method, device, equipment and the storage medium of complaint
CN111339290A (en) * 2018-11-30 2020-06-26 北京嘀嘀无限科技发展有限公司 Text classification method and system
CN109815709A (en) * 2018-12-11 2019-05-28 顺丰科技有限公司 Recognition methods, device, equipment and the storage medium that sensitive information illegally copies
CN109815709B (en) * 2018-12-11 2023-10-10 顺丰科技有限公司 Method, device, equipment and storage medium for identifying illegal copies of sensitive information
CN109739985A (en) * 2018-12-26 2019-05-10 斑马网络技术有限公司 Automatic document classification method, equipment and storage medium
CN111831286A (en) * 2019-04-12 2020-10-27 中国移动通信集团河南有限公司 User complaint processing method and device
CN111831286B (en) * 2019-04-12 2023-11-14 中国移动通信集团河南有限公司 User complaint processing method and device
CN110134957A (en) * 2019-05-14 2019-08-16 云南电网有限责任公司电力科学研究院 A kind of scientific and technological achievement storage method and system based on semantic analysis
CN110134957B (en) * 2019-05-14 2023-06-13 云南电网有限责任公司电力科学研究院 Scientific and technological achievement warehousing method and system based on semantic analysis
CN110347840B (en) * 2019-07-18 2023-06-13 携程计算机技术(上海)有限公司 Prediction method, system, equipment and storage medium for complaint text category
CN110347840A (en) * 2019-07-18 2019-10-18 携程计算机技术(上海)有限公司 Complain prediction technique, system, equipment and the storage medium of text categories
CN112445910A (en) * 2019-09-02 2021-03-05 上海哔哩哔哩科技有限公司 Information classification method and system
CN112445910B (en) * 2019-09-02 2022-12-27 上海哔哩哔哩科技有限公司 Information classification method and system
CN110728142B (en) * 2019-09-09 2023-12-22 上海斑马来拉物流科技有限公司 Method and device for identifying stream file, computer storage medium and electronic equipment
CN110728142A (en) * 2019-09-09 2020-01-24 上海凯京信达科技集团有限公司 Method and device for identifying running files, computer storage medium and electronic equipment
CN111126055A (en) * 2019-10-28 2020-05-08 国电南瑞科技股份有限公司 Power grid equipment name matching method and system
CN111159355A (en) * 2019-12-31 2020-05-15 中国银行股份有限公司 Customer complaint order processing method and device
CN113111898A (en) * 2020-02-13 2021-07-13 北京明亿科技有限公司 Vehicle type determination method and device based on support vector machine
CN111460098A (en) * 2020-03-27 2020-07-28 深圳价值在线信息科技股份有限公司 Text matching method and device and terminal equipment
CN111460098B (en) * 2020-03-27 2023-08-25 深圳价值在线信息科技股份有限公司 Text matching method and device and terminal equipment
CN111506727B (en) * 2020-04-16 2023-10-03 腾讯科技(深圳)有限公司 Text content category acquisition method, apparatus, computer device and storage medium
CN111506727A (en) * 2020-04-16 2020-08-07 腾讯科技(深圳)有限公司 Text content category acquisition method and device, computer equipment and storage medium
CN113779973A (en) * 2020-06-09 2021-12-10 杭州晨熹多媒体科技有限公司 Text data processing method and device
CN111860657A (en) * 2020-07-23 2020-10-30 中国建设银行股份有限公司 Image classification method and device, electronic equipment and storage medium
CN112507113A (en) * 2020-09-18 2021-03-16 青岛海洋科学与技术国家实验室发展中心 Ocean big data text classification method and system
CN112184027B (en) * 2020-09-29 2023-12-26 壹链盟生态科技有限公司 Task progress updating method, device and storage medium
CN112184027A (en) * 2020-09-29 2021-01-05 壹链盟生态科技有限公司 Task progress updating method and device and storage medium
CN112463929A (en) * 2020-12-11 2021-03-09 广东电网有限责任公司佛山供电局 Automatic classification method of fault information
CN112528673A (en) * 2020-12-14 2021-03-19 中国联合网络通信集团有限公司 Text batch processing method, system, terminal equipment and computer storage medium
CN113010669A (en) * 2020-12-24 2021-06-22 华戎信息产业有限公司 News classification method and system
CN112528031A (en) * 2021-02-09 2021-03-19 中关村科学城城市大脑股份有限公司 Work order intelligent distribution method and system
CN112989761A (en) * 2021-05-20 2021-06-18 腾讯科技(深圳)有限公司 Text classification method and device
CN113705200B (en) * 2021-08-31 2023-09-15 中国平安财产保险股份有限公司 Analysis method, analysis device, analysis equipment and analysis storage medium for complaint behavior data
CN113705200A (en) * 2021-08-31 2021-11-26 中国平安财产保险股份有限公司 Method, device and equipment for analyzing complaint behavior data and storage medium
CN113934848A (en) * 2021-10-22 2022-01-14 马上消费金融股份有限公司 Data classification method and device and electronic equipment
CN114090620A (en) * 2022-01-19 2022-02-25 支付宝(杭州)信息技术有限公司 Query request processing method and device

Similar Documents

Publication Publication Date Title
CN107844559A (en) A kind of file classifying method, device and electronic equipment
CN107609121B (en) News text classification method based on LDA and word2vec algorithm
WO2019214245A1 (en) Information pushing method and apparatus, and terminal device and storage medium
CN101794311B (en) Fuzzy data mining based automatic classification method of Chinese web pages
CN106649818B (en) Application search intention identification method and device, application search method and server
Li et al. Twiner: named entity recognition in targeted twitter stream
CN109815314B (en) Intent recognition method, recognition device and computer readable storage medium
US10565233B2 (en) Suffix tree similarity measure for document clustering
CN106776574B (en) User comment text mining method and device
CN106156372B (en) A kind of classification method and device of internet site
CN111797239B (en) Application program classification method and device and terminal equipment
CN103886108B (en) The feature selecting and weighing computation method of a kind of unbalanced text set
CN102194013A (en) Domain-knowledge-based short text classification method and text classification system
CN110543595B (en) In-station searching system and method
CN109885688A (en) File classification method, device, computer readable storage medium and electronic equipment
CN111767716A (en) Method and device for determining enterprise multilevel industry information and computer equipment
CN109840532A (en) A kind of law court's class case recommended method based on k-means
CN113254643B (en) Text classification method and device, electronic equipment and text classification program
CN114003721A (en) Construction method, device and application of dispute event type classification model
CN111090994A (en) Chinese-internet-forum-text-oriented event place attribution province identification method
CN110910175A (en) Tourist ticket product portrait generation method
CN111753526A (en) Similar competitive product data analysis method and system
CN108681977A (en) A kind of lawyer's information processing method and system
CN108614860A (en) A kind of lawyer's information processing method and system
Guadie et al. Amharic text summarization for news items posted on social media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 101-8, 1st floor, building 31, area 1, 188 South Fourth Ring Road West, Fengtai District, Beijing

Applicant after: Guoxin Youyi Data Co., Ltd

Address before: 100070, No. 188, building 31, headquarters square, South Fourth Ring Road West, Fengtai District, Beijing

Applicant before: SIC YOUE DATA Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180327