CN107844559A - A kind of file classifying method, device and electronic equipment - Google Patents
A kind of file classifying method, device and electronic equipment Download PDFInfo
- Publication number
- CN107844559A CN107844559A CN201711051376.5A CN201711051376A CN107844559A CN 107844559 A CN107844559 A CN 107844559A CN 201711051376 A CN201711051376 A CN 201711051376A CN 107844559 A CN107844559 A CN 107844559A
- Authority
- CN
- China
- Prior art keywords
- word
- dictionary
- complaint
- matched
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Abstract
The embodiment of the present invention provides a kind of file classifying method, device and electronic equipment, belongs to technical field of data processing.Methods described includes:Complaint text to be sorted is subjected to word segmentation processing, obtains multiple words to be matched;The dictionary of the multiple word to be matched complaint problem different from sign is matched respectively, obtains matching result;Classification is complained according to belonging to the matching result determines the complaint text to be sorted, to classify to above-mentioned complaint text to be sorted;Wherein, multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems.The multiple dictionaries obtained in this method by training in advance, allow to multiple words to be matched and dictionary matching, it is hereby achieved that more accurate matching result, complaint text to be sorted can be subjected to Accurate classification, the complaint text realized for different complaint problems has higher nicety of grading, improves the performance of text classification.
Description
Technical field
The present invention relates to technical field of data processing, is set in particular to a kind of file classifying method, device and electronics
It is standby.
Background technology
With the development of computer technology, increasing enterprise, tissue and government organs etc. are dependent at computer
All kinds of affairs are managed, in this course, continuously produce substantial amounts of electronic document.In routine duties or carry out archives
During management, generally require and these electronic documents are divided into specific classification, still, the present of explosive increase is presented in data volume
My god, some enterprises' possibility just produce several TB data in one day, correspond to thousands of electronic document, it is manually discriminated
Undoubtedly efficiency is low for other and management, and as computer implemented automatic classification has brought very big facility, but due to text
This classification has the characteristics that higher-dimension, high degree of rarefication, and the performance of text classification is not met by the actual demand of people, also had
Very big room for improvement.
And as the fast development of E-Government, the center of gravity of Government Websites Construction are shifted, it is main from first stage of construction
Send out news information resource various for each department of government, turned to for the purpose of the supervision function and service level that improve government,
Should be from the real work of website, the working system of constituting criterion government website, lift service awareness and government website
Capax negotii;Strengthen the cooperation of website and government affairs, expand government website and popular interaction;Establish efficient complaint body
System, strengthen supervision.There is substantial amounts of complaint and suggestion text data with daily, so, how complaint text to be carried out soon
Fast accurate classification is current urgent problem.
The content of the invention
In view of this, the purpose of the embodiment of the present invention is to provide a kind of file classifying method, device and electronic equipment, its
Can effectively solve the problems, such as in the prior art can not be to complaining text classification accuracy low.
In a first aspect, the embodiments of the invention provide a kind of file classifying method, methods described includes:By complaint to be sorted
Text carries out word segmentation processing, obtains multiple words to be matched;By the multiple word to be matched complaint problem different from sign
Dictionary is matched respectively, obtains matching result;The complaint according to belonging to the matching result determines the complaint text to be sorted
Classification;Wherein, multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems.
Second aspect, the embodiments of the invention provide a kind of device for sorting document, described device includes:Word segmentation processing mould
Block, for complaint text to be sorted to be carried out into word segmentation processing, obtain multiple words to be matched;Matching module, for will be described more
The dictionary of individual word to be matched complaint problem different from sign is matched respectively, obtains matching result;Sort module, for root
Determine to complain classification belonging to the complaint text to be sorted according to the matching result;Wherein, it is described to characterize different complaint problems
Multiple history complaint text is trained to obtain by dictionary.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, and the electronic equipment includes processor and storage
Device, the memory are couple to the processor, the memory store instruction, when executed by the processor
The electronic equipment performs following operate:Complaint text to be sorted is subjected to word segmentation processing, obtains multiple words to be matched;By institute
The dictionary for stating multiple words to be matched complaint problem different from sign is matched respectively, obtains matching result;According to described
Determine to complain classification belonging to the complaint text to be sorted with result;Wherein, the dictionary for characterizing different complaint problems is to incite somebody to action
Multiple history complain text to be trained what is obtained.
Fourth aspect, the embodiment of the present invention provide a kind of read/write memory medium, it is characterised in that described that storage can be read
For media storage in computer, the read/write memory medium includes a plurality of instruction, and a plurality of instruction is configured so that meter
Calculation machine performs the file classifying method provided such as first aspect.
The embodiment of the present invention provides a kind of file classifying method, device and electronic equipment, first by by complaint to be sorted
Text carries out word segmentation processing, multiple words to be matched is obtained, then by multiple words to be matched complaint problem different from sign
Dictionary is matched respectively, obtains matching result, wherein, the dictionary for characterizing different complaint problems is to complain multiple history
Text is trained what is obtained, then classification is complained according to belonging to matching result determines the complaint text to be sorted, with to upper
State complaint text to be sorted to be classified, the multiple dictionaries obtained by training in advance in this method so that can treat multiple
Word and dictionary matching are matched, it is hereby achieved that more accurate matching result, it is accurate to carry out complaint text to be sorted
Classification, the complaint text realized for different complaint problems have higher nicety of grading, improve the performance of text classification.
Other features and advantages of the present invention will illustrate in subsequent specification, also, partly become from specification
It is clear that or by implementing understanding of the embodiment of the present invention.The purpose of the present invention and other advantages can be by saying what is write
Specifically noted structure is realized and obtained in bright book, claims and accompanying drawing.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by embodiment it is required use it is attached
Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair
The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this
A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 shows a kind of structured flowchart for the electronic equipment that can be applied in the embodiment of the present invention;
Fig. 2 is a kind of flow chart for file classifying method that first embodiment of the invention provides;
Fig. 3 is a kind of structured flowchart for device for sorting document that second embodiment of the invention provides;
Fig. 4 is a kind of structured flowchart for matching module that second embodiment of the invention provides;
Fig. 5 is the structural representation of another electronic equipment provided in an embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Ground describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Generally exist
The component of the embodiment of the present invention described and illustrated in accompanying drawing can be configured to arrange and design with a variety of herein.Cause
This, the detailed description of the embodiments of the invention to providing in the accompanying drawings is not intended to limit claimed invention below
Scope, but it is merely representative of the selected embodiment of the present invention.Based on embodiments of the invention, those skilled in the art are not doing
The every other embodiment obtained on the premise of going out creative work, belongs to the scope of protection of the invention.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi
It is defined, then it further need not be defined and explained in subsequent accompanying drawing in individual accompanying drawing.Meanwhile the present invention's
In description, term " first ", " second " etc. are only used for distinguishing description, and it is not intended that instruction or hint relative importance.
Fig. 1 shows a kind of structured flowchart for the electronic equipment 100 that can be applied in the embodiment of the present invention.As shown in figure 1,
Electronic equipment 100 includes memory 101, storage control 102, one or more (one is only shown in figure) processors 103, outer
If interface 104, radio-frequency module 105, audio-frequency module 106, Touch Screen 107 etc..These components are total by one or more communication
Line/signal wire 108 mutually communicates.
Memory 101 can be used for storage software program and module, such as the file classifying method pair in the embodiment of the present invention
Programmed instruction/the module answered, processor 103 is stored in software program and module in memory 101 by operation, so as to hold
Row various function application and data processing, such as file classifying method provided in an embodiment of the present invention.
Memory 101 may include high speed random access memory, may also include nonvolatile memory, such as one or more magnetic
Property storage device, flash memory or other non-volatile solid state memories.Processor 103 and other possible components are to storage
The access of device 101 can be carried out under the control of storage control 102.
Various input/output devices are coupled to processor 103 and memory 101 by Peripheral Interface 104.In some implementations
In example, Peripheral Interface 104, processor 103 and storage control 102 can be realized in one single chip.In some other reality
In example, they can be realized by independent chip respectively.
Radio-frequency module 105 is used to receiving and sending electromagnetic wave, realizes the mutual conversion of electromagnetic wave and electric signal, so that with
Communication network or other equipment are communicated.
Audio-frequency module 106 provides a user COBBAIF, and it may include one or more microphones, one or more raises
Sound device and voicefrequency circuit.
Touch Screen 107 provides an output and inputting interface simultaneously between electronic equipment 100 and user.Specifically,
Touch Screen 107 shows video frequency output to user, and the contents of these video frequency outputs may include word, figure, video and its any
Combination.
It is appreciated that the structure shown in Fig. 1 is only to illustrate, the electronic equipment 100 may also include more more than shown in Fig. 1
Either less component or there is the configuration different from shown in Fig. 1.Each component shown in Fig. 1 can use hardware, software
Or its combination is realized.
First embodiment
It refer to Fig. 2, a kind of flow chart for file classifying method that Fig. 2 provides for first embodiment of the invention, the side
Method is applied to device for sorting document, and this document sorter runs on above-mentioned electronic equipment, and methods described includes:
Step S110:Complaint text to be sorted is subjected to word segmentation processing, obtains multiple words to be matched.
For electronic document, " keyword " can be used to represent analysis and understand all features involved during document, closed
Keyword such as " taxi ", " share-car ", " fee register " etc., certainly, for different main bodys, such as bank, government organs and one
As enterprise, determining the keyword of the classification when institute foundation of electronic document may differ, in the electronics being related to for some enterprises
When document is classified, above-mentioned keyword can be rule of thumb predefined.
When needing the electronic document of multiple customer complaints to receiving to classify such as government organs, first to electronic document
Pre-processed, i.e., word segmentation processing is carried out to the complaint text to be sorted of acquisition, the complaint text to be sorted is above-mentioned electronics
Document.Wherein, word segmentation processing is carried out to the complaint text to be sorted, it is necessary first to identify minimum semantic primitive therein, make
For a kind of embodiment, the Chinese Word Automatic Segmentation that Lucene search engines can be used to carry carries out word segmentation processing, and Lucene has
The Chinese analysis device of their own, wherein mainly StandardAnalyzer and CJKAnalyzer.StandardAnalyzer
Analyzer is using individual character participle method, and CJKAnalyzer analyzers use dichotomy.
Character string matching method most commonly is based in the Chinese Word Automatic Segmentation of Lucene search engines, it is basic herein
Above there are a kind of positive word matching segmentation methods that most increase, the positive word matching segmentation methods that most increase realize that thought is to prepare one
The dictionary of participle, then the sentence of input is from left to right scanned using algorithm, the purpose is to by the word in sentence
Symbol string is matched one by one with the entry in dictionary.Matching field is since a word, constantly increases word in matching, until matching
Untill not going down, each round terminates obtained result, and take maximum can be with the current matching field that the match is successful, for example, treating point
Class complains a word for scanning in text as " today It is gloomy heavy ", have in dictionary " today ", " weather ", " my god ",
Words such as " cloudy heavy ", then since " the present " word, is scanned successively backward, takes " the present ", " today ", " day today ", " today day respectively
Gas ", " today, weather was cloudy ", " It is gloomy today ", " today, It is gloomy sank ", " today, It is gloomy sank " are matched,
Most long matched character string is " today " in dictionary, then the word is split out, next since " my god " scan word, repeat
Aforesaid operations, it is as a result " today/weather/cloudy heavy/", and for its each word mark part of speech, wherein, noun, verb, number
The parts of speech such as word, adjective, preposition, auxiliary word, conjunction, punctuate mark is respectively the symbols such as n, v, m, a, p, u, c, wp, for example, will
" today " is labeled as noun, then will should (today, weather, it is cloudy heavy) be used as initial word set, certainly, for subsequent match
Accuracy, also need to delete initial word and concentrate word that is conventional and having little significance, be referred to as stop words, such as:, be, etc.
Word, so being in the above-mentioned word that is obtained after stop words of removing:It is today, weather, cloudy heavy, then can using these words as
A word to be sorted for complaining text carries out the word to be matched obtained after word segmentation processing, and in this approach, can obtain entire chapter should
It is to be sorted to complain text to carry out the multiple words to be matched obtained after word segmentation processing.
Step S120:The dictionary of the multiple word to be matched complaint problem different from sign is matched respectively, obtained
Take matching result.
Multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems, and the dictionary is corresponding not
With complaint Question Classification, and the weight of each word in each dictionary is in default proportion range.
For example, when complaining file to carry out classification processing to government, can be classified for different complaint problems,
Such as:Taxi problem, communication medium problem, bus problem, parking problem etc., then each complaint problem is predefined more
Individual keyword, e.g., confirmable multiple keywords are in taxi problem:Taxi, joining-person, fee register, call a taxi, raise the price
Deng word, confirmable multiple keywords are in communication medium problem:Broadband, phone, UNICOM, CHINAUNICOM, dial
Word, the word that above-mentioned keyword is concentrated for the semanteme of following determinations.
Then complain text to be trained multiple history of acquisition, the plurality of history is complained in text each gone through first
History complains text to carry out above-mentioned word segmentation processing, the dictionary formed with the word of the different complaint problems of determination sign, for each
Dictionary, semanteme and the dictionary of each word included according to the dictionary characterize the height of the correlation degree of complaint problem, will
Each multiple semantic collection of word division, and can include for proportion range, each semantic collection corresponding to each semantic collection distribution with not
It is to determine weight in proportion range corresponding to each word from affiliated semantic collection with the multiple keywords complained under Question Classification,
Wherein, it is bigger to distribute proportion range respective weights for the semantic collection with complaining problem correlation degree higher.
For example in taxi Question Classification, the semantic collection 1 of division is (taxi, joining-person, fee register), semanteme collection 2 is
(call a taxi, raise the price), the semanteme collect the correlation degree highest of each word and the taxi problem in 1, can be the power of its distribution
Weight scope is 0.9-0.98, and semanteme integrates the proportion range of 2 distribution as 0.8-0.89, if calculating the power of the word in semantic collection 1
When again not in the range of 0.9-0.98, it is likely that represent the weight inaccuracy calculated, text classification mistake may be finally resulted in
The problem of, it is possible to the weight that the semanteme is concentrated is redistributed, if for example, the weight for calculating " taxi " is
0.85, not in the range of above-mentioned 0.9-0.98, then the word " taxi " is redistributed into new weight so that " taxi "
New weight be in the range of 0.9-0.98, wherein, can be in default proportion range as a kind of mode, i.e. 0.9-0.98
In the range of randomly select a weight as new weight distribution to " taxi ", if weight selection is 0.95 to distribute to and " hire out
The new weight of car ", i.e. " taxi " is redefined as 0.95.
Furthermore it is also possible to which the semanteme to determine concentrates each word to distribute a proportion range in advance, e.g., asked in taxi
In topic classification, word " taxi " may be considered a word for occurring that frequency is larger in such problem, so, can be with
A larger proportion range is distributed for it, such as 0.9-0.98, the proportion range for being word " share-car " distribution is 0.87-0.89.
If the weight for then calculating " taxi " is 0.85, then it represents that its weight is not in default proportion range, it is likely that represents meter
The weight that calculates is inaccurate, the problem of may finally resulting in text classification mistake, it is possible to will " taxi " weight progress
Redistribute, will the word " taxi " redistribute new weight so that the new weight of " taxi " is in default weight model
In enclosing, i.e. 0.9-0.98, wherein, as a kind of mode, a weight can be randomly selected in default proportion range as new
Weight distribution gives " taxi ", and if weight selection is 0.95 to distribute to " taxi ", i.e. the new weight of " taxi " redefines
For 0.95.
In addition, as a kind of embodiment, an also settable computation rule, if for example, calculating the weight of " taxi "
Not in default proportion range, then the weight of current " taxi " is added into a preset value, be used as new weight so that at new weight
In in default proportion range.Certainly, the preset value can set smaller, and such as 0.1 or 0.05, if at current " taxi "
Weight add after the preset value obtained new weight again without in default proportion range, then can also be in new weight
On the basis of add preset value, the new weight to the last obtained is in default proportion range.
Certainly, alternatively embodiment, be also based on each history complain text determine to characterize first it is different
The dictionary that the word of complaint problem is formed, each word in the dictionary semantic collect now without distribution weight to be each
Corresponding proportion range is assigned, so again to determine weight in proportion range corresponding to each word from affiliated semantic collection,
The proportion range for integrate 1 (taxi, joining-person, fee register) distribution such as semanteme is then each word of semanteme concentration as 0.9-0.98
The weight being randomly assigned in a 0.9-0.98 proportion range, such as it is that " taxi " distribution weight is 0.97, is distributed for " joining-person "
Weight is 0.95, is that " fee register " distribution weight is 0.9.
By the above method, the semantic new weight for concentrating each word can be obtained, is then based on different classifications, described above
Taxi problem, communication medium problem etc., establish multiple dictionaries, i.e., establish a dictionary under each classification, included in the dictionary
Multiple words and its corresponding new weight.
Wherein, TF-IDF algorithms can be used to obtain each word to be matched in the complaint text to be sorted in the present embodiment
The TF-IDF values of language, the weight using the TF-IDF values of word to be matched as the word to be matched.
TF-IDF (term frequency-inverse document frequency) be it is a kind of be used for information retrieval with
The conventional weighting technique that information is prospected.TF-IDF is a kind of statistical method, to assess a words for a file set or one
The significance level of a copy of it file in individual corpus.The number that the importance of words occurs hereof with it is directly proportional
Increase, but the frequency that can occur simultaneously with it in corpus is inversely proportional decline.
TF-IDF main thought is:If the frequency TF that some word or phrase occur in an article is high, and
Seldom occur in other articles, then it is assumed that this word has good class discrimination ability, is adapted to point to come.TF-IDF is actual
On be TF*IDF, TF word frequency (Term Frequency), IDF inverse document frequencies (Inverse Document Frequency).
TF represents the frequency that entry occurs in document d, and IDF main thought is:If the document comprising entry t is fewer, IDF is got over
Greatly, then illustrate that entry t has good class discrimination ability.If the number of files comprising entry t is m in certain a kind of document C, and
The total number of documents that other classes include t is k, it is clear that all number of files n=m+k comprising t, when m is big, according to IDF formula
Obtained IDF value can be small, just illustrates that entry t class discriminations are indifferent.So in actual applications, if an entry
Frequently occurred in the document of a class, then illustrate that the entry can represent the feature of text of this class, such word very well
Bar should give them to assign higher weights, and select and be used as the Feature Words of the class text to distinguish and other class documents.
Specifically, the TF-IDF values for obtaining each word calculate and obtain each word in complaint text to be sorted and exist first
It is affiliated it is to be sorted complaint text in word frequency TF, some word of word frequency TF=it is affiliated it is to be sorted complaint text in occurrence number/should
The total word number to be sorted for complaining text, its calculation formula areWherein ni,jIt is that the word goes out in affiliated text
Existing number, denominator is represented in the text so the occurrence number sum of words, if word " taxi " is affiliated to be sorted
It is 300 times to complain the occurrence number in text, and this is to be sorted, and to complain total word number of text be 1200, then the word " taxi "
Word frequency TF=300/1200=0.25.Then the inverse document frequency IDF of each word, inverse document frequency IDF=log are obtained again
(number of files+1 of the total number of documents of corpus/the include word), its calculation formula is
Wherein | D | the total number of documents in corpus is represented, | { j:ti∈dj| represent the number of files for including the word.Each word is based on again
The word frequency TF and inverse document frequency IDF of language, obtain the TF-IDF values of each word, i.e. TF-IDF values=word frequency TF* of word is inverse
Document frequency IDF.
Thus the TF-IDF values to be sorted for complaining each word to be matched in text can be obtained, certainly, for one
History complains text, and the TF-IDF values of each word in each dictionary can be also obtained by the above method, and the history is complained into text
Each word in this carries out descending arrangement with TF-IDF values, wherein, as a kind of mode, it can use each history and complain in text
100 words for coming foremost form dictionary as semantic collection.
Tables 1 and 2 is can refer to, it is multiple dictionaries that some government organs establishes for different complaint problems, and table 1 is
The multiple history obtained complain text, and table 2 is the multiple dictionaries established based on different complaint Question Classifications.
Table 1
Table 2
Then multiple words to be matched of above-mentioned acquisition are matched respectively with multiple dictionaries of above-mentioned foundation, will be treated
Multiple words to be matched that classification complains text obtained after word segmentation processing are matched with the word in multiple dictionaries.Specifically
Ground, weight of each word to be matched in above-mentioned complaint text to be sorted in multiple words to be matched is obtained first, will be each
Word to be matched is vectorial as the first word frequency, for example, for sentence " this leather boots number is big, and that number is suitable ", will
The sentence obtained after being segmented " this/leather boots/number/big, that/number/suitable ", calculate the word frequency of each word, i.e.,
Weight, weight corresponding to each of which word are:This 1, leather boots 1, number 2 is big by 1, that 1, suitable 1, not 0, it is small by 0, more 0.
Then each dictionary is directed to, is retrieved as the weight of each word distribution in the dictionary, obtains corresponding to the dictionary the
Two word frequency vector, i.e., multiple second word frequency vectors are the word frequency vector for different complaint Question Classifications, and each classification obtains one
Individual second word frequency vector, as shown in Table 2 above, according still further to default similarity mode algorithm, by first word frequency vector difference
Corresponding second word frequency vector carries out similarity mode successively respectively with each dictionary, the second word frequency vector until determining matching
Then stop continuing to match, and obtain matching result.
KNN (k-NearestNeighbor, nearest neighbor algorithm), simple pattra leaves can be used by carrying out the method for Similarity Measure
This, SVMs, neutral net, decision tree, included angle cosine algorithm the methods of, in the present embodiment, preset described default similar
Degree matching algorithm is included angle cosine algorithm, is illustrated below by taking included angle cosine algorithm as an example.
The vectorial included angle cosine between any second word frequency vector of first word frequency is determined in the following way, is completed
Similarity mode:
If it is A=[A by the first word frequency vector representation1,A2...An], the second word frequency vector representation is B=[B1,
B2...Bn], included angle cosine formula isSpecifically, if such as
The weight of each word in the above-mentioned sentence " this leather boots number is big, and that number is suitable " calculated, as the first word frequency to
Measure A=[1,1,2,1,1,1,0,0,0], if a certain first complain the lower word of classification be " this/small leather boots/number/or not, that
Only/more/suitable ", its each self-corresponding weight in dictionary is:This 1, leather boots 1, number 1 is big by 0, that 1, suitable 1, no
1, it is small by 1, more 1, if it is " this/BMW/very/have type " to complain the word under classification another second, its in dictionary each
Corresponding weight is:This 1, BMW 2, very 0, there is type 1;Then corresponding second word frequency vector can be B1=[1,1,1,0,
1,1,1,1,1] and B2=[1,2,0,1], then tried to achieve respectively according to above-mentioned included angle cosine formulaSo, it can be deduced that matching result is above-mentioned
The value tried to achieve according to included angle cosine formula.
Step S130:Classification is complained according to belonging to the matching result determines the complaint text to be sorted.
In the matching result for obtaining multiple words to be matched and being matched respectively with dictionary, as obtained in step S120
The similarity of first word frequency vector and the second word frequency vector, wherein, multiple similarities of acquisition are compared, such as the first word frequency
Vectorial A and the second word frequency vector B1 similarity is higher than with the second word frequency vector B2 similarity, then by complaint text to be sorted
It is categorized into above-mentioned first to complain in classification, thus can completes the classification to complaint text to be sorted.
Or threshold value can be set, determine to treat if the similarity obtained with certain word frequency vector reaches the threshold value of setting point
Class complains the classification of text to belong to classification corresponding to certain word frequency vector.
For example, its result classified by the above method to complaint text to be sorted is as shown in table 3 below.
Table 3
It can be seen that establishing dictionary with the above method, then complaint text to be sorted is classified with classification again, had
Higher nicety of grading.
First embodiment of the invention provides a kind of file classifying method, first by the way that complaint text to be sorted is segmented
Processing, multiple words to be matched are obtained, then carry out the dictionary of multiple words to be matched complaint problem different from sign respectively
Matching, matching result is obtained, wherein, the dictionary for characterizing different complaint problems is to be trained multiple history complaint text
Obtain, then classification is complained according to belonging to matching result determines the complaint text to be sorted, with to above-mentioned complaint to be sorted
Text is classified, the multiple dictionaries obtained in this method by training in advance so that can be by multiple words to be matched and word
Allusion quotation matches, it is hereby achieved that more accurate matching result, can carry out Accurate classification by complaint text to be sorted, realize pin
Complaint text to different complaint problems has higher nicety of grading, improves the performance of text classification.
Second embodiment
It refer to Fig. 3, a kind of structured flowchart for device for sorting document 200 that Fig. 3 provides for second embodiment of the invention, institute
The file classifying method that device is used to perform first embodiment offer is stated, described device includes:
Word segmentation processing module 210, for complaint text to be sorted to be carried out into word segmentation processing, obtain multiple words to be matched.
Matching module 220, for the dictionary of the multiple word to be matched complaint problem different from sign to be carried out respectively
Matching, obtain matching result.
Sort module 230, for according to the matching result determine it is described it is to be sorted complaint text belonging to complain classification.
Wherein, multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems.
Described device also includes:
Dictionary acquisition module, for complaining each history in text to complain text to carry out at participle to the multiple history
Reason, it is determined that the dictionary that the word for characterizing different complaint problems is formed.
Weight distribution module, for for each dictionary, semanteme and the dictionary of each word included according to the dictionary
The height of the correlation degree of characterized complaint problem, each word is divided into semantic collection, and to be weighed corresponding to each semantic collection distribution
Weight scope;And weight determination module, for determine weight in proportion range corresponding to each word from affiliated semantic collection.
Wherein, it is bigger to distribute proportion range respective weights for the semantic collection with complaining problem correlation degree higher.
Fig. 4 is refer to, the matching module 220 includes:
First word frequency vector acquiring unit 221, for obtaining each word to be matched in the multiple word to be matched
Weight in the complaint text to be sorted, using the weight of each word to be matched as the first word frequency vector.
The first word frequency vector acquiring unit 221, it is additionally operable to obtain the complaint text to be sorted using TF-IDF algorithms
The TF-IDF values of each word to be matched in this, the weight using the TF-IDF values of word to be matched as the word to be matched will
The weight of each word to be matched is as the first word frequency vector.
Second word frequency vector acquiring unit 222, for for each dictionary, being retrieved as each word distribution in the dictionary
Weight, obtain the second word frequency vector corresponding to the dictionary.
Matching unit 223, for according to default similarity mode algorithm, by the first word frequency vector respectively with each dictionary
The second word frequency vector carries out similarity mode successively corresponding to respectively, until determine matching the second word frequency vector then stop after
Continuous matching, and obtain matching result.
Wherein, the default similarity mode algorithm is included angle cosine algorithm, and the matching unit 223 also includes angle
Cosine-algorithm unit, for determining the vectorial angle between any second word frequency vector of first word frequency in the following way
Cosine, complete similarity mode:
It is A=[A by the first word frequency vector representation1,A2...An], the second word frequency vector representation is B=[B1,
B2...Bn], based on included angle cosine formulaCarry out similarity
Match somebody with somebody, obtain matching result.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description
Specific work process, may be referred to the corresponding process in preceding method, no longer excessively repeat herein.
In summary, the embodiment of the present invention provides a kind of file classifying method, device and electronic equipment, first by that will treat
Classification complains text to carry out word segmentation processing, multiple words to be matched is obtained, then by multiple words to be matched and the different throwings of sign
Tell that the dictionary of problem is matched respectively, obtain matching result, wherein, the dictionary for characterizing different complaint problems is will be multiple
History complains text to be trained what is obtained, then complains class according to belonging to matching result determines the complaint text to be sorted
Not, to classify to above-mentioned complaint text to be sorted, multiple dictionaries that training in advance obtains are passed through in this method so that can be with
By multiple words to be matched and dictionary matching, it is hereby achieved that more accurate matching result, can be by complaint text to be sorted
Accurate classification is carried out, the complaint text realized for different complaint problems has higher nicety of grading, improves text classification
Performance.
Corresponding to the file classifying method in Fig. 2, the embodiment of the present application additionally provides a kind of electronic equipment, as shown in figure 5,
The equipment includes memory 1000, processor 2000 and is stored on the memory 1000 and can manage in this place to run on device 2000
Computer program, wherein, above-mentioned processor 2000 realizes the step of above-mentioned file classifying method when performing above computer program
Suddenly.
Specifically, above-mentioned memory 1000 and processor 2000 can be general memory and processor, not do here
It is specific to limit, when the computer program of the run memory 1000 of processor 2000 storage, it is able to carry out above-mentioned document classification side
Method, so as to clearly be visually known in multiple urban nodes, two urban nodes are planned to combine scenic spot data point
Probability, further can be with scientific and reasonable to city so as to improve the tourism data analysis efficiency of industry and enterprise
Tourism planning is instructed, and promotes the development of tourist industry.
Corresponding to the file classifying method in Fig. 1, the embodiment of the present application additionally provides a kind of computer-readable recording medium,
Computer program is stored with the computer-readable recording medium, the computer program performs above-mentioned file when being run by processor
The step of sorting technique.
Specifically, the storage medium can be general storage medium, such as mobile disk, hard disk, in the storage medium
Computer program when being run, above-mentioned file classifying method is able to carry out, so as to clearly be visually known multiple
In urban node, two urban nodes are planned to combine the probability of scenic spot data point, so as to improve the tourism of industry and enterprise
Data analysis efficiency, city tourism planning can further be instructed with scientific and reasonable, promote the development of tourist industry.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can also pass through
Other modes are realized.Device embodiment described above is only schematical, for example, flow chart and block diagram in accompanying drawing
Show the device of multiple embodiments according to the present invention, method and computer program product architectural framework in the cards,
Function and operation.At this point, each square frame in flow chart or block diagram can represent the one of a module, program segment or code
Part, a part for the module, program segment or code include one or more and are used to realize holding for defined logic function
Row instruction.It should also be noted that at some as in the implementation replaced, the function that is marked in square frame can also with different from
The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially perform substantially in parallel, they are sometimes
It can perform in the opposite order, this is depending on involved function.It is it is also noted that every in block diagram and/or flow chart
The combination of individual square frame and block diagram and/or the square frame in flow chart, function or the special base of action as defined in performing can be used
Realize, or can be realized with the combination of specialized hardware and computer instruction in the system of hardware.
In addition, each functional module in each embodiment of the present invention can integrate to form an independent portion
Point or modules individualism, can also two or more modules be integrated to form an independent part.
If the function is realized in the form of software function module and is used as independent production marketing or in use, can be with
It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words
The part to be contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter
Calculation machine software product is stored in a storage medium, including some instructions are causing a computer equipment (can be
People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the present invention.
And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should be included in the scope of the protection.It should be noted that:Similar label and letter exists
Similar terms is represented in following accompanying drawing, therefore, once being defined in a certain Xiang Yi accompanying drawing, is then not required in subsequent accompanying drawing
It is further defined and explained.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained
Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality
Body or operation make a distinction with another entity or operation, and not necessarily require or imply and deposited between these entities or operation
In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to
Nonexcludability includes, so that process, method, article or equipment including a series of elements not only will including those
Element, but also the other element including being not expressly set out, or it is this process, method, article or equipment also to include
Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that
Other identical element also be present in process, method, article or equipment including the key element.
Claims (10)
1. a kind of file classifying method, it is characterised in that methods described includes:
Complaint text to be sorted is subjected to word segmentation processing, obtains multiple words to be matched;
The dictionary of the multiple word to be matched complaint problem different from sign is matched respectively, obtains matching result;
Classification is complained according to belonging to the matching result determines the complaint text to be sorted;
Wherein, multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems.
2. according to the method for claim 1, it is characterised in that also include:
Each history in text is complained to complain text to carry out word segmentation processing to the multiple history, it is determined that characterizing different complaint problems
Word form dictionary;
For each dictionary, what the semanteme of each word included according to the dictionary and the dictionary characterized complaint problem associates journey
The height of degree, each word is divided into semantic collection, and be proportion range corresponding to each semantic collection distribution;And
To determine weight in proportion range corresponding to each word from affiliated semantic collection;
Wherein, it is bigger to distribute proportion range respective weights for the semantic collection with complaining problem correlation degree higher.
3. according to the method for claim 2, it is characterised in that ask the complaint different from sign of the multiple word to be matched
The dictionary of topic is matched respectively, is obtained matching result, is specifically included:
The weight in the complaint text to be sorted of each word to be matched in the multiple word to be matched is obtained, will be every
The weight of individual word to be matched is as the first word frequency vector;
For each dictionary, be retrieved as the weight of each word distribution in the dictionary, obtain the second word frequency corresponding to the dictionary to
Amount;
According to default similarity mode algorithm, by the first word frequency vector respectively with each dictionary respectively corresponding second word frequency to
Amount carries out similarity mode successively, until determining that the second word frequency vector of matching then stops continuing to match, and obtains matching knot
Fruit.
4. according to the method for claim 3, it is characterised in that obtain each word to be matched in the multiple word to be matched
The weight in the complaint text to be sorted of language, is specifically included:
The TF-IDF values to be sorted for complaining each word to be matched in text are obtained using TF-IDF algorithms, by word to be matched
Weight of the TF-IDF values of language as the word to be matched.
5. according to the method described in any claim in claim 3-4, it is characterised in that the default similarity mode algorithm is
Included angle cosine algorithm;
The vectorial included angle cosine between any second word frequency vector of first word frequency is determined in the following way, is completed similar
Degree matching:
It is A=[A by the first word frequency vector representation1,A2...An], the second word frequency vector representation is B=[B1,B2...Bn],
Based on included angle cosine formulaCarry out similarity mode, acquisition
With result.
6. a kind of device for sorting document, it is characterised in that described device includes:
Word segmentation processing module, for complaint text to be sorted to be carried out into word segmentation processing, obtain multiple words to be matched;
Matching module, for the dictionary of the multiple word to be matched complaint problem different from sign to be matched respectively, obtain
Take matching result;
Sort module, for according to the matching result determine it is described it is to be sorted complaint text belonging to complain classification;
Wherein, multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems.
7. device according to claim 6, it is characterised in that described device also includes:
Dictionary acquisition module, for complaining each history in text to complain text to carry out word segmentation processing to the multiple history, really
Surely the dictionary that the word of different complaint problems is formed is characterized;
Weight distribution module, for for each dictionary, semanteme and the dictionary institute table of each word included according to the dictionary
The height of the correlation degree of complaint problem is levied, each word is divided into semantic collection, and be weight model corresponding to each semantic collection distribution
Enclose;And
Weight determination module, for determine weight in proportion range corresponding to each word from affiliated semantic collection;
Wherein, it is bigger to distribute proportion range respective weights for the semantic collection with complaining problem correlation degree higher.
8. device according to claim 7, it is characterised in that the matching module includes:
First word frequency vector acquiring unit, for obtaining being treated described for each word to be matched in the multiple word to be matched
The weight in text is complained in classification, using the weight of each word to be matched as the first word frequency vector;
Second word frequency vector acquiring unit, for for each dictionary, being retrieved as the weight of each word distribution in the dictionary, obtaining
To the second word frequency vector corresponding to the dictionary;
Matching unit, it is for according to default similarity mode algorithm, the first word frequency vector is right respectively with each dictionary respectively
The the second word frequency vector answered carries out similarity mode successively, until determining that the second word frequency vector of matching then stops continuation
Match somebody with somebody, and obtain matching result.
9. a kind of electronic equipment, it is characterised in that the electronic equipment includes processor and memory, the memory coupling
To the processor, the memory store instruction, when executed by the processor the electronic equipment execution
Operate below:
Complaint text to be sorted is subjected to word segmentation processing, obtains multiple words to be matched;
The dictionary of the multiple word to be matched complaint problem different from sign is matched respectively, obtains matching result;
Classification is complained according to belonging to the matching result determines the complaint text to be sorted;
Wherein, multiple history complaint text is trained to obtain by the dictionary for characterizing different complaint problems.
A kind of 10. read/write memory medium, it is characterised in that the read/write memory medium is stored in computer, it is described can
Reading storage medium includes a plurality of instruction, and a plurality of instruction is configured so that computer is performed as claim 1-5 is any
Item methods described.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711051376.5A CN107844559A (en) | 2017-10-31 | 2017-10-31 | A kind of file classifying method, device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711051376.5A CN107844559A (en) | 2017-10-31 | 2017-10-31 | A kind of file classifying method, device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107844559A true CN107844559A (en) | 2018-03-27 |
Family
ID=61682098
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711051376.5A Pending CN107844559A (en) | 2017-10-31 | 2017-10-31 | A kind of file classifying method, device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107844559A (en) |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108681819A (en) * | 2018-05-21 | 2018-10-19 | 中国平安人寿保险股份有限公司 | Employee's image grade is classified method, apparatus, computer equipment and storage medium |
CN108710627A (en) * | 2018-03-29 | 2018-10-26 | 广东欧珀移动通信有限公司 | The device and electronic equipment that method that feedback information is shown, feedback information are shown |
CN108710651A (en) * | 2018-05-08 | 2018-10-26 | 华南理工大学 | A kind of large scale customer complaint data automatic classification method |
CN109146395A (en) * | 2018-06-29 | 2019-01-04 | 阿里巴巴集团控股有限公司 | A kind of method, device and equipment of data processing |
CN109189890A (en) * | 2018-09-12 | 2019-01-11 | 张连祥 | Complaint of inviting outside investment coordinates intelligence and handles system and method |
CN109271481A (en) * | 2018-08-31 | 2019-01-25 | 国网河北省电力有限公司沧州供电分公司 | A kind of classification method, system and the terminal device of electric power demand information |
CN109446321A (en) * | 2018-10-11 | 2019-03-08 | 深圳前海达闼云端智能科技有限公司 | Text classification method, text classification device, terminal and computer readable storage medium |
CN109670843A (en) * | 2018-11-12 | 2019-04-23 | 平安科技(深圳)有限公司 | Data processing method, device, computer equipment and the storage medium of complaint business |
CN109684467A (en) * | 2018-11-16 | 2019-04-26 | 北京奇虎科技有限公司 | A kind of classification method and device of text |
CN109684475A (en) * | 2018-11-21 | 2019-04-26 | 斑马网络技术有限公司 | Processing method, device, equipment and the storage medium of complaint |
CN109739985A (en) * | 2018-12-26 | 2019-05-10 | 斑马网络技术有限公司 | Automatic document classification method, equipment and storage medium |
CN109815709A (en) * | 2018-12-11 | 2019-05-28 | 顺丰科技有限公司 | Recognition methods, device, equipment and the storage medium that sensitive information illegally copies |
CN110134957A (en) * | 2019-05-14 | 2019-08-16 | 云南电网有限责任公司电力科学研究院 | A kind of scientific and technological achievement storage method and system based on semantic analysis |
CN110347840A (en) * | 2019-07-18 | 2019-10-18 | 携程计算机技术(上海)有限公司 | Complain prediction technique, system, equipment and the storage medium of text categories |
CN110598200A (en) * | 2018-06-13 | 2019-12-20 | 北京百度网讯科技有限公司 | Semantic recognition method and device |
CN110705245A (en) * | 2018-07-09 | 2020-01-17 | 中国移动通信集团有限公司 | Method and device for acquiring reference processing scheme and storage medium |
CN110728142A (en) * | 2019-09-09 | 2020-01-24 | 上海凯京信达科技集团有限公司 | Method and device for identifying running files, computer storage medium and electronic equipment |
CN110888977A (en) * | 2018-09-05 | 2020-03-17 | 广州视源电子科技股份有限公司 | Text classification method and device, computer equipment and storage medium |
CN110895703A (en) * | 2018-09-12 | 2020-03-20 | 北京国双科技有限公司 | Legal document routing identification method and device |
WO2020087774A1 (en) * | 2018-10-31 | 2020-05-07 | 平安科技(深圳)有限公司 | Concept-tree-based intention recognition method and apparatus, and computer device |
CN111126055A (en) * | 2019-10-28 | 2020-05-08 | 国电南瑞科技股份有限公司 | Power grid equipment name matching method and system |
CN111159355A (en) * | 2019-12-31 | 2020-05-15 | 中国银行股份有限公司 | Customer complaint order processing method and device |
CN111191445A (en) * | 2018-11-15 | 2020-05-22 | 北京京东金融科技控股有限公司 | Advertisement text classification method and device |
CN111339290A (en) * | 2018-11-30 | 2020-06-26 | 北京嘀嘀无限科技发展有限公司 | Text classification method and system |
CN111460098A (en) * | 2020-03-27 | 2020-07-28 | 深圳价值在线信息科技股份有限公司 | Text matching method and device and terminal equipment |
CN111506727A (en) * | 2020-04-16 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Text content category acquisition method and device, computer equipment and storage medium |
CN111831286A (en) * | 2019-04-12 | 2020-10-27 | 中国移动通信集团河南有限公司 | User complaint processing method and device |
CN111860657A (en) * | 2020-07-23 | 2020-10-30 | 中国建设银行股份有限公司 | Image classification method and device, electronic equipment and storage medium |
CN112184027A (en) * | 2020-09-29 | 2021-01-05 | 壹链盟生态科技有限公司 | Task progress updating method and device and storage medium |
CN112445910A (en) * | 2019-09-02 | 2021-03-05 | 上海哔哩哔哩科技有限公司 | Information classification method and system |
CN112463929A (en) * | 2020-12-11 | 2021-03-09 | 广东电网有限责任公司佛山供电局 | Automatic classification method of fault information |
CN112507113A (en) * | 2020-09-18 | 2021-03-16 | 青岛海洋科学与技术国家实验室发展中心 | Ocean big data text classification method and system |
CN112528031A (en) * | 2021-02-09 | 2021-03-19 | 中关村科学城城市大脑股份有限公司 | Work order intelligent distribution method and system |
CN112528673A (en) * | 2020-12-14 | 2021-03-19 | 中国联合网络通信集团有限公司 | Text batch processing method, system, terminal equipment and computer storage medium |
CN112989761A (en) * | 2021-05-20 | 2021-06-18 | 腾讯科技(深圳)有限公司 | Text classification method and device |
CN113010669A (en) * | 2020-12-24 | 2021-06-22 | 华戎信息产业有限公司 | News classification method and system |
CN113111898A (en) * | 2020-02-13 | 2021-07-13 | 北京明亿科技有限公司 | Vehicle type determination method and device based on support vector machine |
CN113705200A (en) * | 2021-08-31 | 2021-11-26 | 中国平安财产保险股份有限公司 | Method, device and equipment for analyzing complaint behavior data and storage medium |
CN113779973A (en) * | 2020-06-09 | 2021-12-10 | 杭州晨熹多媒体科技有限公司 | Text data processing method and device |
CN113934848A (en) * | 2021-10-22 | 2022-01-14 | 马上消费金融股份有限公司 | Data classification method and device and electronic equipment |
CN114090620A (en) * | 2022-01-19 | 2022-02-25 | 支付宝(杭州)信息技术有限公司 | Query request processing method and device |
CN111191445B (en) * | 2018-11-15 | 2024-04-19 | 京东科技控股股份有限公司 | Advertisement text classification method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102541958A (en) * | 2010-12-30 | 2012-07-04 | 百度在线网络技术(北京)有限公司 | Method, device and computer equipment for identifying short text category information |
US20140337349A1 (en) * | 2013-05-09 | 2014-11-13 | Hon Hai Precision Industry Co., Ltd. | Electronic device and document classification method |
CN104182388A (en) * | 2014-07-21 | 2014-12-03 | 安徽华贞信息科技有限公司 | Semantic analysis based text clustering system and method |
CN104424279A (en) * | 2013-08-30 | 2015-03-18 | 腾讯科技(深圳)有限公司 | Text relevance calculating method and device |
CN104750835A (en) * | 2015-04-03 | 2015-07-01 | 浪潮集团有限公司 | Text classification method and device |
CN106250398A (en) * | 2016-07-19 | 2016-12-21 | 北京京东尚科信息技术有限公司 | A kind of complaint classifying content decision method complaining event and device |
-
2017
- 2017-10-31 CN CN201711051376.5A patent/CN107844559A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102541958A (en) * | 2010-12-30 | 2012-07-04 | 百度在线网络技术(北京)有限公司 | Method, device and computer equipment for identifying short text category information |
US20140337349A1 (en) * | 2013-05-09 | 2014-11-13 | Hon Hai Precision Industry Co., Ltd. | Electronic device and document classification method |
CN104424279A (en) * | 2013-08-30 | 2015-03-18 | 腾讯科技(深圳)有限公司 | Text relevance calculating method and device |
CN104182388A (en) * | 2014-07-21 | 2014-12-03 | 安徽华贞信息科技有限公司 | Semantic analysis based text clustering system and method |
CN104750835A (en) * | 2015-04-03 | 2015-07-01 | 浪潮集团有限公司 | Text classification method and device |
CN106250398A (en) * | 2016-07-19 | 2016-12-21 | 北京京东尚科信息技术有限公司 | A kind of complaint classifying content decision method complaining event and device |
Cited By (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710627A (en) * | 2018-03-29 | 2018-10-26 | 广东欧珀移动通信有限公司 | The device and electronic equipment that method that feedback information is shown, feedback information are shown |
CN108710651A (en) * | 2018-05-08 | 2018-10-26 | 华南理工大学 | A kind of large scale customer complaint data automatic classification method |
CN108710651B (en) * | 2018-05-08 | 2022-03-25 | 华南理工大学 | Automatic classification method for large-scale customer complaint data |
CN108681819A (en) * | 2018-05-21 | 2018-10-19 | 中国平安人寿保险股份有限公司 | Employee's image grade is classified method, apparatus, computer equipment and storage medium |
CN110598200A (en) * | 2018-06-13 | 2019-12-20 | 北京百度网讯科技有限公司 | Semantic recognition method and device |
CN110598200B (en) * | 2018-06-13 | 2023-05-23 | 北京百度网讯科技有限公司 | Semantic recognition method and device |
CN109146395A (en) * | 2018-06-29 | 2019-01-04 | 阿里巴巴集团控股有限公司 | A kind of method, device and equipment of data processing |
CN109146395B (en) * | 2018-06-29 | 2022-04-05 | 创新先进技术有限公司 | Data processing method, device and equipment |
CN110705245A (en) * | 2018-07-09 | 2020-01-17 | 中国移动通信集团有限公司 | Method and device for acquiring reference processing scheme and storage medium |
CN110705245B (en) * | 2018-07-09 | 2023-04-28 | 中移(成都)信息通信科技有限公司 | Method and device for acquiring reference processing scheme and storage medium |
CN109271481A (en) * | 2018-08-31 | 2019-01-25 | 国网河北省电力有限公司沧州供电分公司 | A kind of classification method, system and the terminal device of electric power demand information |
CN110888977A (en) * | 2018-09-05 | 2020-03-17 | 广州视源电子科技股份有限公司 | Text classification method and device, computer equipment and storage medium |
CN110895703A (en) * | 2018-09-12 | 2020-03-20 | 北京国双科技有限公司 | Legal document routing identification method and device |
CN109189890A (en) * | 2018-09-12 | 2019-01-11 | 张连祥 | Complaint of inviting outside investment coordinates intelligence and handles system and method |
CN110895703B (en) * | 2018-09-12 | 2023-05-23 | 北京国双科技有限公司 | Legal document case recognition method and device |
CN109446321A (en) * | 2018-10-11 | 2019-03-08 | 深圳前海达闼云端智能科技有限公司 | Text classification method, text classification device, terminal and computer readable storage medium |
WO2020087774A1 (en) * | 2018-10-31 | 2020-05-07 | 平安科技(深圳)有限公司 | Concept-tree-based intention recognition method and apparatus, and computer device |
CN109670843A (en) * | 2018-11-12 | 2019-04-23 | 平安科技(深圳)有限公司 | Data processing method, device, computer equipment and the storage medium of complaint business |
CN111191445B (en) * | 2018-11-15 | 2024-04-19 | 京东科技控股股份有限公司 | Advertisement text classification method and device |
CN111191445A (en) * | 2018-11-15 | 2020-05-22 | 北京京东金融科技控股有限公司 | Advertisement text classification method and device |
CN109684467A (en) * | 2018-11-16 | 2019-04-26 | 北京奇虎科技有限公司 | A kind of classification method and device of text |
CN109684475A (en) * | 2018-11-21 | 2019-04-26 | 斑马网络技术有限公司 | Processing method, device, equipment and the storage medium of complaint |
CN111339290A (en) * | 2018-11-30 | 2020-06-26 | 北京嘀嘀无限科技发展有限公司 | Text classification method and system |
CN109815709A (en) * | 2018-12-11 | 2019-05-28 | 顺丰科技有限公司 | Recognition methods, device, equipment and the storage medium that sensitive information illegally copies |
CN109815709B (en) * | 2018-12-11 | 2023-10-10 | 顺丰科技有限公司 | Method, device, equipment and storage medium for identifying illegal copies of sensitive information |
CN109739985A (en) * | 2018-12-26 | 2019-05-10 | 斑马网络技术有限公司 | Automatic document classification method, equipment and storage medium |
CN111831286A (en) * | 2019-04-12 | 2020-10-27 | 中国移动通信集团河南有限公司 | User complaint processing method and device |
CN111831286B (en) * | 2019-04-12 | 2023-11-14 | 中国移动通信集团河南有限公司 | User complaint processing method and device |
CN110134957A (en) * | 2019-05-14 | 2019-08-16 | 云南电网有限责任公司电力科学研究院 | A kind of scientific and technological achievement storage method and system based on semantic analysis |
CN110134957B (en) * | 2019-05-14 | 2023-06-13 | 云南电网有限责任公司电力科学研究院 | Scientific and technological achievement warehousing method and system based on semantic analysis |
CN110347840B (en) * | 2019-07-18 | 2023-06-13 | 携程计算机技术(上海)有限公司 | Prediction method, system, equipment and storage medium for complaint text category |
CN110347840A (en) * | 2019-07-18 | 2019-10-18 | 携程计算机技术(上海)有限公司 | Complain prediction technique, system, equipment and the storage medium of text categories |
CN112445910A (en) * | 2019-09-02 | 2021-03-05 | 上海哔哩哔哩科技有限公司 | Information classification method and system |
CN112445910B (en) * | 2019-09-02 | 2022-12-27 | 上海哔哩哔哩科技有限公司 | Information classification method and system |
CN110728142B (en) * | 2019-09-09 | 2023-12-22 | 上海斑马来拉物流科技有限公司 | Method and device for identifying stream file, computer storage medium and electronic equipment |
CN110728142A (en) * | 2019-09-09 | 2020-01-24 | 上海凯京信达科技集团有限公司 | Method and device for identifying running files, computer storage medium and electronic equipment |
CN111126055A (en) * | 2019-10-28 | 2020-05-08 | 国电南瑞科技股份有限公司 | Power grid equipment name matching method and system |
CN111159355A (en) * | 2019-12-31 | 2020-05-15 | 中国银行股份有限公司 | Customer complaint order processing method and device |
CN113111898A (en) * | 2020-02-13 | 2021-07-13 | 北京明亿科技有限公司 | Vehicle type determination method and device based on support vector machine |
CN111460098A (en) * | 2020-03-27 | 2020-07-28 | 深圳价值在线信息科技股份有限公司 | Text matching method and device and terminal equipment |
CN111460098B (en) * | 2020-03-27 | 2023-08-25 | 深圳价值在线信息科技股份有限公司 | Text matching method and device and terminal equipment |
CN111506727B (en) * | 2020-04-16 | 2023-10-03 | 腾讯科技(深圳)有限公司 | Text content category acquisition method, apparatus, computer device and storage medium |
CN111506727A (en) * | 2020-04-16 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Text content category acquisition method and device, computer equipment and storage medium |
CN113779973A (en) * | 2020-06-09 | 2021-12-10 | 杭州晨熹多媒体科技有限公司 | Text data processing method and device |
CN111860657A (en) * | 2020-07-23 | 2020-10-30 | 中国建设银行股份有限公司 | Image classification method and device, electronic equipment and storage medium |
CN112507113A (en) * | 2020-09-18 | 2021-03-16 | 青岛海洋科学与技术国家实验室发展中心 | Ocean big data text classification method and system |
CN112184027B (en) * | 2020-09-29 | 2023-12-26 | 壹链盟生态科技有限公司 | Task progress updating method, device and storage medium |
CN112184027A (en) * | 2020-09-29 | 2021-01-05 | 壹链盟生态科技有限公司 | Task progress updating method and device and storage medium |
CN112463929A (en) * | 2020-12-11 | 2021-03-09 | 广东电网有限责任公司佛山供电局 | Automatic classification method of fault information |
CN112528673A (en) * | 2020-12-14 | 2021-03-19 | 中国联合网络通信集团有限公司 | Text batch processing method, system, terminal equipment and computer storage medium |
CN113010669A (en) * | 2020-12-24 | 2021-06-22 | 华戎信息产业有限公司 | News classification method and system |
CN112528031A (en) * | 2021-02-09 | 2021-03-19 | 中关村科学城城市大脑股份有限公司 | Work order intelligent distribution method and system |
CN112989761A (en) * | 2021-05-20 | 2021-06-18 | 腾讯科技(深圳)有限公司 | Text classification method and device |
CN113705200B (en) * | 2021-08-31 | 2023-09-15 | 中国平安财产保险股份有限公司 | Analysis method, analysis device, analysis equipment and analysis storage medium for complaint behavior data |
CN113705200A (en) * | 2021-08-31 | 2021-11-26 | 中国平安财产保险股份有限公司 | Method, device and equipment for analyzing complaint behavior data and storage medium |
CN113934848A (en) * | 2021-10-22 | 2022-01-14 | 马上消费金融股份有限公司 | Data classification method and device and electronic equipment |
CN114090620A (en) * | 2022-01-19 | 2022-02-25 | 支付宝(杭州)信息技术有限公司 | Query request processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107844559A (en) | A kind of file classifying method, device and electronic equipment | |
CN107609121B (en) | News text classification method based on LDA and word2vec algorithm | |
WO2019214245A1 (en) | Information pushing method and apparatus, and terminal device and storage medium | |
CN101794311B (en) | Fuzzy data mining based automatic classification method of Chinese web pages | |
CN106649818B (en) | Application search intention identification method and device, application search method and server | |
Li et al. | Twiner: named entity recognition in targeted twitter stream | |
CN109815314B (en) | Intent recognition method, recognition device and computer readable storage medium | |
US10565233B2 (en) | Suffix tree similarity measure for document clustering | |
CN106776574B (en) | User comment text mining method and device | |
CN106156372B (en) | A kind of classification method and device of internet site | |
CN111797239B (en) | Application program classification method and device and terminal equipment | |
CN103886108B (en) | The feature selecting and weighing computation method of a kind of unbalanced text set | |
CN102194013A (en) | Domain-knowledge-based short text classification method and text classification system | |
CN110543595B (en) | In-station searching system and method | |
CN109885688A (en) | File classification method, device, computer readable storage medium and electronic equipment | |
CN111767716A (en) | Method and device for determining enterprise multilevel industry information and computer equipment | |
CN109840532A (en) | A kind of law court's class case recommended method based on k-means | |
CN113254643B (en) | Text classification method and device, electronic equipment and text classification program | |
CN114003721A (en) | Construction method, device and application of dispute event type classification model | |
CN111090994A (en) | Chinese-internet-forum-text-oriented event place attribution province identification method | |
CN110910175A (en) | Tourist ticket product portrait generation method | |
CN111753526A (en) | Similar competitive product data analysis method and system | |
CN108681977A (en) | A kind of lawyer's information processing method and system | |
CN108614860A (en) | A kind of lawyer's information processing method and system | |
Guadie et al. | Amharic text summarization for news items posted on social media |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 101-8, 1st floor, building 31, area 1, 188 South Fourth Ring Road West, Fengtai District, Beijing Applicant after: Guoxin Youyi Data Co., Ltd Address before: 100070, No. 188, building 31, headquarters square, South Fourth Ring Road West, Fengtai District, Beijing Applicant before: SIC YOUE DATA Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180327 |