CN108228622A - The sorting technique and device of traffic issues - Google Patents

The sorting technique and device of traffic issues Download PDF

Info

Publication number
CN108228622A
CN108228622A CN201611159082.XA CN201611159082A CN108228622A CN 108228622 A CN108228622 A CN 108228622A CN 201611159082 A CN201611159082 A CN 201611159082A CN 108228622 A CN108228622 A CN 108228622A
Authority
CN
China
Prior art keywords
text
sorted
probability
class
dimensional feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611159082.XA
Other languages
Chinese (zh)
Inventor
韩茂琨
王健宗
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201611159082.XA priority Critical patent/CN108228622A/en
Publication of CN108228622A publication Critical patent/CN108228622A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The present invention is suitable for field of information processing, provides the sorting technique and device of a kind of traffic issues, including:Obtain text to be sorted input by user;Extract the first problem feature in text to be sorted;The N-dimensional feature vector of text to be sorted is generated, the N-dimensional feature vector describes N number of attribute of text to be sorted, each attribute represents a first problem feature;N-dimensional feature vector is inputted into disaggregated model;The output parameter of disaggregated model is obtained, output parameter is the classification results of text to be sorted.In embodiments of the present invention, as long as by the text input sorter to be sorted of customer problem is represented, the classification results of problem are inputted about user with regard to can directly export, realize the automatic classification of traffic issues, it need not be realized by traditional manual sort's gimmick, therefore classification effectiveness is improved, meets the daily business classification demand of user, thus further improve the efficiency of business consultation and business handling.

Description

The sorting technique and device of traffic issues
Technical field
The invention belongs to field of information processing more particularly to the sorting techniques and device of a kind of traffic issues.
Background technology
With the fast development of financial business, the classification of financial business gradually increases.Therefore, if client want consulting about Some flow problems of the specific business oneself to be handled are generally required after the corresponding financial classification of the business is selected, The professional of the corresponding business can just be related to answer.For the client for not having professional knowledge, it is difficult to distinguish certainly Which classification oneself the financial business to be handled belongs to.Therefore, it must first be seeked advice from about this before idiographic flow problem is seeked advice from The classification problem of business.
In actual life, usually the classification problem about business is answered by an exclusive personnel, such as bank is big Hall manager.And the method only manually classified handles the classification problem of high-volume client, classification effectiveness is very low, far from Can meet the needs of daily, seriously constrain the development of financial business.
Invention content
In view of this, an embodiment of the present invention provides the sorting technique and device of a kind of traffic issues, to solve existing industry The sorting technique classification effectiveness of business problem is low, the problem of can not meeting user's daily demand.
In a first aspect, a kind of sorting technique of traffic issues is provided, including:
Obtain text to be sorted input by user;
Extract the first problem feature in the text to be sorted;
The N-dimensional feature vector of the text to be sorted is generated, the N-dimensional feature vector describes the text to be sorted N number of attribute, each attribute represent a first problem feature;
The N-dimensional feature vector is inputted into disaggregated model;
The output parameter of the disaggregated model is obtained, the output parameter is the classification results of the text to be sorted;
Wherein, the N is the integer more than zero.
Second aspect provides a kind of sorter of traffic issues, including:
Acquiring unit, for obtaining text to be sorted input by user;
Extraction unit, for extracting the first problem feature in the text to be sorted;
Generation unit, for generating the N-dimensional feature vector of the text to be sorted, the N-dimensional feature vector describes institute N number of attribute of text to be sorted is stated, each attribute represents a first problem feature;
Input unit, for the N-dimensional feature vector to be inputted disaggregated model;
Taxon, for obtaining the output parameter of the disaggregated model, the output parameter is the text to be sorted Classification results;
Wherein, the N is the integer more than zero.
In embodiments of the present invention, by generating the feature vector of text to be sorted, and with each category in feature vector The first problem feature of property presentation class text so that the text can be classified model in the form of readable computer data Analyzing and processing.As long as using as the text input sorter to be sorted of customer problem, it will be able to directly export defeated about user Enter the classification results of problem, realize the automatic classification of traffic issues, realized without by traditional manual sort's gimmick, because This improves classification effectiveness, meets the daily business classification demand of user, thus further improves the efficiency of business consultation And the efficiency of business handling.
Description of the drawings
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description be only the present invention some Embodiment, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these Attached drawing obtains other attached drawings.
Fig. 1 is the realization flow chart of the sorting technique of traffic issues provided in an embodiment of the present invention;
Fig. 2 is the specific implementation flow chart of the sorting technique S102 of traffic issues provided in an embodiment of the present invention;
Fig. 3 is the specific implementation flow chart of the sorting technique S202 of traffic issues provided in an embodiment of the present invention;
Fig. 4 is that simultaneously train classification models are built in the sorting technique for the traffic issues that another embodiment of the present invention provides Implement flow chart;
Fig. 5 is the specific implementation stream of S405 during structure and train classification models in another embodiment of the present invention Cheng Tu;
Fig. 6 is the structure diagram of the sorter of traffic issues provided in an embodiment of the present invention.
Specific embodiment
In being described below, in order to illustrate rather than in order to limit, it is proposed that such as tool of particular system structure, technology etc Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specifically The present invention can also be realized in the other embodiments of details.In other situations, it omits to well-known system, device, electricity Road and the detailed description of method, in case unnecessary details interferes description of the invention.
The embodiment of the present invention realized based on the disaggregated model of traffic issues, by generate the feature of text to be sorted to It measures, and the first problem feature of classifying text is represented with each attribute in feature vector so that the text can be with readable The form of computer data is classified model analysis processing.As long as it is filled classifying as the text input to be sorted of customer problem It puts, it will be able to which directly output inputs the classification results of problem about user, the automatic classification of traffic issues is realized, without dependence Traditional manual sort's gimmick is realized, therefore improve classification effectiveness, meets the daily business classification demand of user, thus Further improve the efficiency of business consultation and the efficiency of business handling.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
Fig. 1 shows the realization flow of the sorting technique of traffic issues provided in an embodiment of the present invention, and details are as follows:
In S101, text to be sorted input by user is obtained.
Text to be sorted is asking questions either specifically about the business handled needed for user about class of service Professional problem is directly inputted by user and is stored in sorter.
Particularly, which can also be to be stored in the problems in third party database or other systems record text This.
This sorter is from text data the problem of locally reading is above-mentioned pre-stored or passes through database link or data The technologies such as library mapping, call problem log from other systems, so as to fulfill text to be sorted input by user is obtained.
In S102, the first problem feature in the text to be sorted is extracted.
In the embodiment of the present invention, " the first problem feature " or " Second Problem feature " only problem of representation feature is original Object is different, convenient for user area sub-argument solution.
For each text to be sorted, since its essence is a question text, it is come group by multiple words Into.Each words has its unique meaning, available for giving expression to some attributive character of the question text.The attributive character As first problem feature is the abstract result of the physical characteristic of text to be sorted.
As one embodiment of the present of invention, Fig. 2 shows the sorting techniques of traffic issues provided in an embodiment of the present invention The specific implementation flow of S102, details are as follows:
In S201, the text to be sorted is pre-processed, to obtain effective text in the text to be sorted.
It is detected due to there are the interference of various stop words, needing first to treat classifying text, confirmation wherein whether there is One or more stop words, and there will be stop words rejected from the text to be sorted.The process of rejecting is to be treated to above-mentioned The process that classifying text is pre-processed.
Wherein, stop words can be the words that adverbial word, preposition, auxiliary words of mood and conjunction etc. do not have explicit semantic meaning. After above-mentioned words is chosen and rejected from text to be sorted, one can be obtained using the higher effective text of degree.
Specifically, the process of pretreatment is including but not limited to being implemented as described below mode:Based on segmentation methods by text to be sorted Cutting is carried out, the participle in the text data of cutting is matched successively with the stop words in deactivated dictionary.In the process, If a certain participle successful match, which is determined as stop words, and carry out delete operation to it;If it is a certain participle not with The participle is then judged effective word, remained in text to be sorted by any stop words successful match.All participle matchings and place After the completion of reason, effective text of text to be sorted is obtained.
In S202, the first problem feature of effective text described in the text to be sorted is extracted.
All words in effective text are a component part in text to be sorted, therefore, will in the present embodiment Treat the extraction of feature the problem of the extraction of problem characteristic in classifying text is refined as treating effective text in classifying text.So as to Ensure to be characterized as optimization, most effective index feature the problem of currently extracting, can reduce in follow-up data processing procedure The dimension of feature vector improves the classification accuracy and efficiency of traffic issues.
In S103, the N-dimensional feature vector of the text to be sorted is generated, the N-dimensional feature vector describes described treat N number of attribute of classifying text, each attribute represent a first problem feature.
Text to be sorted is converted to the N-dimensional feature vector A=(a for the N number of attribute that can describe this bar text to be sorted1, a2... ..., aN).Wherein, each attribute represents a problem characteristic in this bar text to be sorted, and N is the integer more than zero.
The present embodiment by by it is complicated and changeable, an objectively existing text to be sorted be converted to it is simple, with The feature vector that fundamental and physical characteristic are formed, and the entirety about text to be sorted shown according to feature vector Property, the Data Analysis Services for subsequent classification model provide the foundation.
As one embodiment of the present of invention, Fig. 3 shows the sorting technique of traffic issues provided in an embodiment of the present invention The specific implementation flow of S202, content include:
In S301, participle operation is performed to effective text, obtains multiple words.
In S302, each word is handled by word frequency-reverse document-frequency TF-IDF algorithms, to obtain The metric of each significance level of the word in effective text.
After effective text carries out word segmentation processing among multiple words for obtaining, there are any one or several keywords, These keywords are the word that not can all occur in every text to be sorted, and usually only in certain texts to be sorted It will appear, be the word with high identification.Therefore, important status is occupied in this effectively text, relatively For other words, there is larger weight.
In order to weigh weight or significance level of each word in this bar text to be sorted, distinguished with TF-IDF algorithms Each word is handled, so as to obtain one specifically about the metric values of its significance level.
Preferably, if representing the metric size of any word with symbol DF, the maximum constraint of DF is 0.5.That is, work as When the DF value of the word got is more than 0.5, which is modified to 0.5, it is made to meet the limitation of the condition of the maximum value.By This, avoids the DF value for occurring excessively high weight in text to be sorted, it is prevented to be partial to the longer text to be sorted of text size, The possibility that error occurs is reduced, improves the accuracy rate of traffic issues classification.
In S303, each described metric is exported respectively special for the first problem in a text to be sorted Sign.
In the present embodiment, each metric DF is a first problem feature with classifying text.It is if to be sorted Effective text in text has N number of DF value, then the text to be sorted correspondingly possesses N number of first problem feature.
In S104, the N-dimensional feature vector is inputted into disaggregated model.
In S105, the output parameter of the disaggregated model, point of the output parameter for the text to be sorted are obtained Class result.
Input of the N-dimensional feature vector as traffic issues disaggregated model of each problem characteristic of text to be sorted will be included Parameter, by automatically processing for disaggregated model, the problem of this feature vectorial corresponding text to be sorted can be exported classification results.
The embodiment of the present invention realized based on the disaggregated model of traffic issues, by generate the feature of text to be sorted to It measures, and the first problem feature of classifying text is represented with each attribute in feature vector so that the text can be with readable The form of computer data is classified model analysis processing.As long as it is filled classifying as the text input to be sorted of customer problem It puts, it will be able to which directly output inputs the classification results of problem about user, the automatic classification of traffic issues is realized, without dependence Traditional manual sort's gimmick is realized, therefore improve classification effectiveness, meets the daily business classification demand of user, thus Further improve the efficiency of business consultation and the efficiency of business handling.
As an alternative embodiment of the invention, on the basis of above-described embodiment, before step S104, the method It further includes:Based on training text structure and train classification models.
In the present embodiment, by collect it is various under the conditions of the problem of training text or obtain quantity it is enough the problem of instruct Practice text to build simultaneously train classification models.The training text has the problem of label sample for known business classification, is used for It generates disaggregated model and adjusts the parameter of disaggregated model so that the model can be based on supervised learning, reach in practical applications To required classification performance.
Specifically, as shown in figure 4, the process of structure and train classification models includes:
In S401, the training text of a plurality of known business classification is obtained.
Training text, such as can be the real problems historical record of user.
Preferably, training text is a plurality of problem sample extracted from the typical problem database of each operation system This.And category label is carried in problem sample, so as to know a current training text is the business from which classification It is obtained in system.
In S402, the Second Problem feature in every training text is extracted.
In S403, the M dimensional feature vectors of every training text are generated respectively, and the M dimensional feature vectors describe institute M attribute of training text is stated, each attribute represents a Second Problem feature.
For the content described in above-described embodiment step S102 and S103 and its content in corresponding specific embodiment, It is equally applicable in the S402 and S403 of the present embodiment, is known business difference lies in the urtext handled in the present embodiment Multiple training texts of classification, the urtext handled in S102 to S103 are text to be identified, remaining realization principle all same, It does not repeat one by one herein.
In S404, for each attribute in the M dimensional feature vectors, it is obtained respectively in each service class Probability of occurrence in not.
In S405, according to the probability of occurrence, the M dimensional feature vectors are calculated respectively in each class of service In posterior probability.
In S406, according to the posterior probability, Naive Bayes Classification Model is generated, and the naive Bayesian is divided Class model is determined as the disaggregated model.
In the present embodiment, model construction and the process of training, with each problem characteristic of each training text every Probability distribution in a known business classification is as classification foundation.Its principle is as follows:
Assuming that the problem of training text may finally export classification is respectively c1、c2、……、ck, k is the integer more than zero, ciFor one of traffic issues classification, and i≤k, i ∈ Z.For representing the feature vector x of a training text, it is assumed that its Belong to ciThe conditional probability of class for P (x | ci);Assuming that classification ciPrior probability be P (ci), P (ci) value be ciThe instruction that class includes Practice the sum of training text in textual data divided by all categories, P (x) is the feature vector x of training text in all classs of service In probability of occurrence.
According to Bayes' theorem, classification ciPosterior probability P (ci| x) it is:
If the attribute in each feature vector is conditional sampling,:
Wherein, d is characterized the sum of attribute in vector x, aiThe each property value being characterized in vector x.
Calculate in above-mentioned formula that i is in the case of be possible to value, P (ci| x) corresponding value.It and will wherein P (ci|x) Maximum value, which selects, to be come.At this point, P (ci| x) the maximum corresponding classification c of valueiJudgement classification for the training text.
It, only need to be by molecule since denominator P (x) is constantIt maximizes and can obtain P (ci | maximum value x), so as to obtain the judgement classification of training text.
According to the judgement classification of every obtained training text and its otherness of true affiliated class of service, structure is simultaneously Adjust the model parameter of Naive Bayes Classification Model so that the judgement classification of every training text and its true affiliated business Classification is as identical as possible, so as to obtain final disaggregated model.
In the present embodiment, based on a certain number of training texts and according to naive Bayesian theorem come learning classification mould The parameter of type so that the disaggregated model finally obtained is suitable for true business classification scene, and the classification for improving problem is accurate Property.Since the sample size of training text is insufficient, the truth that various problems are likely to occur can not very be covered, this Embodiment employs Naive Bayes Classification Algorithm, data set text is ultrashort and the situation of problem characteristic quantity wretched insufficiency Under, solution is admirably provided, thus also further improves the accuracy rate of Question Classification.
As one embodiment of the present of invention, Fig. 5 shows the mistake of structure and train classification models in the embodiment of the present invention The specific implementation flow of S405 in journey, details are as follows:
In S501, the prior probability of each class of service is obtained.
In S502, processing is modified to the prior probability of class of service each described using probability smoothing algorithm, So that each attribute in the M dimensional feature vectors, the summation of probability occurred in each described class of service Less than one.
Because in true problem-oriented language, most of word belongs to low-frequency word.In the feature vector of training text, if certain A attribute never occurs, then according to maximal possibility estimation, the probability of these events is zero.However the true probability of these events is simultaneously Differ and be set to zero.Therefore, in the probability of each attribute in calculating feature vector, retain a part of probability in real scene not The attribute known.That is, the sum of probability of occurrence by attribute each in feature vector in each class of service is modified to one and is less than 1.0 value.
In the present embodiment, it could be adjusted to reach said effect by the prior probability to class of service.Probability is smooth Algorithm is specially Lidstone algorithms, and the modified process of the prior probability of each class of service is as follows:
Wherein, P (ci) it is classification ciPrior probability,Represent c in training set DiThe sum of class training sample, T are The sum of class of service.
In S503, the priori of each class of service is general according to being obtained after the probability of occurrence and processing Rate calculates posterior probability of the M dimensional feature vectors in each described class of service respectively.
For the content described in above-described embodiment step S404 to S406, it is equally applicable in the present embodiment.According to The each classification c obtained after probability smoothing processingiPrior probability, the posterior probability P (c of feature vector x can be calculatedi| x), Realization principle is same as the previously described embodiments, does not repeat one by one herein.
The present embodiment suitably reduces the probability for occurring excessive attribute event in training text, reduction is obtained general Rate density distribution prevents occurring instruction in the practical application scene of Question Classification to the event not occurred in training text Practice the attribute not occurred in text, the class for solving the text to be sorted estimated by causing due to training text lazy weight Not insecure problem, so as to improve the classification accuracy of model.
It should be understood that the size of the serial number of each step is not meant to the priority of execution sequence, each process in above-described embodiment Execution sequence should determine that the implementation process without coping with the embodiment of the present invention forms any limit with its function and internal logic It is fixed.
Corresponding to the sorting technique of the traffic issues described in foregoing embodiments, Fig. 6 shows provided in an embodiment of the present invention The structure diagram of the sorter of traffic issues, the sorters of the traffic issues can be software unit, hardware cell or Person is the unit of soft or hard combination.For convenience of description, part related to the present embodiment is illustrated only.
With reference to Fig. 6, which includes:
Acquiring unit 61, for obtaining text to be sorted input by user.
Extraction unit 62, for extracting the first problem feature in the text to be sorted.
Generation unit 63, for generating the N-dimensional feature vector of the text to be sorted, the N-dimensional feature vector describes N number of attribute of the text to be sorted, each attribute represent a first problem feature.
Input unit 64, for the N-dimensional feature vector to be inputted disaggregated model.
Taxon 65, for obtaining the output parameter of the disaggregated model, the output parameter is the text to be sorted This classification results.
Wherein, the N is the integer more than zero.
Optionally, the extraction unit 62 includes:
Subelement is pre-processed, for being pre-processed to the text to be sorted, to obtain in the text to be sorted Effective text.
First extraction subelement, for extracting the first problem feature of effective text described in the text to be sorted.
Optionally, the first extraction subelement is specifically used for:
Participle operation is performed to effective text, obtains multiple words;
Each word is handled by word frequency-reverse document-frequency TF-IDF algorithms, it is each described to obtain The metric of significance level of the word in effective text;
Each described metric is exported respectively as the first problem feature in a text to be sorted.
Optionally, described device further includes:
Training unit, for being based on training text structure and train classification models.
Wherein, the training unit further includes:
Subelement is obtained, for obtaining the training text of a plurality of known business classification.
Second extraction subelement, for extracting the Second Problem feature in every training text.
Subelement is generated, for generating the M dimensional feature vectors of every training text, the M dimensional feature vectors respectively M attribute of the training text is described, each attribute represents a Second Problem feature.
First computation subunit, for for each attribute in the M dimensional feature vectors, obtaining it respectively every Probability of occurrence in one class of service.
Second computation subunit, for according to the probability of occurrence, calculating the M dimensional feature vectors respectively in each institute State the posterior probability in class of service.
Determination subelement, for according to the posterior probability, generating Naive Bayes Classification Model, and by the simple shellfish This disaggregated model of leaf is determined as the disaggregated model.
Optionally, the second computation subunit is specifically used for:
Obtain the prior probability of each class of service;
Processing is modified to the prior probability of class of service each described using probability smoothing algorithm, so that the M Each attribute in dimensional feature vector, the summation of probability occurred in each described class of service are less than one;
The prior probability of each class of service, calculates respectively according to being obtained after the probability of occurrence and processing Go out posterior probability of the M dimensional feature vectors in each described class of service.
The embodiment of the present invention realized based on the disaggregated model of traffic issues, by generate the feature of text to be sorted to It measures, and the first problem feature of classifying text is represented with each attribute in feature vector so that the text can be with readable The form of computer data is classified model analysis processing.As long as it is filled classifying as the text input to be sorted of customer problem It puts, it will be able to which directly output inputs the classification results of problem about user, the automatic classification of traffic issues is realized, without dependence Traditional manual sort's gimmick is realized, therefore improve classification effectiveness, meets the daily business classification demand of user, thus Further improve the efficiency of business consultation and the efficiency of business handling.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work( Can unit, module division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device are divided into different functional units or module, more than completion The all or part of function of description.Each functional unit, module in embodiment can be integrated in a processing unit, also may be used To be that each unit is individually physically present, can also two or more units integrate in a unit, it is above-mentioned integrated The form that hardware had both may be used in unit is realized, can also be realized in the form of SFU software functional unit.In addition, each function list Member, the specific name of module are not limited to the protection domain of the application also only to facilitate mutually distinguish.Above system The specific work process of middle unit, module can refer to the corresponding process in preceding method embodiment, and details are not described herein.
Those of ordinary skill in the art may realize that each exemplary lists described with reference to the embodiments described herein Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is performed with hardware or software mode, specific application and design constraint depending on technical solution.Professional technician Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed device and method can pass through others Mode is realized.For example, system embodiment described above is only schematical, for example, the division of the module or unit, Only a kind of division of logic function, can there is an other dividing mode in actual implementation, such as multiple units or component can be with With reference to or be desirably integrated into another system or some features can be ignored or does not perform.Another point, it is shown or discussed Mutual coupling or direct-coupling or communication connection can be by some interfaces, the INDIRECT COUPLING of device or unit or Communication connection can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical unit, you can be located at a place or can also be distributed to multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also That each unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is independent product sale or uses When, it can be stored in a computer read/write memory medium.Based on such understanding, the technical solution of the embodiment of the present invention The part substantially to contribute in other words to the prior art or all or part of the technical solution can be with software products Form embody, which is stored in a storage medium, including some instructions use so that one Computer equipment (can be personal computer, server or the network equipment etc.) or processor (processor) perform this hair The all or part of step of bright each embodiment the method for embodiment.And aforementioned storage medium includes:USB flash disk, mobile hard disk, Read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic The various media that can store program code such as dish or CD.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to aforementioned reality Example is applied the present invention is described in detail, it will be understood by those of ordinary skill in the art that:It still can be to aforementioned each Technical solution recorded in embodiment modifies or carries out equivalent replacement to which part technical characteristic;And these are changed Or replace, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims (10)

1. a kind of sorting technique of traffic issues, which is characterized in that including:
Obtain text to be sorted input by user;
Extract the first problem feature in the text to be sorted;
The N-dimensional feature vector of the text to be sorted is generated, the N-dimensional feature vector describes the N number of of the text to be sorted Attribute, each attribute represent a first problem feature;
The N-dimensional feature vector is inputted into disaggregated model;
The output parameter of the disaggregated model is obtained, the output parameter is the classification results of the text to be sorted;
Wherein, the N is the integer more than zero.
2. the method as described in claim 1, which is characterized in that the first problem feature in the extraction text to be sorted Including:
The text to be sorted is pre-processed, to obtain effective text in the text to be sorted;
Extract the first problem feature of effective text described in the text to be sorted.
3. method as claimed in claim 2, which is characterized in that effective text described in the extraction text to be sorted First problem feature includes:
Participle operation is performed to effective text, obtains multiple words;
Each word is handled by word frequency-reverse document-frequency TF-IDF algorithms, to obtain each word The metric of significance level in effective text;
Each described metric is exported respectively as the first problem feature in a text to be sorted.
4. the method as described in claim 1, which is characterized in that before the input disaggregated model by the N-dimensional feature vector, The method further includes:
Simultaneously train classification models are built based on training text, including:
Obtain the training text of a plurality of known business classification;
Extract the Second Problem feature in every training text;
The M dimensional feature vectors of every training text are generated respectively, and the M dimensional feature vectors describe the M of the training text A attribute, each attribute represent a Second Problem feature;
For each attribute in the M dimensional feature vectors, it is general that its appearance in each class of service is obtained respectively Rate;
According to the probability of occurrence, posterior probability of the M dimensional feature vectors in each described class of service is calculated respectively;
According to the posterior probability, Naive Bayes Classification Model is generated, and the Naive Bayes Classification Model is determined as The disaggregated model.
5. method as claimed in claim 4, which is characterized in that it is described according to the probability of occurrence, the M Wei Te are calculated respectively Posterior probability of the sign vector in each described class of service includes:
Obtain the prior probability of each class of service;
Processing is modified to the prior probability of class of service each described using probability smoothing algorithm, so that the M Wei Te Each attribute in sign vector, the summation of probability occurred in each described class of service are less than one;
The prior probability of each class of service, calculates institute respectively according to being obtained after the probability of occurrence and processing State posterior probability of the M dimensional feature vectors in each described class of service.
6. a kind of sorter of traffic issues, which is characterized in that including:
Acquiring unit, for obtaining text to be sorted input by user;
Extraction unit, for extracting the first problem feature in the text to be sorted;
Generation unit, for generating the N-dimensional feature vector of the text to be sorted, the N-dimensional feature vector describes described treat N number of attribute of classifying text, each attribute represent a first problem feature;
Input unit, for the N-dimensional feature vector to be inputted disaggregated model;
Taxon, for obtaining the output parameter of the disaggregated model, point of the output parameter for the text to be sorted Class result;
Wherein, the N is the integer more than zero.
7. device as claimed in claim 6, which is characterized in that the extraction unit includes:
Subelement is pre-processed, it is effective in the text to be sorted to obtain for being pre-processed to the text to be sorted Text;
First extraction subelement, for extracting the first problem feature of effective text described in the text to be sorted.
8. device as claimed in claim 7, which is characterized in that the first extraction subelement is specifically used for:
Participle operation is performed to effective text, obtains multiple words;
Each word is handled by word frequency-reverse document-frequency TF-IDF algorithms, to obtain each word The metric of significance level in effective text;
Each described metric is exported respectively as the first problem feature in a text to be sorted.
9. device as described in claim 1, which is characterized in that described device further includes:
Training unit, for being based on training text structure and train classification models;
Wherein, the training unit further includes:
Subelement is obtained, for obtaining the training text of a plurality of known business classification;
Second extraction subelement, for extracting the Second Problem feature in every training text;
Subelement is generated, for generating the M dimensional feature vectors of every training text, the M dimensional feature vectors description respectively M attribute of the training text, each attribute represent a Second Problem feature;
First computation subunit, for for each attribute in the M dimensional feature vectors, obtaining it respectively at each Probability of occurrence in class of service;
Second computation subunit, for according to the probability of occurrence, calculating the M dimensional feature vectors respectively in each industry Posterior probability in classification of being engaged in;
Determination subelement, for according to the posterior probability, generating Naive Bayes Classification Model, and by the naive Bayesian Disaggregated model is determined as the disaggregated model.
10. device as claimed in claim 9, which is characterized in that the second computation subunit is specifically used for:
Obtain the prior probability of each class of service;
Processing is modified to the prior probability of class of service each described using probability smoothing algorithm, so that the M Wei Te Each attribute in sign vector, the summation of probability occurred in each described class of service are less than one;
The prior probability of each class of service, calculates institute respectively according to being obtained after the probability of occurrence and processing State posterior probability of the M dimensional feature vectors in each described class of service.
CN201611159082.XA 2016-12-15 2016-12-15 The sorting technique and device of traffic issues Pending CN108228622A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611159082.XA CN108228622A (en) 2016-12-15 2016-12-15 The sorting technique and device of traffic issues

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611159082.XA CN108228622A (en) 2016-12-15 2016-12-15 The sorting technique and device of traffic issues

Publications (1)

Publication Number Publication Date
CN108228622A true CN108228622A (en) 2018-06-29

Family

ID=62650388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611159082.XA Pending CN108228622A (en) 2016-12-15 2016-12-15 The sorting technique and device of traffic issues

Country Status (1)

Country Link
CN (1) CN108228622A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959568A (en) * 2018-07-04 2018-12-07 重庆华龙网海数科技有限公司 Intelligent file dissemination system and distribution method
CN110704618A (en) * 2019-09-20 2020-01-17 阿里巴巴集团控股有限公司 Method and device for determining standard problem corresponding to dialogue data
WO2020034880A1 (en) * 2018-08-17 2020-02-20 菜鸟智能物流控股有限公司 Logistics object information processing method, device and computer system
WO2020073507A1 (en) * 2018-10-11 2020-04-16 平安科技(深圳)有限公司 Text classification method and terminal
CN111597326A (en) * 2019-02-21 2020-08-28 北京京东尚科信息技术有限公司 Method and device for generating commodity description text
WO2020244336A1 (en) * 2019-06-04 2020-12-10 深圳前海微众银行股份有限公司 Alarm classification method and device, electronic device, and storage medium
CN112131369A (en) * 2020-09-29 2020-12-25 中国银行股份有限公司 Service class determination method and device
CN113766069A (en) * 2021-09-06 2021-12-07 中国银行股份有限公司 Crank call interception method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996241A (en) * 2010-10-22 2011-03-30 东南大学 Bayesian algorithm-based content filtering method
CN104598579A (en) * 2015-01-14 2015-05-06 北京京东尚科信息技术有限公司 Automatic question and answer method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996241A (en) * 2010-10-22 2011-03-30 东南大学 Bayesian algorithm-based content filtering method
CN104598579A (en) * 2015-01-14 2015-05-06 北京京东尚科信息技术有限公司 Automatic question and answer method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
董守斌等: "《网络信息检索》", 30 April 2010, 西安电子科技大学出版社 *
蒋良孝等: "《贝叶斯网络分类器:算法与应用》", 31 December 2015, 中国地质大学出版社 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959568A (en) * 2018-07-04 2018-12-07 重庆华龙网海数科技有限公司 Intelligent file dissemination system and distribution method
WO2020034880A1 (en) * 2018-08-17 2020-02-20 菜鸟智能物流控股有限公司 Logistics object information processing method, device and computer system
WO2020073507A1 (en) * 2018-10-11 2020-04-16 平安科技(深圳)有限公司 Text classification method and terminal
CN111597326A (en) * 2019-02-21 2020-08-28 北京京东尚科信息技术有限公司 Method and device for generating commodity description text
CN111597326B (en) * 2019-02-21 2024-03-05 北京汇钧科技有限公司 Method and device for generating commodity description text
WO2020244336A1 (en) * 2019-06-04 2020-12-10 深圳前海微众银行股份有限公司 Alarm classification method and device, electronic device, and storage medium
CN110704618A (en) * 2019-09-20 2020-01-17 阿里巴巴集团控股有限公司 Method and device for determining standard problem corresponding to dialogue data
CN110704618B (en) * 2019-09-20 2023-06-27 创新先进技术有限公司 Method and device for determining standard problem corresponding to dialogue data
CN112131369A (en) * 2020-09-29 2020-12-25 中国银行股份有限公司 Service class determination method and device
CN112131369B (en) * 2020-09-29 2024-02-02 中国银行股份有限公司 Service class determining method and device
CN113766069A (en) * 2021-09-06 2021-12-07 中国银行股份有限公司 Crank call interception method and device
CN113766069B (en) * 2021-09-06 2023-08-18 中国银行股份有限公司 Harassment call interception method and device

Similar Documents

Publication Publication Date Title
CN108228622A (en) The sorting technique and device of traffic issues
CN107967575B (en) Artificial intelligence platform system for artificial intelligence insurance consultation service
CN107122346B (en) The error correction method and device of a kind of read statement
KR101999152B1 (en) English text formatting method based on convolution network
CN109872162B (en) Wind control classification and identification method and system for processing user complaint information
AlQahtani Product sentiment analysis for amazon reviews
CN109766437A (en) A kind of Text Clustering Method, text cluster device and terminal device
CN110675269B (en) Text auditing method and device
CN106528768A (en) Consultation hotspot analysis method and device
CN111639690A (en) Fraud analysis method, system, medium, and apparatus based on relational graph learning
CN109960719A (en) A kind of document handling method and relevant apparatus
Nguyen et al. An ensemble of shallow and deep learning algorithms for Vietnamese sentiment analysis
CN115935983A (en) Event extraction method and device, electronic equipment and storage medium
CN111179055A (en) Credit limit adjusting method and device and electronic equipment
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
CN113946657A (en) Knowledge reasoning-based automatic identification method for power service intention
CN112801784A (en) Bit currency address mining method and device for digital currency exchange
Hong et al. Effective topic modeling for email
CN112668334B (en) Entity identification method, electronic equipment and storage device
Sinpang et al. Detecting ambiguity in requirements analysis using Mamdani fuzzy inference
CN114492446A (en) Legal document processing method and device, electronic equipment and storage medium
CN115203382A (en) Service problem scene identification method and device, electronic equipment and storage medium
Lyon et al. Using single layer networks for discrete, sequential data: an example from natural language processing
CN112988963B (en) User intention prediction method, device, equipment and medium based on multi-flow nodes
US20220343151A1 (en) Classifying data from de-identified content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180629

RJ01 Rejection of invention patent application after publication