CN110019777A - A kind of method and apparatus of information classification - Google Patents

A kind of method and apparatus of information classification Download PDF

Info

Publication number
CN110019777A
CN110019777A CN201710794992.3A CN201710794992A CN110019777A CN 110019777 A CN110019777 A CN 110019777A CN 201710794992 A CN201710794992 A CN 201710794992A CN 110019777 A CN110019777 A CN 110019777A
Authority
CN
China
Prior art keywords
classification
weight
information
query information
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710794992.3A
Other languages
Chinese (zh)
Other versions
CN110019777B (en
Inventor
王兴光
林芬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710794992.3A priority Critical patent/CN110019777B/en
Publication of CN110019777A publication Critical patent/CN110019777A/en
Application granted granted Critical
Publication of CN110019777B publication Critical patent/CN110019777B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application discloses a kind of method and apparatus of information classification.The embodiment of the present application method includes: to receive query information to be sorted;Classification information is obtained, includes multiple classifications in classification information;Obtain query information the first weight corresponding under each classification in multiple classifications;It is trained according to classification information, the first weight and the first model trained, obtain object classifiers, the classification of object classifiers record includes target category and non-targeted classification, and the first model is the disaggregated model obtained after being trained with the relevant corpus of query information;Classified by object classifiers to query information, obtains query information in the weight of weight and query information under non-targeted classification under target category;If the corresponding weight of target category is greater than the corresponding weight of non-targeted classification, it is determined that query information belongs to target category.The embodiment of the present application also provides a kind of equipment of information classification, for improving the accuracy rate of information classification.

Description

A kind of method and apparatus of information classification
Technical field
The present invention relates to the method and apparatus that computer field more particularly to a kind of information are classified.
Background technique
In the semantic classes identification of artificial intelligence (Artificial Intelligence, abbreviation: AI), it is intended that analysis The understanding of the underlying semantics in stage beginning of conversation, it is intended that analysis generally can be used as dialogue control entrance, it is intended that result It will affect the Entity recognition in dialog procedure, context is intended to and the semantic relevance of dialogue.
Currently, in intent classifier, also include many methods such as AdaBoost, random forest, logistic regression etc., but It is that the recognition result of the classification obtained by these methods tends not to achieve the effect that satisfied, and accuracy rate is not high.For example, being permitted The Query of " program request " intention is expressed as under the premise of no enough interactive corpus, with classification " chat ", " data " etc. is handed over more Fork is serious, and there is many non-" program request " intention of " listening " demand can accidentally assign to " program request " class.Such as: " I wants that you is listened to have a talk about words ", " your sound is really pleasant ", through intention analysis after, it is intended that result can accidentally assign to " program request " class.For another example, if not wrapped in system Containing " program request " this classification, and include " chat " and " data " the two classifications, if query is " I wants to listen sincere words ", user Be actually intended to want to listen " this heart words " this song in conventional methods where can accidentally assign to the intention " chat ", accuracy rate compared with It is low.
Summary of the invention
The embodiment of the invention provides a kind of method and apparatus of information classification, for improving the accuracy rate of information classification.
In a first aspect, the embodiment of the present application provides a kind of method of information classification, comprising:
Receive query information to be sorted;
Classification information is obtained, includes multiple classifications in the classification information;
Obtain the query information the first weight corresponding under each classification in the multiple classification;
It is trained according to the classification information, first weight and the first model trained, obtains target classification Device, the classification of the object classifiers record include target category and non-targeted classification, and first model is with the inquiry The disaggregated model that the relevant corpus of information obtains after being trained;
Classified by the object classifiers to the query information, obtains the query information in the target class The weight of weight and the query information under the non-targeted classification under not;
If the corresponding weight of the target category is greater than the corresponding weight of the non-targeted classification, it is determined that the inquiry is believed Breath belongs to the target category.
Second aspect, the embodiment of the present application provide a kind of equipment of information classification, comprising:
Receiving module, for receiving query information to be sorted;
First obtains module, includes multiple classifications in the classification information for obtaining classification information;
Second obtains module, obtains module described first for obtaining the received query information of the receiving module First weight corresponding to each classification in the multiple classification obtained;
Training module obtains the classification information, the second acquisition module that module obtains according to described first and obtains First weight and the first model for having trained be trained, obtain object classifiers, the object classifiers record Classification includes target category and non-targeted classification, and first model is after being trained with the relevant corpus of the query information Obtained disaggregated model;
Categorization module classifies to the query information by the object classifiers of training module training, The query information is obtained in the weight of weight and the query information under the non-targeted classification under the target category;
First determining module, the corresponding weight of the target category for determining when the categorization module are greater than described non- When the corresponding weight of target category, determine that the query information belongs to the target category.
The third aspect provides a kind of equipment of information classification in the embodiment of the present application, comprising:
Memory, for storing computer executable program code;
Processor, with the memory and the transceiver couples;
Wherein said program code includes instruction, and when the processor executes described instruction, described instruction makes the letter The method that the equipment of breath classification executes above-mentioned aspect.
Fourth aspect provides a kind of computer readable storage medium, computer-readable storage medium in the embodiment of the present application Instruction is stored in matter, when run on a computer, so that the method that computer executes the information classification of above-mentioned aspect.
As can be seen from the above technical solutions, the embodiment of the present invention has the advantage that
In the embodiment of the present application, the equipment of information classification receives query information to be sorted;Obtain classification information, the class It include multiple classifications in other information;Obtain the first power corresponding to each classification of the query information in the multiple classification Weight, it is to be understood that when first layer classifies to query information, the available query information is in multiple classifications The first corresponding weight under each classification;Then, new knowledge (such as the first mould is added in next layer of training objective classifier Type), it is trained according to the classification information, first weight and the first model trained, obtains object classifiers, institute The classification for stating object classifiers record includes target category and non-targeted classification, and first model is with the query information phase The disaggregated model that the corpus of pass obtains after being trained;Classified by the object classifiers to the query information, is obtained To the query information in the weight of weight and the query information under the non-targeted classification under the target category, into And determine which classification is query information belong to, improve the accuracy of information identification.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those skilled in the art, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of schematic diagram of a scenario of the method for information classification in the embodiment of the present application;
Fig. 2 is stacking algorithm principle schematic diagram in the embodiment of the present application;
Fig. 3 is a kind of step flow diagram of one embodiment of the method for information classification in the embodiment of the present application;
Fig. 4 is a kind of principle configuration diagram of the method for information classification in the embodiment of the present application;
Fig. 5 is the structural schematic diagram of classification tree in the embodiment of the present application;
Fig. 6 is a kind of step schematic diagram of the method for information classification in the embodiment of the present application;
Fig. 7 is a kind of step schematic diagram of the method for information classification in the embodiment of the present application;
Fig. 8 is a kind of principle configuration diagram of the method for information classification in the embodiment of the present application;
Fig. 9 is a kind of structural schematic diagram of one embodiment of the equipment of information classification in the embodiment of the present application;
Figure 10 is a kind of structural schematic diagram of another embodiment of the equipment of information classification in the embodiment of the present application;
Figure 11 is a kind of structural schematic diagram of another embodiment of the equipment of information classification in the embodiment of the present application;
Figure 12 is a kind of structural schematic diagram of another embodiment of the equipment of information classification in the embodiment of the present application;
Figure 13 is a kind of structural schematic diagram of one embodiment of the equipment of information classification in the embodiment of the present application.
Specific embodiment
The embodiment of the invention provides a kind of method and apparatus of information classification, for improving the accuracy rate of classification.
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people Member's every other embodiment obtained, should fall within the scope of the present invention.
Description and claims of this specification and term " first ", " second ", " third " " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiments described herein can be in addition to illustrating herein Or the sequence other than the content of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that Cover it is non-exclusive include, for example, containing the process, method, system, product or equipment of a series of steps or units need not limit In step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, produce The other step or units of product or equipment inherently.
In semantics recognition in AI, the natural language inputted according to user is needed to understand the intention of user, by In the complexity of spoken and written languages, polysemy, a word multitone, a sentence or a word institute's table under different language environments The meaning (intention) reached may not also be identical, currently, in intent classifier, inputted by traditional method according to user one Word in short determines that the semantics recognition of user is inaccurate, for example, the query information (query) of user's input are as follows: I Want to listen sincere words.It is semantic are as follows: I wants to listen " sincere words " this song, it is to be understood that the query is that song is listened to be intended to, and still, is had " chat " class may be accidentally assigned to, according to the corpus in " chat " class to user feedback as a result, this result tends not to enable use Family is satisfied.
To solve the above-mentioned problems, in the embodiment of the present application, a kind of method that semantics recognition is more accurate, the party are provided The equipment of the equipment that method is applied to information classification, the classification of the equipment information is used to receive the query information (query) of user's input, The query information can be text information, or voice messaging, pictorial information, audio/video information etc., the embodiment of the present application In the query information can be illustrated by taking voice messaging as an example.
The concrete form of the query information is not limited in the embodiment of the present application.The equipment of information classification includes but unlimited Due to mobile phone, intelligent sound equipment, palm PC, PDA etc., in the embodiment of the present application, the equipment of information classification can be with It is illustrated for intelligent sound box, intelligent sound box can be understood as the entrance of AI human-computer interaction, be understood incorporated by reference to Fig. 1, Fig. 1 For the schematic diagram of a scenario of the method for information a kind of in the embodiment of the present application classification, intelligent sound box 101 is that user is carried out with voice One equipment of net, such as chat, requesting songs, online shopping, or understand weather forecast etc., smart home can also be set It is standby to be controlled, for example open curtain, setting refrigerator temperature, allow water heater heating etc. in advance.
Traditional Integrated Algorithm (such as stacking), referring to one model of training, other are each for combining (combine) Model.Multiple and different models is trained first, then trains one again with the output of each model of training before is input A model, to obtain a final output.Specific step is permissible are as follows: 1. close the multiple learners of training in first collection. 2. being closed in second collection and testing these learners;3. the prediction result that third step obtains is returned as input correct It should be used as exporting, training one high-rise learner.Understood incorporated by reference to Fig. 2, Fig. 2 is stacking algorithm principle schematic diagram. Each training set is first obtained by sampling on entire training dataset, obtains a series of disaggregated models, referred to as Tier 1 Then output is used to train 2 classifier of Tier by classifier.Traditional Stacking algorithm using the output of 105 classifications as The input of higher level classifier, and export final probability distribution of the Query in this 105 classifications.It is this end to end in Though learning process complexity is lower between, and result is inaccurate, and is often difficult satisfactory, and there is also an issue " program request " is intended to that degree of aliasing is higher often there was only " chat ", the higher classification of the colloquial styles such as " data ", can't be easy and its Mutually obscuring for his classification, the relationship for advanced optimizing " program request " and other classifications can aggravate the burden of model.
In the present application embodiment, the method based on Stacking, and on this basis according to different business demands, Based on whole prior informations, the new classifier of training is continuously added new knowledge training in each layer of new classifier of training Classifier, by way of transmitting layer by layer, repetitive exercise object classifiers are determined belonging to query by the object classifiers Classification.The classification results that finally trained classifier determines have great raising compared with the accuracy rate of conventional method.It needs to illustrate It is that the prior information in the present embodiment refers to: corresponding probability under each classification of the query in classification information.
Information classification refers under specified subject categories, by voice according to division of teaching contents to the theme class preassigned In not, classification is generally manually set as needed, such as " program request ", " finance ", " data ", " environment ".For example, clear in people It has look at after an article, the understanding of a center general idea will be generated to text according to oneself previous knowledge or experience.But Computer can only recognize 0 or 1, so needing some way that the information being made of numerous characters is converted into computer capacity processing Form, this process are exactly information classification.For example, each word in query information is to classification contribution according to Bayesian assumption Effect is independent from each other, and query information can thus be regarded as the set of word, using each word as the feature of text to Amount is to indicate text.Although this hypothesis does not account for semantic effect, more stable classifying quality can be obtained.Example Such as, Boolean Model is simplest text representation model and the basis of other models.In the matrix being made of document-vocabulary In form, indicate that the word occurs in the text with " 1 ", conversely, " 0 " then indicates that the word does not occur in the text, so in matrix only There are two kinds of numerical value " 0 " or " 1 ", to calculate query corresponding probability under each classification.
In an application scenarios, the equipment of information classification receives the query information of user's input, which can To be interpreted as a word of user speech input, the equipment of information classification needs to identify the correct semanteme of the query, could be correct The demand of user is responded, for example, user inputs query are as follows: I want to listen sincere words, which is categorized into the classification of " program request " In, the equipment of information classification determines that the query's of user is intended to listen song, further, sends program request for " sincere words " Server, then, the semantics recognition of the equipment that " sincere words " this song is fed back to information classification by server, user are clear, quasi- True rate improves.
A kind of embodiment of the method for information classification provided in the embodiment of the present application is described in detail below.
Specifically, one embodiment that the embodiment of the present application provides a kind of method of information classification is specifically described, Understood incorporated by reference to Fig. 3 and Fig. 4, the step of Fig. 3 is one embodiment of the method for a kind of information classification in the present embodiment flows Journey schematic diagram;Fig. 4 is a kind of principle configuration diagram of the method for information classification in the present embodiment.
Step 301 receives query information to be sorted.
The query information (query) to be sorted of user's input is received, such as the query information can be voice messaging, it should Query information can be a word, e.g., the query information are as follows: I wants to listen sincere words.
Step 302 obtains classification information, includes multiple classifications in classification information.
In one implementation, understood incorporated by reference to Fig. 5, Fig. 5 is that the structure of classification tree in the embodiment of the present application is shown It is intended to.The device memory of information classification contains classification information, includes multiple classifications in category information, and category information is each There are the sort tree structures of weak mutex relation between classification, for example, amount to 105 classifications in the sort tree structure, totally 11 it is vertical Classification system.For example, the category can be with are as follows: weather, TV play, film, finance and economics, shopping, chat, data, program request etc..It needs Illustrate, in the embodiment of the present application the type and quantity of classification be for convenience of explanation and lift example, do not cause to The limited explanation of application.
In alternatively possible implementation, the equipment of information classification can also obtain classification letter from storage server Breath, the equipment of information classification is sent to server requests, and classification information is fed back to information classification according to the request by server Equipment.
Step 303 obtains query information the first weight corresponding under each classification in multiple classifications.
By the meta classifier in system, classified by the meta classifier to query information, which can be with It is interpreted as the classifier of the first layer in stacking, which can be trained to obtain according to the corpus of million ranks , the first weight corresponding to each classification of the query information in multiple classifications is determined by the meta classifier, this first Weight can be understood as the feature in some feature space query, which can be managed in conjunction with the example of the following table 1 Solution:
Table 1
Classification First weight
Weather 0
TV play 0
Finance and economics 0
Shopping 0
It chats 0.051
Data 0.032
It should be noted that it should be noted that classification and the corresponding weight of each classification in upper table 1 are exemplary Illustrate, does not cause the limited explanation to the application.
Step 304 is trained according to classification information, the first weight and the first model, obtains object classifiers, target point The classification of class device record includes target category and non-targeted classification, and the first model is to be trained with the relevant corpus of query information The disaggregated model obtained afterwards.
Understood incorporated by reference to Fig. 4, which can be point of supervised learning (supervised learning) Class model, a large amount of corpus relevant to query are trained according to first model, and supervised learning, which refers to, to be passed through Existing training sample (i.e. given data and its corresponding output) goes training to obtain a model, and (this model belongs to some The set of function), recycle this model to be mapped as exporting accordingly by all inputs, to output carry out simply judge from And realize the purpose of classification, that is to say, that there is the first model of supervision to can have the ability classified to unknown data.This Apply in embodiment, which is the disaggregated model of supervised learning, and the addition of the first model of supervised learning can be to elder generation The amendment of information is tested, that is, weight corresponding to each classification is modified.
Specifically, query information can be indicated in vector form, and the first weight of each classification is expressed as generally Rate vector;The query information of vector form, probability vector be trained after splicing synthesizes with the first model, obtain target Classifier.For example, first model is Fast Text Classification model (FastText model), which can be support Vector machine classifier (such as LibSVM), it is to be understood that on the basis of classification information and the first weight, be added and new know Know, which is the first model (such as FastText model), which instructed according to different business demands Experienced, if business is " program request " class, which is to be trained to obtain according to largely corpus relevant to " program request " class 's.It should be noted that the first model of this in the present embodiment is FastText model, which is LibSVM classifier Merely for convenience of description and for example, do not cause the limited explanation to the application.
Step 305 classifies to query information by object classifiers, obtains power of the query information under target category Weight and weight of the query information under non-targeted classification.
In a kind of scene, if query are as follows: I wants to listen sincere words.The query should belong to " program request " class, if being Not including in classification information in system should " program request " class, it is understood that be, not comprising " program request " class in 105 classifications, The object classifiers have recorded target category and non-targeted classification, for example, target category is indicated with " 1 ", non-targeted classification with "- 1 " indicates, it is to be understood that the target category is " program request ", which is " non-on-demand " class, by " chat ", " data " etc. others classifications weight by object classifiers (such as LibSVM) study be mapped to " program request " and " non-on-demand " this two A classification.
The corresponding weight of target category is greater than the corresponding weight of non-targeted classification, thens follow the steps 306;If target category pair The weight answered is less than the corresponding weight of non-targeted classification, thens follow the steps 307.
Step 306 determines that query information belongs to target category.
For example, the query is 0.05 in the weight of " program request " class, it is 0.03 in the weight of non-" program request " class, determining should Query belongs to " program request " class.
In the embodiment of the present application, in the case where target category is not included in the classification information of original system, pass through addition New knowledge (new disaggregated model), increases new classification, query is categorized into the target category of its ownership, so as to mention The accuracy of high semantic analysis, and the classification of object classifiers record is " target category " and " non-targeted classification ", when When query belongs to " target category ", because object classifiers only determine the query from " target category " and " non-targeted classification " Belonging kinds, the very big burden for alleviating object classifiers.
If the corresponding weight of step 307, target category is less than the corresponding weight of non-targeted classification, query information is belonged to Non-targeted classification, non-targeted classification include multiple subclass.Weight corresponding to each subclass in more multiple subclass. Then, subclass corresponding to weight maximum value that query information belongs in weight corresponding to each subclass is further determined that Not.
For example, if the corresponding weight of " program request " class is less than " non-on-demand " class, for example, the query wants that you is listened to have a talk about words for me, Then the query belongs to non-on-demand class, and still, non-on-demand class further includes multiple subclass, for example, multiple subclass are respectively " not busy Merely ", " data ", " weather " etc., and the corresponding weighted value of each subclass, the corresponding weighted value of more each subclass Size, if " chat " corresponding weighted value is maximum, it is determined that the classification of the query belongs to maximum " chat " class of weighted value.
In the embodiment of the present application, if having contained target category in the original classification information of system, that is to say, that In such cases, the other accuracy rate of non-target class can be promoted, for example, target category is still by taking " program request " as an example, in systems Classification information in contain " program request " class, that is to say, that contain " program request " class in 105 classifications, pass through object classifiers pair Query classifies, and the weight of non-on-demand class is greater than the weight of program request class, then query belongs to non-on-demand class, it is then possible to root Further determine that the query belongs to the specific classification of which of non-on-demand class according to weighted value.
In the embodiment of the present application, the equipment of information classification receives query information to be sorted;Obtain classification information, the class It include multiple classifications in other information;Obtain the first power corresponding to each classification of the query information in the multiple classification Weight, it is to be understood that when first layer classifies to query information, the available query information is in multiple classifications The first corresponding weight under each classification;Then, new knowledge (such as the first mould is added in next layer of training objective classifier Type), it is trained according to the classification information, first weight and the first model trained, obtains object classifiers, institute The classification for stating object classifiers record includes target category and non-targeted classification, and first model is with the query information phase The disaggregated model that the corpus of pass obtains after being trained;Classified by the object classifiers to the query information, is obtained To the query information in the weight of weight and the query information under the non-targeted classification under the target category, into And determine which classification query belongs to, improve the accuracy of information identification.
In order to further increase the accuracy of semantics recognition, understood incorporated by reference to Fig. 6 to 8, Fig. 6 and Fig. 7 are the application A kind of step schematic diagram of the method for information classification in embodiment.Fig. 8 is a kind of original of the method for information classification in the present embodiment Manage configuration diagram.Another embodiment that a kind of method of information classification is provided in the embodiment of the present application includes:
Step 601 receives query information to be sorted.
The query information (query) to be sorted of user's input is received, such as the query information can be voice messaging, it should Query information can be a word, e.g., the query information are as follows: I wants to listen sincere words.
Step 602 obtains classification information, includes multiple classifications in classification information.
In one implementation, understood incorporated by reference to Fig. 5, Fig. 5 is that the structure of classification tree in the embodiment of the present application is shown It is intended to.The device memory of information classification contains classification information, includes multiple classifications in category information, and category information is each There are the sort tree structures of weak mutex relation between classification, for example, amount to 105 classifications in the sort tree structure, totally 11 it is vertical Classification system.For example, the category can be with are as follows: weather, TV play, film, finance and economics, shopping, chat, data, program request etc..It needs Illustrate, in the embodiment of the present application the type and quantity of classification be for convenience of explanation and lift example, do not cause to The limited explanation of application.
In alternatively possible implementation, the equipment of information classification can also obtain classification letter from storage server Breath, the equipment of information classification is sent to server requests, and classification information is fed back to information classification according to the request by server Equipment.
Step 603 obtains the first weight corresponding under each classification of the query information in multiple classifications.
By the meta classifier in system, classified by the meta classifier to query information, which can be with It is interpreted as the classifier of the first layer in stacking, which can be trained to obtain according to the corpus of million ranks , the first weight corresponding to each classification of the query information in multiple classifications is determined by the meta classifier, this first Weight can be understood as the feature in some feature space query, which can use W1It indicates.
Step 604 is trained according to classification information, the first weight and the first model, obtains the first classifier, the first mould Type is supervised classification model.
Understood incorporated by reference to Fig. 7, which can be point of supervised learning (supervised learning) Class model, a large amount of corpus relevant to query are trained according to first model, that is to say, that have the of supervision One model can have the ability classified to unknown data.In the embodiment of the present application, which is supervised learning Disaggregated model, being added for the first model of supervised learning can be to the amendment of prior information, that is, to corresponding to each classification Weight be modified.Specifically, query information can be indicated in vector form, and by the first weight table of each classification It is shown as probability vector;The query information of vector form, probability vector be trained after splicing synthesizes with the first model, obtained To the first classifier.First classifier can be LibSVM classifier.It is understood that in classification information and the first weight On the basis of, new knowledge is added, which is the first model (such as FastText model), which is according to not What same business demand was trained, if business is " program request " class, which is according to largely related to " program request " class Corpus be trained.It should be noted that the first model of this in the present embodiment is FastText model, the target point Class device be LibSVM classifier merely for convenience of description and for example, do not cause the limited explanation to the application.
Step 605 classifies to query information by the first classifier, obtains of query information under target category The third weight of two weights and query information under non-targeted classification.
Classified by first classifier (such as LibSVM classifier) to the query, obtains query information in target The weight of weight and query information under non-targeted classification under classification, the weight W obtained after first classifier classification2Table Show (i.e. the second weight and third weight).
Step 606 is trained according to classification information, the first weight, the second weight, third weight and the second model, is obtained To the second classifier, the second model is unsupervised classification model, and the second classifier is to be instructed with the relevant corpus of query information The disaggregated model obtained after white silk, the second classifier are object classifiers.
It should be noted that after being classified by the first classifier to the query, in order to improve the standard of semantics recognition True property, together by the weight (the second weight and third weight) of acquisition and initial all classification weight (the first weight), iteration instruction Practice classifier, obtain the second classifier, model (the second mould of new knowledge training is added during the second classifier of training Type).
Second model can be non-supervisory disaggregated model, alternatively, second model is non-supervisory term clustering or text This Clustering Model can make up the careless omission of the first model of supervision.For example, second model is that document subject matter generates model (latent directlet allocation, abbreviation: LDA), includes word, theme and document three-decker.So-called generation mould Type, it is believed that each word of a text information be by " with some theme of certain probability selection, and from this theme with Such a process of centainly some word of probability selection " obtains.Document obeys multinomial distribution to theme, and theme to word is obeyed more Item formula distribution.LDA is a kind of non-supervisory machine learning techniques, can be used to identify extensive document sets (document Collection the subject information) or in corpus (corpus) hidden.The method that it uses bag of words (bag of words), Each document is considered as a word frequency vector by this method, is believed to converting text information for ease of the number of modeling Breath.But bag of words method does not account for the sequence between word and word, this simplifies the complex natures of the problem, while also changing for model Into providing opportunity.The probability distribution that some themes of each documents representative are constituted, and each theme represents Probability distribution that many words are constituted.
For example, the query that the equipment of information classification has received user's input wants to listen lustily water for me, but in corpus The corpus of " lustily water " is not found, still " lustily water " and " providence " belongs to the same theme, and " providence " belongs to " program request " Class can then determine that " lustily water " belongs to the weight of " program request " class.
Second classifier can be regression analysis model (Logistic regression).It should be noted that this Two models are LDA, which is merely illustrative for regression analysis model, do not cause the limited theory to the application It is bright.
Step 607 classifies to query information by the second classifier, obtains of query information under target category The 5th weight of four weights and query information under non-targeted classification.
The weight obtained by the second classifier, i.e. the 4th weight and the 5th weight can use W3To indicate.
Step 608, according to classification information, the first weight, the second weight, third weight, the 4th weight, the 5th weight and Three models are trained, and obtain third classifier, and third model is supervised classification model, and the complexity of third model is greater than second Classifier.
After being classified by classifier to the query, in order to improve the accuracy of semantics recognition, by acquisition Weight (the second weight, third weight, the 4th weight, the 5th weight) and initial all classification weight (the first weight) together, repeatedly Generation training classifier, obtains third classifier, and the model (the of new knowledge training is added during training third classifier Three models).
The third model can be convolutional neural networks (convolution neural network, abbreviation: CNN) classification Model, in a kind of preferred mode, in order to improve the accuracy rate of classification, each layer is gradually trained classifier, every During the new classifier of one layer of training, the complexity of disaggregated model is gradually increased, such as the complexity of the first model is less than The complexity of second model, the complexity of the second model are less than the complexity of third model.
Step 609 classifies to query information by third classifier, obtains of query information under target category The 7th weight of six weights and query information under non-targeted classification.
The weight obtained by third classifier, i.e. the 6th weight and the 7th weight can use W4To indicate.
If the corresponding weight of target category is greater than the corresponding weight of non-targeted classification, 610 are thened follow the steps;If target category Corresponding weight is less than the corresponding weight of non-targeted classification, thens follow the steps 611.
For example, the third classifier can be CNN softtmax classifier, it should be noted that the third model is CNN disaggregated model.Third classifier is that CNN softtmax classifier is merely illustrative, and does not cause the restriction to the application Property explanation.
Step 610 determines that query information belongs to target category.
For example, the query is 0.05 in the weight of " program request " class, it is 0.03 in the weight of non-" program request " class, determining should Query belongs to " program request " class.
If the corresponding weight of step 611, target category is less than the corresponding weight of non-targeted classification, query information is belonged to Non-targeted classification, non-targeted classification include multiple subclass.Weight corresponding to each subclass in more multiple subclass. Then, subclass corresponding to weight maximum value that query information belongs in weight corresponding to each subclass is further determined that Not.
It should be noted that repetitive exercise classifier does not limit three layers in the embodiment of the present application, it is accurate in order to improve Rate can be continuously added the N disaggregated model of new knowledge training, training during each layer of training new classifier N classifier further increases accuracy rate.
In the embodiment of the present application, in the case where target category is not included in the classification information of original system, pass through addition New knowledge (new disaggregated model), increases new classification, query is categorized into the target category of its ownership, so as to mention The accuracy of high semantic analysis, and the classification of object classifiers record is " target category " and " non-targeted classification ", when When query belongs to " target category ", because object classifiers only determine the query from " target category " and " non-targeted classification " Belonging kinds, the very big burden for alleviating object classifiers.
Information classification approach in the application based on classifier is not limited only to text classification, and Classification of Speech can be adapted for Text information emotional semantic classification, the classification method of the other informations such as picture classification, audio-video classification.
Above to a kind of information classification method be described, below to this method application information classification equipment into Row specifically describes, and please refers to shown in Fig. 9, and a kind of equipment of the information classification of this method application is provided in the embodiment of the present application 900 one embodiment, comprising:
Receiving module 901, for receiving query information to be sorted;
First obtains module 902, includes multiple classifications in classification information for obtaining classification information;
Second obtains module 903, obtains for obtaining the received query information of receiving module 901 in the first acquisition module 902 First weight corresponding under each classification in the multiple classifications taken;
Training module 904, obtain that the classification information, second that module 902 obtains obtain that module 903 obtains according to first the One weight and the first model trained are trained, and obtain object classifiers, the classification of object classifiers record includes target Classification and non-targeted classification, the first model are the disaggregated model obtained after being trained with the relevant corpus of query information;
Categorization module 905 is classified to query information by the object classifiers of the training of training module 904, is looked into Information is ask in the weight of weight and query information under non-targeted classification under target category;
First determining module 906, the corresponding weight of target category for determining when categorization module 905 are greater than non-target class Not corresponding weight when, determine that query information belongs to target category.
It please refers to shown in Figure 10, on the basis of Fig. 9 corresponding embodiment, a kind of information for being provided in the embodiment of the present application Another embodiment of the equipment 1000 of classification includes:
The equipment of information classification further includes the second determining module 907, comparison module 908 and third determining module 909;
Second determining module 907, the corresponding weight of target category for determining when categorization module 905 are less than non-target class Not corresponding weight when, determine that query information belongs to non-targeted classification, non-targeted classification includes multiple subclass;
Comparison module 908, it is right for comparing the institute of each subclass in multiple subclass that the second determining module 907 determines The weight answered;
Third determining module 909, for determining that it is right that query information belongs to each subclass institute that comparison module 908 determines Subclass corresponding to weight maximum value in the weight answered.
It please refers to shown in Figure 11, on the basis of Fig. 9 corresponding embodiment, a kind of information for being provided in the embodiment of the present application Another embodiment of the equipment 1100 of classification includes:
The equipment training module 904 of information classification further includes the first training unit 9041 and the second training unit 9042;
First training unit 9041, for obtaining the classification information that module 902 obtains according to first, second obtains module 903 the first weights obtained and the first model trained are trained, and obtain the first classifier, and the first model is supervised classification Model;
Categorization module 905 is also used to carry out query information by the first classifier of the first training unit 9041 training Classification, obtains third weight of second weight and query information of the query information under target category under non-targeted classification;
Second training unit 9042, for obtaining classification information, the second acquisition module that module 902 obtains according to first 903 the first weights obtained, the second weight and third weight that the first taxon 9042 obtains, and the second model trained It is trained, obtains the second classifier, the second model is unsupervised classification model, and the second classifier is relevant with query information The disaggregated model that corpus obtains after being trained, the second classifier are object classifiers;
Categorization module 905, the second classifier for being also used to obtain by the training of the second training unit 9042 is to query information Classify, obtains fiveth power of fourth weight and query information of the query information under target category under non-targeted classification Weight.
It please refers to shown in Figure 12, on the basis of Figure 11 corresponding embodiment, a kind of letter for being provided in the embodiment of the present application Another embodiment of equipment 1200 for ceasing classification includes:
The equipment training module 904 of information classification further includes third training unit 9043;Third training unit 9043, is used for The first weight and the first grouping sheet that classification information, the second acquisition module 903 that module 902 obtains obtain are obtained according to first The second weight and third weight that member 9042 obtains, the 4th weight and the 5th weight that the second taxon obtains, and trained Third model be trained, obtain third classifier, third model is supervised classification model, and the complexity of third model is greater than Second classifier;
Categorization module 905, the third classifier for being also used to obtain by the training of third training unit 9043 is to query information Classify, obtains seventh power of sixth weight and query information of the query information under target category under non-targeted classification Weight.
Optionally, categorization module 905 are also used in vector form indicate query information, and by the of each classification One weight is expressed as probability vector;It is laggard that the query information of vector form, probability vector and the first model are subjected to splicing synthesis Row training, obtains object classifiers.
Further, the equipment of information classification of the Fig. 9 into Figure 12 is presented in the form of functional module.Here " module " can refer to application-specific integrated circuit (application-specific integrated circuit, ASIC), electricity Road, executes the processor and memory of one or more softwares or firmware program, integrated logic circuit and/or other can mention For the device of above-mentioned function.The equipment of information classification of the Fig. 9 into Figure 12 can be using form shown in Figure 13.
The embodiment of the invention also provides the equipment of another information classification for ease of description, only to show as shown in figure 13 Go out part related to the embodiment of the present invention, it is disclosed by specific technical details, please refer to present invention method part. The equipment of information classification can be include intelligent sound box, tablet computer, PDA (Personal Digital Assistant, it is a Personal digital assistant), the equipment of any information classification such as vehicle-mounted computer:
Figure 13 shows that the part of intelligent sound box relevant to the equipment of information provided in an embodiment of the present invention classification is tied The block diagram of structure.With reference to Figure 13, intelligent sound box includes: transceiver 1310, memory 1320, input unit 1330, display unit 1340, sensor 1350, voicefrequency circuit 1360, Wireless Fidelity (wireless fidelity, WiFi) module 1370, processor The components such as 1380 and power supply 1390.It will be understood by those skilled in the art that the not structure of intelligent sound box structure shown in Figure 13 The restriction of pairs of intelligent sound box may include perhaps combining certain components or different than illustrating more or fewer components Component layout.
It is specifically introduced below with reference to each component parts of the Figure 13 to the equipment:
Transceiver 1310 can be used for receiving and sending messages, and signal sends and receivees.
Memory 1320 can be used for storing software program and module, and processor 1380 is stored in memory by operation 1320 software program and module, thereby executing the various function application and data processing of the equipment.Memory 1320 can It mainly include storing program area and storage data area, wherein storing program area can be needed for storage program area, at least one function Application program (such as sound-playing function, image player function etc.) etc.;Storage data area can be stored to be made according to the equipment With the data (such as audio data, phone directory etc.) etc. created.In addition, memory 1320 may include that high random access is deposited Reservoir can also include nonvolatile memory, for example, at least a disk memory, flush memory device or other volatibility Solid-state memory.
Input unit 1330 can be used for receiving the query information of user's input, and generate the user setting with the equipment And the related key signals input of function control.Specifically, input unit 1330 may include touch panel 1331 and other are defeated Enter equipment 1332.Touch panel 1331, also referred to as touch screen collect the touch operation of user on it or nearby and (for example use Family uses the behaviour of any suitable object or attachment on touch panel 1331 or near touch panel 1331 such as finger, stylus Make), and corresponding attachment device is driven according to preset formula.Optionally, touch panel 1331 may include touch detection Two parts of device and touch controller.Wherein, the touch orientation of touch detecting apparatus detection user, and detect touch operation band The signal come, transmits a signal to touch controller;Touch controller receives touch information from touch detecting apparatus, and by it It is converted into contact coordinate, then gives processor 1380, and order that processor 1380 is sent can be received and executed.In addition, Touch panel 1331 can be realized using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves.In addition to touch surface Plate 1331, input unit 1330 can also include other input equipments 1332.Specifically, other input equipments 1332 may include But in being not limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, operating stick etc. It is one or more.
Voicefrequency circuit 1360, loudspeaker 1361, microphone 1362 can provide the audio interface between user and the equipment.It should Microphone 1362 can be used for receiving the query information of user's input;Voicefrequency circuit 1360 can convert the audio data received Electric signal afterwards is transferred to loudspeaker 1361, is converted to voice signal output by loudspeaker 1361;On the other hand, microphone The voice signal of collection is converted to electric signal by 1362, is converted to audio data after being received by voicefrequency circuit 1360, then by audio After the processing of data output processing device 1380, transceiver 1310 is exported to be sent to such as another equipment, or by audio data To memory 1320 to be further processed.
Display unit 1340 can be used for showing information input by user or be supplied to user information and mobile phone it is each Kind menu.Display unit 1340 may include display panel 1341, optionally, can use liquid crystal display (Liquid Crystal Display, LCD), the forms such as Organic Light Emitting Diode (Organic Light-Emitting Diode, OLED) To configure display panel 1341.Further, touch panel 1331 can cover display panel 1341, when touch panel 1331 detects After arriving touch operation on it or nearby, processor 1380 is sent to determine the type of touch event, is followed by subsequent processing device 1380 provide corresponding visual output according to the type of touch event on display panel 1341.Although in Figure 13, touch surface Plate 1331 and display panel 1341 are the input and input function for realizing mobile phone as two independent components, but certain In embodiment, can be integrated by touch panel 1331 and display panel 1341 and that realizes mobile phone output and input function.
WiFi belongs to short range wireless transmission technology, which can help user to access service by WiFi module 1370 Device etc., it provides wireless broadband internet access for user.Although Figure 13 shows WiFi module 1370, can manage Solution, and it is not belonging to must be configured into for the equipment, it can according to need completely within the scope of not changing the essence of the invention And it omits.
Processor 1380 is the control centre of the equipment, utilizes each portion of various interfaces and the entire equipment of connection Point, by running or execute the software program and/or module that are stored in memory 1320, and calls and be stored in memory Data in 1320 execute the various functions and processing data of the equipment, to carry out integral monitoring to the equipment.Optionally, Processor 1380 may include one or more processing units;Preferably, processor 1380 can integrate application processor and modulatedemodulate Adjust processor, wherein the main processing operation system of application processor, user interface and application program etc., modem processor Main processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 1380.
The equipment further includes the power supply 1390 (such as battery) powered to all parts, it is preferred that power supply can pass through electricity Management system and processor 1380 are logically contiguous, to realize management charging, electric discharge and power consumption by power-supply management system The functions such as management.
Although being not shown, which can also include camera, bluetooth module etc., and details are not described herein.
In embodiments of the present invention, processor 1380 included by the equipment of information classification is also used to execute above-mentioned Fig. 3 With method performed by the equipment in Fig. 6.
Specifically, microphone 1362 or input unit 1330, for receiving query information to be sorted.
Processor 1380 is also used to obtain classification information, includes multiple classifications in the classification information;Obtain the inquiry First weight corresponding to each classification of the information in the multiple classification;According to the classification information, first weight And the first model trained is trained, and obtains object classifiers, the classification of the object classifiers record includes target class Not with non-targeted classification, first model is the classification mould obtained after being trained with the relevant corpus of the query information Type;Classified by the object classifiers to the query information, obtains the query information under the target category Weight under the non-targeted classification of weight and the query information;Described in being greater than when the corresponding weight of the target category When the corresponding weight of non-targeted classification, determine that the query information belongs to the target category.
Optionally, processor 1380 are also used to be less than the non-targeted classification pair when the corresponding weight of the target category When the weight answered, determine that the query information belongs to the non-targeted classification, the non-targeted classification includes multiple subclass; Weight corresponding to each subclass in more the multiple subclass;Determine that the query information belongs to every height Subclass corresponding to weight maximum value in weight corresponding to classification.
Optionally, processor 1380, be also used to according to the classification information, first weight and first model into Row training, obtains the first classifier, and first model is supervised classification model;By first classifier to the inquiry Information is classified, and obtains second weight and the query information of the query information under the target category described non- Third weight under target category;According to the classification information, first weight, second weight, the third weight The second model trained is trained, and obtains the second classifier, and second model is unsupervised classification model, and described the Two classifiers are the disaggregated model obtained after being trained with the relevant corpus of the query information, and second classifier is institute State object classifiers;Classified by second classifier to the query information, obtains the query information described The 5th weight of the 4th weight and the query information under the non-targeted classification under target category.
Optionally, processor 1380 are also used to according to the classification information, first weight, second weight, institute It states third weight, the 4th weight, the 5th weight and the third model trained to be trained, obtains third classification Device, the third model are supervised classification model, and the complexity of the third model is greater than second classifier;By described Third classifier classifies to the query information, obtain sixth weight of the query information under the target category and Seventh weight of the query information under the non-targeted classification.
Optionally, processor 1380 are also used in vector form indicate the query information, and by each class Other first weight is expressed as probability vector;By the query information of the vector form, the probability vector and the first model into It is trained after row splicing synthesis, obtains the object classifiers.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention Portion or part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store journey The medium of sequence code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although referring to before Stating embodiment, invention is explained in detail, those skilled in the art should understand that: it still can be to preceding Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features;And these It modifies or replaces, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims (12)

1. a kind of method of information classification characterized by comprising
Receive query information to be sorted;
Classification information is obtained, includes multiple classifications in the classification information;
Obtain the query information the first weight corresponding under each classification in the multiple classification;
It is trained according to the classification information, first weight and the first model trained, obtains object classifiers, institute The classification for stating object classifiers record includes target category and non-targeted classification, and first model is with the query information phase The disaggregated model that the corpus of pass obtains after being trained;
Classified by the object classifiers to the query information, obtains the query information under the target category Weight under the non-targeted classification of weight and the query information;
If the corresponding weight of the target category is greater than the corresponding weight of the non-targeted classification, it is determined that the query information is returned Belong to the target category.
2. the method according to claim 1, wherein the method also includes:
If the corresponding weight of the target category is less than the corresponding weight of the non-targeted classification, it is determined that the query information is returned Belong to the non-targeted classification, the non-targeted classification includes multiple subclass;
Weight corresponding to each subclass in more the multiple subclass;
Determine son corresponding to weight maximum value that the query information belongs in weight corresponding to each subclass Classification.
3. the method according to claim 1, wherein it is described according to the classification information, first weight and First model is trained, and obtains object classifiers, comprising:
According to the classification information, first weight and first model are trained, and obtain the first classifier, and described One disaggregated model is supervised classification model;
It is described to be classified by the object classifiers to the query information, the query information is obtained in the target class The weight of weight and the query information under the non-targeted classification under not, comprising:
Classified by first classifier to the query information, obtains the query information under the target category Third weight under the non-targeted classification of the second weight and the query information;
According to the classification information, first weight, second weight, the third weight and the second model trained It is trained, obtains the second classifier, second disaggregated model is unsupervised classification model, and second model is with described The model that the relevant corpus of query information obtains after being trained, second classifier are the object classifiers;
It is described to be classified by the object classifiers to the query information, the query information is obtained in the target class The weight of weight and the query information under the non-targeted classification under not, comprising:
Classified by second classifier to the query information, obtains the query information under the target category The 5th weight under the non-targeted classification of the 4th weight and the query information.
4. according to the method described in claim 3, it is characterized in that, it is described by second classifier to the query information Classify, obtains fourth weight and the query information of the query information under the target category described non-targeted After the 5th weight under classification, the method also includes:
It is second weight, the third weight, the 4th weight, described according to the classification information, first weight 5th weight and the third model trained are trained, and obtain third classifier, and the third model is with inquiry letter The model obtained after relevant corpus is trained is ceased, the third disaggregated model is supervised classification model;
It is described to be classified by the object classifiers to the query information, the query information is obtained in the target class The weight of weight and the query information under the non-targeted classification under not, comprising:
Classified by the third classifier to the query information, obtains the query information under the target category The 7th weight under the non-targeted classification of the 6th weight and the query information.
5. method according to any one of claims 1 to 4, which is characterized in that described according to the query information and described First weight of corresponding each classification and the first model are trained, and obtain object classifiers, comprising:
The query information is indicated in vector form, and the first weight of each classification is expressed as probability vector;
The query information of the vector form, the probability vector be trained after splicing synthesizes with the first model, obtained To the object classifiers.
6. a kind of equipment of information classification characterized by comprising
Receiving module, for receiving query information to be sorted;
First obtains module, includes multiple classifications in the classification information for obtaining classification information;
Second obtains module, obtains module acquisition described first for obtaining the received query information of the receiving module The multiple classification in the first weight corresponding under each classification;
Training module obtains the institute that the classification information, the second acquisition module that module obtains obtain according to described first The first model stating the first weight and having trained is trained, and obtains object classifiers, the classification of the object classifiers record Including target category and non-targeted classification, first model is to obtain after being trained with the relevant corpus of the query information Disaggregated model;
Categorization module is classified to the query information by the object classifiers of training module training, is obtained The query information is in the weight of weight and the query information under the non-targeted classification under the target category;
First determining module, the corresponding weight of the target category for determining when the categorization module are greater than described non-targeted When the corresponding weight of classification, determine that the query information belongs to the target category.
7. the equipment of information classification according to claim 6, which is characterized in that further include the second determining module, compare mould Block and third determining module;
Second determining module, the corresponding weight of the target category for determining when the categorization module are less than described non- When the corresponding weight of target category, determine that the query information belongs to the non-targeted classification, the non-targeted classification includes Multiple subclass;
The comparison module, for corresponding to each subclass in multiple subclass of second determining module determination Weight;
The third determining module, for determining that the query information belongs to each subclass institute that the comparison module determines Subclass corresponding to weight maximum value in corresponding weight.
8. the equipment of information classification according to claim 6, which is characterized in that the training module further includes the first training Unit and the second training unit;
First training unit, for obtaining the classification information that module obtains according to described first, described second is obtained First weight of module acquisition and the first model trained are trained, and obtain the first classifier, first model For supervised classification model;
The categorization module is also used to first classifier by first training unit training to the query information Classify, obtains second weight and the query information of the query information under the target category described non-targeted Third weight under classification;
Second training unit, for obtaining the classification information, second acquisition that module obtains according to described first First weight that module obtains, second weight and the third weight that first taxon obtains, and The second trained model is trained, and obtains the second classifier, and second model is non-supervisory model, and second model is The model obtained after being trained with the relevant corpus of the query information, second classifier are the object classifiers;
The categorization module, second classifier for being also used to obtain by second training unit training is to the inquiry Information is classified, and obtains fourth weight and the query information of the query information under the target category described non- The 5th weight under target category.
9. the equipment of information classification according to claim 8, which is characterized in that the training module further includes third training Unit and third taxon;The third training unit is believed for obtaining the classification that module obtains according to described first First weight and obtained second weight of first taxon that breath, the second acquisition module obtain and The third weight, the 4th weight and the 5th weight that second taxon obtains, and the third trained Model is trained, and obtains third classifier, and the third model is supervised classification model, and the third model is to be looked into described Ask the model obtained after the relevant corpus of information is trained;
The categorization module, the third classifier for being also used to obtain by third training unit training is to the inquiry Information is classified, and obtains sixth weight and the query information of the query information under the target category described non- The 7th weight under target category.
10. according to the equipment of the described in any item information classification of claim 6 to 9, which is characterized in that the categorization module, also It is expressed as probability vector for indicating the query information in vector form, and by the first weight of each classification; The query information of the vector form, the probability vector be trained after splicing synthesizes with the first model, obtain institute State object classifiers.
11. a kind of equipment of information classification characterized by comprising
Memory, for storing computer executable program code;
Processor, with the memory and the transceiver couples;
Wherein said program code includes instruction, and when the processor executes described instruction, described instruction makes the information point The equipment of class executes the method such as information described in any one of claim 1 to 5 classification.
12. a kind of computer readable storage medium, which is characterized in that instruction is stored in the computer readable storage medium, When run on a computer, so that computer executes the side of information classification described in any one of the claims 1 to 5 Method.
CN201710794992.3A 2017-09-05 2017-09-05 Information classification method and equipment Active CN110019777B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710794992.3A CN110019777B (en) 2017-09-05 2017-09-05 Information classification method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710794992.3A CN110019777B (en) 2017-09-05 2017-09-05 Information classification method and equipment

Publications (2)

Publication Number Publication Date
CN110019777A true CN110019777A (en) 2019-07-16
CN110019777B CN110019777B (en) 2022-08-19

Family

ID=67186205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710794992.3A Active CN110019777B (en) 2017-09-05 2017-09-05 Information classification method and equipment

Country Status (1)

Country Link
CN (1) CN110019777B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110505144A (en) * 2019-08-09 2019-11-26 世纪龙信息网络有限责任公司 Process for sorting mailings, device, equipment and storage medium
CN110531632A (en) * 2019-09-27 2019-12-03 北京声智科技有限公司 Control method and system
CN110851607A (en) * 2019-11-19 2020-02-28 中国银行股份有限公司 Training method and device for information classification model
CN111260435A (en) * 2020-01-10 2020-06-09 京东数字科技控股有限公司 Multi-factor weight assignment correction method and device, computer equipment and storage medium
CN111597804A (en) * 2020-05-15 2020-08-28 腾讯科技(深圳)有限公司 Entity recognition model training method and related device
CN112269860A (en) * 2020-08-10 2021-01-26 北京沃东天骏信息技术有限公司 Automatic response processing method and device, electronic equipment and readable storage medium
CN112632269A (en) * 2019-09-24 2021-04-09 北京国双科技有限公司 Method and related device for training document classification model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982344A (en) * 2012-11-12 2013-03-20 浙江大学 Support vector machine sorting method based on simultaneously blending multi-view features and multi-label information
CN103365997A (en) * 2013-07-12 2013-10-23 华东师范大学 Opinion mining method based on ensemble learning
US20140101119A1 (en) * 2012-10-05 2014-04-10 Microsoft Corporation Meta classifier for query intent classification
US20170076224A1 (en) * 2015-09-15 2017-03-16 International Business Machines Corporation Learning of classification model
CN106649694A (en) * 2016-12-19 2017-05-10 北京云知声信息技术有限公司 Method and device for identifying user's intention in voice interaction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140101119A1 (en) * 2012-10-05 2014-04-10 Microsoft Corporation Meta classifier for query intent classification
CN102982344A (en) * 2012-11-12 2013-03-20 浙江大学 Support vector machine sorting method based on simultaneously blending multi-view features and multi-label information
CN103365997A (en) * 2013-07-12 2013-10-23 华东师范大学 Opinion mining method based on ensemble learning
US20170076224A1 (en) * 2015-09-15 2017-03-16 International Business Machines Corporation Learning of classification model
CN106649694A (en) * 2016-12-19 2017-05-10 北京云知声信息技术有限公司 Method and device for identifying user's intention in voice interaction

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A. OKAN BILGE ÖZDEMIR: "ACTIVE LEARNING FOR HYPERSPECTRAL IMAGE CLASSIFICATION WITH A STACKED AUTOENCODERS BASED NEURAL NETWORK", 《2015 7TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING》 *
周星: "分类器集成算法研究", 《武汉大学学报》 *
陈文: "中文短文本跨领域情感分类算法研究", 《中国优秀硕士学位论文全文库》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110505144A (en) * 2019-08-09 2019-11-26 世纪龙信息网络有限责任公司 Process for sorting mailings, device, equipment and storage medium
CN112632269A (en) * 2019-09-24 2021-04-09 北京国双科技有限公司 Method and related device for training document classification model
CN110531632A (en) * 2019-09-27 2019-12-03 北京声智科技有限公司 Control method and system
CN110851607A (en) * 2019-11-19 2020-02-28 中国银行股份有限公司 Training method and device for information classification model
CN111260435A (en) * 2020-01-10 2020-06-09 京东数字科技控股有限公司 Multi-factor weight assignment correction method and device, computer equipment and storage medium
CN111597804A (en) * 2020-05-15 2020-08-28 腾讯科技(深圳)有限公司 Entity recognition model training method and related device
CN111597804B (en) * 2020-05-15 2023-03-10 腾讯科技(深圳)有限公司 Method and related device for training entity recognition model
CN112269860A (en) * 2020-08-10 2021-01-26 北京沃东天骏信息技术有限公司 Automatic response processing method and device, electronic equipment and readable storage medium
CN112269860B (en) * 2020-08-10 2024-03-05 北京汇钧科技有限公司 Automatic response processing method, device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN110019777B (en) 2022-08-19

Similar Documents

Publication Publication Date Title
CN110019777A (en) A kind of method and apparatus of information classification
CN107943860B (en) Model training method, text intention recognition method and text intention recognition device
US11302337B2 (en) Voiceprint recognition method and apparatus
CN110945513B (en) Domain addition system and method for language understanding system
KR102288249B1 (en) Information processing method, terminal, and computer storage medium
CN110276075A (en) Model training method, name entity recognition method, device, equipment and medium
WO2018072663A1 (en) Data processing method and device, classifier training method and system, and storage medium
US11511436B2 (en) Robot control method and companion robot
WO2021093821A1 (en) Intelligent assistant evaluation and recommendation methods, system, terminal, and readable storage medium
US20180061421A1 (en) Personalization of experiences with digital assistants in communal settings through voice and query processing
CN106792003B (en) Intelligent advertisement insertion method and device and server
CN110853618A (en) Language identification method, model training method, device and equipment
KR20190094314A (en) An artificial intelligence apparatus for generating text or speech having content-based style and method for the same
CN111444357B (en) Content information determination method, device, computer equipment and storage medium
CN110334344A (en) A kind of semanteme intension recognizing method, device, equipment and storage medium
CN106528859A (en) Data pushing system and method
WO2017183242A1 (en) Information processing device and information processing method
CN110462676A (en) Electronic device, its control method and non-transient computer readable medium recording program performing
US11881209B2 (en) Electronic device and control method
CN108228720B (en) Identify method, system, device, terminal and the storage medium of target text content and original image correlation
CN111353299B (en) Dialog scene determining method based on artificial intelligence and related device
WO2019212729A1 (en) Generating response based on user's profile and reasoning on contexts
CN108345612A (en) A kind of question processing method and device, a kind of device for issue handling
KR20190076870A (en) Device and method for recommeding contact information
CN110431547A (en) Electronic equipment and control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant