CN107507613A - Towards Chinese instruction identification method, device, equipment and the storage medium of scene - Google Patents
Towards Chinese instruction identification method, device, equipment and the storage medium of scene Download PDFInfo
- Publication number
- CN107507613A CN107507613A CN201710620448.7A CN201710620448A CN107507613A CN 107507613 A CN107507613 A CN 107507613A CN 201710620448 A CN201710620448 A CN 201710620448A CN 107507613 A CN107507613 A CN 107507613A
- Authority
- CN
- China
- Prior art keywords
- sample
- prediction
- mrow
- forecast model
- mistake
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Abstract
The invention provides a kind of Chinese instruction identification method, device, equipment and storage medium towards scene, wherein, towards the Chinese instruction identification method of scene, including:Divide the sample set and the first preset formula of sample according to including mistake, correct the prediction weight of each forecast model, wherein, mistake divides sample to identify unmatched test sample with actual class for prediction class mark.Pass through technical scheme, with dividing the sample set of sample to train the prediction weight of each forecast model of amendment including mistake, the accuracy rate of Chinese instruction identification is effectively increased, and prejudge by scene, hind computation resource effectively is saved, improves the intelligent level of Chinese instruction identification.
Description
Technical field
The present invention relates to human-computer intellectualization technical field, knows in particular to a kind of Chinese instruction towards scene
Other method, a kind of Chinese instruction identification device, a kind of computer equipment and a kind of computer-readable recording medium towards scene.
Background technology
Modern intelligent Answer System generally comprise speech recognition, text resolution, syntactic analysis, semantic analysis, topic identification,
Multiple sport technique segments such as response are parsed, Chinese instruction identification (the mainly interrogative sentence sentence towards scene wherein in syntactic analysis
Formula identifies) it act as the portal authentication function of whole intelligent Answer System.
In correlation technique, the Chinese instruction identification towards scene in syntactic analysis mainly passes through interrogative mode of rule
Match somebody with somebody, change the generation major class method of syntactic analysis two to realize there is following technological deficiency:
(1) matching of interrogative mode of rule is, it is necessary to very numerous and jumbled and be difficult to exhaustive all query vocabularys, and Chinese is referred to
The understanding of order is more superficial, and the accuracy rate of identification is relatively low.
(2) conversion generation syntactic analysis is, it is necessary to pre-establish corresponding dictionary collection and formulate syntactic pattern in advance, it is necessary to mistake
More manual interventions, intelligence degree are relatively low.
The content of the invention
It is contemplated that at least solves one of technical problem present in prior art or correlation technique.
Therefore, it is an object of the present invention to provide a kind of Chinese instruction identification method towards scene.
It is another object of the present invention to provide a kind of Chinese instruction identification device towards scene.
It is yet a further object of the present invention to provide a kind of computer equipment.
A further object of the present invention is to provide a kind of computer-readable recording medium.
To achieve these goals, the technical scheme of the first aspect of the present invention provides a kind of Chinese towards scene and referred to
Recognition methods is made, including:According to the sample set and the first preset formula for dividing sample including mistake, the prediction of each forecast model is corrected
Weight, wherein, mistake divides sample to identify unmatched test sample with actual class for prediction class mark.
In the technical scheme, by according to the wrong sample set and the first preset formula for dividing sample is included, correcting each pre-
The prediction weight of model is surveyed, realizes and identifies each prediction of amendment originally of unmatched test specimens with prediction class mark and actual class
The prediction weight of model, can effectively train forecast model, improve the accuracy rate of prediction, and then effectively improve Chinese instruction identification
Accuracy rate, and test sample prediction class mark with actual class mark mismatch when, mistake will be marked as and divide sample,
The wrong probability for dividing sample is improved simultaneously so that mistake divides sample to be preferentially extracted, as the prediction for correcting each forecast model
The sample set of weight, also enables mistake to divide sample to be preferentially extracted, and as new test sample, reduces people to a certain extent
Work intervention, improves the intelligent level of forecast model training, while also improves the intelligent level of Chinese instruction identification.
In addition, the sample set of sample is divided to be the sample set or a part that all mistakes divide sample including mistake
Sample, a part is divided to be to predict the sample set of correct sample for mistake, the quantity of sample set is larger, each to reach amendment
The purpose of the prediction weight of forecast model.
In the above-mentioned technical solutions, it is preferable that every according to the sample set and the first preset formula for dividing sample including mistake, amendment
The prediction weight of individual forecast model, is specifically included:Divide the sample set of sample according to including mistake, each forecast model of cross validation,
To determine the precision of prediction of each forecast model;According to the first preset formula and precision of prediction, the pre- of each forecast model is corrected
Weight is surveyed, wherein, the first preset formula includes:
ωiIt is characterized as the prediction weight of i-th of forecast model, piThe precision of prediction of i-th of forecast model is characterized as,It is characterized as the precision of prediction sum of all forecast models.
In the technical scheme, by using the sample set for dividing sample including mistake, each forecast model of cross validation, to determine
The precision of prediction of each forecast model, specifically, can use 10 folding cross-validation methods, will include the sample set that mistake divides sample
It is divided into 10 parts, 9 parts are used as training data, and 1 part is used as test data, is tested, and experiment every time can all draw corresponding correct
Rate, using the average value of the accuracy of 10 results as the precision of prediction to forecast model, it typically can also carry out multiple 10 folding and hand over
Fork checking, such as 10 times, then average, to improve the accuracy of the precision of prediction of forecast model determination.
By the first preset formula and precision of prediction, to calculate the prediction weight of each forecast model, with what is corrected
The prediction weight of each forecast model, the accuracy of the determination of the prediction weight of each forecast model is improved, is further improved
The accuracy rate of Chinese instruction identification.
In any of the above-described technical scheme, it is preferable that according to the default public affairs of sample set and first for dividing sample including mistake
Formula, before the prediction weight for correcting each forecast model, in addition to:It is default according to the prediction weight of each forecast model and second
Formula, determine the prediction class mark of test sample;If the actual class mark of test sample mismatches with prediction class mark, it is determined that
Test sample is that mistake divides sample;The sampling probability that mistake divides sample is improved, including mistake with extraction divides the sample set of sample and to extract
Mistake divides sample as new test sample, wherein, the second preset formula includes:
Pred=Max (ωi·nj)
ωiIt is characterized as the prediction weight of i-th of forecast model, njJ-th of class mark is characterized as in all forecast models to go out
Existing number, pred are characterized as Max (ωi·nj) corresponding to class mark, that is, predict class mark.
In the technical scheme, by the prediction weight and the second preset formula according to each forecast model, to determine to survey
The prediction class mark of sample sheet, and prediction class mark and actual class are identified into unmatched test sample and divide sample labeled as mistake,
The test to forecast model is realized, is advantageous to the training to the next step of forecast model, the probability of sample is divided by improving mistake,
Enable mistake to divide sample to be preferentially extracted, as the sample set for the prediction weight for correcting each forecast model, also cause wrong point
Sample can be preferentially extracted, and as new test sample, reduce manual intervention to a certain extent, improve forecast model instruction
Experienced intelligent level, be advantageous to further improve the accuracy rate of Chinese instruction identification.
In any of the above-described technical scheme, it is preferable that default according to the default weight of each forecast model and second
Formula, before determining that the prediction class of test sample identifies, in addition to:Determine whether include and default scene vocabulary in test sample
The vocabulary that storehouse matches;If it is determined that not including the vocabulary to match with default scene lexicon in test sample, then prompting is sent
Signal, and the determination of the prediction class mark without test sample;If it is determined that test sample includes and default scene lexicon
The vocabulary to match, then to preset corresponding vocabulary in the vocabulary replacement test sample to match in scene lexicon, and carry out
The determination of the prediction class mark of test sample.
In the technical scheme, by it is determined that before the prediction class mark of test sample, determine in test sample whether
Including the vocabulary to match with default scene lexicon, the anticipation of scene is realized so that Chinese, which instructs, to be identified towards scene, than
Relatively targetedly, the computing resource on backstage can effectively be saved, if it is determined that do not include and default scene vocabulary in test sample
The vocabulary that storehouse matches, then send cue, and without the determination that identifies of prediction class of test sample, can will be uncorrelated
Test sample filter out, further effectively save backstage computing resource, by it is determined that test sample include with preset
During the vocabulary that scene lexicon matches, to preset corresponding word in the vocabulary replacement test sample to match in scene lexicon
Converge, and carry out the determination of the prediction class mark of test sample, improve the standardization level into the test sample of forecast model,
The prediction class for being advantageous to forecast model output and the sensible matching of actual category identifies, and further increases the standard of Chinese instruction identification
Exactness.
For example scene is set to kitchen scene, then in default scene lexicon, it is possible to including following vocabulary:The first kind
Conventional food materials (define have chosen 450 kinds of conventional food materials such as apple, celery, potato and its synonymous);Second class often (is defined with recipe
It has chosen 10000 kinds of conventional recipes such as the Fish with Chinese Sauerkraut, fish-flavoured shredded pork and its synonymous);3rd class taste flavor is (comprising sour, peppery, light etc.
Multiple subclasses and its synonymous);Season in 4th class season (comprising multiple subclasses such as the Dragon Boat Festival, Valentine's Day and its synonymous);5th class
Nutritive effect (includes multiple subclasses and its synonymous such as fat-reducing, insomnia, weight reducing);6th class special population (comprising driver, teacher,
Multiple subclasses such as examinee and its synonymous);The conditioning of 7th class disease is (comprising multiple subclasses such as hypertension, flu, toothache and its together
Justice);8th class beauty treatment weight reducing (comprising multiple subclasses such as whitening, anti-acne, nti-freckle and its synonymous);9th class cuisine vegetable (includes
Multiple subclasses such as snack, barbecue, stoke of midnight and its synonymous);Tenth class scene scene (includes more height such as unmarried, afternoon tea, promotion
Class and its synonymous).
In any of the above-described technical scheme, it is preferable that improve the sampling probability that mistake divides sample, specifically include:According to
Three preset formulas, the wrong sampling probability for dividing sample is redefined, wherein, the 3rd preset formula includes:
ykIt is characterized as test sample k actual class mark, h(k)It is characterized as test sample k prediction class mark, Wk+1Characterize
Mistake to redefine divides sample k sampling probability, ∑ (yk≠h(k)) be characterized as wrong point of sample sum.
In the technical scheme, by the 3rd preset formula, the wrong sampling probability for dividing sample is redefined, is realized with one
Fixed rule improves the sampling probability that mistake divides sample, is advantageous to extract and divides the sample set of sample to go to correct each prediction mould comprising mistake
The prediction weight of type, it is also beneficial to extraction mistake and divides sample to be calculated as new test sample by the 3rd preset formula
Mistake divides the sampling probability of sample to step up, that is to say, that is more than for the first time by the sampling probability of the sample of mistake point general
The sampling probability of sample, if mistake divides sample as new test sample again by mistake point, sampling probability may proceed to improve, i.e.,
The sampling probability of second of sample by mistake point is more than for the first time by the sampling probability of the sample of mistake point, is instructed by multiple samsara
Practice, the prediction weight of the convenient each forecast model of a ratio can be obtained, the accurate of Chinese instruction identification can be effectively improved
Rate.
In any of the above-described technical scheme, it is preferable that according to the default public affairs of sample set and first for dividing sample including mistake
Formula, before the prediction weight for correcting each forecast model, in addition to:Based on preset rules, according to default corpus, structure prediction
Model, and preset the prediction weight of each forecast model.
In the technical scheme, by based on preset rules, according to default corpus, realizing the structure to forecast model
Build, then preset the prediction weight of each forecast model, be advantageously implemented the training to forecast model, than if any 4 prediction moulds
Type, the prediction weight that can preset each forecast model are 0.25.
Wherein, preset rules are algorithm of support vector machine, random forest tree algorithm, KNN nearest neighbor algorithms, naive Bayesian
Algorithm, every kind of algorithm each independently build forecast model, and can further improve Chinese instruction with reference to these forecast models knows
Other accuracy rate.
Default corpus is the structure of forecast model, and training provides language material, test sample and divides sample including mistake
Sample set all extracts from default corpus, specifically, collects and arranges interrogative sentence, imperative sentence, exclamative sentence, the class of declarative sentence 4
Corpus marks as default corpus, to form forecast model training test set T={ (x1, y1), (x2, y2)…(xn,
yn), wherein, x ∈ χ, and instance space χ ∈ Rn, ynBelong to tag set { 1,2,3,4 }, the set corresponds to interrogative sentence, prayed respectively
Make 4 sentence, exclamative sentence, declarative sentence class marks, related subclass is included per class corpus, wherein, interrogative sentence, which includes, refers in particular to question sentence, choosing
Select question sentence, A-not-A question, whether 4 subclasses of question sentence, imperative sentence (comprising order imperative sentence, ask imperative sentence, forbid imperative sentence,
4 subclasses of imperative sentence are tried to stop, exclamative sentence includes 4 interjection exclamative sentence, noun exclamative sentence, spoken exclamative sentence, adverbial word exclamative sentence
Class, declarative sentence include negative statement declarative sentence, certainly 2 subclasses such as statement declarative sentence.
The technical scheme of second aspect of the present invention provides a kind of Chinese instruction identification device towards scene, including:Repair
Positive unit, for according to the wrong sample set and the first preset formula for dividing sample is included, correcting the prediction weight of each forecast model,
Wherein, mistake divides sample to identify unmatched test sample with actual class for prediction class mark.
In the technical scheme, by according to the wrong sample set and the first preset formula for dividing sample is included, correcting each pre-
The prediction weight of model is surveyed, realizes and identifies each prediction of amendment originally of unmatched test specimens with prediction class mark and actual class
The prediction weight of model, can effectively train forecast model, improve the accuracy rate of prediction, and then effectively improve Chinese instruction identification
Accuracy rate, and test sample prediction class mark with actual class mark mismatch when, mistake will be marked as and divide sample,
The wrong probability for dividing sample is improved simultaneously so that mistake divides sample to be preferentially extracted, as the prediction for correcting each forecast model
The sample set of weight, also enables mistake to divide sample to be preferentially extracted, and as new test sample, reduces people to a certain extent
Work intervention, improves the intelligent level of forecast model training, while also improves the intelligent level of Chinese instruction identification.
In addition, the sample set of sample is divided to be the sample set or a part that all mistakes divide sample including mistake
Sample, a part is divided to be to predict the sample set of correct sample for mistake, the quantity of sample set is larger, each to reach amendment
The purpose of the prediction weight of forecast model.
In the above-mentioned technical solutions, it is preferable that also include:Authentication unit, include the wrong sample set for dividing sample for basis,
The each forecast model of cross validation, to determine the precision of prediction of each forecast model;Amending unit is additionally operable to:It is default according to first
Formula and precision of prediction, the prediction weight of each forecast model is corrected, wherein, the first preset formula includes:
ωiIt is characterized as the prediction weight of i-th of forecast model, piThe precision of prediction of i-th of forecast model is characterized as,It is characterized as the precision of prediction sum of all forecast models.
In the technical scheme, by using the sample set for dividing sample including mistake, each forecast model of cross validation, to determine
The precision of prediction of each forecast model, specifically, can use 10 folding cross-validation methods, will include the sample set that mistake divides sample
It is divided into 10 parts, 9 parts are used as training data, and 1 part is used as test data, is tested, and experiment every time can all draw corresponding correct
Rate, using the average value of the accuracy of 10 results as the precision of prediction to forecast model, it typically can also carry out multiple 10 folding and hand over
Fork checking, such as 10 times, then average, to improve the accuracy of the precision of prediction of forecast model determination.
By the first preset formula and precision of prediction, to calculate the prediction weight of each forecast model, with what is corrected
The prediction weight of each forecast model, the accuracy of the determination of the prediction weight of each forecast model is improved, is further improved
The accuracy rate of Chinese instruction identification.
In any of the above-described technical scheme, it is preferable that also include:Determining unit, for according to each forecast model
Weight and the second preset formula are predicted, determines the prediction class mark of test sample;Determining unit is additionally operable to:In the reality of test sample
When border class mark mismatches with prediction class mark, determine that test sample divides sample for mistake;Unit is improved, divides sample for improving mistake
Sampling probability, with extract include the wrong sample set for dividing sample and using extract it is wrong divide sample as new test sample, wherein, the
Two preset formulas include:
Pred=Max (ωi·nj)
ωiIt is characterized as the prediction weight of i-th of forecast model, njJ-th of class mark is characterized as in all forecast models to go out
Existing number, pred are characterized as Max (ωi·nj) corresponding to class mark, that is, predict class mark.
In the technical scheme, by the prediction weight and the second preset formula according to each forecast model, to determine to survey
The prediction class mark of sample sheet, and prediction class mark and actual class are identified into unmatched test sample and divide sample labeled as mistake,
The test to forecast model is realized, is advantageous to the training to the next step of forecast model, the probability of sample is divided by improving mistake,
Enable mistake to divide sample to be preferentially extracted, as the sample set for the prediction weight for correcting each forecast model, also cause wrong point
Sample can be preferentially extracted, and as new test sample, reduce manual intervention to a certain extent, improve forecast model instruction
Experienced intelligent level, be advantageous to further improve the accuracy rate of Chinese instruction identification.
In any of the above-described technical scheme, it is preferable that determining unit is additionally operable to:Determine in test sample whether to include with
The vocabulary that default scene lexicon matches;Chinese instruction identification device also includes:Tip element, for it is determined that test sample
In when not including the vocabulary to match with default scene lexicon, send cue, and without the prediction class of test sample
The determination of mark;Replacement unit, for when it is determined that test sample includes the vocabulary to match with default scene lexicon, with
Corresponding vocabulary in the vocabulary replacement test sample to match in default scene lexicon, and carry out the prediction category of test sample
The determination of knowledge.
In the technical scheme, by it is determined that before the prediction class mark of test sample, determine in test sample whether
Including the vocabulary to match with default scene lexicon, the anticipation of scene is realized so that Chinese, which instructs, to be identified towards scene, than
Relatively targetedly, the computing resource on backstage can effectively be saved, if it is determined that do not include and default scene vocabulary in test sample
The vocabulary that storehouse matches, then send cue, and without the determination that identifies of prediction class of test sample, can will be uncorrelated
Test sample filter out, further effectively save backstage computing resource, by it is determined that test sample include with preset
During the vocabulary that scene lexicon matches, to preset corresponding word in the vocabulary replacement test sample to match in scene lexicon
Converge, and carry out the determination of the prediction class mark of test sample, improve the standardization level into the test sample of forecast model,
The prediction class for being advantageous to forecast model output and the sensible matching of actual category identifies, and further increases the standard of Chinese instruction identification
Exactness.
For example scene is set to kitchen scene, then in default scene lexicon, it is possible to including following vocabulary:The first kind
Conventional food materials (define have chosen 450 kinds of conventional food materials such as apple, celery, potato and its synonymous);Second class often (is defined with recipe
It has chosen 10000 kinds of conventional recipes such as the Fish with Chinese Sauerkraut, fish-flavoured shredded pork and its synonymous);3rd class taste flavor is (comprising sour, peppery, light etc.
Multiple subclasses and its synonymous);Season in 4th class season (comprising multiple subclasses such as the Dragon Boat Festival, Valentine's Day and its synonymous);5th class
Nutritive effect (includes multiple subclasses and its synonymous such as fat-reducing, insomnia, weight reducing);6th class special population (comprising driver, teacher,
Multiple subclasses such as examinee and its synonymous);The conditioning of 7th class disease is (comprising multiple subclasses such as hypertension, flu, toothache and its together
Justice);8th class beauty treatment weight reducing (comprising multiple subclasses such as whitening, anti-acne, nti-freckle and its synonymous);9th class cuisine vegetable (includes
Multiple subclasses such as snack, barbecue, stoke of midnight and its synonymous);Tenth class scene scene (includes more height such as unmarried, afternoon tea, promotion
Class and its synonymous).
In any of the above-described technical scheme, it is preferable that determining unit is additionally operable to:According to the 3rd preset formula, again really
Determine the sampling probability that mistake divides sample, wherein, the 3rd preset formula includes:
ykIt is characterized as test sample k actual class mark, h(k)It is characterized as test sample k prediction class mark, Wk+1Characterize
Mistake to redefine divides sample k sampling probability, ∑ (yk≠h(k)) be characterized as wrong point of sample sum.
In the technical scheme, by the 3rd preset formula, the wrong sampling probability for dividing sample is redefined, is realized with one
Fixed rule improves the sampling probability that mistake divides sample, is advantageous to extract and divides the sample set of sample to go to correct each prediction mould comprising mistake
The prediction weight of type, it is also beneficial to extraction mistake and divides sample to be calculated as new test sample by the 3rd preset formula
Mistake divides the sampling probability of sample to step up, that is to say, that is more than for the first time by the sampling probability of the sample of mistake point general
The sampling probability of sample, if mistake divides sample as new test sample again by mistake point, sampling probability may proceed to improve, i.e.,
The sampling probability of second of sample by mistake point is more than for the first time by the sampling probability of the sample of mistake point, is instructed by multiple samsara
Practice, the prediction weight of the convenient each forecast model of a ratio can be obtained, the accurate of Chinese instruction identification can be effectively improved
Rate.
In any of the above-described technical scheme, it is preferable that also include:Default unit, for based on preset rules, according to pre-
If corpus, forecast model is built, and preset the prediction weight of each forecast model.
In the technical scheme, by based on preset rules, according to default corpus, realizing the structure to forecast model
Build, then preset the prediction weight of each forecast model, be advantageously implemented the training to forecast model, than if any 4 prediction moulds
Type, the prediction weight that can preset each forecast model are 0.25.
Wherein, preset rules are algorithm of support vector machine, random forest tree algorithm, KNN nearest neighbor algorithms, naive Bayesian
Algorithm, every kind of algorithm each independently build forecast model, and can further improve Chinese instruction with reference to these forecast models knows
Other accuracy rate.
Default corpus is the structure of forecast model, and training provides language material, test sample and divides sample including mistake
Sample set all extracts from default corpus, specifically, collects and arranges interrogative sentence, imperative sentence, exclamative sentence, the class of declarative sentence 4
Corpus marks as default corpus, to form forecast model training test set T={ (x1, y1), (x2, y2)…(xn,
yn), wherein, x ∈ χ, and instance space χ ∈ Rn, ynBelong to tag set { 1,2,3,4 }, the set corresponds to interrogative sentence, prayed respectively
Make 4 sentence, exclamative sentence, declarative sentence class marks, related subclass is included per class corpus, wherein, interrogative sentence, which includes, refers in particular to question sentence, choosing
Select question sentence, A-not-A question, whether 4 subclasses of question sentence, imperative sentence (comprising order imperative sentence, ask imperative sentence, forbid imperative sentence,
4 subclasses of imperative sentence are tried to stop, exclamative sentence includes 4 interjection exclamative sentence, noun exclamative sentence, spoken exclamative sentence, adverbial word exclamative sentence
Class, declarative sentence include negative statement declarative sentence, certainly 2 subclasses such as statement declarative sentence.
The technical scheme of the third aspect of the present invention proposes a kind of computer equipment, and computer equipment includes processor,
Processor realizes the technical scheme such as above-mentioned the first aspect of the present invention when being used to perform the computer program stored in memory
Any one of proposition towards scene Chinese instruction identification method the step of.
In the technical scheme, computer equipment includes processor, and processor is used to perform the calculating stored in memory
The Chinese instruction towards scene of any one proposed such as the technical scheme of above-mentioned the first aspect of the present invention is realized during machine program
The step of recognition methods, thus the technical scheme of the first aspect with the invention described above any one that proposes towards scene
Whole beneficial effects of Chinese instruction identification method, will not be repeated here.
The technical scheme of the fourth aspect of the present invention proposes a kind of computer-readable recording medium, is stored thereon with calculating
Machine program, the face for any one that the technical scheme of the first aspect of the present invention proposes is realized when computer program is executed by processor
To scene Chinese instruction identification method the step of.
In the technical scheme, computer-readable recording medium is stored thereon with computer program, and computer program is located
Reason device realizes the Chinese instruction identification towards scene for any one that the technical scheme of the first aspect of the present invention proposes when performing
The step of method, therefore the Chinese towards scene of any one of the technical scheme proposition of the first aspect with the invention described above
Whole beneficial effects of instruction identification method, will not be repeated here.
The additional aspect and advantage of the present invention will provide in following description section, will partly become from the following description
Obtain substantially, or recognized by the practice of the present invention.
Brief description of the drawings
The above-mentioned and/or additional aspect and advantage of the present invention will become in the description from combination accompanying drawings below to embodiment
Substantially and it is readily appreciated that, wherein:
Fig. 1 shows the exemplary flow of the Chinese instruction identification method according to an embodiment of the invention towards scene
Figure;
Fig. 2 shows the exemplary flow of the Chinese instruction identification device according to an embodiment of the invention towards scene
Figure;
Fig. 3 shows the signal stream of the Chinese instruction identification method towards scene according to another embodiment of the invention
Cheng Tu.
Embodiment
It is below in conjunction with the accompanying drawings and specific real in order to be more clearly understood that the above objects, features and advantages of the present invention
Mode is applied the present invention is further described in detail.It should be noted that in the case where not conflicting, the implementation of the application
Feature in example and embodiment can be mutually combined.
Many details are elaborated in the following description to facilitate a thorough understanding of the present invention, still, the present invention may be used also
To be different from other modes described here using other to implement, therefore, protection scope of the present invention is not by described below
Specific embodiment limitation.
Embodiment 1
As shown in figure 1, the Chinese instruction identification method according to an embodiment of the invention towards scene, including:Step
S102, divide the sample set and the first preset formula of sample according to including mistake, correct the prediction weight of each forecast model, wherein,
Mistake divides sample to identify unmatched test sample with actual class for prediction class mark.
In this embodiment, by according to the wrong sample set and the first preset formula for dividing sample is included, correcting each prediction
The prediction weight of model, realize and identify the unmatched test specimens each prediction mould of amendment originally with prediction class mark and actual class
The prediction weight of type, can effectively train forecast model, improve the accuracy rate of prediction, and then effectively improve Chinese instruction identification
Accuracy rate, and when the prediction class mark of test sample mismatches with actual class mark, mistake will be marked as and divide sample, together
Shi Tigao mistakes divide the probability of sample so that mistake divides sample to be preferentially extracted, as the prediction power for correcting each forecast model
The sample set of weight, also enables mistake to divide sample to be preferentially extracted, and as new test sample, reduces to a certain extent artificial
Intervene, improve the intelligent level of forecast model training, while also improve the intelligent level of Chinese instruction identification.
In addition, the sample set of sample is divided to be the sample set or a part that all mistakes divide sample including mistake
Sample, a part is divided to be to predict the sample set of correct sample for mistake, the quantity of sample set is larger, each to reach amendment
The purpose of the prediction weight of forecast model.
It is in the above embodiment, it is preferable that each according to the sample set and the first preset formula that divide sample including mistake, amendment
The prediction weight of forecast model, is specifically included:Divide the sample set of sample according to including mistake, each forecast model of cross validation, with
It is determined that the precision of prediction of each forecast model;According to the first preset formula and precision of prediction, the prediction of each forecast model is corrected
Weight, wherein, the first preset formula includes:
ωiIt is characterized as the prediction weight of i-th of forecast model, piThe precision of prediction of i-th of forecast model is characterized as,It is characterized as the precision of prediction sum of all forecast models.
In this embodiment, it is every to determine by using the sample set for dividing sample including mistake, each forecast model of cross validation
The precision of prediction of individual forecast model, specifically, 10 folding cross-validation methods can be used, the sample set point that mistake divides sample will be included
For 10 parts, 9 parts are used as training data, and 1 part is used as test data, is tested, and experiment every time can all draw corresponding accuracy,
Using the average value of the accuracy of 10 results as the precision of prediction to forecast model, it typically can also carry out multiple 10 folding intersection and test
Card, such as 10 times, then average, to improve the accuracy of the precision of prediction of forecast model determination.
By the first preset formula and precision of prediction, to calculate the prediction weight of each forecast model, with what is corrected
The prediction weight of each forecast model, the accuracy of the determination of the prediction weight of each forecast model is improved, is further improved
The accuracy rate of Chinese instruction identification.
In any of the above-described embodiment, it is preferable that include the wrong sample set and the first preset formula for dividing sample in basis,
Before the prediction weight for correcting each forecast model, in addition to:According to the prediction weight of each forecast model and the second default public affairs
Formula, determine the prediction class mark of test sample;If the actual class mark of test sample mismatches with prediction class mark, it is determined that surveys
Sample sheet is that mistake divides sample;The sampling probability that mistake divides sample is improved, including mistake with extraction divides the sample set of sample and to extract mistake
Divide sample as new test sample, wherein, the second preset formula includes:
Pred=Max (ωi·nj)
ωiIt is characterized as the prediction weight of i-th of forecast model, njJ-th of class mark is characterized as in all forecast models to go out
Existing number, pred are characterized as Max (ωi·nj) corresponding to class mark, that is, predict class mark.
In this embodiment, by the prediction weight and the second preset formula according to each forecast model, to determine to test
The prediction class mark of sample, and prediction class mark and actual class are identified into unmatched test sample and divide sample labeled as mistake, it is real
The test to forecast model is showed, has been advantageous to the training to the next step of forecast model, the probability of sample is divided by improving mistake, is made
Wrong sample must be divided preferentially to be extracted, as the sample set for the prediction weight for correcting each forecast model, also cause mistake to divide sample
Originally it can preferentially be extracted, as new test sample, reduce manual intervention to a certain extent, improve forecast model training
Intelligent level, be advantageous to further improve Chinese instruction identification accuracy rate.
In any of the above-described embodiment, it is preferable that according to the default weight of each forecast model and the second default public affairs
Formula, before determining that the prediction class of test sample identifies, in addition to:Determine whether include and default scene lexicon in test sample
The vocabulary to match;If it is determined that not including the vocabulary to match with default scene lexicon in test sample, then prompting letter is sent
Number, and the determination of the prediction class mark without test sample;If it is determined that test sample includes and default scene lexicon phase
The vocabulary of matching, then to preset corresponding vocabulary in the vocabulary replacement test sample to match in scene lexicon, and surveyed
The determination of the prediction class mark of sample sheet.
In this embodiment, by it is determined that before the prediction class mark of test sample, determining whether wrapped in test sample
The vocabulary to match with default scene lexicon is included, realizes the anticipation of scene so that Chinese instruction identification is compared towards scene
Targetedly, the computing resource on backstage can effectively be saved, if it is determined that do not include and default scene lexicon in test sample
The vocabulary to match, then send cue, and without the determination that identifies of prediction class of test sample, can will be incoherent
Test sample filters out, and the computing resource on backstage is further effectively saved, by it is determined that test sample includes and default field
During the vocabulary that scape lexicon matches, to preset corresponding word in the vocabulary replacement test sample to match in scene lexicon
Converge, and carry out the determination of the prediction class mark of test sample, improve the standardization level into the test sample of forecast model,
The prediction class for being advantageous to forecast model output and the sensible matching of actual category identifies, and further increases the standard of Chinese instruction identification
Exactness.
For example scene is set to kitchen scene, then in default scene lexicon, it is possible to including following vocabulary:The first kind
Conventional food materials (define have chosen 450 kinds of conventional food materials such as apple, celery, potato and its synonymous);Second class often (is defined with recipe
It has chosen 10000 kinds of conventional recipes such as the Fish with Chinese Sauerkraut, fish-flavoured shredded pork and its synonymous);3rd class taste flavor is (comprising sour, peppery, light etc.
Multiple subclasses and its synonymous);Season in 4th class season (comprising multiple subclasses such as the Dragon Boat Festival, Valentine's Day and its synonymous);5th class
Nutritive effect (includes multiple subclasses and its synonymous such as fat-reducing, insomnia, weight reducing);6th class special population (comprising driver, teacher,
Multiple subclasses such as examinee and its synonymous);The conditioning of 7th class disease is (comprising multiple subclasses such as hypertension, flu, toothache and its together
Justice);8th class beauty treatment weight reducing (comprising multiple subclasses such as whitening, anti-acne, nti-freckle and its synonymous);9th class cuisine vegetable (includes
Multiple subclasses such as snack, barbecue, stoke of midnight and its synonymous);Tenth class scene scene (includes more height such as unmarried, afternoon tea, promotion
Class and its synonymous).
In any of the above-described embodiment, it is preferable that improve the sampling probability that mistake divides sample, specifically include:According to the 3rd
Preset formula, the wrong sampling probability for dividing sample is redefined, wherein, the 3rd preset formula includes:
ykIt is characterized as test sample k actual class mark, h(k)It is characterized as test sample k prediction class mark, Wk+1Characterize
Mistake to redefine divides sample k sampling probability, ∑ (yk≠h(k)) be characterized as wrong point of sample sum.
In this embodiment, by the 3rd preset formula, the wrong sampling probability for dividing sample is redefined, is realized with certain
Rule improve mistake and divide the sampling probability of sample, be advantageous to extract and divide the sample set of sample to go to correct each forecast model comprising mistake
Prediction weight, be also beneficial to extract mistake divide sample as new test sample, the mistake calculated by the 3rd preset formula
The sampling probability of sample is divided to step up, that is to say, that to be more than general sample by the sampling probability of the sample of mistake point for the first time
This sampling probability, if mistake divides sample as new test sample again by mistake point, sampling probability may proceed to improve, i.e., and the
The sampling probability of the secondary sample by mistake point is more than for the first time by the sampling probability of the sample of mistake point, is trained by multiple samsara,
The prediction weight of the convenient each forecast model of a ratio can be obtained, the accuracy rate of Chinese instruction identification can be effectively improved.
In any of the above-described embodiment, it is preferable that include the wrong sample set and the first preset formula for dividing sample in basis,
Before the prediction weight for correcting each forecast model, in addition to:Based on preset rules, according to default corpus, structure prediction mould
Type, and preset the prediction weight of each forecast model.
In this embodiment, by based on preset rules, according to default corpus, realizing the structure to forecast model,
Then the prediction weight of each forecast model is preset, is advantageously implemented the training to forecast model, than if any 4 forecast models,
The prediction weight that each forecast model can be preset is 0.25.
Wherein, preset rules are algorithm of support vector machine, random forest tree algorithm, KNN nearest neighbor algorithms, naive Bayesian
Algorithm, every kind of algorithm each independently build forecast model, and can further improve Chinese instruction with reference to these forecast models knows
Other accuracy rate.
Default corpus is the structure of forecast model, and training provides language material, test sample and divides sample including mistake
Sample set all extracts from default corpus, specifically, collects and arranges interrogative sentence, imperative sentence, exclamative sentence, the class of declarative sentence 4
Corpus marks as default corpus, to form forecast model training test set T={ (x1, y1), (x2, y2)…(xn,
yn), wherein, x ∈ χ, and instance space χ ∈ Rn, ynBelong to tag set { 1,2,3,4 }, the set corresponds to interrogative sentence, prayed respectively
Make 4 sentence, exclamative sentence, declarative sentence class marks, related subclass is included per class corpus, wherein, interrogative sentence, which includes, refers in particular to question sentence, choosing
Select question sentence, A-not-A question, whether 4 subclasses of question sentence, imperative sentence (comprising order imperative sentence, ask imperative sentence, forbid imperative sentence,
4 subclasses of imperative sentence are tried to stop, exclamative sentence includes 4 interjection exclamative sentence, noun exclamative sentence, spoken exclamative sentence, adverbial word exclamative sentence
Class, declarative sentence include negative statement declarative sentence, certainly 2 subclasses such as statement declarative sentence.
Embodiment 2
As shown in Fig. 2 the Chinese instruction identification device 200 according to an embodiment of the invention towards scene, including:
Amending unit 201, for according to the wrong sample set and the first preset formula for dividing sample is included, correcting the prediction of each forecast model
Weight, wherein, mistake divides sample to identify unmatched test sample with actual class for prediction class mark.
In this embodiment, by according to the wrong sample set and the first preset formula for dividing sample is included, correcting each prediction
The prediction weight of model, realize and identify the unmatched test specimens each prediction mould of amendment originally with prediction class mark and actual class
The prediction weight of type, can effectively train forecast model, improve the accuracy rate of prediction, and then effectively improve Chinese instruction identification
Accuracy rate, and when the prediction class mark of test sample mismatches with actual class mark, mistake will be marked as and divide sample, together
Shi Tigao mistakes divide the probability of sample so that mistake divides sample to be preferentially extracted, as the prediction power for correcting each forecast model
The sample set of weight, also enables mistake to divide sample to be preferentially extracted, and as new test sample, reduces to a certain extent artificial
Intervene, improve the intelligent level of forecast model training, while also improve the intelligent level of Chinese instruction identification.
In addition, the sample set of sample is divided to be the sample set or a part that all mistakes divide sample including mistake
Sample, a part is divided to be to predict the sample set of correct sample for mistake, the quantity of sample set is larger, each to reach amendment
The purpose of the prediction weight of forecast model.
In the above embodiment, it is preferable that also include:Authentication unit 202, for according to the sample for dividing sample including mistake
Collection, each forecast model of cross validation, to determine the precision of prediction of each forecast model;
Amending unit 201 is additionally operable to:According to the first preset formula and precision of prediction, the prediction for correcting each forecast model is weighed
Weight, wherein, the first preset formula includes:
ωiIt is characterized as the prediction weight of i-th of forecast model, piThe precision of prediction of i-th of forecast model is characterized as,It is characterized as the precision of prediction sum of all forecast models.
In this embodiment, it is every to determine by using the sample set for dividing sample including mistake, each forecast model of cross validation
The precision of prediction of individual forecast model, specifically, 10 folding cross-validation methods can be used, the sample set point that mistake divides sample will be included
For 10 parts, 9 parts are used as training data, and 1 part is used as test data, is tested, and experiment every time can all draw corresponding accuracy,
Using the average value of the accuracy of 10 results as the precision of prediction to forecast model, it typically can also carry out multiple 10 folding intersection and test
Card, such as 10 times, then average, to improve the accuracy of the precision of prediction of forecast model determination.
By the first preset formula and precision of prediction, to calculate the prediction weight of each forecast model, with what is corrected
The prediction weight of each forecast model, the accuracy of the determination of the prediction weight of each forecast model is improved, is further improved
The accuracy rate of Chinese instruction identification.
In any of the above-described embodiment, it is preferable that also include:Determining unit 206, for according to each forecast model
Weight and the second preset formula are predicted, determines the prediction class mark of test sample;Determining unit 206 is additionally operable to:In test sample
Actual class mark when being mismatched with prediction class mark, determine that test sample divides sample for mistake;Unit 208 is improved, for improving
Mistake divides the sampling probability of sample, includes the wrong sample set for dividing sample to extract and wrong divides sample as new test specimens to extract
This, wherein, the second preset formula includes:
Pred=Max (ωi·nj)
ωiIt is characterized as the prediction weight of i-th of forecast model, njJ-th of class mark is characterized as in all forecast models to go out
Existing number, pred are characterized as Max (ωi·nj) corresponding to class mark, that is, predict class mark.
In this embodiment, by the prediction weight and the second preset formula according to each forecast model, to determine to test
The prediction class mark of sample, and prediction class mark and actual class are identified into unmatched test sample and divide sample labeled as mistake, it is real
The test to forecast model is showed, has been advantageous to the training to the next step of forecast model, the probability of sample is divided by improving mistake, is made
Wrong sample must be divided preferentially to be extracted, as the sample set for the prediction weight for correcting each forecast model, also cause mistake to divide sample
Originally it can preferentially be extracted, as new test sample, reduce manual intervention to a certain extent, improve forecast model training
Intelligent level, be advantageous to further improve Chinese instruction identification accuracy rate.
In any of the above-described embodiment, it is preferable that determining unit 206 is additionally operable to:Determine whether include in test sample
The vocabulary to match with default scene lexicon;Chinese instruction identification device also includes:Tip element 210, for it is determined that surveying
When not including the vocabulary to match with default scene lexicon in sample sheet, cue is sent, and without test sample
Predict the determination of class mark;Replacement unit 212, for it is determined that test sample includes what is matched with default scene lexicon
During vocabulary, to preset corresponding vocabulary in the vocabulary replacement test sample to match in scene lexicon, and test sample is carried out
Prediction class mark determination.
In this embodiment, by it is determined that before the prediction class mark of test sample, determining whether wrapped in test sample
The vocabulary to match with default scene lexicon is included, realizes the anticipation of scene so that Chinese instruction identification is compared towards scene
Targetedly, the computing resource on backstage can effectively be saved, if it is determined that do not include and default scene lexicon in test sample
The vocabulary to match, then send cue, and without the determination that identifies of prediction class of test sample, can will be incoherent
Test sample filters out, and the computing resource on backstage is further effectively saved, by it is determined that test sample includes and default field
During the vocabulary that scape lexicon matches, to preset corresponding word in the vocabulary replacement test sample to match in scene lexicon
Converge, and carry out the determination of the prediction class mark of test sample, improve the standardization level into the test sample of forecast model,
The prediction class for being advantageous to forecast model output and the sensible matching of actual category identifies, and further increases the standard of Chinese instruction identification
Exactness.
For example scene is set to kitchen scene, then in default scene lexicon, it is possible to including following vocabulary:The first kind
Conventional food materials (define have chosen 450 kinds of conventional food materials such as apple, celery, potato and its synonymous);Second class often (is defined with recipe
It has chosen 10000 kinds of conventional recipes such as the Fish with Chinese Sauerkraut, fish-flavoured shredded pork and its synonymous);3rd class taste flavor is (comprising sour, peppery, light etc.
Multiple subclasses and its synonymous);Season in 4th class season (comprising multiple subclasses such as the Dragon Boat Festival, Valentine's Day and its synonymous);5th class
Nutritive effect (includes multiple subclasses and its synonymous such as fat-reducing, insomnia, weight reducing);6th class special population (comprising driver, teacher,
Multiple subclasses such as examinee and its synonymous);The conditioning of 7th class disease is (comprising multiple subclasses such as hypertension, flu, toothache and its together
Justice);8th class beauty treatment weight reducing (comprising multiple subclasses such as whitening, anti-acne, nti-freckle and its synonymous);9th class cuisine vegetable (includes
Multiple subclasses such as snack, barbecue, stoke of midnight and its synonymous);Tenth class scene scene (includes more height such as unmarried, afternoon tea, promotion
Class and its synonymous).
In any of the above-described embodiment, it is preferable that determining unit 206 is additionally operable to:According to the 3rd preset formula, again really
Determine the sampling probability that mistake divides sample, wherein, the 3rd preset formula includes:
ykIt is characterized as test sample k actual class mark, h(k)It is characterized as test sample k prediction class mark, Wk+1Characterize
Mistake to redefine divides sample k sampling probability, ∑ (yk≠h(k)) be characterized as wrong point of sample sum.
In this embodiment, by the 3rd preset formula, the wrong sampling probability for dividing sample is redefined, is realized with certain
Rule improve mistake and divide the sampling probability of sample, be advantageous to extract and divide the sample set of sample to go to correct each forecast model comprising mistake
Prediction weight, be also beneficial to extract mistake divide sample as new test sample, the mistake calculated by the 3rd preset formula
The sampling probability of sample is divided to step up, that is to say, that to be more than general sample by the sampling probability of the sample of mistake point for the first time
This sampling probability, if mistake divides sample as new test sample again by mistake point, sampling probability may proceed to improve, i.e., and the
The sampling probability of the secondary sample by mistake point is more than for the first time by the sampling probability of the sample of mistake point, is trained by multiple samsara,
The prediction weight of the convenient each forecast model of a ratio can be obtained, the accuracy rate of Chinese instruction identification can be effectively improved.
In any of the above-described embodiment, it is preferable that also include:Default unit 214, for based on preset rules, according to
Default corpus, forecast model is built, and preset the prediction weight of each forecast model.
In this embodiment, by based on preset rules, according to default corpus, realizing the structure to forecast model,
Then the prediction weight of each forecast model is preset, is advantageously implemented the training to forecast model, than if any 4 forecast models,
The prediction weight that each forecast model can be preset is 0.25.
Wherein, preset rules are algorithm of support vector machine, random forest tree algorithm, KNN nearest neighbor algorithms, naive Bayesian
Algorithm, every kind of algorithm each independently build forecast model, and can further improve Chinese instruction with reference to these forecast models knows
Other accuracy rate.
Default corpus is the structure of forecast model, and training provides language material, test sample and divides sample including mistake
Sample set all extracts from default corpus, specifically, collects and arranges interrogative sentence, imperative sentence, exclamative sentence, the class of declarative sentence 4
Corpus marks as default corpus, to form forecast model training test set T={ (x1, y1), (x2, y2)…(xn,
yn), wherein, x ∈ χ, and instance space χ ∈ Rn, ynBelong to tag set { 1,2,3,4 }, the set corresponds to interrogative sentence, prayed respectively
Make 4 sentence, exclamative sentence, declarative sentence class marks, related subclass is included per class corpus, wherein, interrogative sentence, which includes, refers in particular to question sentence, choosing
Select question sentence, A-not-A question, whether 4 subclasses of question sentence, imperative sentence (comprising order imperative sentence, ask imperative sentence, forbid imperative sentence,
4 subclasses of imperative sentence are tried to stop, exclamative sentence includes 4 interjection exclamative sentence, noun exclamative sentence, spoken exclamative sentence, adverbial word exclamative sentence
Class, declarative sentence include negative statement declarative sentence, certainly 2 subclasses such as statement declarative sentence.
Embodiment 3
Computer equipment according to an embodiment of the invention, computer equipment include processor, and processor is deposited for execution
The Chinese towards scene of any one proposed such as above-mentioned embodiments of the invention is realized during the computer program stored in reservoir
The step of instruction identification method.
In this embodiment, computer equipment includes processor, and processor is used to perform the computer stored in memory
During program realize as above-mentioned embodiments of the invention propose any one towards scene Chinese instruction identification method the step of,
Therefore the whole of the Chinese instruction identification method towards scene of any one proposed with embodiments of the invention described above is beneficial
Effect, it will not be repeated here.
Embodiment 4
Computer-readable recording medium according to an embodiment of the invention, it is stored thereon with computer program, computer journey
The Chinese instruction identification side towards scene for any one that embodiments of the invention described above propose is realized when sequence is executed by processor
The step of method.
In this embodiment, computer-readable recording medium, is stored thereon with computer program, and computer program is processed
Device perform when realize embodiments of the invention described above propose any one towards scene Chinese instruction identification method the step of,
Therefore the whole of the Chinese instruction identification method towards scene of any one proposed with embodiments of the invention described above is beneficial
Effect, it will not be repeated here.
Embodiment 5
As shown in figure 3, the Chinese instruction identification method according to an embodiment of the invention towards scene, first according to language
Expect storehouse, 4 are built in advance by algorithm of support vector machine, random forest tree algorithm, KNN nearest neighbor algorithms, NB Algorithm
Model is surveyed, and presets weights omega 1 respectively, ω 2, ω 3, ω 4, test sample is then extracted from corpus, reads test sample, obtain
The text-string for taking speech recognition to return, Chinese is carried out to the text using natural language processing technique in text resolution layer and cut
Word, stop words filtering, Custom Dictionaries and text duplicate removal, the text-string number of the test sample after being handled afterwards
Group, then in scene subject layer, judge whether to include the vocabulary in default scene lexicon, if it is decided that be no, i.e., do not include
Vocabulary in default scene lexicon, then export prediction result, the question sentence is unrelated with scene, if it is decided that is yes, that is, includes pre-
If the vocabulary in scene lexicon, then the class for predicting test text respectively by 4 forecast models of structure identifies, then basis
Default weights omega 1, ω 2, ω 3, ω 4 integrate the prediction result of each forecast model, draw the prediction class mark of test text, so
Wrong point is carried out afterwards to judge, if the actual class mark of test text mismatches with prediction class mark, that is, is determined as it being wrong point, then will
Test text is defined as wrong single cent sheet, and corrects the prediction weight of each forecast model, if the actual class mark of test text
Matched with prediction class mark, that is, be determined as it not being wrong point, then export prediction result, that is, predict class mark, that is, actual category
Know, the amendment of the prediction weight of each forecast model divides sample to realize according to mistake, by correcting each forecast model
Weight is predicted, the accuracy rate of Chinese instruction identification can be effectively improved.
Technical scheme is described in detail above in association with accompanying drawing, the present invention proposes a kind of Chinese towards scene
Instruction identification method, device, equipment and storage medium, the wrong sample set and the first preset formula for dividing sample is included by basis,
The prediction weight of each forecast model is corrected, effectively increases the accuracy rate of Chinese instruction identification, and is prejudged by scene, is had
Effect saves hind computation resource, improves the intelligent level of Chinese instruction identification.
Step in the inventive method can be according to being actually needed the adjustment of carry out order, merge and delete.
Unit in apparatus of the present invention can be combined, divided and deleted according to being actually needed.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can
To instruct the hardware of correlation to complete by program, the program can be stored in a computer-readable recording medium, storage
Medium include read-only storage (Read-Only Memory, ROM), random access memory (Random Access Memory,
RAM), programmable read only memory (Programmable Read-only Memory, PROM), erasable programmable is read-only deposits
Reservoir (Erasable Programmable Read Only Memory, EPROM), disposable programmable read-only storage (One-
Time Programmable Read-Only Memory, OTPROM), the electronics formula of erasing can make carbon copies read-only storage
(Electrically-Erasable Programmable Read-Only Memory, EEPROM), read-only optical disc (Compact
Disc Read-Only Memory, CD-ROM) or other disk storages, magnetic disk storage, magnetic tape storage or can
For carrying or any other computer-readable medium of data storage.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should be included in the scope of the protection.
Claims (14)
- A kind of 1. Chinese instruction identification method towards scene, it is characterised in that including:Divide the sample set and the first preset formula of sample according to including mistake, correct the prediction weight of each forecast model,Wherein, the mistake divides sample to identify unmatched test sample with actual class for prediction class mark.
- 2. the Chinese instruction identification method according to claim 1 towards scene, it is characterised in that the basis includes mistake Divide the sample set and the first preset formula of sample, correct the prediction weight of each forecast model, specifically include:According to the sample set for dividing sample including mistake, each forecast model described in cross validation, to determine each prediction The precision of prediction of model;According to first preset formula and the precision of prediction, the prediction weight of amendment each forecast model,Wherein, first preset formula includes:<mrow> <msub> <mi>&omega;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>p</mi> <mi>i</mi> </msub> <mrow> <msubsup> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msub> <mi>p</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow>ωiIt is characterized as the prediction weight of i-th of forecast model, piThe precision of prediction of i-th of forecast model is characterized as,It is characterized as the precision of prediction sum of all forecast models.
- 3. the Chinese instruction identification method according to claim 1 towards scene, it is characterised in that include in the basis Mistake divides the sample set and the first preset formula of sample, before the prediction weight for correcting each forecast model, in addition to:According to the prediction weight and the second preset formula of each forecast model, determine that the prediction class of test sample identifies;If the actual class mark of the test sample mismatches with the prediction class mark, it is determined that the test sample is described Mistake divides sample;The sampling probability that the mistake divides sample is improved, divides the sample set of sample and to extract the mistake that includes to extract described wrong point Sample as new test sample,Wherein, second preset formula includes:Pred=Max (ωi·nj)ωiIt is characterized as the prediction weight of i-th of forecast model, njIt is characterized as what j-th of class mark occurred in all forecast models Number, pred are characterized as Max (ωi·nj) corresponding to class mark, i.e., it is described prediction class mark.
- 4. the Chinese instruction identification method according to claim 3 towards scene, it is characterised in that described in the basis The default weight and the second preset formula of each forecast model, before determining that the prediction class of test sample identifies, in addition to:Determine whether include the vocabulary to match with default scene lexicon in the test sample;If it is determined that not including the vocabulary to match with the default scene lexicon in the test sample, then prompting letter is sent Number, and the determination of the prediction class mark without the test sample;If it is determined that the test sample includes the vocabulary to match with the default scene lexicon, then with the default scene The vocabulary to match in lexicon replaces corresponding vocabulary in the test sample, and carries out the prediction category of the test sample The determination of knowledge.
- 5. the Chinese instruction identification method according to claim 3 towards scene, it is characterised in that described to improve the mistake Divide the sampling probability of sample, specifically include:According to the 3rd preset formula, the sampling probability that the mistake divides sample is redefined,Wherein, the 3rd preset formula includes:<mrow> <msub> <mi>w</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>&Sigma;</mo> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mrow> <mi>k</mi> <mo>&NotEqual;</mo> </mrow> </msub> <msub> <mi>h</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>ykIt is characterized as test sample k actual class mark, h(k)It is characterized as the prediction class mark of the test sample k, Wk+1It is characterized as The mistake redefined divides sample k sampling probability, ∑ (yk≠h(k)) be characterized as wrong point of sample sum.
- 6. the Chinese instruction identification method according to claim 1 towards scene, it is characterised in that include in the basis Mistake divides the sample set and the first preset formula of sample, before the prediction weight for correcting each forecast model, in addition to:Based on preset rules, according to default corpus, the forecast model, and the prediction of default each forecast model are built Weight.
- A kind of 7. Chinese instruction identification device towards scene, it is characterised in that including:Amending unit, for according to the wrong sample set and the first preset formula for dividing sample is included, correcting the pre- of each forecast model Survey weight,Wherein, the mistake divides sample to identify unmatched test sample with actual class for prediction class mark.
- 8. the Chinese instruction identification device according to claim 7 towards scene, it is characterised in that also include:Authentication unit, for according to the sample set for including mistake and dividing sample, each forecast model described in cross validation, with determination The precision of prediction of each forecast model;The amending unit is additionally operable to:According to first preset formula and the precision of prediction, each prediction mould is corrected The prediction weight of type,Wherein, first preset formula includes:<mrow> <msub> <mi>&omega;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>p</mi> <mi>i</mi> </msub> <mrow> <msubsup> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msub> <mi>p</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow>ωiIt is characterized as the prediction weight of i-th of forecast model, piThe precision of prediction of i-th of forecast model is characterized as,It is characterized as the precision of prediction sum of all forecast models.
- 9. the Chinese instruction identification device according to claim 7 towards scene, it is characterised in that also include:Determining unit, for the prediction weight and the second preset formula according to each forecast model, determine test sample Predict class mark;The determining unit is additionally operable to:When the actual class mark of the test sample mismatches with the prediction class mark, really The fixed test sample is that the mistake divides sample;Improve unit, the sampling probability of sample divided for improving the mistake, with extract it is described include mistake divide sample sample set and Divide sample as new test sample to extract the mistake,Wherein, second preset formula includes:Pred=Max (ωi·nj)ωiIt is characterized as the prediction weight of i-th of forecast model, njIt is characterized as what j-th of class mark occurred in all forecast models Number, pred are characterized as Max (ωi·nj) corresponding to class mark, i.e., it is described prediction class mark.
- 10. the Chinese instruction identification device according to claim 9 towards scene, it is characterised in thatThe determining unit is additionally operable to:Determine whether include the word to match with default scene lexicon in the test sample Converge;The Chinese instruction identification device also includes:Tip element, for it is determined that not including the vocabulary to match with the default scene lexicon in the test sample When, send cue, and the determination of the prediction class mark without the test sample;Replacement unit, for when it is determined that the test sample includes the vocabulary to match with the default scene lexicon, Corresponding vocabulary in the test sample is replaced with the vocabulary to match in the default scene lexicon, and carries out the test The determination of the prediction class mark of sample.
- 11. the Chinese instruction identification device according to claim 9 towards scene, it is characterised in thatThe determining unit is additionally operable to:According to the 3rd preset formula, the sampling probability that the mistake divides sample is redefined,Wherein, the 3rd preset formula includes:<mrow> <msub> <mi>w</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>&Sigma;</mo> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mrow> <mi>k</mi> <mo>&NotEqual;</mo> </mrow> </msub> <msub> <mi>h</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>ykIt is characterized as test sample k actual class mark, h(k)It is characterized as the prediction class mark of the test sample k, Wk+1It is characterized as The mistake redefined divides sample k sampling probability, ∑ (yk≠h(k)) be characterized as wrong point of sample sum.
- 12. the Chinese instruction identification device according to claim 7 towards scene, it is characterised in that also include:Default unit, for based on preset rules, according to default corpus, building the forecast model, and preset described each The prediction weight of forecast model.
- 13. a kind of computer equipment, it is characterised in that the computer equipment includes processor, and the processor is used to perform The Chinese instruction towards scene as any one of claim 1 to 6 is realized during the computer program stored in memory The step of recognition methods.
- 14. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program The step of the Chinese instruction identification method towards scene as any one of claim 1 to 6 is realized when being executed by processor Suddenly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710620448.7A CN107507613B (en) | 2017-07-26 | 2017-07-26 | Scene-oriented Chinese instruction identification method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710620448.7A CN107507613B (en) | 2017-07-26 | 2017-07-26 | Scene-oriented Chinese instruction identification method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107507613A true CN107507613A (en) | 2017-12-22 |
CN107507613B CN107507613B (en) | 2021-03-16 |
Family
ID=60689769
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710620448.7A Active CN107507613B (en) | 2017-07-26 | 2017-07-26 | Scene-oriented Chinese instruction identification method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107507613B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110602307A (en) * | 2018-06-12 | 2019-12-20 | 范世汶 | Data processing method, device and equipment |
CN110689135A (en) * | 2019-09-05 | 2020-01-14 | 第四范式(北京)技术有限公司 | Anti-money laundering model training method and device and electronic equipment |
CN111651686A (en) * | 2019-09-24 | 2020-09-11 | 北京嘀嘀无限科技发展有限公司 | Test processing method and device, electronic equipment and storage medium |
CN113096642A (en) * | 2021-03-31 | 2021-07-09 | 南京地平线机器人技术有限公司 | Speech recognition method and device, computer readable storage medium, electronic device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070208494A1 (en) * | 2006-03-03 | 2007-09-06 | Inrix, Inc. | Assessing road traffic flow conditions using data obtained from mobile data sources |
CN104361010A (en) * | 2014-10-11 | 2015-02-18 | 北京中搜网络技术股份有限公司 | Automatic classification method for correcting news classification |
CN104573013A (en) * | 2015-01-09 | 2015-04-29 | 上海大学 | Category weight combined integrated learning classifying method |
CN106548210A (en) * | 2016-10-31 | 2017-03-29 | 腾讯科技(深圳)有限公司 | Machine learning model training method and device |
-
2017
- 2017-07-26 CN CN201710620448.7A patent/CN107507613B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070208494A1 (en) * | 2006-03-03 | 2007-09-06 | Inrix, Inc. | Assessing road traffic flow conditions using data obtained from mobile data sources |
CN104361010A (en) * | 2014-10-11 | 2015-02-18 | 北京中搜网络技术股份有限公司 | Automatic classification method for correcting news classification |
CN104573013A (en) * | 2015-01-09 | 2015-04-29 | 上海大学 | Category weight combined integrated learning classifying method |
CN106548210A (en) * | 2016-10-31 | 2017-03-29 | 腾讯科技(深圳)有限公司 | Machine learning model training method and device |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110602307A (en) * | 2018-06-12 | 2019-12-20 | 范世汶 | Data processing method, device and equipment |
CN110689135A (en) * | 2019-09-05 | 2020-01-14 | 第四范式(北京)技术有限公司 | Anti-money laundering model training method and device and electronic equipment |
CN110689135B (en) * | 2019-09-05 | 2022-10-11 | 第四范式(北京)技术有限公司 | Anti-money laundering model training method and device and electronic equipment |
CN111651686A (en) * | 2019-09-24 | 2020-09-11 | 北京嘀嘀无限科技发展有限公司 | Test processing method and device, electronic equipment and storage medium |
CN113096642A (en) * | 2021-03-31 | 2021-07-09 | 南京地平线机器人技术有限公司 | Speech recognition method and device, computer readable storage medium, electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN107507613B (en) | 2021-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sóskuthy | Evaluating generalised additive mixed modelling strategies for dynamic speech analysis | |
CN109359293B (en) | Mongolian name entity recognition method neural network based and its identifying system | |
WO2019153996A1 (en) | Text error correction method and apparatus for voice recognition | |
CN108304372B (en) | Entity extraction method and device, computer equipment and storage medium | |
CN110543631B (en) | Implementation method and device for machine reading understanding, storage medium and electronic equipment | |
US6188976B1 (en) | Apparatus and method for building domain-specific language models | |
CN107507613A (en) | Towards Chinese instruction identification method, device, equipment and the storage medium of scene | |
CN105654250A (en) | Method and device for automatically assessing satisfaction degree | |
CN110442859B (en) | Labeling corpus generation method, device, equipment and storage medium | |
CN110232923B (en) | Voice control instruction generation method and device and electronic equipment | |
CN102043774A (en) | Machine translation evaluation device and method | |
CN109858042A (en) | A kind of determination method and device of translation quality | |
CN108052504A (en) | Mathematics subjective item answers the structure analysis method and system of result | |
CN115357719B (en) | Power audit text classification method and device based on improved BERT model | |
CN103186658B (en) | Reference grammer for Oral English Exam automatic scoring generates method and apparatus | |
CA3052862A1 (en) | Systems and methods for report processing | |
CN114970560A (en) | Dialog intention recognition method and device, storage medium and intelligent device | |
CN108763211A (en) | The automaticabstracting and system of knowledge are contained in fusion | |
CN110148413B (en) | Voice evaluation method and related device | |
CN112216267A (en) | Rhythm prediction method, device, equipment and storage medium | |
CN111553159A (en) | Question generation method and system | |
CN113705207A (en) | Grammar error recognition method and device | |
CN116860947A (en) | Text reading and understanding oriented selection question generation method, system and storage medium | |
CN113822052A (en) | Text error detection method and device, electronic equipment and storage medium | |
JP2019204415A (en) | Wording generation method, wording device and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 230088 Building No. 198, building No. 198, Mingzhu Avenue, Anhui high tech Zone, Anhui Applicant after: Hefei Hualing Co.,Ltd. Address before: 230601 R & D building, No. 176, Jinxiu Road, Hefei economic and Technological Development Zone, Anhui 501 Applicant before: Hefei Hualing Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |