CN107507613A - Towards Chinese instruction identification method, device, equipment and the storage medium of scene - Google Patents

Towards Chinese instruction identification method, device, equipment and the storage medium of scene Download PDF

Info

Publication number
CN107507613A
CN107507613A CN201710620448.7A CN201710620448A CN107507613A CN 107507613 A CN107507613 A CN 107507613A CN 201710620448 A CN201710620448 A CN 201710620448A CN 107507613 A CN107507613 A CN 107507613A
Authority
CN
China
Prior art keywords
sample
prediction
mrow
forecast model
mistake
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710620448.7A
Other languages
Chinese (zh)
Other versions
CN107507613B (en
Inventor
闫永刚
沈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Midea Intelligent Technologies Co Ltd
Original Assignee
Hefei Midea Intelligent Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Midea Intelligent Technologies Co Ltd filed Critical Hefei Midea Intelligent Technologies Co Ltd
Priority to CN201710620448.7A priority Critical patent/CN107507613B/en
Publication of CN107507613A publication Critical patent/CN107507613A/en
Application granted granted Critical
Publication of CN107507613B publication Critical patent/CN107507613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention provides a kind of Chinese instruction identification method, device, equipment and storage medium towards scene, wherein, towards the Chinese instruction identification method of scene, including:Divide the sample set and the first preset formula of sample according to including mistake, correct the prediction weight of each forecast model, wherein, mistake divides sample to identify unmatched test sample with actual class for prediction class mark.Pass through technical scheme, with dividing the sample set of sample to train the prediction weight of each forecast model of amendment including mistake, the accuracy rate of Chinese instruction identification is effectively increased, and prejudge by scene, hind computation resource effectively is saved, improves the intelligent level of Chinese instruction identification.

Description

Towards Chinese instruction identification method, device, equipment and the storage medium of scene
Technical field
The present invention relates to human-computer intellectualization technical field, knows in particular to a kind of Chinese instruction towards scene Other method, a kind of Chinese instruction identification device, a kind of computer equipment and a kind of computer-readable recording medium towards scene.
Background technology
Modern intelligent Answer System generally comprise speech recognition, text resolution, syntactic analysis, semantic analysis, topic identification, Multiple sport technique segments such as response are parsed, Chinese instruction identification (the mainly interrogative sentence sentence towards scene wherein in syntactic analysis Formula identifies) it act as the portal authentication function of whole intelligent Answer System.
In correlation technique, the Chinese instruction identification towards scene in syntactic analysis mainly passes through interrogative mode of rule Match somebody with somebody, change the generation major class method of syntactic analysis two to realize there is following technological deficiency:
(1) matching of interrogative mode of rule is, it is necessary to very numerous and jumbled and be difficult to exhaustive all query vocabularys, and Chinese is referred to The understanding of order is more superficial, and the accuracy rate of identification is relatively low.
(2) conversion generation syntactic analysis is, it is necessary to pre-establish corresponding dictionary collection and formulate syntactic pattern in advance, it is necessary to mistake More manual interventions, intelligence degree are relatively low.
The content of the invention
It is contemplated that at least solves one of technical problem present in prior art or correlation technique.
Therefore, it is an object of the present invention to provide a kind of Chinese instruction identification method towards scene.
It is another object of the present invention to provide a kind of Chinese instruction identification device towards scene.
It is yet a further object of the present invention to provide a kind of computer equipment.
A further object of the present invention is to provide a kind of computer-readable recording medium.
To achieve these goals, the technical scheme of the first aspect of the present invention provides a kind of Chinese towards scene and referred to Recognition methods is made, including:According to the sample set and the first preset formula for dividing sample including mistake, the prediction of each forecast model is corrected Weight, wherein, mistake divides sample to identify unmatched test sample with actual class for prediction class mark.
In the technical scheme, by according to the wrong sample set and the first preset formula for dividing sample is included, correcting each pre- The prediction weight of model is surveyed, realizes and identifies each prediction of amendment originally of unmatched test specimens with prediction class mark and actual class The prediction weight of model, can effectively train forecast model, improve the accuracy rate of prediction, and then effectively improve Chinese instruction identification Accuracy rate, and test sample prediction class mark with actual class mark mismatch when, mistake will be marked as and divide sample, The wrong probability for dividing sample is improved simultaneously so that mistake divides sample to be preferentially extracted, as the prediction for correcting each forecast model The sample set of weight, also enables mistake to divide sample to be preferentially extracted, and as new test sample, reduces people to a certain extent Work intervention, improves the intelligent level of forecast model training, while also improves the intelligent level of Chinese instruction identification.
In addition, the sample set of sample is divided to be the sample set or a part that all mistakes divide sample including mistake Sample, a part is divided to be to predict the sample set of correct sample for mistake, the quantity of sample set is larger, each to reach amendment The purpose of the prediction weight of forecast model.
In the above-mentioned technical solutions, it is preferable that every according to the sample set and the first preset formula for dividing sample including mistake, amendment The prediction weight of individual forecast model, is specifically included:Divide the sample set of sample according to including mistake, each forecast model of cross validation, To determine the precision of prediction of each forecast model;According to the first preset formula and precision of prediction, the pre- of each forecast model is corrected Weight is surveyed, wherein, the first preset formula includes:
ωiIt is characterized as the prediction weight of i-th of forecast model, piThe precision of prediction of i-th of forecast model is characterized as,It is characterized as the precision of prediction sum of all forecast models.
In the technical scheme, by using the sample set for dividing sample including mistake, each forecast model of cross validation, to determine The precision of prediction of each forecast model, specifically, can use 10 folding cross-validation methods, will include the sample set that mistake divides sample It is divided into 10 parts, 9 parts are used as training data, and 1 part is used as test data, is tested, and experiment every time can all draw corresponding correct Rate, using the average value of the accuracy of 10 results as the precision of prediction to forecast model, it typically can also carry out multiple 10 folding and hand over Fork checking, such as 10 times, then average, to improve the accuracy of the precision of prediction of forecast model determination.
By the first preset formula and precision of prediction, to calculate the prediction weight of each forecast model, with what is corrected The prediction weight of each forecast model, the accuracy of the determination of the prediction weight of each forecast model is improved, is further improved The accuracy rate of Chinese instruction identification.
In any of the above-described technical scheme, it is preferable that according to the default public affairs of sample set and first for dividing sample including mistake Formula, before the prediction weight for correcting each forecast model, in addition to:It is default according to the prediction weight of each forecast model and second Formula, determine the prediction class mark of test sample;If the actual class mark of test sample mismatches with prediction class mark, it is determined that Test sample is that mistake divides sample;The sampling probability that mistake divides sample is improved, including mistake with extraction divides the sample set of sample and to extract Mistake divides sample as new test sample, wherein, the second preset formula includes:
Pred=Max (ωi·nj)
ωiIt is characterized as the prediction weight of i-th of forecast model, njJ-th of class mark is characterized as in all forecast models to go out Existing number, pred are characterized as Max (ωi·nj) corresponding to class mark, that is, predict class mark.
In the technical scheme, by the prediction weight and the second preset formula according to each forecast model, to determine to survey The prediction class mark of sample sheet, and prediction class mark and actual class are identified into unmatched test sample and divide sample labeled as mistake, The test to forecast model is realized, is advantageous to the training to the next step of forecast model, the probability of sample is divided by improving mistake, Enable mistake to divide sample to be preferentially extracted, as the sample set for the prediction weight for correcting each forecast model, also cause wrong point Sample can be preferentially extracted, and as new test sample, reduce manual intervention to a certain extent, improve forecast model instruction Experienced intelligent level, be advantageous to further improve the accuracy rate of Chinese instruction identification.
In any of the above-described technical scheme, it is preferable that default according to the default weight of each forecast model and second Formula, before determining that the prediction class of test sample identifies, in addition to:Determine whether include and default scene vocabulary in test sample The vocabulary that storehouse matches;If it is determined that not including the vocabulary to match with default scene lexicon in test sample, then prompting is sent Signal, and the determination of the prediction class mark without test sample;If it is determined that test sample includes and default scene lexicon The vocabulary to match, then to preset corresponding vocabulary in the vocabulary replacement test sample to match in scene lexicon, and carry out The determination of the prediction class mark of test sample.
In the technical scheme, by it is determined that before the prediction class mark of test sample, determine in test sample whether Including the vocabulary to match with default scene lexicon, the anticipation of scene is realized so that Chinese, which instructs, to be identified towards scene, than Relatively targetedly, the computing resource on backstage can effectively be saved, if it is determined that do not include and default scene vocabulary in test sample The vocabulary that storehouse matches, then send cue, and without the determination that identifies of prediction class of test sample, can will be uncorrelated Test sample filter out, further effectively save backstage computing resource, by it is determined that test sample include with preset During the vocabulary that scene lexicon matches, to preset corresponding word in the vocabulary replacement test sample to match in scene lexicon Converge, and carry out the determination of the prediction class mark of test sample, improve the standardization level into the test sample of forecast model, The prediction class for being advantageous to forecast model output and the sensible matching of actual category identifies, and further increases the standard of Chinese instruction identification Exactness.
For example scene is set to kitchen scene, then in default scene lexicon, it is possible to including following vocabulary:The first kind Conventional food materials (define have chosen 450 kinds of conventional food materials such as apple, celery, potato and its synonymous);Second class often (is defined with recipe It has chosen 10000 kinds of conventional recipes such as the Fish with Chinese Sauerkraut, fish-flavoured shredded pork and its synonymous);3rd class taste flavor is (comprising sour, peppery, light etc. Multiple subclasses and its synonymous);Season in 4th class season (comprising multiple subclasses such as the Dragon Boat Festival, Valentine's Day and its synonymous);5th class Nutritive effect (includes multiple subclasses and its synonymous such as fat-reducing, insomnia, weight reducing);6th class special population (comprising driver, teacher, Multiple subclasses such as examinee and its synonymous);The conditioning of 7th class disease is (comprising multiple subclasses such as hypertension, flu, toothache and its together Justice);8th class beauty treatment weight reducing (comprising multiple subclasses such as whitening, anti-acne, nti-freckle and its synonymous);9th class cuisine vegetable (includes Multiple subclasses such as snack, barbecue, stoke of midnight and its synonymous);Tenth class scene scene (includes more height such as unmarried, afternoon tea, promotion Class and its synonymous).
In any of the above-described technical scheme, it is preferable that improve the sampling probability that mistake divides sample, specifically include:According to Three preset formulas, the wrong sampling probability for dividing sample is redefined, wherein, the 3rd preset formula includes:
ykIt is characterized as test sample k actual class mark, h(k)It is characterized as test sample k prediction class mark, Wk+1Characterize Mistake to redefine divides sample k sampling probability, ∑ (yk≠h(k)) be characterized as wrong point of sample sum.
In the technical scheme, by the 3rd preset formula, the wrong sampling probability for dividing sample is redefined, is realized with one Fixed rule improves the sampling probability that mistake divides sample, is advantageous to extract and divides the sample set of sample to go to correct each prediction mould comprising mistake The prediction weight of type, it is also beneficial to extraction mistake and divides sample to be calculated as new test sample by the 3rd preset formula Mistake divides the sampling probability of sample to step up, that is to say, that is more than for the first time by the sampling probability of the sample of mistake point general The sampling probability of sample, if mistake divides sample as new test sample again by mistake point, sampling probability may proceed to improve, i.e., The sampling probability of second of sample by mistake point is more than for the first time by the sampling probability of the sample of mistake point, is instructed by multiple samsara Practice, the prediction weight of the convenient each forecast model of a ratio can be obtained, the accurate of Chinese instruction identification can be effectively improved Rate.
In any of the above-described technical scheme, it is preferable that according to the default public affairs of sample set and first for dividing sample including mistake Formula, before the prediction weight for correcting each forecast model, in addition to:Based on preset rules, according to default corpus, structure prediction Model, and preset the prediction weight of each forecast model.
In the technical scheme, by based on preset rules, according to default corpus, realizing the structure to forecast model Build, then preset the prediction weight of each forecast model, be advantageously implemented the training to forecast model, than if any 4 prediction moulds Type, the prediction weight that can preset each forecast model are 0.25.
Wherein, preset rules are algorithm of support vector machine, random forest tree algorithm, KNN nearest neighbor algorithms, naive Bayesian Algorithm, every kind of algorithm each independently build forecast model, and can further improve Chinese instruction with reference to these forecast models knows Other accuracy rate.
Default corpus is the structure of forecast model, and training provides language material, test sample and divides sample including mistake Sample set all extracts from default corpus, specifically, collects and arranges interrogative sentence, imperative sentence, exclamative sentence, the class of declarative sentence 4 Corpus marks as default corpus, to form forecast model training test set T={ (x1, y1), (x2, y2)…(xn, yn), wherein, x ∈ χ, and instance space χ ∈ Rn, ynBelong to tag set { 1,2,3,4 }, the set corresponds to interrogative sentence, prayed respectively Make 4 sentence, exclamative sentence, declarative sentence class marks, related subclass is included per class corpus, wherein, interrogative sentence, which includes, refers in particular to question sentence, choosing Select question sentence, A-not-A question, whether 4 subclasses of question sentence, imperative sentence (comprising order imperative sentence, ask imperative sentence, forbid imperative sentence, 4 subclasses of imperative sentence are tried to stop, exclamative sentence includes 4 interjection exclamative sentence, noun exclamative sentence, spoken exclamative sentence, adverbial word exclamative sentence Class, declarative sentence include negative statement declarative sentence, certainly 2 subclasses such as statement declarative sentence.
The technical scheme of second aspect of the present invention provides a kind of Chinese instruction identification device towards scene, including:Repair Positive unit, for according to the wrong sample set and the first preset formula for dividing sample is included, correcting the prediction weight of each forecast model, Wherein, mistake divides sample to identify unmatched test sample with actual class for prediction class mark.
In the technical scheme, by according to the wrong sample set and the first preset formula for dividing sample is included, correcting each pre- The prediction weight of model is surveyed, realizes and identifies each prediction of amendment originally of unmatched test specimens with prediction class mark and actual class The prediction weight of model, can effectively train forecast model, improve the accuracy rate of prediction, and then effectively improve Chinese instruction identification Accuracy rate, and test sample prediction class mark with actual class mark mismatch when, mistake will be marked as and divide sample, The wrong probability for dividing sample is improved simultaneously so that mistake divides sample to be preferentially extracted, as the prediction for correcting each forecast model The sample set of weight, also enables mistake to divide sample to be preferentially extracted, and as new test sample, reduces people to a certain extent Work intervention, improves the intelligent level of forecast model training, while also improves the intelligent level of Chinese instruction identification.
In addition, the sample set of sample is divided to be the sample set or a part that all mistakes divide sample including mistake Sample, a part is divided to be to predict the sample set of correct sample for mistake, the quantity of sample set is larger, each to reach amendment The purpose of the prediction weight of forecast model.
In the above-mentioned technical solutions, it is preferable that also include:Authentication unit, include the wrong sample set for dividing sample for basis, The each forecast model of cross validation, to determine the precision of prediction of each forecast model;Amending unit is additionally operable to:It is default according to first Formula and precision of prediction, the prediction weight of each forecast model is corrected, wherein, the first preset formula includes:
ωiIt is characterized as the prediction weight of i-th of forecast model, piThe precision of prediction of i-th of forecast model is characterized as,It is characterized as the precision of prediction sum of all forecast models.
In the technical scheme, by using the sample set for dividing sample including mistake, each forecast model of cross validation, to determine The precision of prediction of each forecast model, specifically, can use 10 folding cross-validation methods, will include the sample set that mistake divides sample It is divided into 10 parts, 9 parts are used as training data, and 1 part is used as test data, is tested, and experiment every time can all draw corresponding correct Rate, using the average value of the accuracy of 10 results as the precision of prediction to forecast model, it typically can also carry out multiple 10 folding and hand over Fork checking, such as 10 times, then average, to improve the accuracy of the precision of prediction of forecast model determination.
By the first preset formula and precision of prediction, to calculate the prediction weight of each forecast model, with what is corrected The prediction weight of each forecast model, the accuracy of the determination of the prediction weight of each forecast model is improved, is further improved The accuracy rate of Chinese instruction identification.
In any of the above-described technical scheme, it is preferable that also include:Determining unit, for according to each forecast model Weight and the second preset formula are predicted, determines the prediction class mark of test sample;Determining unit is additionally operable to:In the reality of test sample When border class mark mismatches with prediction class mark, determine that test sample divides sample for mistake;Unit is improved, divides sample for improving mistake Sampling probability, with extract include the wrong sample set for dividing sample and using extract it is wrong divide sample as new test sample, wherein, the Two preset formulas include:
Pred=Max (ωi·nj)
ωiIt is characterized as the prediction weight of i-th of forecast model, njJ-th of class mark is characterized as in all forecast models to go out Existing number, pred are characterized as Max (ωi·nj) corresponding to class mark, that is, predict class mark.
In the technical scheme, by the prediction weight and the second preset formula according to each forecast model, to determine to survey The prediction class mark of sample sheet, and prediction class mark and actual class are identified into unmatched test sample and divide sample labeled as mistake, The test to forecast model is realized, is advantageous to the training to the next step of forecast model, the probability of sample is divided by improving mistake, Enable mistake to divide sample to be preferentially extracted, as the sample set for the prediction weight for correcting each forecast model, also cause wrong point Sample can be preferentially extracted, and as new test sample, reduce manual intervention to a certain extent, improve forecast model instruction Experienced intelligent level, be advantageous to further improve the accuracy rate of Chinese instruction identification.
In any of the above-described technical scheme, it is preferable that determining unit is additionally operable to:Determine in test sample whether to include with The vocabulary that default scene lexicon matches;Chinese instruction identification device also includes:Tip element, for it is determined that test sample In when not including the vocabulary to match with default scene lexicon, send cue, and without the prediction class of test sample The determination of mark;Replacement unit, for when it is determined that test sample includes the vocabulary to match with default scene lexicon, with Corresponding vocabulary in the vocabulary replacement test sample to match in default scene lexicon, and carry out the prediction category of test sample The determination of knowledge.
In the technical scheme, by it is determined that before the prediction class mark of test sample, determine in test sample whether Including the vocabulary to match with default scene lexicon, the anticipation of scene is realized so that Chinese, which instructs, to be identified towards scene, than Relatively targetedly, the computing resource on backstage can effectively be saved, if it is determined that do not include and default scene vocabulary in test sample The vocabulary that storehouse matches, then send cue, and without the determination that identifies of prediction class of test sample, can will be uncorrelated Test sample filter out, further effectively save backstage computing resource, by it is determined that test sample include with preset During the vocabulary that scene lexicon matches, to preset corresponding word in the vocabulary replacement test sample to match in scene lexicon Converge, and carry out the determination of the prediction class mark of test sample, improve the standardization level into the test sample of forecast model, The prediction class for being advantageous to forecast model output and the sensible matching of actual category identifies, and further increases the standard of Chinese instruction identification Exactness.
For example scene is set to kitchen scene, then in default scene lexicon, it is possible to including following vocabulary:The first kind Conventional food materials (define have chosen 450 kinds of conventional food materials such as apple, celery, potato and its synonymous);Second class often (is defined with recipe It has chosen 10000 kinds of conventional recipes such as the Fish with Chinese Sauerkraut, fish-flavoured shredded pork and its synonymous);3rd class taste flavor is (comprising sour, peppery, light etc. Multiple subclasses and its synonymous);Season in 4th class season (comprising multiple subclasses such as the Dragon Boat Festival, Valentine's Day and its synonymous);5th class Nutritive effect (includes multiple subclasses and its synonymous such as fat-reducing, insomnia, weight reducing);6th class special population (comprising driver, teacher, Multiple subclasses such as examinee and its synonymous);The conditioning of 7th class disease is (comprising multiple subclasses such as hypertension, flu, toothache and its together Justice);8th class beauty treatment weight reducing (comprising multiple subclasses such as whitening, anti-acne, nti-freckle and its synonymous);9th class cuisine vegetable (includes Multiple subclasses such as snack, barbecue, stoke of midnight and its synonymous);Tenth class scene scene (includes more height such as unmarried, afternoon tea, promotion Class and its synonymous).
In any of the above-described technical scheme, it is preferable that determining unit is additionally operable to:According to the 3rd preset formula, again really Determine the sampling probability that mistake divides sample, wherein, the 3rd preset formula includes:
ykIt is characterized as test sample k actual class mark, h(k)It is characterized as test sample k prediction class mark, Wk+1Characterize Mistake to redefine divides sample k sampling probability, ∑ (yk≠h(k)) be characterized as wrong point of sample sum.
In the technical scheme, by the 3rd preset formula, the wrong sampling probability for dividing sample is redefined, is realized with one Fixed rule improves the sampling probability that mistake divides sample, is advantageous to extract and divides the sample set of sample to go to correct each prediction mould comprising mistake The prediction weight of type, it is also beneficial to extraction mistake and divides sample to be calculated as new test sample by the 3rd preset formula Mistake divides the sampling probability of sample to step up, that is to say, that is more than for the first time by the sampling probability of the sample of mistake point general The sampling probability of sample, if mistake divides sample as new test sample again by mistake point, sampling probability may proceed to improve, i.e., The sampling probability of second of sample by mistake point is more than for the first time by the sampling probability of the sample of mistake point, is instructed by multiple samsara Practice, the prediction weight of the convenient each forecast model of a ratio can be obtained, the accurate of Chinese instruction identification can be effectively improved Rate.
In any of the above-described technical scheme, it is preferable that also include:Default unit, for based on preset rules, according to pre- If corpus, forecast model is built, and preset the prediction weight of each forecast model.
In the technical scheme, by based on preset rules, according to default corpus, realizing the structure to forecast model Build, then preset the prediction weight of each forecast model, be advantageously implemented the training to forecast model, than if any 4 prediction moulds Type, the prediction weight that can preset each forecast model are 0.25.
Wherein, preset rules are algorithm of support vector machine, random forest tree algorithm, KNN nearest neighbor algorithms, naive Bayesian Algorithm, every kind of algorithm each independently build forecast model, and can further improve Chinese instruction with reference to these forecast models knows Other accuracy rate.
Default corpus is the structure of forecast model, and training provides language material, test sample and divides sample including mistake Sample set all extracts from default corpus, specifically, collects and arranges interrogative sentence, imperative sentence, exclamative sentence, the class of declarative sentence 4 Corpus marks as default corpus, to form forecast model training test set T={ (x1, y1), (x2, y2)…(xn, yn), wherein, x ∈ χ, and instance space χ ∈ Rn, ynBelong to tag set { 1,2,3,4 }, the set corresponds to interrogative sentence, prayed respectively Make 4 sentence, exclamative sentence, declarative sentence class marks, related subclass is included per class corpus, wherein, interrogative sentence, which includes, refers in particular to question sentence, choosing Select question sentence, A-not-A question, whether 4 subclasses of question sentence, imperative sentence (comprising order imperative sentence, ask imperative sentence, forbid imperative sentence, 4 subclasses of imperative sentence are tried to stop, exclamative sentence includes 4 interjection exclamative sentence, noun exclamative sentence, spoken exclamative sentence, adverbial word exclamative sentence Class, declarative sentence include negative statement declarative sentence, certainly 2 subclasses such as statement declarative sentence.
The technical scheme of the third aspect of the present invention proposes a kind of computer equipment, and computer equipment includes processor, Processor realizes the technical scheme such as above-mentioned the first aspect of the present invention when being used to perform the computer program stored in memory Any one of proposition towards scene Chinese instruction identification method the step of.
In the technical scheme, computer equipment includes processor, and processor is used to perform the calculating stored in memory The Chinese instruction towards scene of any one proposed such as the technical scheme of above-mentioned the first aspect of the present invention is realized during machine program The step of recognition methods, thus the technical scheme of the first aspect with the invention described above any one that proposes towards scene Whole beneficial effects of Chinese instruction identification method, will not be repeated here.
The technical scheme of the fourth aspect of the present invention proposes a kind of computer-readable recording medium, is stored thereon with calculating Machine program, the face for any one that the technical scheme of the first aspect of the present invention proposes is realized when computer program is executed by processor To scene Chinese instruction identification method the step of.
In the technical scheme, computer-readable recording medium is stored thereon with computer program, and computer program is located Reason device realizes the Chinese instruction identification towards scene for any one that the technical scheme of the first aspect of the present invention proposes when performing The step of method, therefore the Chinese towards scene of any one of the technical scheme proposition of the first aspect with the invention described above Whole beneficial effects of instruction identification method, will not be repeated here.
The additional aspect and advantage of the present invention will provide in following description section, will partly become from the following description Obtain substantially, or recognized by the practice of the present invention.
Brief description of the drawings
The above-mentioned and/or additional aspect and advantage of the present invention will become in the description from combination accompanying drawings below to embodiment Substantially and it is readily appreciated that, wherein:
Fig. 1 shows the exemplary flow of the Chinese instruction identification method according to an embodiment of the invention towards scene Figure;
Fig. 2 shows the exemplary flow of the Chinese instruction identification device according to an embodiment of the invention towards scene Figure;
Fig. 3 shows the signal stream of the Chinese instruction identification method towards scene according to another embodiment of the invention Cheng Tu.
Embodiment
It is below in conjunction with the accompanying drawings and specific real in order to be more clearly understood that the above objects, features and advantages of the present invention Mode is applied the present invention is further described in detail.It should be noted that in the case where not conflicting, the implementation of the application Feature in example and embodiment can be mutually combined.
Many details are elaborated in the following description to facilitate a thorough understanding of the present invention, still, the present invention may be used also To be different from other modes described here using other to implement, therefore, protection scope of the present invention is not by described below Specific embodiment limitation.
Embodiment 1
As shown in figure 1, the Chinese instruction identification method according to an embodiment of the invention towards scene, including:Step S102, divide the sample set and the first preset formula of sample according to including mistake, correct the prediction weight of each forecast model, wherein, Mistake divides sample to identify unmatched test sample with actual class for prediction class mark.
In this embodiment, by according to the wrong sample set and the first preset formula for dividing sample is included, correcting each prediction The prediction weight of model, realize and identify the unmatched test specimens each prediction mould of amendment originally with prediction class mark and actual class The prediction weight of type, can effectively train forecast model, improve the accuracy rate of prediction, and then effectively improve Chinese instruction identification Accuracy rate, and when the prediction class mark of test sample mismatches with actual class mark, mistake will be marked as and divide sample, together Shi Tigao mistakes divide the probability of sample so that mistake divides sample to be preferentially extracted, as the prediction power for correcting each forecast model The sample set of weight, also enables mistake to divide sample to be preferentially extracted, and as new test sample, reduces to a certain extent artificial Intervene, improve the intelligent level of forecast model training, while also improve the intelligent level of Chinese instruction identification.
In addition, the sample set of sample is divided to be the sample set or a part that all mistakes divide sample including mistake Sample, a part is divided to be to predict the sample set of correct sample for mistake, the quantity of sample set is larger, each to reach amendment The purpose of the prediction weight of forecast model.
It is in the above embodiment, it is preferable that each according to the sample set and the first preset formula that divide sample including mistake, amendment The prediction weight of forecast model, is specifically included:Divide the sample set of sample according to including mistake, each forecast model of cross validation, with It is determined that the precision of prediction of each forecast model;According to the first preset formula and precision of prediction, the prediction of each forecast model is corrected Weight, wherein, the first preset formula includes:
ωiIt is characterized as the prediction weight of i-th of forecast model, piThe precision of prediction of i-th of forecast model is characterized as,It is characterized as the precision of prediction sum of all forecast models.
In this embodiment, it is every to determine by using the sample set for dividing sample including mistake, each forecast model of cross validation The precision of prediction of individual forecast model, specifically, 10 folding cross-validation methods can be used, the sample set point that mistake divides sample will be included For 10 parts, 9 parts are used as training data, and 1 part is used as test data, is tested, and experiment every time can all draw corresponding accuracy, Using the average value of the accuracy of 10 results as the precision of prediction to forecast model, it typically can also carry out multiple 10 folding intersection and test Card, such as 10 times, then average, to improve the accuracy of the precision of prediction of forecast model determination.
By the first preset formula and precision of prediction, to calculate the prediction weight of each forecast model, with what is corrected The prediction weight of each forecast model, the accuracy of the determination of the prediction weight of each forecast model is improved, is further improved The accuracy rate of Chinese instruction identification.
In any of the above-described embodiment, it is preferable that include the wrong sample set and the first preset formula for dividing sample in basis, Before the prediction weight for correcting each forecast model, in addition to:According to the prediction weight of each forecast model and the second default public affairs Formula, determine the prediction class mark of test sample;If the actual class mark of test sample mismatches with prediction class mark, it is determined that surveys Sample sheet is that mistake divides sample;The sampling probability that mistake divides sample is improved, including mistake with extraction divides the sample set of sample and to extract mistake Divide sample as new test sample, wherein, the second preset formula includes:
Pred=Max (ωi·nj)
ωiIt is characterized as the prediction weight of i-th of forecast model, njJ-th of class mark is characterized as in all forecast models to go out Existing number, pred are characterized as Max (ωi·nj) corresponding to class mark, that is, predict class mark.
In this embodiment, by the prediction weight and the second preset formula according to each forecast model, to determine to test The prediction class mark of sample, and prediction class mark and actual class are identified into unmatched test sample and divide sample labeled as mistake, it is real The test to forecast model is showed, has been advantageous to the training to the next step of forecast model, the probability of sample is divided by improving mistake, is made Wrong sample must be divided preferentially to be extracted, as the sample set for the prediction weight for correcting each forecast model, also cause mistake to divide sample Originally it can preferentially be extracted, as new test sample, reduce manual intervention to a certain extent, improve forecast model training Intelligent level, be advantageous to further improve Chinese instruction identification accuracy rate.
In any of the above-described embodiment, it is preferable that according to the default weight of each forecast model and the second default public affairs Formula, before determining that the prediction class of test sample identifies, in addition to:Determine whether include and default scene lexicon in test sample The vocabulary to match;If it is determined that not including the vocabulary to match with default scene lexicon in test sample, then prompting letter is sent Number, and the determination of the prediction class mark without test sample;If it is determined that test sample includes and default scene lexicon phase The vocabulary of matching, then to preset corresponding vocabulary in the vocabulary replacement test sample to match in scene lexicon, and surveyed The determination of the prediction class mark of sample sheet.
In this embodiment, by it is determined that before the prediction class mark of test sample, determining whether wrapped in test sample The vocabulary to match with default scene lexicon is included, realizes the anticipation of scene so that Chinese instruction identification is compared towards scene Targetedly, the computing resource on backstage can effectively be saved, if it is determined that do not include and default scene lexicon in test sample The vocabulary to match, then send cue, and without the determination that identifies of prediction class of test sample, can will be incoherent Test sample filters out, and the computing resource on backstage is further effectively saved, by it is determined that test sample includes and default field During the vocabulary that scape lexicon matches, to preset corresponding word in the vocabulary replacement test sample to match in scene lexicon Converge, and carry out the determination of the prediction class mark of test sample, improve the standardization level into the test sample of forecast model, The prediction class for being advantageous to forecast model output and the sensible matching of actual category identifies, and further increases the standard of Chinese instruction identification Exactness.
For example scene is set to kitchen scene, then in default scene lexicon, it is possible to including following vocabulary:The first kind Conventional food materials (define have chosen 450 kinds of conventional food materials such as apple, celery, potato and its synonymous);Second class often (is defined with recipe It has chosen 10000 kinds of conventional recipes such as the Fish with Chinese Sauerkraut, fish-flavoured shredded pork and its synonymous);3rd class taste flavor is (comprising sour, peppery, light etc. Multiple subclasses and its synonymous);Season in 4th class season (comprising multiple subclasses such as the Dragon Boat Festival, Valentine's Day and its synonymous);5th class Nutritive effect (includes multiple subclasses and its synonymous such as fat-reducing, insomnia, weight reducing);6th class special population (comprising driver, teacher, Multiple subclasses such as examinee and its synonymous);The conditioning of 7th class disease is (comprising multiple subclasses such as hypertension, flu, toothache and its together Justice);8th class beauty treatment weight reducing (comprising multiple subclasses such as whitening, anti-acne, nti-freckle and its synonymous);9th class cuisine vegetable (includes Multiple subclasses such as snack, barbecue, stoke of midnight and its synonymous);Tenth class scene scene (includes more height such as unmarried, afternoon tea, promotion Class and its synonymous).
In any of the above-described embodiment, it is preferable that improve the sampling probability that mistake divides sample, specifically include:According to the 3rd Preset formula, the wrong sampling probability for dividing sample is redefined, wherein, the 3rd preset formula includes:
ykIt is characterized as test sample k actual class mark, h(k)It is characterized as test sample k prediction class mark, Wk+1Characterize Mistake to redefine divides sample k sampling probability, ∑ (yk≠h(k)) be characterized as wrong point of sample sum.
In this embodiment, by the 3rd preset formula, the wrong sampling probability for dividing sample is redefined, is realized with certain Rule improve mistake and divide the sampling probability of sample, be advantageous to extract and divide the sample set of sample to go to correct each forecast model comprising mistake Prediction weight, be also beneficial to extract mistake divide sample as new test sample, the mistake calculated by the 3rd preset formula The sampling probability of sample is divided to step up, that is to say, that to be more than general sample by the sampling probability of the sample of mistake point for the first time This sampling probability, if mistake divides sample as new test sample again by mistake point, sampling probability may proceed to improve, i.e., and the The sampling probability of the secondary sample by mistake point is more than for the first time by the sampling probability of the sample of mistake point, is trained by multiple samsara, The prediction weight of the convenient each forecast model of a ratio can be obtained, the accuracy rate of Chinese instruction identification can be effectively improved.
In any of the above-described embodiment, it is preferable that include the wrong sample set and the first preset formula for dividing sample in basis, Before the prediction weight for correcting each forecast model, in addition to:Based on preset rules, according to default corpus, structure prediction mould Type, and preset the prediction weight of each forecast model.
In this embodiment, by based on preset rules, according to default corpus, realizing the structure to forecast model, Then the prediction weight of each forecast model is preset, is advantageously implemented the training to forecast model, than if any 4 forecast models, The prediction weight that each forecast model can be preset is 0.25.
Wherein, preset rules are algorithm of support vector machine, random forest tree algorithm, KNN nearest neighbor algorithms, naive Bayesian Algorithm, every kind of algorithm each independently build forecast model, and can further improve Chinese instruction with reference to these forecast models knows Other accuracy rate.
Default corpus is the structure of forecast model, and training provides language material, test sample and divides sample including mistake Sample set all extracts from default corpus, specifically, collects and arranges interrogative sentence, imperative sentence, exclamative sentence, the class of declarative sentence 4 Corpus marks as default corpus, to form forecast model training test set T={ (x1, y1), (x2, y2)…(xn, yn), wherein, x ∈ χ, and instance space χ ∈ Rn, ynBelong to tag set { 1,2,3,4 }, the set corresponds to interrogative sentence, prayed respectively Make 4 sentence, exclamative sentence, declarative sentence class marks, related subclass is included per class corpus, wherein, interrogative sentence, which includes, refers in particular to question sentence, choosing Select question sentence, A-not-A question, whether 4 subclasses of question sentence, imperative sentence (comprising order imperative sentence, ask imperative sentence, forbid imperative sentence, 4 subclasses of imperative sentence are tried to stop, exclamative sentence includes 4 interjection exclamative sentence, noun exclamative sentence, spoken exclamative sentence, adverbial word exclamative sentence Class, declarative sentence include negative statement declarative sentence, certainly 2 subclasses such as statement declarative sentence.
Embodiment 2
As shown in Fig. 2 the Chinese instruction identification device 200 according to an embodiment of the invention towards scene, including: Amending unit 201, for according to the wrong sample set and the first preset formula for dividing sample is included, correcting the prediction of each forecast model Weight, wherein, mistake divides sample to identify unmatched test sample with actual class for prediction class mark.
In this embodiment, by according to the wrong sample set and the first preset formula for dividing sample is included, correcting each prediction The prediction weight of model, realize and identify the unmatched test specimens each prediction mould of amendment originally with prediction class mark and actual class The prediction weight of type, can effectively train forecast model, improve the accuracy rate of prediction, and then effectively improve Chinese instruction identification Accuracy rate, and when the prediction class mark of test sample mismatches with actual class mark, mistake will be marked as and divide sample, together Shi Tigao mistakes divide the probability of sample so that mistake divides sample to be preferentially extracted, as the prediction power for correcting each forecast model The sample set of weight, also enables mistake to divide sample to be preferentially extracted, and as new test sample, reduces to a certain extent artificial Intervene, improve the intelligent level of forecast model training, while also improve the intelligent level of Chinese instruction identification.
In addition, the sample set of sample is divided to be the sample set or a part that all mistakes divide sample including mistake Sample, a part is divided to be to predict the sample set of correct sample for mistake, the quantity of sample set is larger, each to reach amendment The purpose of the prediction weight of forecast model.
In the above embodiment, it is preferable that also include:Authentication unit 202, for according to the sample for dividing sample including mistake Collection, each forecast model of cross validation, to determine the precision of prediction of each forecast model;
Amending unit 201 is additionally operable to:According to the first preset formula and precision of prediction, the prediction for correcting each forecast model is weighed Weight, wherein, the first preset formula includes:
ωiIt is characterized as the prediction weight of i-th of forecast model, piThe precision of prediction of i-th of forecast model is characterized as,It is characterized as the precision of prediction sum of all forecast models.
In this embodiment, it is every to determine by using the sample set for dividing sample including mistake, each forecast model of cross validation The precision of prediction of individual forecast model, specifically, 10 folding cross-validation methods can be used, the sample set point that mistake divides sample will be included For 10 parts, 9 parts are used as training data, and 1 part is used as test data, is tested, and experiment every time can all draw corresponding accuracy, Using the average value of the accuracy of 10 results as the precision of prediction to forecast model, it typically can also carry out multiple 10 folding intersection and test Card, such as 10 times, then average, to improve the accuracy of the precision of prediction of forecast model determination.
By the first preset formula and precision of prediction, to calculate the prediction weight of each forecast model, with what is corrected The prediction weight of each forecast model, the accuracy of the determination of the prediction weight of each forecast model is improved, is further improved The accuracy rate of Chinese instruction identification.
In any of the above-described embodiment, it is preferable that also include:Determining unit 206, for according to each forecast model Weight and the second preset formula are predicted, determines the prediction class mark of test sample;Determining unit 206 is additionally operable to:In test sample Actual class mark when being mismatched with prediction class mark, determine that test sample divides sample for mistake;Unit 208 is improved, for improving Mistake divides the sampling probability of sample, includes the wrong sample set for dividing sample to extract and wrong divides sample as new test specimens to extract This, wherein, the second preset formula includes:
Pred=Max (ωi·nj)
ωiIt is characterized as the prediction weight of i-th of forecast model, njJ-th of class mark is characterized as in all forecast models to go out Existing number, pred are characterized as Max (ωi·nj) corresponding to class mark, that is, predict class mark.
In this embodiment, by the prediction weight and the second preset formula according to each forecast model, to determine to test The prediction class mark of sample, and prediction class mark and actual class are identified into unmatched test sample and divide sample labeled as mistake, it is real The test to forecast model is showed, has been advantageous to the training to the next step of forecast model, the probability of sample is divided by improving mistake, is made Wrong sample must be divided preferentially to be extracted, as the sample set for the prediction weight for correcting each forecast model, also cause mistake to divide sample Originally it can preferentially be extracted, as new test sample, reduce manual intervention to a certain extent, improve forecast model training Intelligent level, be advantageous to further improve Chinese instruction identification accuracy rate.
In any of the above-described embodiment, it is preferable that determining unit 206 is additionally operable to:Determine whether include in test sample The vocabulary to match with default scene lexicon;Chinese instruction identification device also includes:Tip element 210, for it is determined that surveying When not including the vocabulary to match with default scene lexicon in sample sheet, cue is sent, and without test sample Predict the determination of class mark;Replacement unit 212, for it is determined that test sample includes what is matched with default scene lexicon During vocabulary, to preset corresponding vocabulary in the vocabulary replacement test sample to match in scene lexicon, and test sample is carried out Prediction class mark determination.
In this embodiment, by it is determined that before the prediction class mark of test sample, determining whether wrapped in test sample The vocabulary to match with default scene lexicon is included, realizes the anticipation of scene so that Chinese instruction identification is compared towards scene Targetedly, the computing resource on backstage can effectively be saved, if it is determined that do not include and default scene lexicon in test sample The vocabulary to match, then send cue, and without the determination that identifies of prediction class of test sample, can will be incoherent Test sample filters out, and the computing resource on backstage is further effectively saved, by it is determined that test sample includes and default field During the vocabulary that scape lexicon matches, to preset corresponding word in the vocabulary replacement test sample to match in scene lexicon Converge, and carry out the determination of the prediction class mark of test sample, improve the standardization level into the test sample of forecast model, The prediction class for being advantageous to forecast model output and the sensible matching of actual category identifies, and further increases the standard of Chinese instruction identification Exactness.
For example scene is set to kitchen scene, then in default scene lexicon, it is possible to including following vocabulary:The first kind Conventional food materials (define have chosen 450 kinds of conventional food materials such as apple, celery, potato and its synonymous);Second class often (is defined with recipe It has chosen 10000 kinds of conventional recipes such as the Fish with Chinese Sauerkraut, fish-flavoured shredded pork and its synonymous);3rd class taste flavor is (comprising sour, peppery, light etc. Multiple subclasses and its synonymous);Season in 4th class season (comprising multiple subclasses such as the Dragon Boat Festival, Valentine's Day and its synonymous);5th class Nutritive effect (includes multiple subclasses and its synonymous such as fat-reducing, insomnia, weight reducing);6th class special population (comprising driver, teacher, Multiple subclasses such as examinee and its synonymous);The conditioning of 7th class disease is (comprising multiple subclasses such as hypertension, flu, toothache and its together Justice);8th class beauty treatment weight reducing (comprising multiple subclasses such as whitening, anti-acne, nti-freckle and its synonymous);9th class cuisine vegetable (includes Multiple subclasses such as snack, barbecue, stoke of midnight and its synonymous);Tenth class scene scene (includes more height such as unmarried, afternoon tea, promotion Class and its synonymous).
In any of the above-described embodiment, it is preferable that determining unit 206 is additionally operable to:According to the 3rd preset formula, again really Determine the sampling probability that mistake divides sample, wherein, the 3rd preset formula includes:
ykIt is characterized as test sample k actual class mark, h(k)It is characterized as test sample k prediction class mark, Wk+1Characterize Mistake to redefine divides sample k sampling probability, ∑ (yk≠h(k)) be characterized as wrong point of sample sum.
In this embodiment, by the 3rd preset formula, the wrong sampling probability for dividing sample is redefined, is realized with certain Rule improve mistake and divide the sampling probability of sample, be advantageous to extract and divide the sample set of sample to go to correct each forecast model comprising mistake Prediction weight, be also beneficial to extract mistake divide sample as new test sample, the mistake calculated by the 3rd preset formula The sampling probability of sample is divided to step up, that is to say, that to be more than general sample by the sampling probability of the sample of mistake point for the first time This sampling probability, if mistake divides sample as new test sample again by mistake point, sampling probability may proceed to improve, i.e., and the The sampling probability of the secondary sample by mistake point is more than for the first time by the sampling probability of the sample of mistake point, is trained by multiple samsara, The prediction weight of the convenient each forecast model of a ratio can be obtained, the accuracy rate of Chinese instruction identification can be effectively improved.
In any of the above-described embodiment, it is preferable that also include:Default unit 214, for based on preset rules, according to Default corpus, forecast model is built, and preset the prediction weight of each forecast model.
In this embodiment, by based on preset rules, according to default corpus, realizing the structure to forecast model, Then the prediction weight of each forecast model is preset, is advantageously implemented the training to forecast model, than if any 4 forecast models, The prediction weight that each forecast model can be preset is 0.25.
Wherein, preset rules are algorithm of support vector machine, random forest tree algorithm, KNN nearest neighbor algorithms, naive Bayesian Algorithm, every kind of algorithm each independently build forecast model, and can further improve Chinese instruction with reference to these forecast models knows Other accuracy rate.
Default corpus is the structure of forecast model, and training provides language material, test sample and divides sample including mistake Sample set all extracts from default corpus, specifically, collects and arranges interrogative sentence, imperative sentence, exclamative sentence, the class of declarative sentence 4 Corpus marks as default corpus, to form forecast model training test set T={ (x1, y1), (x2, y2)…(xn, yn), wherein, x ∈ χ, and instance space χ ∈ Rn, ynBelong to tag set { 1,2,3,4 }, the set corresponds to interrogative sentence, prayed respectively Make 4 sentence, exclamative sentence, declarative sentence class marks, related subclass is included per class corpus, wherein, interrogative sentence, which includes, refers in particular to question sentence, choosing Select question sentence, A-not-A question, whether 4 subclasses of question sentence, imperative sentence (comprising order imperative sentence, ask imperative sentence, forbid imperative sentence, 4 subclasses of imperative sentence are tried to stop, exclamative sentence includes 4 interjection exclamative sentence, noun exclamative sentence, spoken exclamative sentence, adverbial word exclamative sentence Class, declarative sentence include negative statement declarative sentence, certainly 2 subclasses such as statement declarative sentence.
Embodiment 3
Computer equipment according to an embodiment of the invention, computer equipment include processor, and processor is deposited for execution The Chinese towards scene of any one proposed such as above-mentioned embodiments of the invention is realized during the computer program stored in reservoir The step of instruction identification method.
In this embodiment, computer equipment includes processor, and processor is used to perform the computer stored in memory During program realize as above-mentioned embodiments of the invention propose any one towards scene Chinese instruction identification method the step of, Therefore the whole of the Chinese instruction identification method towards scene of any one proposed with embodiments of the invention described above is beneficial Effect, it will not be repeated here.
Embodiment 4
Computer-readable recording medium according to an embodiment of the invention, it is stored thereon with computer program, computer journey The Chinese instruction identification side towards scene for any one that embodiments of the invention described above propose is realized when sequence is executed by processor The step of method.
In this embodiment, computer-readable recording medium, is stored thereon with computer program, and computer program is processed Device perform when realize embodiments of the invention described above propose any one towards scene Chinese instruction identification method the step of, Therefore the whole of the Chinese instruction identification method towards scene of any one proposed with embodiments of the invention described above is beneficial Effect, it will not be repeated here.
Embodiment 5
As shown in figure 3, the Chinese instruction identification method according to an embodiment of the invention towards scene, first according to language Expect storehouse, 4 are built in advance by algorithm of support vector machine, random forest tree algorithm, KNN nearest neighbor algorithms, NB Algorithm Model is surveyed, and presets weights omega 1 respectively, ω 2, ω 3, ω 4, test sample is then extracted from corpus, reads test sample, obtain The text-string for taking speech recognition to return, Chinese is carried out to the text using natural language processing technique in text resolution layer and cut Word, stop words filtering, Custom Dictionaries and text duplicate removal, the text-string number of the test sample after being handled afterwards Group, then in scene subject layer, judge whether to include the vocabulary in default scene lexicon, if it is decided that be no, i.e., do not include Vocabulary in default scene lexicon, then export prediction result, the question sentence is unrelated with scene, if it is decided that is yes, that is, includes pre- If the vocabulary in scene lexicon, then the class for predicting test text respectively by 4 forecast models of structure identifies, then basis Default weights omega 1, ω 2, ω 3, ω 4 integrate the prediction result of each forecast model, draw the prediction class mark of test text, so Wrong point is carried out afterwards to judge, if the actual class mark of test text mismatches with prediction class mark, that is, is determined as it being wrong point, then will Test text is defined as wrong single cent sheet, and corrects the prediction weight of each forecast model, if the actual class mark of test text Matched with prediction class mark, that is, be determined as it not being wrong point, then export prediction result, that is, predict class mark, that is, actual category Know, the amendment of the prediction weight of each forecast model divides sample to realize according to mistake, by correcting each forecast model Weight is predicted, the accuracy rate of Chinese instruction identification can be effectively improved.
Technical scheme is described in detail above in association with accompanying drawing, the present invention proposes a kind of Chinese towards scene Instruction identification method, device, equipment and storage medium, the wrong sample set and the first preset formula for dividing sample is included by basis, The prediction weight of each forecast model is corrected, effectively increases the accuracy rate of Chinese instruction identification, and is prejudged by scene, is had Effect saves hind computation resource, improves the intelligent level of Chinese instruction identification.
Step in the inventive method can be according to being actually needed the adjustment of carry out order, merge and delete.
Unit in apparatus of the present invention can be combined, divided and deleted according to being actually needed.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can To instruct the hardware of correlation to complete by program, the program can be stored in a computer-readable recording medium, storage Medium include read-only storage (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), programmable read only memory (Programmable Read-only Memory, PROM), erasable programmable is read-only deposits Reservoir (Erasable Programmable Read Only Memory, EPROM), disposable programmable read-only storage (One- Time Programmable Read-Only Memory, OTPROM), the electronics formula of erasing can make carbon copies read-only storage (Electrically-Erasable Programmable Read-Only Memory, EEPROM), read-only optical disc (Compact Disc Read-Only Memory, CD-ROM) or other disk storages, magnetic disk storage, magnetic tape storage or can For carrying or any other computer-readable medium of data storage.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (14)

  1. A kind of 1. Chinese instruction identification method towards scene, it is characterised in that including:
    Divide the sample set and the first preset formula of sample according to including mistake, correct the prediction weight of each forecast model,
    Wherein, the mistake divides sample to identify unmatched test sample with actual class for prediction class mark.
  2. 2. the Chinese instruction identification method according to claim 1 towards scene, it is characterised in that the basis includes mistake Divide the sample set and the first preset formula of sample, correct the prediction weight of each forecast model, specifically include:
    According to the sample set for dividing sample including mistake, each forecast model described in cross validation, to determine each prediction The precision of prediction of model;
    According to first preset formula and the precision of prediction, the prediction weight of amendment each forecast model,
    Wherein, first preset formula includes:
    <mrow> <msub> <mi>&amp;omega;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>p</mi> <mi>i</mi> </msub> <mrow> <msubsup> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msub> <mi>p</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow>
    ωiIt is characterized as the prediction weight of i-th of forecast model, piThe precision of prediction of i-th of forecast model is characterized as,It is characterized as the precision of prediction sum of all forecast models.
  3. 3. the Chinese instruction identification method according to claim 1 towards scene, it is characterised in that include in the basis Mistake divides the sample set and the first preset formula of sample, before the prediction weight for correcting each forecast model, in addition to:
    According to the prediction weight and the second preset formula of each forecast model, determine that the prediction class of test sample identifies;
    If the actual class mark of the test sample mismatches with the prediction class mark, it is determined that the test sample is described Mistake divides sample;
    The sampling probability that the mistake divides sample is improved, divides the sample set of sample and to extract the mistake that includes to extract described wrong point Sample as new test sample,
    Wherein, second preset formula includes:
    Pred=Max (ωi·nj)
    ωiIt is characterized as the prediction weight of i-th of forecast model, njIt is characterized as what j-th of class mark occurred in all forecast models Number, pred are characterized as Max (ωi·nj) corresponding to class mark, i.e., it is described prediction class mark.
  4. 4. the Chinese instruction identification method according to claim 3 towards scene, it is characterised in that described in the basis The default weight and the second preset formula of each forecast model, before determining that the prediction class of test sample identifies, in addition to:
    Determine whether include the vocabulary to match with default scene lexicon in the test sample;
    If it is determined that not including the vocabulary to match with the default scene lexicon in the test sample, then prompting letter is sent Number, and the determination of the prediction class mark without the test sample;
    If it is determined that the test sample includes the vocabulary to match with the default scene lexicon, then with the default scene The vocabulary to match in lexicon replaces corresponding vocabulary in the test sample, and carries out the prediction category of the test sample The determination of knowledge.
  5. 5. the Chinese instruction identification method according to claim 3 towards scene, it is characterised in that described to improve the mistake Divide the sampling probability of sample, specifically include:
    According to the 3rd preset formula, the sampling probability that the mistake divides sample is redefined,
    Wherein, the 3rd preset formula includes:
    <mrow> <msub> <mi>w</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>&amp;Sigma;</mo> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mrow> <mi>k</mi> <mo>&amp;NotEqual;</mo> </mrow> </msub> <msub> <mi>h</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
    ykIt is characterized as test sample k actual class mark, h(k)It is characterized as the prediction class mark of the test sample k, Wk+1It is characterized as The mistake redefined divides sample k sampling probability, ∑ (yk≠h(k)) be characterized as wrong point of sample sum.
  6. 6. the Chinese instruction identification method according to claim 1 towards scene, it is characterised in that include in the basis Mistake divides the sample set and the first preset formula of sample, before the prediction weight for correcting each forecast model, in addition to:
    Based on preset rules, according to default corpus, the forecast model, and the prediction of default each forecast model are built Weight.
  7. A kind of 7. Chinese instruction identification device towards scene, it is characterised in that including:
    Amending unit, for according to the wrong sample set and the first preset formula for dividing sample is included, correcting the pre- of each forecast model Survey weight,
    Wherein, the mistake divides sample to identify unmatched test sample with actual class for prediction class mark.
  8. 8. the Chinese instruction identification device according to claim 7 towards scene, it is characterised in that also include:
    Authentication unit, for according to the sample set for including mistake and dividing sample, each forecast model described in cross validation, with determination The precision of prediction of each forecast model;
    The amending unit is additionally operable to:According to first preset formula and the precision of prediction, each prediction mould is corrected The prediction weight of type,
    Wherein, first preset formula includes:
    <mrow> <msub> <mi>&amp;omega;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>p</mi> <mi>i</mi> </msub> <mrow> <msubsup> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msub> <mi>p</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow>
    ωiIt is characterized as the prediction weight of i-th of forecast model, piThe precision of prediction of i-th of forecast model is characterized as,It is characterized as the precision of prediction sum of all forecast models.
  9. 9. the Chinese instruction identification device according to claim 7 towards scene, it is characterised in that also include:
    Determining unit, for the prediction weight and the second preset formula according to each forecast model, determine test sample Predict class mark;
    The determining unit is additionally operable to:When the actual class mark of the test sample mismatches with the prediction class mark, really The fixed test sample is that the mistake divides sample;
    Improve unit, the sampling probability of sample divided for improving the mistake, with extract it is described include mistake divide sample sample set and Divide sample as new test sample to extract the mistake,
    Wherein, second preset formula includes:
    Pred=Max (ωi·nj)
    ωiIt is characterized as the prediction weight of i-th of forecast model, njIt is characterized as what j-th of class mark occurred in all forecast models Number, pred are characterized as Max (ωi·nj) corresponding to class mark, i.e., it is described prediction class mark.
  10. 10. the Chinese instruction identification device according to claim 9 towards scene, it is characterised in that
    The determining unit is additionally operable to:Determine whether include the word to match with default scene lexicon in the test sample Converge;
    The Chinese instruction identification device also includes:
    Tip element, for it is determined that not including the vocabulary to match with the default scene lexicon in the test sample When, send cue, and the determination of the prediction class mark without the test sample;
    Replacement unit, for when it is determined that the test sample includes the vocabulary to match with the default scene lexicon, Corresponding vocabulary in the test sample is replaced with the vocabulary to match in the default scene lexicon, and carries out the test The determination of the prediction class mark of sample.
  11. 11. the Chinese instruction identification device according to claim 9 towards scene, it is characterised in that
    The determining unit is additionally operable to:According to the 3rd preset formula, the sampling probability that the mistake divides sample is redefined,
    Wherein, the 3rd preset formula includes:
    <mrow> <msub> <mi>w</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>&amp;Sigma;</mo> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mrow> <mi>k</mi> <mo>&amp;NotEqual;</mo> </mrow> </msub> <msub> <mi>h</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
    ykIt is characterized as test sample k actual class mark, h(k)It is characterized as the prediction class mark of the test sample k, Wk+1It is characterized as The mistake redefined divides sample k sampling probability, ∑ (yk≠h(k)) be characterized as wrong point of sample sum.
  12. 12. the Chinese instruction identification device according to claim 7 towards scene, it is characterised in that also include:
    Default unit, for based on preset rules, according to default corpus, building the forecast model, and preset described each The prediction weight of forecast model.
  13. 13. a kind of computer equipment, it is characterised in that the computer equipment includes processor, and the processor is used to perform The Chinese instruction towards scene as any one of claim 1 to 6 is realized during the computer program stored in memory The step of recognition methods.
  14. 14. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program The step of the Chinese instruction identification method towards scene as any one of claim 1 to 6 is realized when being executed by processor Suddenly.
CN201710620448.7A 2017-07-26 2017-07-26 Scene-oriented Chinese instruction identification method, device, equipment and storage medium Active CN107507613B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710620448.7A CN107507613B (en) 2017-07-26 2017-07-26 Scene-oriented Chinese instruction identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710620448.7A CN107507613B (en) 2017-07-26 2017-07-26 Scene-oriented Chinese instruction identification method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN107507613A true CN107507613A (en) 2017-12-22
CN107507613B CN107507613B (en) 2021-03-16

Family

ID=60689769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710620448.7A Active CN107507613B (en) 2017-07-26 2017-07-26 Scene-oriented Chinese instruction identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN107507613B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110602307A (en) * 2018-06-12 2019-12-20 范世汶 Data processing method, device and equipment
CN110689135A (en) * 2019-09-05 2020-01-14 第四范式(北京)技术有限公司 Anti-money laundering model training method and device and electronic equipment
CN111651686A (en) * 2019-09-24 2020-09-11 北京嘀嘀无限科技发展有限公司 Test processing method and device, electronic equipment and storage medium
CN113096642A (en) * 2021-03-31 2021-07-09 南京地平线机器人技术有限公司 Speech recognition method and device, computer readable storage medium, electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070208494A1 (en) * 2006-03-03 2007-09-06 Inrix, Inc. Assessing road traffic flow conditions using data obtained from mobile data sources
CN104361010A (en) * 2014-10-11 2015-02-18 北京中搜网络技术股份有限公司 Automatic classification method for correcting news classification
CN104573013A (en) * 2015-01-09 2015-04-29 上海大学 Category weight combined integrated learning classifying method
CN106548210A (en) * 2016-10-31 2017-03-29 腾讯科技(深圳)有限公司 Machine learning model training method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070208494A1 (en) * 2006-03-03 2007-09-06 Inrix, Inc. Assessing road traffic flow conditions using data obtained from mobile data sources
CN104361010A (en) * 2014-10-11 2015-02-18 北京中搜网络技术股份有限公司 Automatic classification method for correcting news classification
CN104573013A (en) * 2015-01-09 2015-04-29 上海大学 Category weight combined integrated learning classifying method
CN106548210A (en) * 2016-10-31 2017-03-29 腾讯科技(深圳)有限公司 Machine learning model training method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110602307A (en) * 2018-06-12 2019-12-20 范世汶 Data processing method, device and equipment
CN110689135A (en) * 2019-09-05 2020-01-14 第四范式(北京)技术有限公司 Anti-money laundering model training method and device and electronic equipment
CN110689135B (en) * 2019-09-05 2022-10-11 第四范式(北京)技术有限公司 Anti-money laundering model training method and device and electronic equipment
CN111651686A (en) * 2019-09-24 2020-09-11 北京嘀嘀无限科技发展有限公司 Test processing method and device, electronic equipment and storage medium
CN113096642A (en) * 2021-03-31 2021-07-09 南京地平线机器人技术有限公司 Speech recognition method and device, computer readable storage medium, electronic device

Also Published As

Publication number Publication date
CN107507613B (en) 2021-03-16

Similar Documents

Publication Publication Date Title
Sóskuthy Evaluating generalised additive mixed modelling strategies for dynamic speech analysis
CN109359293B (en) Mongolian name entity recognition method neural network based and its identifying system
WO2019153996A1 (en) Text error correction method and apparatus for voice recognition
CN108304372B (en) Entity extraction method and device, computer equipment and storage medium
CN110543631B (en) Implementation method and device for machine reading understanding, storage medium and electronic equipment
US6188976B1 (en) Apparatus and method for building domain-specific language models
CN107507613A (en) Towards Chinese instruction identification method, device, equipment and the storage medium of scene
CN105654250A (en) Method and device for automatically assessing satisfaction degree
CN110442859B (en) Labeling corpus generation method, device, equipment and storage medium
CN110232923B (en) Voice control instruction generation method and device and electronic equipment
CN102043774A (en) Machine translation evaluation device and method
CN109858042A (en) A kind of determination method and device of translation quality
CN108052504A (en) Mathematics subjective item answers the structure analysis method and system of result
CN115357719B (en) Power audit text classification method and device based on improved BERT model
CN103186658B (en) Reference grammer for Oral English Exam automatic scoring generates method and apparatus
CA3052862A1 (en) Systems and methods for report processing
CN114970560A (en) Dialog intention recognition method and device, storage medium and intelligent device
CN108763211A (en) The automaticabstracting and system of knowledge are contained in fusion
CN110148413B (en) Voice evaluation method and related device
CN112216267A (en) Rhythm prediction method, device, equipment and storage medium
CN111553159A (en) Question generation method and system
CN113705207A (en) Grammar error recognition method and device
CN116860947A (en) Text reading and understanding oriented selection question generation method, system and storage medium
CN113822052A (en) Text error detection method and device, electronic equipment and storage medium
JP2019204415A (en) Wording generation method, wording device and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 230088 Building No. 198, building No. 198, Mingzhu Avenue, Anhui high tech Zone, Anhui

Applicant after: Hefei Hualing Co.,Ltd.

Address before: 230601 R & D building, No. 176, Jinxiu Road, Hefei economic and Technological Development Zone, Anhui 501

Applicant before: Hefei Hualing Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant