CN109325236A

CN109325236A - The method of service robot Auditory Perception kinsfolk's diet information

Info

Publication number: CN109325236A
Application number: CN201811217808.XA
Authority: CN
Inventors: 杨观赐; 苏志东; 李杨; 陈占杰
Original assignee: Guizhou University
Current assignee: Guizhou University
Priority date: 2018-10-18
Filing date: 2018-10-18
Publication date: 2019-02-12
Anticipated expiration: 2038-10-18
Also published as: CN109325236B

Abstract

The invention discloses a kind of methods of service robot Auditory Perception kinsfolk's diet information, comprising the following steps: (1) obtains speaker's voice data by service robot, and judge system operating mode；(2) when system is in mode for the moment, utilize Application on Voiceprint Recognition and speech recognition technology obtain current speech data speaker's identity information and corresponding text data；(3) when system is in mode two, divided using voice and be split voice data according to speaker's identity transfer point, and connected adjacent same identity sound bite using Application on Voiceprint Recognition；Obtain the speaker's identity information of different phonetic segment and corresponding text data after connecting；(4) Chinese word segmentation, part-of-speech tagging and interdependent Parsing algorithm are utilized, the diet information of different people is extracted, and diet information is stored according to identity information.The present invention can independently obtain the diet information of different user, and establish diet information archives according to the identity information of different user.

Description

The method of service robot Auditory Perception kinsfolk's diet information

Technical field

The present invention relates to service robot fields, and in particular to a kind of service robot Auditory Perception kinsfolk's diet letter The method of breath.

Background technique

With the technological progress in the fields such as machinery, information, material, control, medicine, home-services robot technology was just in the past The speed not having advances.According to international robot combined data that can be 2016, before 2019, about forty-two million is a Home-services robot will enter the family life of people.It is more and more only with getting worse for the problem of an aging population It occupies old man to need to look after, protects home-services robot that is old, the helping the elderly deep attention by domestic and foreign scholars.Simultaneously as people are strong The raising of Kang Yishi, healthy diet become people's increasingly concern.Scientific and reasonable diet not only contributes to keep body Health also has very important effect to the treatment of disease.Currently, existing home-services robot is in chat conversations, family The service for occupying control, intelligent interaction and health INDEX MANAGEMENT etc. has biggish promotion, although its application, intelligence Property improved, but the service of healthy diet management aspect needs further strengthen.Although have by regular question and answer come The service robot of Geriatric's state is detected, health account can be established for the elderly, but have ignored diet to human body The significant impact of health fails automatically to establish diet information archives for the elderly, lack with intelligent management individual diet number According to service robot.

Summary of the invention

A kind of service robot Auditory Perception kinsfolk drink is proposed it is an object of the invention to overcome disadvantages mentioned above The method for eating information.This method is based on service robot auditory system, according to user's diet problem proposed to service robot Answer and user between independent communication, realize the autonomous diet information for obtaining different user, and according to the body of different user Part information establishes diet information archives, intelligent management individual dietary data.

A kind of method of service robot Auditory Perception kinsfolk's diet information of the invention, comprising the following steps:

(1) speaker's voice data is obtained by service robot, and judges system operating mode；System is set there are two types of work Operation mode: operating mode one is that user actively expresses diet to service robot to record diet information；Mode two is clothes Business robot independently obtains the diet information of user according to user's every-day language；

(2) when system is in mode for the moment, saying for current speech data is obtained using Application on Voiceprint Recognition and speech recognition technology Talk about people's identity information and corresponding text data；

(3) when system is in mode two, detection window is slided on voice data first with bayesian information criterion Speaker's change point is detected, voice segmentation is carried out to voice data, recycles Application on Voiceprint Recognition by adjacent same identity voice data Connection carries out speech recognition to the result of segmentation, obtains the identity and its table of different speakers in more people's dialogic voice data Up to the text data of content；

(4) it is directed to above-mentioned steps identity information obtained and text data, after being pre-processed, building one is customized Diet dictionary；

(5) according to two kinds of operating modes above-mentioned, diet Text Information Extraction is divided into single diet and is answered and more day for human beings Normal diet talks with two classes, establishes diet information establishment decimation rule respectively；

Single diet is answered, is not related to the reference to contextual information, the diet of single diet expression is established for it Information extraction rules；More people's diets are talked with, according to the language habits, according in dialogue the former whether inquire and currently speak Whether the diet of people and current speaker are directly to give expression to diet, establish more people and talk with the diet directly expressed Information extraction rules and more people talk with the diet information decimation rule of indirect expression；

(6) Chinese word segmentation, part-of-speech tagging and interdependent Parsing algorithm are utilized, by universaling dictionary and customized diet dictionary After merging, Chinese word segmentation, part-of-speech tagging and interdependent syntactic analysis are carried out to free text, is expressed and is practised according to Chinese language It is used, using part-of-speech information, context vocabulary part of speech and location information, linguistic unit between syntactic structure, for needing to take out The diet information establishment decimation rule taken, extracts the diet information of different people, and store diet letter according to identity information Breath.

The method of above-mentioned service robot Auditory Perception kinsfolk's diet information, in which: in the step (2), when being System for the moment, ties up single orders, second order MFCC difference using static 26 Jan Vermeer frequency cepstral coefficient (MFCC) features and 13 in mode Feature of the feature as voice data, with LBG algorithm, according to the Euclidean distance of the 52 of speaker's voice dimension MFCC features to code book Calculate the identity information of speaker；Divide speech recognition SDK to convert voice data into text data using University of Science and Technology's news simultaneously, obtains To the identity information and text data of the single expression.

The method of above-mentioned service robot Auditory Perception kinsfolk's diet information, in which: in the step (3), be based on Bayesian information criterion speaker's change point detection algorithm carries out voice segmentation as follows:

1) detection window windows=[ws, we] is initialized；

2) judge whether there is speaker's change point wi in window windows using BIC detection algorithm；

3) if there is change point wi, then window initial position is moved at change point, windows=[ws+wi, we+ Wi], keep window size constant；If there is no change point, then by the long increase w of window, only change window terminal position, Windows=[ws, we+w]；

4) it repeats 2), 3), until window traverses entire voice signal sample.

The method of above-mentioned service robot Auditory Perception kinsfolk's diet information, in which: in the step (4), building Customized diet dictionary: referring initially to the food-classifying of annex E in " National Standard of the People's Republic of China GB2760-2014 " System collects food name common in daily life under different classifications according to the classification standard, constructs vocabulary；Secondly by Web crawlers collects diet related web site food title, expands diet vocabulary.

The method of above-mentioned service robot Auditory Perception kinsfolk's diet information, in which: in the step (5), one The diet information decimation rule (rule 1) of diet expression: determining whether diet movement really occurs first, then to the drink of generation The corresponding diet noun of food movement is extracted, and the diet information of user is obtained.

The diet information decimation rule (rule 2) that more people's dialogues are directly expressed: if first in the answer of current speaker A linguistic unit is noun, then directly extracts noun backward since the noun, until next verb terminates, is opened from end place Begin that rule 1 is recycled to be extracted；Otherwise it is directly extracted using rule 1.

More people talk with the diet information decimation rule (rule 3) of indirect expression: the meaning of indirect expression is current speaker It does not give expression to diet information directly, according to the content of other speakers expression above, carries out all or part of reference to express The diet content of speaker recycles rule 1 further to be extracted.

The present invention has apparent beneficial effect compared with the prior art, as it can be seen from the above scheme, comprehensively utilize vocal print Identification, voice segmentation and speech recognition technology carry out voice signal to the diet voice data that service robot is collected into and locate in advance Reason, gets the content of text of each speaker's phonetic representation, and designs and say expression content for one and more human speech Decimation rule, to text information carry out after Chinese word segmentation, part-of-speech tagging and interdependent syntactic analysis processing the characteristic that obtains into Row extracts diet information, completes and obtains automatically from primary voice data to the diet information of final different identity information.

Below by way of specific embodiment, beneficial effects of the present invention are further illustrated.

Detailed description of the invention

Fig. 1 is voice data processing method of the invention；

Fig. 2 is that diet information of the invention extracts frame；

Fig. 3 is diet Text Information Extraction process flow of the invention.

Specific embodiment

Below in conjunction with attached drawing and preferred embodiment, to a kind of service robot Auditory Perception family proposed according to the present invention Method specific embodiment, feature and its effect of member's diet information, detailed description is as follows.

Referring to figs. 1 to Fig. 3, a kind of method of service robot Auditory Perception kinsfolk's diet information of the invention, packet Include following steps:

When system is in mode for the moment, using static 26 Jan Vermeer frequency cepstral coefficient (MFCC) features and 13 dimension single orders, Feature of the second order MFCC Differential Characteristics as voice data ties up MFCC features to code according to the 52 of speaker's voice with LBG algorithm This Euclidean distance calculates the identity information of speaker；Speech recognition SDK is divided to convert voice data using University of Science and Technology's news simultaneously For text data, the identity information and text data of the single expression are obtained.

It is described to be based on bayesian information criterion speaker change point detection algorithm, voice segmentation is carried out as follows:

1) detection window windows=[ws, we] is initialized；

4) it repeats 2), 3), until window traverses entire voice signal sample.

(4) it is directed to above-mentioned steps identity information obtained and text data, after being pre-processed, building one is customized Diet dictionary；Referring initially to the food-classifying system of annex E in " National Standard of the People's Republic of China GB2760-2014 ", According to the classification standard, food name common in daily life under different classifications is collected, constructs vocabulary；It is climbed secondly by network Worm collects diet related web site food title, expands diet vocabulary.

The diet information decimation rule (rule 1) of single diet expression: determining whether diet movement really occurs first, so The corresponding diet noun of the diet movement to generation extracts afterwards, obtains the diet information of user；

The diet information decimation rule (rule 2) that more people's dialogues are directly expressed: if first in the answer of current speaker A linguistic unit is noun, then directly extracts noun backward since the noun, until next verb terminates, is opened from end place Begin that rule 1 is recycled to be extracted；Otherwise it is directly extracted using rule 1；

Embodiment is as follows:

Method of the invention consists of two parts, first with voice segmentation, Application on Voiceprint Recognition and speech recognition technology to language Sound data are pre-processed, and the text data of different speaker's expression contents is obtained.It then is input with above-mentioned pre-processed results, Using rule-based diet Text Information Extraction method, the diet information of different speakers is obtained.It is listened based on service robot There are two types of operating modes for the diet information understanding method of feel: mode one is that user actively expresses diet to service robot Diet information is recorded, mode two is the diet information that service robot independently obtains user according to user's every-day language.Following base In both modes, it is described in terms of the pretreatment of voice data and diet Text Information Extraction two.

The preprocess method of 1 voice data

Voice data is pre-processed, the identity information and text data of user are obtained, is used for subsequent diet text Information extraction.As shown in Figure 1, user actively expresses diet to service robot, what auditory system obtained for mode one Only include single user in voice data, is not related to more people's dialogues, therefore the mode obtains the identity of user using Application on Voiceprint Recognition Information obtains the text data that user answers using speech recognition.For mode two, service robot automatically records the day of user Normal dialogue audio data includes multiple users in the audio data, therefore the mode will first with voice segmentation and Application on Voiceprint Recognition Audio data carries out cutting according to speaker's identity, the text information of data after recycling speech recognition to obtain cutting.Make herein Speech recognition is carried out with the online speech recognition SDK of Iflytek.

1.1 more people's dialogic voice dividing methods based on bayesian information criterion

For the voice data of the dialogue of people more than one section, in order to identify the content of different people expression, it is necessary first to by the language Sound data carry out voice segmentation according to different speakers.Here using based on bayesian information criterion (BIC, Bayesian Information Criterion), using become the method for window long change point detection speaker's change point to more people's dialogic voices into Row segmentation.

Assuming that X={ x₁,x₂,...,x_NIt is voice signal sample, meet multivariate Gaussian distribution, following hypothesis inspection is done to X It tests:

H0:X~N (μ, ∑)

H1:X₁~N (μ₁,∑₁), X₂~N (μ₂,∑₂), X1={ x₁,x₂,...x_i, X2={ x_i+1,x_i+2,...,x_N}

Wherein ∑, ∑₁, ∑₂With μ, μ₁, μ₂The covariance matrix and mean value of respectively X, X1, X2, N are sample number.H0 table Showing speech samples only includes a speaker, and H1 indicates that there are speaker's change points in sample.Enable N₁And N₂For sample X₁And X₂'s Number, the then maximum likelihood ratio of H0 and H1 is defined as:

R (i)=Nln | Σ |-N₁ln|Σ₁|-N₂ln|Σ₂| (1)

It is therefore assumed that examining the difference of the BIC value of H0 and H1 are as follows:

Δ BIC=-R (i)+λ p (2)

Wherein p=1/2 (d+1/2d (d+1)) lnN, d is the dimension of sample space, and λ is penalty factor.If BIC < 0 △, Then show that H1 assumes to set up, i.e., change in i moment speaker, otherwise H0 assumes to set up, i.e., does not change in i moment speaker Become.

According to above-mentioned BIC speaker's change point detection algorithm, voice segmentation is carried out as follows:

1) detection window windows=[ws, we] is initialized；

4) (2), (3) are repeated, until window traverses entire voice signal sample.

The 1.2 vocal print methods based on MFCC feature and Vector Quantization algorithm

Using the vocal print method based on MFCC feature and Vector Quantization algorithm, user belonging to one section of voice data is identified Identity information, the storage for the Text Information Extraction and diet information of sound bite splicing, more people dialogue provide foundation.It adopts herein It is used as phonetic feature with mel-frequency cepstrum coefficient (Mel-frequency cestrum coefficient, MFCC), due to people The not linear proportional relation of the frequency of volume up-down and sound that ear is heard, the sense of hearing of the Mel dimensions in frequency based on people to sound Characteristic is the Application on Voiceprint Recognition feature vector of current mainstream.Mel frequency and the relationship of actual frequency f can be indicated with formula (3):

Mel (f)=2595lg (1+f/700) (3)

The MFCC parameter of standard only reflects the static characteristic of speech parameter, and the dynamic characteristic of voice can use static nature Difference Spectrum describe, dynamic and static feature is combined the recognition performance for the system of can effectively improve.It is used herein by single order Characteristic parameter of the MFCC parameter combination as sound of difference and second differnce with standard.Static nature is selected as 26 dimensions, single order Difference, second differnce feature are selected as 13 dimensions, totally 52 dimension.

Common model has vector quantization model, gauss hybrid models (gaussian mixture in Application on Voiceprint Recognition Model, GMM), Hidden Markov Model (hidden markov model, HMM) etc., wherein vector quantization model discrimination It is higher, and arithmetic speed is fast, is ideal identification model.When training, with LBG algorithm, by the training sample of speaker's voice This Sequence clustering generates code book.In view of the randomness of sampled speech, initial codebook is determined with random choice method.When identification, from Feature vector sequence Q1, Q2 ..., QT are extracted in voice to be identified, and vector quantity then is carried out to it with the code book established in system Change, judges that the distribution of the vector sequence and which code book is closest with the average quantization distortion degree that formula (4) defines.

Wherein Yij indicates j-th of numeral of i-th of speaker codebook, and T is the length of feature vector sequence, i.e., to be identified The sum of the included frame of voice.In formula, d (Ot, Yij) is estimated using Euclidean distance.Final recognition result is exactly Di minimum institute Corresponding i-th of speaker.

2 rule-based diet information abstracting methods

User is for the daily exchange expression between the answer and multiple users of service robot diet problem Mode is various, while can also adulterate some and incoherent information of diet information.The method of information extraction is generally divided into system Two kinds of meter and rule can ignore the characteristic of some language although statistical method can obtain statistical law, and answer for user Non-structured text information field is limited, regular feature, extract using rule-based method can obtain more Good extraction effect.Therefore rule-based Text Information Extraction method is used herein, and the result of language data process is carried out It extracts, obtains the diet information of individual.

In order to make the data fit real life scene being collected into, there is higher robustness in actual use, this is System is based on the service robot platform built, and according to both of which above-mentioned, puts question to diet by robot respectively, uses The mode that person answers, and the autonomous mode for obtaining multi-user's every-day language, are collected the answer information of different users, are made with this To carry out an abstract foundation to decimation rule.As shown in Fig. 2, diet information abstracting method is pre-processed and is drunk by text data Eat Text Information Extraction two parts composition.

The pretreatment of 2.1 text datas

There is specialized vocabularies such as a large amount of water fruits and vegetables, finished product dish, milk, meats in the diet text of user's expression, It relies solely on the universaling dictionary in Words partition system to be difficult to accurately distinguish diet text, the inaccuracy of participle will lead to word again Property mark inaccuracy, in turn result in the inaccuracy of data pick-up.Therefore need to construct a customized diet dictionary.

Collect building diet dictionary (Dietetic Lexicon) from the following aspect herein.Referring initially to " the Chinese people are total With state standard GB/T 2760-2014 " in the food-classifying system of annex E collect under different classifications according to the classification standard Common food name in daily life constructs vocabulary.Secondly by web crawlers, collects and take out website, cuisines website, food The diet related web site food title such as forum expands diet vocabulary.

Based on Harbin Institute of Technology's LTP Chinese word segmentation, part-of-speech tagging and interdependent Parsing algorithm, by universaling dictionary and customized drink After food dictionary merges, between linguistic unit in free text progress Chinese word segmentation, part-of-speech tagging and parsing sentence Syntactic structure.863 part-of-speech tagging collection are used in LTP, common part-of-speech tagging is as shown in table 1, and interdependent syntactic relation is as shown in table 2.

1 part-of-speech tagging of table

The interdependent syntactic relation of table 2

2.2 diet Text Information Extraction modules

Using Harbin Institute of Technology's LTP Chinese word segmentation, part-of-speech tagging and the interdependent Parsing algorithm of addition dictionary, to free text It carries out Chinese word segmentation, part-of-speech tagging and interdependent syntactic analysis and utilizes part-of-speech information, context according to Chinese language communicative habits Syntactic structure between the part of speech and location information, linguistic unit of vocabulary extracts rule for the diet information establishment for needing to extract Then, diet information is extracted.

Fig. 3 is the process flow of diet Text Information Extraction module.According to two kinds of operating modes above-mentioned: user's interrogation responder Regularly diet situation inquiry and robot autonomous obtain talk with the diet of user to device people, can be by diet Text Information Extraction It is divided into single diet to answer and more people's diets two classes of dialogue.Single diet is expressed, is not related to contextual information Reference establishes rule 1 for it.More people's diets are talked with, according to the language habits, AC mode can be divided into two types Type: one kind is that speaker directly carries out dietary behavior expression, is not related to the drink to a upper speaker or next speaker Food expression reference influences.Another kind be speaker the information expressed by the former is quoted or to next speaker into Row is putd question to, to influence the expression behavior of next speaker.Therefore according in dialogue the former whether inquire the drink of current speaker Whether food situation and current speaker are directly to give expression to diet, establish rule 2 and rule 3." whether directly expressing " For judging whether current speaker quotes the diet content of the former expression, directly expression means the unreferenced drink that the former expresses Eat content.Robot inquires that the single diet of diet situation is answered and the direct expression under the former query status in more people dialogue There is a model identical, therefore common rules 1.Described below regular 1, regular 2 and regular 3.

(1) the diet information decimation rule of single diet expression: rule 1

This system is run on home-services robot platform, when user actively expresses diet situation in face of robot, Robot can be assigned as an object poured out, the content of expression may include the other information except diet information, such as Topic etc. except " having bought apple in supermarket ", " liking eating apple ", " wanting very much to eat apple now " or diet, therefore this rule Then determine whether diet movement really occurs first, then the corresponding diet noun of the diet movement to generation extracts, Obtain the diet information of user.

Enable sentence={ word₁,word₂,...,word_i,...,word_NIndicate in the diet expression of single user Hold, wherein word_iFor the linguistic unit after participle, N is the number of linguistic unit in diet expression content.Remember V_eat= { " eating ", " drinking ", " enjoying ", " trial test ", " tasting " } is diet motion characteristic word, when the word in V_eat occurs in sentence, the sentence Son is likely to occur diet movement, needs to analyze the word occurred before and after V_eat at this time, further judges that diet movement is No generation.Remember V_eat_negative={ " thinking ", " liking ", " love ", " liking ", " not having ", " not having ", " no " } for diet movement Negative word when occurring V_eat_negative before V_eat, can be determined that diet movement is not sent out according to language communicative habits It is raw.Note V_eat_enhance=" ", " " it is that word is reinforced in diet movement, occur V_eat_enhance's after V_eat When, it can reinforce judging that diet movement has occurred.Using first-order predicate logic, formula (5) are converted by above-mentioned analysis, into The movement judgement of row diet.

Wherein (A a) indicates to judge whether set A includes element a, former_word (word Constains_i) indicate word_iPrevious word, next_word (word_i) indicate word_iThe latter word, Real_Eat (word_i) indicate judgement word_iFor the generation word of diet movement.

To Real_Eat (word in first-order predicate reasoning_i) it is genuine word_iWith and subsequent entire sentence carry out interdependent sentence Method analysis, and decimation rule is formulated according to syntactic analysis result.The general frame of the rule are as follows: as next_word (word_i) be When " ", with word_start=word_iWord is originated to extract, as next_word (word_i) be " and " when, with word_iStructure At the word word of SBV (subject-predicate relationship)_jWord is originated to extract, at this time word_start=word_j.It extracts and word_start structure At the noun word of VOB (dynamic guest's relationship)_k, with word_kConstitute the noun word of COO (coordination)_mAs diet information.Together When extract with word_start constitute COO (coordination) verb word_n, will be with word_nConstitute the name of VOB (dynamic guest's relationship) Word word_kn, with word_knConstitute the noun word of COO (coordination)_mnAs diet information.By above-mentioned analysis first-order predicate The following formula of logical expressions.

Wherein formula (6), (7) are found out and word_iDirectly related diet information, formula (8), (9) are found out and word_iFor The diet information of coordination word.Relation_VOB, Relation_COO are used to judge dynamic guest's relationship between two words and simultaneously Whether column relationship, Eat_Food grammatical term for the character are diet information.

Based on said frame, frame is supplemented to improve the comprehensive of rule here by two aspects.First A aspect, to word_iCoordination word judged, exclude with diet to act unrelated verb arranged side by side.Remember that V_COO is and drink Relevant coordination word is eaten, word is worked as_iCoordination word not in V_COO when, then this jumps out extraction, carries out next time The selection of diet action word and the extraction of diet information.Due to the diversity of diet, may include and culinary art in diet noun The relevant verb of method, such as " tomato scrambled eggs ", " green pepper stir-fry meat " etc., as " stir-fry " and word_iWhen arranged side by side, it should not be jumped It crosses.Remember that V_cuisine is culinary art verb, V_cuisine={ " stir-fry ", " quick-fried ", " steaming ", " steamed ", " boiling ", " fried ", " oil It is fried ", " stewed ", " pan-fried ", " fry " }, V_cuisine is added in V_COO.As " also " and word_iWhen arranged side by side, show a kind of benefit Explanation is filled, therefore " also " will be added in V_COO.V_eat is apparent dietary behavior, therefore is also added into V_COO.To sum up, V_COO=V_cuisine+ " also "+V_eat.The second aspect, it is contemplated that also will appear the dynamic of culinary art classification in food noun Word, the appearance for cooking verb influence whether the accuracy of rule extraction, handle here the appearance of culinary art verb.When word_nWhen appearing in V_cuisine, by word_nThe noun that front and back is directly connected to is denoted as diet information.

Above-mentioned analysis is converted into following process flow:

Wherein ' pos (word_x) ' indicate word_xPart of speech, Food_eat be single speaker diet information.

(2) the diet inquiry judgement and directly expression judgement in more people's dialogues

According to diet Text Information Extraction process flow diagram, which extracts rule using in judgement in more human diets dialogue Before then, need to judge whether previous speaker puts question to, and whether current speaker directly answers.

When previous speaker directly puts question to " what you have eaten? " or " you eat what? " a kind of problem, can be direct Judge that diet is putd question to.Or after previous speaker expresses oneself diet content, directly ask in reply " you? ", at this time also by it It is judged as that diet is putd question to.The judgment rule is expressed with following formula.

Wherein Regex (A, B) indicates to carry out canonical matching by rule B to A.

When the people in face of previous speaker puts question to, current speaker does not give expression to its diet information directly, but logical It crosses certain indirect mode to express, is then judged as indirect expression, such as " I has also eaten that you say for answer.", " pork braised in brown sauce Taste is pretty good, I also eats " etc..Judge whether to answer indirectly with following formula.

Wherein Undirect (model1 (current_sentence)) indicates current_sentence in formula (12) Meet the diet expression pattern of the model1 mode of " (I also eats) .* (that | this) (a | a little) " type, formula (13), (14), (15) similar therewith.When formula (12), (13), (14), (15) any one be true when, currently speak it is artificial answer indirectly, it is no It is then directly answer.

(3) more people talk with the diet information decimation rule directly expressed: rule 2

When previous speaker inquire current speaker's diet situation " what you have eaten? " when, current speaker's is direct It answers, can be directly to say diet noun " pork braised in brown sauce, tomato scrambled eggs ", or the side similar with single diet expression way Formula is answered " I has eaten the pork braised in brown sauce that mother does, the watermelon for also going to supermarket to buy ".Therefore note V_jump=V_cuisine+ " also " directly extracts since the noun backward if first linguistic unit is noun in the answer of current speaker Noun skips V_jum, and until next verb terminates, recycling rule 1 is extracted since end place.Otherwise directly sharp It is extracted with rule 1.Above-mentioned analysis is converted into following process flow:

Wherein ' 1 (sub_sentencet (speaker of rule_i)) ' indicate that the sentence that will jump out after for is recycled utilizes rule Then 1 extracted.Food_eat is the diet information of each speaker finally extracted.

(4) more people talk with the diet information decimation rule of indirect expression: rule 3

The meaning of indirect expression does not give expression to diet information directly for current speaker, but according to other speakers above The content of expression carries out all or part of reference to express the diet content of oneself.For " I has also eaten those " or " red Canton style roast pork taste is pretty good, I has also eaten this " etc. model1, model2 type expression pattern, if diet action word before do not deposit Any diet noun expressed by previous individual, then " those ", " this " are indicated to previous the complete of human diet's content of speaking Portion's reference, and if it exists, then quote the diet information there are part.Model3 and model1, model2 the difference is that Model3 is there is no referring to pronoun, if therefore being not present before diet action word under model3 mode and appointing expressed by previous individual What diet noun, then ignore.The explicit expression of model4 type this kind of " as I eats with you " is all quoted The diet content of previous speaker can be assigned entirely to current speaker by situation.It simultaneously will be after four kinds of expression patterns Diet expression content is further extracted using rule 1.According to above-mentioned analysis, diet expression extraction process is connect to more human world and is done Following expression:

According to above-mentioned regular 1, rule 2, rule 3 and to diet inquiry and the direct judgement answered indirectly, by voice number The result of Data preprocess, which carries out processing according to diet Text Information Extraction process flow diagram, can be obtained the diet information of user.

3 diet informations understand service robot experiment porch

The portions such as the service robot experiment porch built, including Intel NUC host, data acquisition equipment and machinery mount It is grouped as.For showing that the display screen of data and image is 7.9 cun of 4 display screens of IPad mini；Auditory system is interrogated using University of Science and Technology Fly six wheat annular microphone array strakes, has the function of auditory localization, echo cancellor, noise filtering etc., to realize to audio The acquisition of signal；Vision system is using Microsoft Kinect V2 depth camera, for acquiring RGB color image； Service robot host is Intel NUC mini host, configures i7-6770HQ processor and Intel IRIS Pro video card；It moves Dynamic chassis is EAI DashGO B1.Service robot host uses Ubuntu16.04 operating system, and is mounted in systems Kinect version ROS (Robot OPeration System) system, TensorFlow CPU version deep learning frame and OpenCV3.3.0.The computational load that service robot is reduced using 5810 work station of DELL TOWER is mainly used for model instruction The analysis of experienced and data.Work station uses Ubuntu16.04 operating system, and is mounted with Kinect version ROS in systems (Robot OPeration System) system, TensorFlow GPU version deep learning frame and OpenCV3.3.0.Service Robot and work station all have wireless communication module, for communicating end to end.

3.1 data sets and experimental design

3.3.1 data set is made

(1) voice data training Application on Voiceprint Recognition model is collected

In view of more people's dialogue modes more than three people are compared with three people's dialogue modes, in addition to dialogue participant increases it Outside, upper and indistinction is obtained in diet information, therefore herein using the phonetic representation of 3 testers as test object.Utilize house Six wheat annular microphone array strakes of front yard service robot collect the voice data training Application on Voiceprint Recognition model of 3 testers. Every tester's 10 voice data of collection, every data duration 10 seconds or so.

(2) diet dialogic voice test data is collected

The voice data and corresponding different speakers that test data set is expressed by single diet, two people and three human diets talk with Diet label composition, which is recorded by above-mentioned 3 testers.Two distinct types of test number is collected herein According to collection.

1) diet dialogic voice test data self-built under experimental situation

According to diet information rule, establish simple expression pattern, have redundancy expression pattern, indirect expression and up and down The single expression text of literary associative expression mode and more people express text language expression, and the effect of rule is established in test.Herein One, two people and three human diets for separately designing simple expression pattern express 60 groups, 20 groups and 20 groups of text data.

According to the text conversation data of collection, tested speech number is recorded by 3 testers of registration Application on Voiceprint Recognition model According to.Remember that 3 testers are A, B and C, the mode of 3 tester's recorded speech data is as shown in table 3.Wherein " A " is indicated Tester A records single diet and expresses voice data, and " AB " indicates that tester A, B record two human diet's dialogic voice numbers According to, and so on.

3 tested speech data recording scheme of table

The voice data of above-mentioned acquisition is manually marked, the corresponding text data of different speaker's identities and drink are obtained Eat information labels.

2) the diet dialogic voice test data that user freely expresses under simulation of real scenes

Herein in the case where not informing expression way and display rule, respective diet table is collected from 60 experimenters The text data reached, collection mode are as shown in table 4.Everyone collects 3 single diet expression from 20 experimenters therein, From remaining 40 experimenters, each of every 20 people collects one group of two human diet's dialog text, three human diets respectively Dialog text.According to the text conversation data of collection, by 3 mode recorded speech data of table.

The collection mode of 4 diet of table expression text data

In view of the data acquired in real time are not easy to be compared and test, therefore hereinafter, used data are in advance The data of acquisition, the mechanism of simulation microphone array real-time working transfers data to system when test.

3.3.2 experimental design

For test macro performance, following 2 experiments are designed.

The diet information for testing 1. different type tone testing data obtains test.Test data a and b be respectively it is self-built and The voice data that the corpus of collection is recorded.This experiment is to test macro to preset Expression of language and true field herein The diet information of expression way obtains performance under scape.

Experiment 2. directly carries out diet information to corpus of text and obtains test.Test data c and d are self-built respectively and collect Corpus of text.This experiment is to same 1 comparison of experiment, and test macro is to the performance of diet Text Information Extraction and system Voice pretreatment energy.

3.2 experiments and interpretation of result

In one section of voice, if corresponding some food letter of speaker's identity information and the speaker in recognition result Breath, in label identity and corresponding some food information of the identity match simultaneously when, then determine correctly to have identified some Food information.Note A be the food number marked, and B is the food number identified, and C is the food number correctly identified, to being The performance of system is evaluated with F value index.

On the service robot platform built, involved algorithm is disposed, and surveyed for above-mentioned 2 experiments Examination.The F value of system identification is as shown in table 5.It observes known to this data:

5 system of table is directed to the F value of different test datas

1) by the test data a of experiment 1 it is found that the self-control voice data for this paper special designing, the F- of single expression Score is 0.8960, and the F-score of two people and three people dialogue is respectively 0.7023 and 0.7328, the F-score ratio of single expression Two people and three people dialogue are higher by 0.1937 and 0.1632, and effect is much better than more people's dialogues, the F-score of two people and three people dialogue it Difference is 0.0305, and recognition effect gap is smaller.There is similar test result for the test data b under real scene: single table The result reached will be much better than more people's expression, and the gap between two people and three people's expression effects is little.Contrast test data a's and b It is 0.021 that Average F-score, which can be seen that the two difference, shows that two kinds of test data gaps in performance are little. It is 0.0078 that the F-score of Synthetic Measuring Data a and b, which can be seen that the gap that two people and three people express between performance, and one Expression is higher by 0.1693 and 0.1771 than two people and three people respectively, and it is relatively strong that this shows that system has single diet expression voice messaging Extractability, when there is multiple speakers in voice data, the performance of system identification can decline, and the variation of number can be to system Performance produces bigger effect.

2) by the data of experiment 2 it is found that the performance that system obtains direct text diet information are as follows: one > two people > Three people.It is 0.0038 that the Average F-score of test data c and d are only poor, this show herein self-built text dietary data with The gap between text dietary data collected under true environment is simultaneously little.In comprehensive F-score, the difference of two people and three people It is 0.016, and the single F-score than two people and three people is higher by 0.0554 and 0.0660 respectively, much smaller than the difference in experiment 1 Value, this show directly using this system to diet text carry out information extraction when, one recognition result be better than two people with Three people, in more people dialogue, the growth for talking with number can't generate large effect to recognition result.

Comparative test 1 and the experimental result of experiment 2 it is found that in experiment 2 one, synthesis F-score points of two people and three people It is not higher by 0.0410,0.1549 and 0.1491 than experiment 1, due to carrying out to original audio data it can be seen from the data The processing of Application on Voiceprint Recognition and speech recognition these two aspects has dropped the performance of single voice data identification than single text data 0.0410.The voice data of two people and three people dialogue voice segmentation and cluster link more than the single language data process, so that The performance sharp fall of two people and three people's voice data, the performance than text data have dropped 0.1549 and 0.1491.Reason There are following three kinds:

One, is in voice dialogue, and the time span of different speaker's expression is different in size, voice after causing voice to be divided Length is different, it may appear that the voice data of 1 ± 0.5s time span, and Voiceprint Recognition System is enough to the identification needs of identity Data, Short Time Speech can seriously affect the performance of Voiceprint Recognition System, the language for generating voice segmentation herein by Application on Voiceprint Recognition Tablet section is connected according to the identity of different people, and the sound bite identity information of mistake will affect the connection of sound bite, into And voice dialogue is made to generate entanglement.

Two, when voice segmentation fails to detect speaker's identity transfer point, will appear in a sound bite two or Multiple speakers will generate speaker's language expression no matter which speaker's identity the sound bite is identified as The missing of redundancy and another speaker expression.

The imperfect or redundancy of phonetic representation, the caused different identity voice dialogue of voice segmentation that three, are grown in short-term, will drop The understanding that linguistic model distich is anticipated in low speech recognition modeling, to reduce the accuracy rate of speech recognition.Two people and three people's The language expression of current speaker is utilized in diet information understanding method and the speaker's identity information of context makes inferences, when When there is not corresponding speaker's identity, expression redundancy and missing, diet information identification mistake will lead to.

In short, the diet information based on the service robot sense of hearing independently obtains, can be drunk with perfect service robot in health The function of eating management aspect, starts with from diet and is managed to people's health.Merged herein Application on Voiceprint Recognition, voice segmentation and Speech recognition technology handles voice data, while devising the extraction rule for the diet dialog text of different numbers Then, the autonomous acquisition methods of voice diet information based on service robot auditory system are formd.

The above described is only a preferred embodiment of the present invention, being not intended to limit the present invention in any form, appoint What is to the above embodiments according to the technical essence of the invention any simply to repair without departing from technical solution of the present invention content Change, equivalent variations and modification, all of which are still within the scope of the technical scheme of the invention.

Claims

1. a kind of method of service robot Auditory Perception kinsfolk's diet information, comprising the following steps:

(1) speaker's voice data is obtained by service robot, and judges system operating mode；System is set there are two types of Working mould Formula: operating mode one is that user actively expresses diet to service robot to record diet information；Mode two is server Device people independently obtains the diet information of user according to user's every-day language；

(2) when system is in mode for the moment, Application on Voiceprint Recognition and speech recognition technology is utilized to obtain the speaker of current speech data Identity information and corresponding text data；

(3) when system is in mode two, detection window detection is slided on voice data first with bayesian information criterion Speaker's change point carries out voice segmentation to voice data, Application on Voiceprint Recognition is recycled to connect adjacent same identity voice data, Speech recognition is carried out to the result of segmentation, obtains the identity and its expression content of different speakers in more people's dialogic voice data Text data；

(4) it is directed to above-mentioned steps identity information obtained and text data, after being pre-processed, constructs a customized drink Eat dictionary；

(5) according to two kinds of operating modes above-mentioned, diet Text Information Extraction is divided into single diet answer and more day for human beings often drink Food two classes of dialogue establish diet information establishment decimation rule respectively；

Single diet is answered, is not related to the reference to contextual information, the diet information of single diet expression is established for it Decimation rule；For more people's diets talk with, according to the language habits, according in dialogue the former whether inquire current speaker's Whether diet and current speaker are directly to give expression to diet, establish more people and talk with the diet information directly expressed Decimation rule and more people talk with the diet information decimation rule of indirect expression；

(6) Chinese word segmentation, part-of-speech tagging and interdependent Parsing algorithm are utilized, universaling dictionary and customized diet dictionary are carried out After merging, Chinese word segmentation, part-of-speech tagging and interdependent syntactic analysis are carried out to free text, according to Chinese language communicative habits, benefit With part-of-speech information, context vocabulary part of speech and location information, linguistic unit between syntactic structure, for the drink that extracts of needs Information preparation decimation rule is eaten, the diet information of different people is extracted, and diet information is stored according to identity information.

2. the method for service robot Auditory Perception kinsfolk's diet information as described in claim 1, it is characterised in that: institute It states in step (2), when system is in mode for the moment, using static 26 Jan Vermeer frequency cepstral coefficient (MFCC) features and 13 dimensions one The feature of rank, second order MFCC Differential Characteristics as voice data ties up MFCC features according to the 52 of speaker's voice with LBG algorithm The identity information of speaker is calculated to the Euclidean distance of code book；Divide speech recognition SDK by voice data using University of Science and Technology's news simultaneously Text data is converted to, the identity information and text data of the single expression are obtained.

3. the method for service robot Auditory Perception kinsfolk's diet information as described in claim 1, it is characterised in that: institute It states in step (3), is based on bayesian information criterion speaker change point detection algorithm, carry out voice segmentation as follows:

1) detection window windows=[ws, we] is initialized；

3) if there is change point wi, then window initial position is moved at change point, windows=[ws+wi, we+wi], Keep window size constant；If there is no change point, then window length is increased into w, only change window terminal position, windows= [ws, we+w]；

4) it repeats 2), 3), until window traverses entire voice signal sample.

4. the method for service robot Auditory Perception kinsfolk's diet information as described in claim 1, it is characterised in that: institute It states in step (4), constructs customized diet dictionary: referring initially to " National Standard of the People's Republic of China GB 2760- 2014 " the food-classifying system of annex E in collects food common in daily life under different classifications according to the classification standard Title constructs vocabulary；Secondly by web crawlers, diet related web site food title is collected, expands diet vocabulary.

5. the method for service robot Auditory Perception kinsfolk's diet information according to any one of claims 1 to 4, It is characterized in that: in the step (5), the diet information decimation rule (rule 1) of single diet expression: determining diet movement first Whether really occur, then the corresponding diet noun of the diet movement to generation extracts, and obtains the diet information of user.

6. the method for service robot Auditory Perception kinsfolk's diet information as claimed in claim 5, it is characterised in that: institute It states in step (5), the diet information decimation rule (rule 2) that more people's dialogues are directly expressed: if in the answer of current speaker First linguistic unit is noun, then directly noun is extracted backward since the noun, until next verb terminates, from end Place starts that rule 1 is recycled to be extracted；Otherwise it is directly extracted using rule 1.

7. the method for service robot Auditory Perception kinsfolk's diet information as claimed in claim 6, it is characterised in that: institute It states in step (5), more people talk with the diet information decimation rule (rule 3) of indirect expression: the meaning of indirect expression is currently to say Words people does not give expression to diet information directly, according to the content of other speakers expression above, carries out all or part of reference and comes The diet content of speaker is expressed, rule 1 is recycled further to be extracted.