CN107146619A

CN107146619A - A kind of intelligent sound interacts robot

Info

Publication number: CN107146619A
Application number: CN201710579708.0A
Authority: CN
Inventors: 臧红彬; 周颖玥
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2017-07-17
Filing date: 2017-07-17
Publication date: 2017-09-08
Anticipated expiration: 2037-07-17
Also published as: CN107146619B

Abstract

Robot is interacted the invention discloses a kind of intelligent sound, it is therefore intended that solve existing speech-sound intelligent interaction robot at present and be only capable of being controlled by the way of question-response, the problem of friendly and security of man-machine interaction can not be protected.The robot of the present invention can effectively carry out double-directional speech identification by the improvement to its structure, break through the defect present in prior art.On the other hand, the improvement based on robot interior structure, robot is in moving process, and because the ground noise of the equipment such as stepper motor is disturbed, caused interactive voice problem is effectively solved.The present invention can realize that people exchanges with the two-way interaction of robot, effectively the friendly of lifting man-machine interaction, with remarkable progress.Through actual test, accuracy of identification of the invention can reach more than 95%, effectively realize the two-way progress of speaker and robot double-directional speech input and output so that friendly between speaker and robot and interactive be greatly enhanced.

Description

A kind of intelligent sound interacts robot

Technical field

The present invention relates to robot field, especially interactive voice robot field, specially a kind of intelligent sound interaction Robot.The present invention is improved by the structure to robot and interacts robot there is provided a kind of brand-new intelligent sound, and it is adopted With the structure design of similar panda profile, and by being improved to internal structure, existing voice interaction machine is efficiently solved The problem of voice present in people is only capable of individually entering or exported, for promoting interactive voice machine man-based development, promotes machine The progress of person speech interaction technology, has great importance.

Background technology

Voice, as ability specific to the mankind, is the important instrument of exchange and acquisition external information resource between the mankind And channel, the development for human civilization has great importance.Speech recognition technology as man-machine interaction branch important set Into, be the important interface of man-machine interaction, for artificial intelligence development have important practical significance.Speech recognition technology passes through The development of many decades, has been achieved for significant progress, progressively starts slowly to move towards market from laboratory.At present, for specific The speech recognition system of speaker has had higher accuracy of identification, and is widely used in industry, household electrical appliances, communication, automobile electricity The fields such as son, medical treatment, home services and consumable electronic product.

In recent years, the application with speech recognition technology in robot control, the application field of robot constantly expands. Meanwhile, the research on the Robot Control Technology based on speech recognition both at home and abroad also makes some progress.For example, domestic There is Bai Lin to be improved in the research of the Robot Control Technology based on speech recognition speech characteristic parameter extracting method, Traditional MFCC characteristic parameters are combined with formant parameter, it is proposed that new speech characteristic parameter extracting method.

At present, existing interactive voice product is mostly based on special voice recognition chip, and its kernel is single-chip microcomputer or number Word signal central processing unit, its essence is by the voice signal sample code of microphone input, then passes through internal processor and its The voice messaging matching recorded in advance, then corresponding voice messaging is defeated by external loudspeaker by the module in piece Go out.For example, Chinese patent CN201620720668.8 discloses a kind of robot system with voice interactive function, it includes A pcb board, the PCB are provided with the robot being made up of robot head, machine person portion and base, the machine person portion Plate is connected with a single-chip microcomputer, and the single-chip microcomputer is connected with a signal transmission circuit, and the robot head is passed provided with IMAQ Sensor and voice receiver, the signal transmission circuit are connected with the voice receiver, image acquiring sensor, the signal Radiating circuit is connected with mobile terminal, and the single-chip microcomputer is also associated with a signal receiving circuit and speech player, the signal Receiving circuit is connected with mobile terminal and speech player respectively, and the signal transmission circuit, signal receiving circuit are respectively connected with One wave filter, the machine person portion includes robot arm, display device and load button, and the load button shows with described Showing device is connected, and it can realize the function of interactive voice.

However, applicants have found that, existing speech recognition machine people has preferably unidirectional recognition capability, but two-way Speech recognition capabilities are weaker, mainly there is following both sides problem：

1）Robot, because the ground noise of the equipment such as stepper motor is disturbed, can give interactive voice robot in moving process Bring unpredictable results；

2）When robot is speaking, or when playing music, even if user sends instruction, robot is also difficult to what user was sent Instruction is identified, and double-directional speech recognition capability is almost lost, and this is also that current existing robot mainly uses interrogation reply system The main cause being controlled.

Drawbacks described above present in robot is interacted based on existing voice, the friendly and security of man-machine interaction can not be obtained To guarantee, three laws of robot have been run counter to.Therefore, in the urgent need to a kind of new device, to solve the above problems.

The content of the invention

The goal of the invention of the present invention is：It is only capable of using question-response for existing speech-sound intelligent interaction robot at present Mode be controlled, there is provided a kind of friendship of intelligent sound for the problem of friendly and security of man-machine interaction can not be protected Mutual robot.The robot of the present invention can effectively carry out double-directional speech identification by the improvement to its structure, break through prior art Existing defect.On the other hand, the improvement based on robot interior structure, robot is in moving process, due to stepping electricity The ground noise interference of the equipment such as machine, caused interactive voice problem is effectively solved.The present invention can realize people and machine The friendly of the two-way interaction exchange, effectively lifting man-machine interaction of device people, with remarkable progress.

To achieve these goals, the present invention is adopted the following technical scheme that：

A kind of intelligent sound interacts robot, including bottom support frame, drive mechanism, the first cavity, the second cavity, control system System, the drive mechanism is arranged on the support frame of bottom and drive mechanism can drive robot motion, institute by bottom support frame The connected composition robot body of the first cavity, the second cavity is stated, the robot body is arranged on the support frame of bottom；

Two the 3rd cavitys are symmetrically arranged with second cavity, first cavity, the second cavity, the 3rd cavity are respectively Hollow structure；

The first support frame is provided with the cavity of first cavity, first support frame is connected with bottom support frame, described It is respectively arranged with first cavity wall below the first voice playing device, the first cavity, first cavity and is provided with first Sound insulation drawer, lower sound insulation drawer and first are disposed with sound panel, the first cavity of first cavity from bottom to up Support frame can be respectively that upper sound insulation drawer, lower sound insulation drawer provide support, first sound panel be located at bottom support frame with Between lower sound insulation drawer；

It is provided between first cavity and the second cavity on the second sound panel, the 3rd cavity and is respectively arranged with the 3rd language Sound playing device, bellmouth orifice, the speech recognition equipment being engaged with the 3rd voice playing device, the 3rd cavity are spherical in shape, 3rd voice playing device is two and to be separately positioned on the 3rd cavity, and the bellmouth orifice is in for several and bellmouth orifice Fan-shaped ring-band shape, the speech recognition equipment is located between the 3rd voice playing device；

The control system is connected with the first voice playing device, the 3rd voice playing device, speech recognition equipment respectively.

Several heat emission holes are provided with below the robot body.

Several heat emission holes constitute rectangle and are arranged at below main body.

The signal reception for being provided with and being connected with control system in groove, the groove is additionally provided with first cavity One or more in device, handrail.

The signal receiver is arranged on the first support frame.

Also include the display being connected with control system, the display is arranged on the side wall of the second cavity, described aobvious Show that device is located between two the 3rd cavitys and speech recognition equipment is arranged on below display.

Angle between the display and horizontal plane is 15 ~ 90 °.

The 3rd sound panel is provided between the upper sound insulation drawer, lower sound insulation drawer.

The speech recognition equipment is located on the center line between the 3rd voice playing device.

Also include camera tracking mechanism, avoidance mechanism, the camera tracking mechanism, avoidance mechanism are separately positioned on machine Device human agent is upper and camera tracking mechanism, avoidance mechanism are connected with control system respectively, and the control system can receive, locate The position signalling that the picture signal and avoidance mechanism of reason camera tracking mechanism transmission are detected, and then control drive mechanism Action.

Also include the navigation sector being connected with control system.

For the method for aforementioned intelligent interactive voice robot interactive system, comprise the following steps：

（One）Judge phonetic entry type

1）Phonetic entry type is judged, if input and output bidirectional recognition system, then perform step（Two）If input is unidirectional to be known Other system, then perform step（Three）；

（Two）Predefined input and output bidirectional recognition system；

2）Predefined voice output table, and according to predefined voice output table gather voice playing device composition output sample set and Export test set；

3）Predefined voice vocabulary table, and voice sample data composition input sample collection and input are gathered according to the voice vocabulary table Test set；

4) N is obtained to N number of speech samples in output sample set, M speech samples fully intermeshing in input sample collection respectively！ M！Individual arrangement；Respectively by each arrangement input training system, the speech vector trained a center is obtained；Finally obtain N！M！The mean vector and variance parameter at individual speech vector center, obtain final voice training template；Wherein, N, M be more than 1 integer；

5）The speech samples concentrated simultaneously using output test set, input test are tested as voice to be measured, obtain difference Robustness degree under speech samples, includes the average correct recognition rata of correct recognition rata and speech samples of each speech samples；

6) speech samples are ranked up according to the size of speech samples correct recognition rata, selection word correct recognition rata is more than flat The speech samples of equal correct recognition rata constitute two-way candidate's vocabulary；

7) two-way candidate's vocabulary is directed to, step 4 is reused) training sound template, obtain the average arrow of each sound template Measure the peaceful meansquaredeviationσs 1 of μ 1；

8) when phonetic entry to be measured, the matching distance of voice to be measured and each sound template is calculated, minimal matching span pair is selected The sound template answered is recognition result；

9) recognition result of voice to be measured is exported；

（Three）The predefined unidirectional identifying system of input；

10）To step 3）M speech samples fully intermeshing in interior input sample collection, obtains M！Individual arrangement；Each is arranged respectively In row input training system, the speech vector trained a center is obtained；Finally obtain M！Individual speech vector center is averaged Vector variance parameter, obtains final voice training template；Wherein, M is the integer more than 1；

11）Tested using the speech samples that input test is concentrated as voice to be measured, obtain the robust of corresponding speech samples Property degree, include the average correct recognition rata of correct recognition rata and speech samples of each speech samples；

12) speech samples are ranked up according to the size of speech samples correct recognition rata, selection word correct recognition rata is more than The speech samples of average correct recognition rata constitute unidirectional candidate's vocabulary；

13) unidirectional candidate's vocabulary is directed to, step 10 is reused) training sound template, obtain being averaged for each sound template The peaceful meansquaredeviationσs 2 of vector μ 2；

14) when phonetic entry to be measured, the matching distance of voice to be measured and each sound template is calculated, minimal matching span pair is selected The sound template answered is recognition result；

15) recognition result of voice to be measured is exported.

In existing structure, mainly it is controlled by the way of question-response, this is mainly due to robot itself Output can to speech recognition effect produce extreme influence the problem of.At present, it is general by the way of being improved to chip, with Solve foregoing problems.And in invention, be improved by the overall structure to robot, effectively reduce voice output defeated to voice The interference entered, and then reach that phonetic entry exports the purpose of two-way interactive.

The structure includes bottom support frame, drive mechanism, the first cavity, the second cavity, control system；Wherein, bottom branch Support provides support for miscellaneous part, and drive mechanism is connected with bottom support frame, and drive mechanism drives bottom support frame and thereon Miscellaneous part motion.Drive mechanism includes one group of driving wheel, driven pulley, and driving wheel is connected with control system respectively.The present invention In, driven pulley can be universal wheel, and driving wheel is two and drives driving wheel to rotate by motor respectively.Further, driving wheel Can be McCrum wheel, driving wheel, driven pulley are distributed in isosceles triangle.

First cavity, the second cavity are connected, and constitute robot body, the first cavity, the second cavity are set successively from bottom to up Put；Also, two the 3rd cavitys are symmetrically arranged with the second cavity, the first cavity, the second cavity, the 3rd cavity are respectively hollow Structure, the 3rd cavity is spherical in shape.First cavity is more than the second cavity, and the second cavity is more than the 3rd cavity.Using the structure, formed The intelligent robot of the panda form of two ears is arranged at small, top upper greatly down.

It is respectively arranged with the first voice playing device, the first opening, the cavity of the first cavity and sets on first cavity wall The first support frame is equipped with, the first support frame provides support for the miscellaneous part in the first cavity.By the first voice playing device, The voice output of robot can be realized.

In the present invention, sound insulation drawer, lower sound insulation drawer are disposed with from bottom to up in the cavity of the first cavity, lead to Cross sound insulation drawer, lower sound insulation drawer and play a part of sound insulation, damping.The first sound panel, first are provided with below first cavity Sound panel is located between bottom support frame and lower sound insulation drawer, and the second sound panel is provided between the first cavity and the second cavity. The 3rd voice playing device, the bellmouth orifice being engaged with the 3rd voice playing device, voice is respectively arranged with 3rd cavity to know Other device, the 3rd voice playing device is two and is separately positioned on the 3rd cavity that bellmouth orifice is in eccentric circular ring zonal distribution, language Sound identifying device is located between two the 3rd voice playing devices, preferably on center line.

Think after applicant's analysis, existing welcome's class, household small-size are anthropomorphic or imitative zoomorphism intelligent robot can not be real The problem of existing voice bidirectional input and output, is, in the structure of robot itself；Existing welcome's class, household small-size are anthropomorphic or imitative dynamic Thing form intelligent robot uses single cavity body structure, a huge sound chamber can be formed inside it, sound chamber can have a strong impact on The effect of speech recognition.Therefore, the present invention has carried out the improvement of following several respects in structure：1）To be single in the prior art Cavity body structure is improved to the first cavity, the second cavity two single cavity, 2）Set between the first cavity and the second cavity There is the second sound panel, block influence of the first cavity sound chamber to identifying device, 3）And by the cavity of the first cavity from bottom to up Sound insulation drawer, lower sound insulation drawer are set gradually, by the setting of upper sound insulation drawer, lower sound insulation drawer, is on the one hand conducive to using The placement of family article etc., plays a part of glove, on the other hand can then destroy the original sound chamber of the first cavity, and the is reduced as far as possible Influence of one voice playing device to speech recognition equipment；4）3rd voice playing device is symmetricly set on the 3rd cavity, loudspeaker Hole is in fan-shaped ring-band shape, using which, the 3rd voice playing device one symmetrical voice output of formation, greatly reduces by the Influence of three voice playing devices for speech recognition equipment.Improvement based on said structure, the present invention can realize robot Voice output and user phonetic entry two-way interaction, the double-directional speech recognition efficiency of user is greatly improved, effectively solve The problems of prior art and defect.

Further, several heat emission holes are provided with below robot body；Robot can effectively be distributed by heat emission hole Internal heat, it is ensured that the normal operation of robot.

Further, it is additionally provided with the first cavity and is provided with the signal being connected with control system in groove, the groove and connects Receive the one or more in device, handrail.Using which, user can be operated by handrail to robot；User is except straight Connect using voice furthermore it is possible to which by signal receiver, corresponding control instruction is sent to robot.

Further, in addition to control system the display being connected, display is arranged on the side wall of the second cavity, display Device is located between two the 3rd voice playing devices and speech recognition equipment is arranged on below display.Using which, display Device forms the lovely face of similar panda, and the 3rd cavity then forms the ear of panda, gives more preferable friendliness, strengthens user Interactive friendly.

Further, the angle between display and horizontal plane is 15 ~ 90 °.Using which, user can be easy to display The viewing of device.

Further, in order to improve soundproof effect, the present invention be provided between upper sound insulation drawer, lower sound insulation drawer the 3rd every Soundboard, further to reduce influence of first voice playing device to speech recognition equipment.

Further, in addition to camera tracking mechanism, avoidance mechanism, camera tracking mechanism, avoidance mechanism are set respectively On robot body and camera tracking mechanism, avoidance mechanism are connected with control system respectively, control system can receive, locate The position signalling that the picture signal and avoidance mechanism of reason camera tracking mechanism transmission are detected, and then control drive mechanism Action.Using which, robot of the invention recognizes the movement locus of user by camera tracking mechanism, and image is believed Breath passes to control system；Meanwhile, the position signalling detected is passed to control system by avoidance mechanism；Control system receiving, The position signalling that the picture signal and avoidance mechanism of processing camera tracking mechanism transmission are detected, and control drive mechanism Action, realizes and the intelligence of user is followed.

Further, in addition to control system the navigation sector being connected.By navigation sector, the present invention can carry for user For navigation instruction；Meanwhile, based on navigation sector, the present invention can also be automatically moved to the position of setting.

The present invention is also provided in the implementation method of the interactive system based on aforementioned intelligent interactive voice robot, this method, Judged for different voice scenes, and the result based on judgement, perform corresponding identification operation.In this method, use Single speech recognition equipment, you can realize the identifying processing to voice, without using multiple speech recognition equipments, carries out language Sound noise reduction process.Meanwhile, this method is independent of specific expositor, by the definition to voice output table, voice vocabulary table, And by subsequent treatments such as voice training templates so that the present invention is improved in antinoise and aspect unrelated with speaker, weak The individual information of different speakers is changed.On the other hand, the method based on the present invention, can be used for online networking identification, also can Realize that, without net identified off-line, discrimination is high, and recognition effect is good.

Using this method, recognition result can be effectively corrected, one-way voice input and double-directional speech input and output are realized, and In one-way voice input and double-directional speech input and output, preferable recognition effect is respectively provided with.Through actual test, knowledge of the invention Other precision can reach more than 95%, effectively realize the two-way progress of speaker and robot double-directional speech input and output so that Friendly between speaker and robot and it is interactive be greatly enhanced, with remarkable progress.

Brief description of the drawings

Examples of the present invention will be described by way of reference to the accompanying drawings, wherein：

Fig. 1 is the side view of device in embodiment 1.

Fig. 2 is the rearview of device in embodiment 1.

Marked in figure：1 is drive mechanism, and 2 be the first cavity, and 3 be the second cavity, and 4 be the 3rd cavity, and 6 be that upper sound insulation is taken out Drawer, 7 be lower sound insulation drawer, and 8 be bellmouth orifice, and 9 be signal receiver, and 10 be display, and 11 be heat emission hole.

Embodiment

All features disclosed in this specification, or disclosed all methods or during the step of, except mutually exclusive Feature and/or step beyond, can combine in any way.

Any feature disclosed in this specification, unless specifically stated otherwise, can be equivalent by other or with similar purpose Alternative features are replaced.I.e., unless specifically stated otherwise, each feature is an example in a series of equivalent or similar characteristics .

Embodiment 1

The intelligent sound interaction robot of the present embodiment includes bottom support frame, drive mechanism, the first cavity, the second cavity, control System processed.Wherein, drive mechanism is arranged on the support frame of bottom, and the first cavity, the second cavity, which are connected, constitutes robot body, machine Device human agent is arranged on the support frame of bottom.Two the 3rd cavitys, the first cavity, the second chamber are symmetrically arranged with second cavity Body, the 3rd cavity are respectively hollow structure.

In the present embodiment, drive mechanism includes a driven pulley and two driving wheels, two drivings for being connected with driving wheel Motor.Using the structure, drive mechanism can relatively face be moved with mobile robot.

Meanwhile, the first support frame is provided with the cavity of the first cavity, the first support frame is connected with bottom support frame, first It is respectively arranged with below the first voice playing device, the first cavity, the first cavity and is provided with positioned at bottom support on cavity wall Sound insulation is disposed with from bottom to up in the first sound panel between frame and lower sound insulation drawer, the first cavity of the first cavity to take out Drawer, lower sound insulation drawer, the first support frame can be respectively upper sound insulation drawer, the offer support of lower sound insulation drawer.

In the present embodiment, it is provided with the second sound panel, the 3rd cavity and sets respectively between the first cavity and the second cavity Bellmouth orifice, the speech recognition equipment for have the 3rd voice playing device, being engaged with the 3rd voice playing device, the 3rd cavity are in ball Shape, the 3rd voice playing device is two and is separately positioned on the 3rd cavity that bellmouth orifice is several and bellmouth orifice is in sector Ring-band shape（As shown in the figure）, speech recognition equipment is between the 3rd voice playing device.

In the present embodiment, motor, the first voice playing device, the 3rd voice playing device, speech recognition equipment point It is not connected with control system.

In the present embodiment, several heat emission holes are additionally provided with below the first cavity, heat emission hole is in rectangular layout；First cavity On be additionally provided with the signal receiver for being provided with and being connected with control system in groove, groove；Also set up on the side wall of second cavity The display being connected with control system, display is located between two the 3rd cavitys, and speech recognition equipment is located at below display. In the present embodiment, the angle between display and horizontal plane is 45 °, and speech recognition equipment is located at two the 3rd voice playing devices Between center line on.

In the present embodiment, in addition to camera tracking mechanism, avoidance mechanism, navigation sector, camera tracking mechanism, avoidance Mechanism is separately positioned on robot body.Camera tracking mechanism, avoidance mechanism, navigation sector respectively with control system phase Even, control system can receive, handle the position that the picture signal and avoidance mechanism of the transmission of camera tracking mechanism are detected Signal, and then control the action of drive mechanism.

Using which, the robot of the present embodiment recognizes the movement locus of user by camera tracking mechanism, and will Image information passes to control system；Meanwhile, the position signalling detected is passed to control system by avoidance mechanism；Control system Receive, handle the position signalling that the picture signal and avoidance mechanism of the transmission of camera tracking mechanism are detected, and control driving The action of mechanism, realizes and the intelligence of user is followed.And based on navigation sector, control system can control the machine of the present embodiment People is automatically moved to setting position.

In the present embodiment, fly speech recognition interface using news and carry out speech recognition, double-directional speech input and output identification is accurate Degree reaches more than 88%, and one-way voice inputs accuracy of identification up to 95% or so, with preferable effect.

Embodiment 2

Based on the device of embodiment 1, the present embodiment provides a kind of implementation method of different voice interactive systems, and it includes Following steps：

（One）Judge phonetic entry type

（Two）Predefined input and output bidirectional recognition system；

9) recognition result of voice to be measured is exported；

（Three）The predefined unidirectional identifying system of input；

15) recognition result of voice to be measured is exported.

In the present embodiment, double-directional speech input and output identification accuracy reaches more than 95%, one-way voice input accuracy of identification Up to 97% or so, with preferable effect.

The invention is not limited in foregoing embodiment.The present invention, which is expanded to, any in this manual to be disclosed New feature or any new combination, and disclose any new method or process the step of or any new combination.

Claims

1. a kind of intelligent sound interacts robot, including bottom support frame, drive mechanism, the first cavity, the second cavity, control system System, the drive mechanism is arranged on the support frame of bottom and drive mechanism can drive robot motion, institute by bottom support frame The connected composition robot body of the first cavity, the second cavity is stated, the robot body is arranged on the support frame of bottom；

Characterized in that, two the 3rd cavitys are symmetrically arranged with second cavity, first cavity, the second cavity, Three cavitys are respectively hollow structure；

2. intelligent sound interacts robot according to claim 1, it is characterised in that be additionally provided with first cavity recessed The one or more in the signal receiver being connected with control system, handrail are provided with groove, the groove.

3. intelligent sound interacts robot according to claim 2, it is characterised in that the signal receiver is arranged on first On support frame.

4. intelligent sound interacts robot according to claim 1, it is characterised in that aobvious also including what is be connected with control system Show device, the display is arranged on the side wall of the second cavity, the display is located between two the 3rd cavitys and voice is known Other device is arranged on below display.

5. intelligent sound interacts robot according to claim 4, it is characterised in that between the display and horizontal plane Angle is 15 ~ 90 °.

6. robot is interacted according to any one of claim 1 ~ 5 intelligent sound, it is characterised in that the upper sound insulation drawer, The 3rd sound panel is provided between lower sound insulation drawer.

7. robot is interacted according to any one of claim 1 ~ 6 intelligent sound, it is characterised in that the speech recognition equipment On center line between the 3rd voice playing device.

8. robot is interacted according to any one of claim 1 ~ 7 intelligent sound, it is characterised in that also followed including camera Mechanism, avoidance mechanism, the camera tracking mechanism, avoidance mechanism are separately positioned on robot body above and camera is with random Structure, avoidance mechanism are connected with control system respectively, and the control system can receive, handle the figure of camera tracking mechanism transmission As the position signalling that signal and avoidance mechanism are detected, and then control the action of drive mechanism.

9. robot is interacted according to any one of claim 1 ~ 8 intelligent sound, it is characterised in that also include and control system Connected navigation sector.

10. the method for interacting robot interactive system for any one of preceding claims 1 ~ 9 intelligent sound, its feature exists In comprising the following steps：

（One）Judge phonetic entry type

（Two）Predefined input and output bidirectional recognition system；

9) recognition result of voice to be measured is exported；

（Three）The predefined unidirectional identifying system of input；

15) recognition result of voice to be measured is exported.