CN107146619B - Intelligent voice interaction robot - Google Patents

Intelligent voice interaction robot Download PDF

Info

Publication number
CN107146619B
CN107146619B CN201710579708.0A CN201710579708A CN107146619B CN 107146619 B CN107146619 B CN 107146619B CN 201710579708 A CN201710579708 A CN 201710579708A CN 107146619 B CN107146619 B CN 107146619B
Authority
CN
China
Prior art keywords
voice
cavity
robot
input
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710579708.0A
Other languages
Chinese (zh)
Other versions
CN107146619A (en
Inventor
臧红彬
周颖玥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN201710579708.0A priority Critical patent/CN107146619B/en
Publication of CN107146619A publication Critical patent/CN107146619A/en
Application granted granted Critical
Publication of CN107146619B publication Critical patent/CN107146619B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • B25J11/0005Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mechanical Engineering (AREA)
  • Robotics (AREA)
  • Manipulator (AREA)
  • Toys (AREA)

Abstract

The invention discloses an intelligent voice interaction robot, and aims to solve the problems that the existing voice intelligent interaction robot can only be controlled in a question-and-answer mode at present, and the friendliness and safety of human-computer interaction cannot be guaranteed. The robot of the invention can effectively perform bidirectional voice recognition by improving the structure thereof, and breaks through the defects in the prior art. On the other hand, based on the improvement of the internal structure of the robot, the problem of voice interaction caused by the interference of the substrate noise of equipment such as a stepping motor and the like in the moving process of the robot is effectively solved. The invention can realize the two-way interactive communication between the human and the robot, effectively improves the friendliness of human-computer interaction and has obvious improvement significance. Through practical tests, the recognition precision of the invention can reach more than 95%, and the bidirectional input and output of the bidirectional voice of the speaker and the robot are effectively realized, so that the friendliness and the interactivity between the speaker and the robot are greatly enhanced.

Description

Intelligent voice interaction robot
Technical Field
The invention relates to the field of robots, in particular to the field of voice interaction robots, and particularly relates to an intelligent voice interaction robot. The invention provides a brand-new intelligent voice interaction robot by improving the structure of the robot, which adopts the structural design similar to the appearance of a panda and improves the internal structure, thereby effectively solving the problem that the voice of the existing voice interaction robot can only be independently input or output, and having important significance for promoting the development of the voice interaction robot and promoting the progress of the voice interaction technology of the robot.
Background
The voice is the specific ability of human, is an important tool and channel for exchanging between human beings and obtaining external information resources, and has important significance for the development of human civilization. The voice recognition technology is an important component of a human-computer interaction branch, is an important interface of human-computer interaction, and has important practical significance for the development of artificial intelligence. Speech recognition technology has advanced significantly over decades, gradually starting to move from the laboratory to the market. At present, a speech recognition system for a specific speaker has a high recognition accuracy, and is widely applied to the fields of industry, household appliances, communication, automotive electronics, medical treatment, home services, consumer electronics and the like.
In recent years, the application field of robots has been expanding with the application of voice recognition technology to robot control. Meanwhile, research on a robot control technology based on voice recognition at home and abroad is also advanced to a certain extent. For example, Bailin in China improves a voice characteristic parameter extraction method in the research of a robot control technology based on voice recognition, combines the traditional MFCC characteristic parameter with a formant parameter, and provides a new voice characteristic parameter extraction method.
At present, most of existing voice interaction products are based on a special voice recognition chip, the inner core of the existing voice interaction product is a single chip microcomputer or a digital signal central processing unit, the essence of the existing voice interaction product is that a voice signal input by a microphone is sampled and coded, then the voice signal is matched with voice information recorded in advance through an internal processor, and then the corresponding voice information is output through a module in the chip and an external loudspeaker. For example, chinese patent CN201620720668.8 discloses a robot system with voice interaction function, which comprises a robot composed of a robot head, a robot body and a base, wherein a PCB board is arranged in the robot body, the PCB board is connected to a single chip microcomputer, the single chip microcomputer is connected to a signal transmitting circuit, the robot head is provided with an image collecting sensor and a voice receiver, the signal transmitting circuit is connected to the voice receiver and the image collecting sensor, the signal transmitting circuit is connected to a mobile terminal, the single chip microcomputer is further connected to a signal receiving circuit and a voice player, the signal receiving circuit is respectively connected to the mobile terminal and the voice player, the signal transmitting circuit and the signal receiving circuit are both connected to a filter, the robot body comprises a robot arm, a display device and an input button, the input button is connected with the display device and can realize the function of voice interaction.
However, the applicant researches and discovers that the existing voice recognition robot has better one-way recognition capability, but the two-way voice recognition capability is weaker, and the two problems are mainly as follows:
1) in the moving process of the robot, unpredictable results can be brought to the voice interaction robot due to the interference of substrate noise of equipment such as a stepping motor and the like;
2) when the robot speaks or plays music, even if a user gives an instruction, the robot is difficult to recognize the instruction given by the user, and the bidirectional voice recognition capability is almost lost, which is also the main reason that the existing robot mainly adopts a question-answering mode for control.
Based on the defects of the existing voice interaction robot, the friendliness and safety of human-computer interaction cannot be guaranteed, and the three laws of the robot are violated. Therefore, a new device is urgently needed to solve the above problems.
Disclosure of Invention
The invention aims to: the intelligent voice interaction robot aims at the problems that the existing voice intelligent interaction robot can only be controlled in a question-and-answer mode and the friendliness and safety of human-computer interaction cannot be guaranteed. The robot of the invention can effectively perform bidirectional voice recognition by improving the structure thereof, and breaks through the defects in the prior art. On the other hand, based on the improvement of the internal structure of the robot, the problem of voice interaction caused by the interference of the substrate noise of equipment such as a stepping motor and the like in the moving process of the robot is effectively solved. The invention can realize the two-way interactive communication between the human and the robot, effectively improves the friendliness of human-computer interaction and has obvious improvement significance.
In order to achieve the purpose, the invention adopts the following technical scheme:
an intelligent voice interaction robot comprises a bottom support frame, a driving mechanism, a first cavity, a second cavity and a control system, wherein the driving mechanism is arranged on the bottom support frame and can drive the robot to move through the bottom support frame;
two third cavities are symmetrically arranged on the second cavity, and the first cavity, the second cavity and the third cavity are respectively of a hollow structure;
a first support frame is arranged in a cavity of the first cavity, the first support frame is connected with the bottom support frame, a first voice playing device and a first cavity are respectively arranged on the side wall of the first cavity, a first sound insulation plate is arranged below the first cavity, an upper sound insulation drawer and a lower sound insulation drawer are sequentially arranged in the first cavity of the first cavity from bottom to top, the first support frame can respectively provide support for the upper sound insulation drawer and the lower sound insulation drawer, and the first sound insulation plate is positioned between the bottom support frame and the lower sound insulation drawer;
a second sound insulation plate is arranged between the first cavity and the second cavity, a third voice playing device, a horn hole matched with the third voice playing device and a voice recognition device are respectively arranged on the third cavity, the third cavity is spherical, the number of the third voice playing devices is two, the third voice playing devices are respectively arranged on the third cavity, the number of the horn holes is multiple, the horn holes are in a fan-shaped annular belt shape, and the voice recognition device is positioned between the third voice playing devices;
the control system is respectively connected with the first voice playing device, the third voice playing device and the voice recognition device.
A plurality of heat dissipation holes are formed in the lower portion of the robot main body.
The plurality of heat dissipation holes form a rectangle and are arranged below the main body.
The first cavity is further provided with a groove, and one or more of a signal receiver and a handrail which are connected with a control system are arranged in the groove.
The signal receiver is arranged on the first support frame.
The voice recognition device is arranged below the display, and the display is positioned between the two third cavities.
The included angle between the display and the horizontal plane is 15-90 degrees.
And a third sound insulation plate is arranged between the upper sound insulation drawer and the lower sound insulation drawer.
The voice recognition device is positioned on a center line between the third voice playing devices.
Still include camera following mechanism, keep away barrier mechanism, camera following mechanism, keep away barrier mechanism and set up respectively in the main part of the robot and camera following mechanism, keep away barrier mechanism and link to each other with control system respectively, control system can accept, handle the image signal of camera following mechanism transmission and keep away the position signal that barrier mechanism detected, and then control actuating mechanism's action.
And the navigation mechanism is connected with the control system.
The method for the intelligent voice interaction robot interaction system comprises the following steps:
(one) judging the type of voice input
1) Judging the voice input type, if the voice input type is an input and output bidirectional recognition system, executing the step (II), and if the voice input type is an input unidirectional recognition system, executing the step (III);
(II) predefining an input-output bidirectional identification system;
2) predefining a voice output table, and collecting a voice playing device to form an output sample set and an output test set according to the predefined voice output table;
3) predefining a voice vocabulary, and collecting voice sample data according to the voice vocabulary to form an input sample set and an input test set;
4) respectively arranging N voice samples in the output sample set and M voice samples in the input sample set to obtain N! M! An arrangement; inputting each permutation into a training system respectively to obtain a trained voice vector center; finally, find N! M! Obtaining a final voice training template by the average vector and variance parameters of the voice vector centers; wherein N, M is an integer greater than 1;
5) simultaneously, voice samples in the output test set and the input test set are used as voices to be tested for testing, so that robustness degrees under different voice samples are obtained, wherein the robustness degrees comprise correct recognition rate of each voice sample and average correct recognition rate of the voice samples;
6) the voice samples are sequenced according to the correct recognition rate of the voice samples, and the voice samples with the correct word recognition rate larger than the average correct word recognition rate are selected to form a bidirectional candidate vocabulary list;
7) aiming at the bidirectional candidate vocabulary, the speech template in the step 4) is used again to obtain the average vector mu 1 and the average variance sigma 1 of each speech template;
8) when the voice to be detected is input, calculating the matching distance between the voice to be detected and each voice template, and selecting the voice template corresponding to the minimum matching distance as a recognition result;
9) outputting the recognition result of the voice to be detected;
(III) predefining an input one-way recognition system;
10) fully arranging the M voice samples in the input sample set in the step 3) to obtain M! An arrangement; inputting each permutation into a training system respectively to obtain a trained voice vector center; finally, find M! Obtaining a final voice training template by the average vector and variance parameters of the voice vector centers; wherein M is an integer greater than 1;
11) testing by using the voice samples in the input test set as the voice to be tested to obtain the robustness degree of the corresponding voice samples, including the correct recognition rate of each voice sample and the average correct recognition rate of the voice samples;
12) the voice samples are sequenced according to the correct recognition rate of the voice samples, and the voice samples with the correct word recognition rate larger than the average correct word recognition rate are selected to form a one-way candidate vocabulary list;
13) aiming at the unidirectional candidate vocabulary, the voice template in the step 10) is used again to obtain the average vector mu 2 and the average variance sigma 2 of each voice template;
14) when the voice to be detected is input, calculating the matching distance between the voice to be detected and each voice template, and selecting the voice template corresponding to the minimum matching distance as a recognition result;
15) and outputting the recognition result of the voice to be detected.
In the existing structure, the control is mainly carried out by adopting a question-and-answer mode, which is mainly caused by the problem that the output of the robot can greatly influence the voice recognition effect. At present, the mode of improving the chip is generally adopted to solve the problems. In the invention, the interference of voice output to voice input is effectively reduced by improving the overall structure of the robot, and the aim of bidirectional interaction of voice input and output is further fulfilled.
The structure comprises a bottom support frame, a driving mechanism, a first cavity, a second cavity and a control system; the bottom support frame provides support for other parts, the driving mechanism is connected with the bottom support frame, and the driving mechanism drives the bottom support frame and other parts on the bottom support frame to move. The driving mechanism comprises a group of driving wheels and driven wheels, and the driving wheels are respectively connected with the control system. In the invention, the driven wheels can be universal wheels, and the two driving wheels are respectively driven by the motor to rotate. Furthermore, the driving wheel can be a Maclam wheel, and the driving wheel and the driven wheel are distributed in an isosceles triangle shape.
The first cavity and the second cavity are connected to form a robot main body, and are sequentially arranged from bottom to top; and two third cavities are symmetrically arranged on the second cavity, the first cavity, the second cavity and the third cavity are respectively of a hollow structure, and the third cavities are spherical. The first cavity is larger than the second cavity, and the second cavity is larger than the third cavity. By adopting the structure, the intelligent robot in the shape of a panda with a big bottom and a small top and two ears at the upper part is formed.
The side wall of the first cavity is provided with a first voice playing device and a first opening respectively, a cavity of the first cavity is internally provided with a first support frame, and the first support frame provides support for other components in the first cavity. Through first voice play device, can realize the speech output of robot.
According to the invention, the upper sound insulation drawer and the lower sound insulation drawer are sequentially arranged in the cavity of the first cavity from bottom to top, and the upper sound insulation drawer and the lower sound insulation drawer play roles in sound insulation and shock absorption. First baffle is provided with below the first cavity, and first baffle is located between bottom sprag frame and the lower sound insulation drawer, is provided with the second baffle between first cavity and the second cavity. The third cavity is respectively provided with a third voice playing device, a horn hole and a voice recognition device, the horn holes are matched with the third voice playing devices, the number of the third voice playing devices is two, the third voice playing devices are respectively arranged on the third cavity, the horn holes are distributed in a fan-shaped annular band shape, and the voice recognition device is located between the two third voice playing devices, preferably on a central line.
After the analysis of the applicant, the problem that the existing welcome small anthropomorphic or animal-like intelligent robot cannot realize voice bidirectional input and output is that the robot is structurally; the existing intelligent robot with the shape of welcome guests, household small anthropomorphic or animal imitation adopts an independent cavity structure, a huge sound cavity can be formed inside the robot, and the sound cavity can seriously influence the voice recognition effect. Therefore, the invention is structurally improved in the following aspects: 1) the structure of an independent cavity in the prior art is improved into two independent cavities, namely a first cavity and a second cavity, 2) a second sound insulation plate is arranged between the first cavity and the second cavity to block the influence of a sound cavity of the first cavity on the recognition device, 3) an upper sound insulation drawer and a lower sound insulation drawer are sequentially arranged in the cavity of the first cavity from bottom to top, and through the arrangement of the upper sound insulation drawer and the lower sound insulation drawer, on one hand, the placement of user articles and the like is facilitated, the object placing effect is achieved, on the other hand, the original sound cavity of the first cavity can be damaged, and the influence of a first voice playing device on the voice recognition device is reduced as much as possible; 4) the third voice playing device is symmetrically arranged on the third cavity, the horn hole is in a fan-shaped annular band shape, and by adopting the mode, the third voice playing device forms a symmetrical voice output, so that the influence of the third voice playing device on the voice recognition device is greatly reduced. Based on the improvement of the structure, the invention can realize the bidirectional interaction between the voice output of the robot and the voice input of the user, greatly improve the bidirectional voice recognition efficiency of the user and effectively solve the problems and the defects existing in the prior art.
Further, a plurality of heat dissipation holes are formed in the lower portion of the robot body; the heat inside the robot can be effectively dissipated through the heat dissipation holes, and the normal operation of the robot is guaranteed.
Furthermore, a groove is further formed in the first cavity, and one or more of a signal receiver and a handrail which are connected with the control system are arranged in the groove. By adopting the mode, a user can operate the robot through the handrail; besides using voice directly, the user can send corresponding control instructions to the robot through the signal receiver.
And the display is arranged on the side wall of the second cavity and is positioned between the two third voice playing devices, and the voice recognition device is arranged below the display. By adopting the mode, the display forms a lovely face similar to a panda, and the third cavity forms the ears of the panda, so that better intimacy is brought to people, and the friendliness of user interaction is enhanced.
Further, the included angle between display and the horizontal plane is 15 ~ 90. In this way, the user can view the display conveniently.
Furthermore, in order to improve the sound insulation effect, a third sound insulation plate is arranged between the upper sound insulation drawer and the lower sound insulation drawer so as to further reduce the influence of the first voice playing device on the voice recognition device.
Further, still include camera following mechanism, keep away barrier mechanism, camera following mechanism, keep away barrier mechanism and set up respectively in the main part of the robot and camera following mechanism, keep away barrier mechanism and link to each other with control system respectively, control system can accept, handle the image signal of camera following mechanism transmission and keep away the position signal that barrier mechanism detected, and then control actuating mechanism's action. By adopting the mode, the robot identifies the motion track of a user through the camera following mechanism and transmits image information to the control system; meanwhile, the obstacle avoidance mechanism transmits the detected position signal to the control system; the control system receives and processes the image signal transmitted by the camera following mechanism and the position signal detected by the obstacle avoidance mechanism, controls the action of the driving mechanism and realizes intelligent following for the user.
Further, the system also comprises a navigation mechanism connected with the control system. By the navigation mechanism, the invention can provide navigation guidance for users; meanwhile, the invention can automatically move to the set position based on the navigation mechanism.
The invention also provides an implementation method of the interactive system based on the intelligent voice interactive robot, which is characterized in that judgment is carried out according to different voice scenes, and corresponding recognition operation is executed based on the judgment result. In the method, the speech recognition processing can be realized by adopting a single speech recognition device, and the speech noise reduction processing is carried out without adopting a plurality of speech recognition devices. Meanwhile, the method does not depend on a specific describer, and the noise resistance and the independence of the speaker are improved by defining the voice output table and the voice vocabulary table and relying on subsequent processing such as the voice training template and the like, so that the method weakens the personal information of different speakers. On the other hand, the method can be used for online identification and also can realize offline identification without a network, and has high identification rate and good identification effect.
By adopting the method, the recognition result can be effectively corrected, unidirectional voice input and bidirectional voice input and output are realized, and the method has better recognition effect in both unidirectional voice input and bidirectional voice input and output. Through practical tests, the recognition precision of the invention can reach more than 95%, the bidirectional input and output of the bidirectional voice of the speaker and the robot are effectively realized, the friendliness and the interactivity between the speaker and the robot are greatly enhanced, and the invention has obvious improvement significance.
Drawings
The invention will now be described, by way of example, with reference to the accompanying drawings, in which:
fig. 1 is a rear view of the apparatus of example 1.
FIG. 2 is a side view of the apparatus of example 1.
The labels in the figure are: 1 is actuating mechanism, 2 is first cavity, 3 is the second cavity, 4 is the third cavity, 6 is the drawer that gives sound insulation, 7 is the drawer that gives sound insulation down, 8 is the horn hole, 9 is signal receiver, 10 is the display, 11 is the louvre.
Detailed Description
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
Any feature disclosed in this specification may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.
Example 1
The intelligent voice interaction robot of the embodiment comprises a bottom support frame, a driving mechanism, a first cavity, a second cavity and a control system. The driving mechanism is arranged on the bottom support frame, the first cavity and the second cavity are connected to form a robot main body, and the robot main body is arranged on the bottom support frame. Two third cavities are symmetrically arranged on the second cavity, and the first cavity, the second cavity and the third cavity are respectively of a hollow structure.
In this embodiment, the driving mechanism includes a driven wheel, two driving wheels, and two driving motors connected to the driving wheels. By adopting the structure, the driving mechanism can drive the robot to move relative to the ground.
Simultaneously, be provided with first support frame in the cavity of first cavity, first support frame links to each other with the bottom sprag frame, be provided with first pronunciation play device on the first cavity lateral wall respectively, first cavity below is provided with the first acoustic celotex board that is located bottom sprag frame and gives sound insulation down between the drawer, supreme sound insulation drawer, the lower sound insulation drawer of having set gradually down in the first cavity of first cavity, first support frame can be respectively for last sound insulation drawer, the drawer provides the support that gives sound insulation down.
In this embodiment, a second sound insulation board is disposed between the first cavity and the second cavity, a third voice playing device, a speaker hole matched with the third voice playing device, and a voice recognition device are respectively disposed on the third cavity, the third cavity is spherical, the third voice playing devices are two and are respectively disposed on the third cavity, the speaker hole is a plurality of and the speaker hole is in a fan-shaped annular shape (as shown in the figure), and the voice recognition device is located between the third voice playing devices.
In this embodiment, the driving motor, the first voice playing device, the third voice playing device, and the voice recognition device are respectively connected to the control system.
In this embodiment, a plurality of heat dissipation holes are further formed below the first cavity, and the heat dissipation holes are arranged in a rectangular shape; the first cavity is also provided with a groove, and a signal receiver connected with a control system is arranged in the groove; the side wall of the second cavity is also provided with a display connected with the control system, the display is positioned between the two third cavities, and the voice recognition device is positioned below the display. In this embodiment, an angle between the display and the horizontal plane is 45 °, and the voice recognition device is located on a center line between the two third voice playing devices.
In this embodiment, still include camera following mechanism, keep away barrier mechanism, navigation mechanism, camera following mechanism, keep away barrier mechanism and set up respectively on the robot main part. The camera following mechanism, the obstacle avoidance mechanism and the navigation mechanism are respectively connected with the control system, and the control system can receive and process image signals transmitted by the camera following mechanism and position signals detected by the obstacle avoidance mechanism so as to control the action of the driving mechanism.
By adopting the mode, the robot of the embodiment identifies the motion track of the user through the camera following mechanism and transmits image information to the control system; meanwhile, the obstacle avoidance mechanism transmits the detected position signal to the control system; the control system receives and processes the image signal transmitted by the camera following mechanism and the position signal detected by the obstacle avoidance mechanism, controls the action of the driving mechanism and realizes intelligent following for the user. And based on the navigation mechanism, the control system can control the robot of the embodiment to automatically move to the set position.
In the embodiment, the message flight voice recognition interface is adopted for voice recognition, the accuracy of bidirectional voice input and output recognition reaches over 88%, the accuracy of unidirectional voice input and recognition reaches about 95%, and the method has a good effect.
Example 2
Based on the apparatus in embodiment 1, this embodiment provides a method for implementing different voice interaction systems, which includes the following steps:
(one) judging the type of voice input
1) Judging the voice input type, if the voice input type is an input and output bidirectional recognition system, executing the step (II), and if the voice input type is an input unidirectional recognition system, executing the step (III);
(II) predefining an input-output bidirectional identification system;
2) predefining a voice output table, and collecting a voice playing device to form an output sample set and an output test set according to the predefined voice output table;
3) predefining a voice vocabulary, and collecting voice sample data according to the voice vocabulary to form an input sample set and an input test set;
4) respectively arranging N voice samples in the output sample set and M voice samples in the input sample set to obtain N! M! An arrangement; inputting each permutation into a training system respectively to obtain a trained voice vector center; finally, find N! M! Obtaining a final voice training template by the average vector and variance parameters of the voice vector centers; wherein N, M is an integer greater than 1;
5) simultaneously, voice samples in the output test set and the input test set are used as voices to be tested for testing, so that robustness degrees under different voice samples are obtained, wherein the robustness degrees comprise correct recognition rate of each voice sample and average correct recognition rate of the voice samples;
6) the voice samples are sequenced according to the correct recognition rate of the voice samples, and the voice samples with the correct word recognition rate larger than the average correct word recognition rate are selected to form a bidirectional candidate vocabulary list;
7) aiming at the bidirectional candidate vocabulary, the speech template in the step 4) is used again to obtain the average vector mu 1 and the average variance sigma 1 of each speech template;
8) when the voice to be detected is input, calculating the matching distance between the voice to be detected and each voice template, and selecting the voice template corresponding to the minimum matching distance as a recognition result;
9) outputting the recognition result of the voice to be detected;
(III) predefining an input one-way recognition system;
10) fully arranging the M voice samples in the input sample set in the step 3) to obtain M! An arrangement; inputting each permutation into a training system respectively to obtain a trained voice vector center; finally, find M! Obtaining a final voice training template by the average vector and variance parameters of the voice vector centers; wherein M is an integer greater than 1;
11) testing by using the voice samples in the input test set as the voice to be tested to obtain the robustness degree of the corresponding voice samples, including the correct recognition rate of each voice sample and the average correct recognition rate of the voice samples;
12) the voice samples are sequenced according to the correct recognition rate of the voice samples, and the voice samples with the correct word recognition rate larger than the average correct word recognition rate are selected to form a one-way candidate vocabulary list;
13) aiming at the unidirectional candidate vocabulary, the voice template in the step 10) is used again to obtain the average vector mu 2 and the average variance sigma 2 of each voice template;
14) when the voice to be detected is input, calculating the matching distance between the voice to be detected and each voice template, and selecting the voice template corresponding to the minimum matching distance as a recognition result;
15) and outputting the recognition result of the voice to be detected.
In the embodiment, the recognition accuracy of the bidirectional voice input and output reaches more than 95%, and the recognition accuracy of the unidirectional voice input reaches about 97%, so that the method has a good effect.
The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed.

Claims (9)

1. An intelligent voice interaction robot comprises a bottom support frame, a driving mechanism, a first cavity, a second cavity and a control system, wherein the driving mechanism is arranged on the bottom support frame and can drive the robot to move through the bottom support frame;
the device is characterized in that two third cavities are symmetrically arranged on the second cavity, and the first cavity, the second cavity and the third cavity are respectively of a hollow structure;
a first support frame is arranged in a cavity of the first cavity, the first support frame is connected with the bottom support frame, a first voice playing device and a first cavity are respectively arranged on the side wall of the first cavity, a first sound insulation plate is arranged below the first cavity, an upper sound insulation drawer and a lower sound insulation drawer are sequentially arranged in the first cavity of the first cavity from bottom to top, the first support frame can respectively provide support for the upper sound insulation drawer and the lower sound insulation drawer, and the first sound insulation plate is positioned between the bottom support frame and the lower sound insulation drawer;
a second sound insulation plate is arranged between the first cavity and the second cavity, a third voice playing device, a horn hole matched with the third voice playing device and a voice recognition device are respectively arranged on the third cavity, the third cavity is spherical, the number of the third voice playing devices is two, the third voice playing devices are respectively arranged on the third cavity, the number of the horn holes is multiple, the horn holes are in a fan-shaped annular belt shape, and the voice recognition device is positioned between the third voice playing devices;
the control system is respectively connected with the first voice playing device, the third voice playing device and the voice recognition device;
the method for realizing voice interaction by the intelligent voice interaction robot comprises the following steps:
(one) judging the type of voice input
1) Judging the voice input type, if the voice input type is an input and output bidirectional recognition system, executing the step (II), and if the voice input type is an input unidirectional recognition system, executing the step (III);
(II) predefining an input-output bidirectional identification system;
2) predefining a voice output table, and collecting a voice playing device to form an output sample set and an output test set according to the predefined voice output table;
3) predefining a voice vocabulary, and collecting voice sample data according to the voice vocabulary to form an input sample set and an input test set;
4) respectively arranging N voice samples in the output sample set and M voice samples in the input sample set to obtain N! M! An arrangement; inputting each permutation into a training system respectively to obtain a trained voice vector center; finally, find N! M! Obtaining a final voice training template by the average vector and variance parameters of the voice vector centers; wherein N, M is an integer greater than 1;
5) simultaneously, voice samples in the output test set and the input test set are used as voices to be tested for testing, so that robustness degrees under different voice samples are obtained, wherein the robustness degrees comprise correct recognition rate of each voice sample and average correct recognition rate of the voice samples;
6) the voice samples are sequenced according to the correct recognition rate of the voice samples, and the voice samples with the correct word recognition rate larger than the average correct word recognition rate are selected to form a bidirectional candidate vocabulary list;
7) aiming at the bidirectional candidate vocabulary, the speech template in the step 4) is used again to obtain the average vector mu 1 and the average variance sigma 1 of each speech template;
8) when the voice to be detected is input, calculating the matching distance between the voice to be detected and each voice template, and selecting the voice template corresponding to the minimum matching distance as a recognition result;
9) outputting the recognition result of the voice to be detected;
(III) predefining an input one-way recognition system;
10) fully arranging the M voice samples in the input sample set in the step 3) to obtain M! An arrangement; inputting each permutation into a training system respectively to obtain a trained voice vector center; finally, find M! Obtaining a final voice training template by the average vector and variance parameters of the voice vector centers; wherein M is an integer greater than 1;
11) testing by using the voice samples in the input test set as the voice to be tested to obtain the robustness degree of the corresponding voice samples, including the correct recognition rate of each voice sample and the average correct recognition rate of the voice samples;
12) the voice samples are sequenced according to the correct recognition rate of the voice samples, and the voice samples with the correct word recognition rate larger than the average correct word recognition rate are selected to form a one-way candidate vocabulary list;
13) aiming at the unidirectional candidate vocabulary, the voice template in the step 10) is used again to obtain the average vector mu 2 and the average variance sigma 2 of each voice template;
14) when the voice to be detected is input, calculating the matching distance between the voice to be detected and each voice template, and selecting the voice template corresponding to the minimum matching distance as a recognition result;
15) and outputting the recognition result of the voice to be detected.
2. The intelligent voice interaction robot of claim 1, wherein the first cavity is further provided with a groove, one or more of a signal receiver and an armrest are arranged in the groove, and the signal receiver is connected with a control system.
3. The intelligent voice interactive robot of claim 2, wherein the signal receiver is disposed on the first support frame.
4. The intelligent voice interaction robot of claim 1, further comprising a display connected to the control system, the display being disposed on a side wall of the second cavity, the display being located between the two third cavities and the voice recognition device being disposed below the display.
5. The intelligent voice interaction robot of claim 4, wherein the angle between the display and the horizontal plane is 15-90 °.
6. The intelligent voice interaction robot as claimed in any one of claims 1 to 5, wherein a third sound insulation plate is arranged between the upper sound insulation drawer and the lower sound insulation drawer.
7. The intelligent voice interaction robot as claimed in any one of claims 1 to 5, wherein the voice recognition device is located on a center line between the third voice playing devices.
8. The intelligent voice interaction robot according to any one of claims 1 to 5, further comprising a camera following mechanism and an obstacle avoidance mechanism, wherein the camera following mechanism and the obstacle avoidance mechanism are respectively arranged on the robot main body, the camera following mechanism and the obstacle avoidance mechanism are respectively connected with a control system, and the control system can receive and process image signals transmitted by the camera following mechanism and position signals detected by the obstacle avoidance mechanism, so as to control the action of the driving mechanism.
9. The intelligent voice interaction robot of any one of claims 1-5, further comprising a navigation mechanism connected to the control system.
CN201710579708.0A 2017-07-17 2017-07-17 Intelligent voice interaction robot Active CN107146619B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710579708.0A CN107146619B (en) 2017-07-17 2017-07-17 Intelligent voice interaction robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710579708.0A CN107146619B (en) 2017-07-17 2017-07-17 Intelligent voice interaction robot

Publications (2)

Publication Number Publication Date
CN107146619A CN107146619A (en) 2017-09-08
CN107146619B true CN107146619B (en) 2020-11-13

Family

ID=59776377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710579708.0A Active CN107146619B (en) 2017-07-17 2017-07-17 Intelligent voice interaction robot

Country Status (1)

Country Link
CN (1) CN107146619B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101577118A (en) * 2009-06-12 2009-11-11 北京大学 Implementation method of voice interaction system facing intelligent service robot
WO2012020858A1 (en) * 2010-08-11 2012-02-16 (주) 퓨처로봇 Intelligent driving robot for providing customer service and calculation in restaurants
CN203522988U (en) * 2013-09-26 2014-04-02 深圳市金立通信设备有限公司 Microphone apparatus and terminal
CN105425799A (en) * 2015-12-03 2016-03-23 昆山穿山甲机器人有限公司 Bank self-service robot system and automatic navigation method thereof
KR101651493B1 (en) * 2010-07-15 2016-08-26 현대모비스 주식회사 Apparatus for two way voice recognition
CN106737760A (en) * 2017-03-01 2017-05-31 深圳市爱维尔智能科技有限公司 A kind of human-like intelligent robot and man-machine communication's system
CN106378786B (en) * 2016-11-30 2018-12-21 北京百度网讯科技有限公司 Robot based on artificial intelligence

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101577118A (en) * 2009-06-12 2009-11-11 北京大学 Implementation method of voice interaction system facing intelligent service robot
KR101651493B1 (en) * 2010-07-15 2016-08-26 현대모비스 주식회사 Apparatus for two way voice recognition
WO2012020858A1 (en) * 2010-08-11 2012-02-16 (주) 퓨처로봇 Intelligent driving robot for providing customer service and calculation in restaurants
CN203522988U (en) * 2013-09-26 2014-04-02 深圳市金立通信设备有限公司 Microphone apparatus and terminal
CN105425799A (en) * 2015-12-03 2016-03-23 昆山穿山甲机器人有限公司 Bank self-service robot system and automatic navigation method thereof
CN106378786B (en) * 2016-11-30 2018-12-21 北京百度网讯科技有限公司 Robot based on artificial intelligence
CN106737760A (en) * 2017-03-01 2017-05-31 深圳市爱维尔智能科技有限公司 A kind of human-like intelligent robot and man-machine communication's system

Also Published As

Publication number Publication date
CN107146619A (en) 2017-09-08

Similar Documents

Publication Publication Date Title
US11620983B2 (en) Speech recognition method, device, and computer-readable storage medium
CN107030691B (en) Data processing method and device for nursing robot
CN107203953B (en) Teaching system based on internet, expression recognition and voice recognition and implementation method thereof
KR100215946B1 (en) Apparatus voiice selection apparatus voice recognition apparatus and voice response apparatus
US20070033012A1 (en) Method and apparatus for a verbo-manual gesture interface
CN106409030A (en) Customized foreign spoken language learning system
CN203300127U (en) Children teaching and monitoring robot
CN108717852B (en) Intelligent robot semantic interaction system and method based on white light communication and brain-like cognition
CN104537925B (en) Language barrier child language training auxiliary system and method
CN102446428A (en) Robot-based interactive learning system and interaction method thereof
CN110176284A (en) A kind of speech apraxia recovery training method based on virtual reality
CN103611294B (en) A kind of chess and card games phonetic controller and control method thereof
CN106570473A (en) Deaf-mute sign language identification interaction system based on robot
CN205645223U (en) Indicate to stop ware more
JPH08187368A (en) Game device, input device, voice selector, voice recognizing device and voice reacting device
CN103546790A (en) Language interaction method and language interaction system on basis of mobile terminal and interactive television
KR20080003738A (en) System for playing video and audio data bidirectionally and method therefor
CN106157971A (en) Intelligence control system
US20120278066A1 (en) Communication interface apparatus and method for multi-user and system
CN106960609A (en) A kind of intelligent music teaching device and its application method
WO2016134633A1 (en) Online-offline interaction toy and method for realizing online-offline data interaction of the toy
CN106340214B (en) Intelligent parlor children for learning system and method based on interactive projection
CN107146619B (en) Intelligent voice interaction robot
JP5842245B2 (en) Communication robot
CN108201680A (en) A kind of intelligent speech treadlemill control system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant