WO2018205083A1 - 机器人唤醒方法、装置和机器人 - Google Patents
机器人唤醒方法、装置和机器人 Download PDFInfo
- Publication number
- WO2018205083A1 WO2018205083A1 PCT/CN2017/083424 CN2017083424W WO2018205083A1 WO 2018205083 A1 WO2018205083 A1 WO 2018205083A1 CN 2017083424 W CN2017083424 W CN 2017083424W WO 2018205083 A1 WO2018205083 A1 WO 2018205083A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice command
- line
- information
- sight
- robot
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 81
- 230000000007 visual effect Effects 0.000 claims description 43
- 238000012790 confirmation Methods 0.000 claims description 25
- 238000012544 monitoring process Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 9
- 230000001815 facial effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000002618 waking effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/013—Eye tracking input arrangements
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J11/00—Manipulators not otherwise provided for
- B25J11/0005—Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J13/00—Controls for manipulators
- B25J13/003—Controls for manipulators by means of an audio-responsive input
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1694—Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
- B25J9/1697—Vision controlled systems
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/18—Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form
- G05B19/4155—Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form characterised by programme execution, i.e. part programme or machine function execution, e.g. selection of a programme
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/36—Nc in input of data, input key till input tape
- G05B2219/36017—Graphic assisted robot programming, display projection of surface
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/39—Robotics, robotics to robotics hand
- G05B2219/39441—Voice command, camera detects object, grasp, move
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/40—Robotics, robotics mapping to robotics vision
- G05B2219/40002—Camera, robot follows direction movement of operator head, helmet, headstick
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/227—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/2803—Home automation networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/2803—Home automation networks
- H04L12/2816—Controlling appliance services of a home automation network by calling their functionalities
- H04L12/2821—Avoiding conflicts related to the use of home appliances
Definitions
- Embodiments of the present invention relate to the field of artificial intelligence automatic control, for example, to a robot wake-up method, apparatus, and robot.
- robots bring a lot of convenience to human production and life.
- the robot can pre-set the wake-up words.
- a specific wake-up word such as the name of the robot
- the user can set the wake-up word Alexa or Mike for the robot.
- Alexa or Mike the robot knows that the user is calling himself.
- the inventors found that at least the following problems exist in the related art: when the user cannot remember the name of the robot at a certain moment, or the user has multiple robots, he cannot remember the name of each robot, or because Some robots look alike, and users can't distinguish robots correctly. In these cases, the user will not be able to wake up the robot or will wake up the wrong robot, and thus will not be able to complete the actual needs of the user.
- an embodiment of the present invention provides a robot wake-up method, where the wake-up method is applied to a robot, and the method includes:
- the voice command issuer If the line-of-sight range information when the voice command issuer issues the voice command is obtained, if the voice command is confirmed according to the line-of-sight information, the voice command issuer looks at himself, and if he or she looks at himself, it confirms that he or she is called.
- an embodiment of the present invention further provides a robot wake-up device, where the wake-up device is applied to a robot, and the device includes:
- the call confirmation module is configured to: if the line-of-sight range information when the voice command issuer issues the voice command is obtained, confirm whether the voice command issuer looks at the voice command when the voice command is issued according to the line-of-sight range information, and confirms if you look at yourself I am called.
- an embodiment of the present invention further provides a robot, including:
- At least one processor and,
- the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method as described above.
- the robot determines whether the voice command publisher pays attention to himself when issuing the voice command according to the line of sight range information when the voice command publisher issues the voice command, and confirms that he or she is called if he or she looks at himself.
- the pronoun can be used as the wake-up word to wake up the robot without the user having to remember the name of each robot, thereby improving the user experience.
- FIG. 1 is a schematic diagram of an application scenario of the method and apparatus of the present invention.
- FIG. 2 is a schematic diagram of an application scenario of the method and apparatus of the present invention.
- FIG. 3 is a flow chart of one embodiment of a wake-up method of the present invention.
- FIG. 4 is a flow chart of one embodiment of a wake-up method of the present invention.
- FIG. 5 is a flowchart of a step of acquiring line-of-sight range information when a voice command issuer issues a voice command in an embodiment of the wake-up method of the present invention
- FIG. 6 is a flow chart of the steps of inquiring the voice command issuer in one embodiment of the wake-up method of the present invention.
- FIG. 7 is a flow chart of one embodiment of a wake-up method of the present invention.
- Figure 8 is a schematic structural view of an embodiment of the wake-up device of the present invention.
- Figure 9 is a schematic structural view of an embodiment of the wake-up device of the present invention.
- Figure 10 is a schematic structural view of an embodiment of the wake-up device of the present invention.
- FIG. 11 is a schematic structural diagram of a voice command acquisition submodule in an embodiment of the wakeup device of the present invention.
- FIG. 12 is a schematic structural diagram of an inquiry module in an embodiment of the wake-up device of the present invention.
- FIG. 13 is a schematic structural diagram of a line-of-sight range acquisition sub-module in an embodiment of the wake-up device of the present invention.
- Figure 14 is a block diagram showing the structure of a call confirmation module in an embodiment of the wake-up device of the present invention.
- FIG. 15 is a schematic diagram showing the hardware structure of a waking method according to an embodiment of the present invention.
- the robot wake-up method and apparatus provided by the present invention are applicable to an application scenario as shown in FIGS. 1 and 2, including one or more robots 20, which can communicate with each other through a network 30, wherein the network 30 can be, for example, Home or company LAN, or a specific network.
- the bot 20 has at least one network interface that establishes a communication connection with the network 30 to retrieve data or instructions from the network 30.
- User 10 can be any number of groups having the same or similar operational behavior, such as a family, a work group, or an individual.
- the user 10 can set or issue commands to the plurality of robots 20.
- Each robot has its corresponding wake-up word for waking itself from a sleep state or responding to a user's call, which can be preset by the user.
- the wake-up word can be a robot name, an identification code or any other vocabulary.
- the user needs to remember the specific wake-up word of each robot, and the user cannot wake up the robot when the user cannot remember the specific wake-up word.
- a unified pronoun can be used instead of each specific wake-up word, the user will save the trouble of remembering each particular wake-up word.
- the robot wake-up method and device provided by the invention can wake up the robot by using a unified pronoun.
- the robot wake-up method and apparatus provided by the present invention are applicable to the case where a unified pronoun is used as the wake-up word, and the same applies to the case where the specific wake-up word is adopted.
- the unified pronoun may be a pronoun representing a call, such as you, you, ape, etc., which may also be user-defined pronouns such as "dear” or "robot”. It is also possible to define singular and plural attributes for the pronouns, such as "you”, "robot” is a pronoun that means singular, "you", “machine people” are pronouns that represent plural.
- the robot wake-up method provided by the embodiment of the present invention may be executed by any of the robots shown in FIG. 1 or FIG. 2, as shown in FIG. 3, which is a flowchart of an embodiment of the wake-up method, where the wake-up method includes:
- Step 101 Acquire a line of sight range information when a voice command publisher issues a voice command.
- Step 102 If the line-of-sight range information when the voice command publisher issues the voice command is obtained, if the voice command is confirmed according to the line-of-sight range information, the voice command publisher looks at himself, and if he looks at himself, he confirms that he is called. .
- the general robot has a camera or a 360-degree panoramic camera.
- the camera records the image around the robot and stores it in the visual information cache. By recalling the image in the visual information buffer, it can be judged. By setting the location of the user, it is also known whether the user is facing himself or not to confirm whether the user is looking at himself or herself.
- the robot determines whether the voice command publisher pays attention to himself when issuing the voice command according to the line of sight range information when the voice command publisher issues the voice command, and confirms that he or she is called if he or she looks at himself.
- the pronoun can be used as the wake-up word to wake up the robot without the user having to remember the name of each robot, thereby improving the user experience.
- the method includes:
- Step 201 Acquire a line of sight range information when a voice command issuer issues a voice command.
- Step 202 If the line-of-sight range information when the voice command publisher issues the voice command is obtained, if the voice command is issued according to the line-of-sight range information, whether the voice command publisher faces the user, and if facing the self, confirm that he is called. .
- Step 203 If the line-of-sight range information when the voice command publisher issues the voice command is not obtained, the voice command issuer is queried.
- the robot may be doing other things in a farther place.
- the user does not have to go to the robot to issue a command to the robot, but can issue a voice command in place and hear.
- the voice commanding robot judges that the user does not look at himself when issuing the voice command, he will actively ask the client to confirm whether the customer calls himself.
- the call can be made in place, thereby further improving the user experience.
- the obtaining the line-of-sight range information when the voice command publisher issues the voice command includes:
- the line-of-sight range information when the voice command issuer issues the voice command is obtained in the step 102 or the step 202, that is, the robot itself acquires the line-of-sight range information of the voice command publisher.
- all the robots that hear the same voice command are placed in a candidate group, and the robot in the candidate group broadcasts the line of sight when the user obtains the voice command to the other robot.
- Range information all robots in the candidate group can share the line-of-sight range information acquired by the robots in other candidate groups. In this way, even if some robots do not capture the user's line of sight or the captured range of sight is incomplete, the user's line of sight can be obtained from other robots to confirm whether they are being watched.
- the line of sight range information obtained when the voice command issuer issues the voice command is obtained in step 102 or step 202, that is, the machine The person himself obtains the line-of-sight range information of the voice command publisher or the robot itself receives the line-of-sight range information of other robot broadcasts.
- the obtaining the line-of-sight range information when the voice command issuer issues the voice command includes:
- Step 301 Acquire voice command information, where the voice command information includes time information of a voice command and voice command issuer information.
- a microphone can be placed on the robot for receiving voice signals in real time.
- the voice command may be a voice signal received in real time.
- the user makes a voice, but it is not necessarily a voice command for the robot, so it is necessary to further judge the voice signal. Recording is only performed when the voice signal is a voice command issued by the user. In some cases, the user is far away from the robot. Even if the robot can receive a long-distance voice signal, if the sound pressure level of the voice signal is too small, it may not be correctly resolved. Therefore, for a voice command whose sound pressure level is less than a certain value, Will not be recorded.
- the acquiring the voice command information includes:
- the sound pressure level of the voice signal exceeds a preset threshold, recording the start time and the end time of the voice signal as time information of the voice command, and recording the voice signal The sound pressure level as the sound pressure level of the voice command;
- a voice signal issuer is identified based on the voice signal, and the voice signal issuer is recorded as voice command issuer information.
- determining whether the appearance of the wake-up word is a call may determine whether the time interval between the wake-up word and the following voice content is more than a preset time. If the preset time is exceeded, the appearance of the wake-up word is a call. Or by judging whether there is other speech content in front of the first wake-up word, if there is no other speech content, the appearance of the wake-up word is a call.
- the time information of the voice command may also be the start time and the end time of the wake-up word in the voice command, and may be selected according to the speaking habit of the user.
- Each person's voiceprint features are unique, and the identity of the person identifying the voice signal can be identified by identifying the voiceprint feature.
- the voiceprint feature of the user can be stored in advance in the storage of the robot. When a robot has multiple owners, the correspondence between the voiceprint feature and each owner information should also be stored. So that the robot can identify the owner's identity according to the voiceprint feature.
- Step 302 Broadcast the voice command information.
- Step 303 Confirm whether there is a robot that hears the same voice command, and if so, join the robot to the same candidate group;
- the time information and voice life of the voice command can be used by each robot.
- the publisher information is broadcast to the candidate group, and then a robot determines which robots hear the same voice command, and then establishes a candidate group to notify the robots to join the candidate group.
- the confirmation confirms whether there is a robot that hears the same voice command, and if there is a robot whose time information and voice command publisher information match, it is confirmed that the robot hears the same voice command. That is, if the voice command is issued at the same time and by the same person, it is the same voice command.
- Step 304 Acquire visual information and location information of the self that conforms to the time information, and obtain line-of-sight range information of the voice command publisher according to the visual information and the location information of the time information;
- the robot when the time information of the voice command is the start time and the end time of the voice command, the robot is in the cache of its own visual information (the visual information cache will buffer the historical visual input for a certain period of time, such as 5s before the current time). Internally, the visual information from the start time of the voice command to the end time is retrieved.
- the robot retrieves the visual information from the start time of the wake-up word to the end time in the cache of its own visual information. It can be seen from the above that the start time and the start time of the wake-up word are selected, the data processing amount is relatively small, and the running speed is faster.
- the facial features of the user, the voiceprint features, and the correspondence between the facial features and the voiceprint features and the user's identity may be stored in the storage of the robot in advance.
- the robot can determine the voice command issuer in conjunction with the user's facial features. Dividing the time information of the voice command into a plurality of moments; at each moment, confirming, according to the visual information of the moment, the angle of the voice command publisher's face and the robot itself, and according to the position information of the moment itself and the The angle gets the line of sight at this moment.
- the line of sight direction is an equation of the user's line of sight facing the target direction. When the user calls several robots, several line of sight directions may be generated, that is, several equations.
- the obtained line of sight direction and the time corresponding to the line of sight direction are taken as the line of sight range information.
- the line-of-sight range information may be a line-of-sight direction and a time corresponding thereto, or may be a plurality of line-of-sight directions and a plurality of times corresponding thereto.
- the time information is divided into a plurality of times, that is, a plurality of uniform times between the start time and the end time are obtained, and the time stamp of the system is directly utilized.
- the line of sight interval and the time range corresponding to the line of sight interval may be acquired according to the obtained line of sight direction and the time corresponding to the line of sight direction, and the line of sight interval and The time range is used as line of sight range information.
- the line of sight interval can be determined in two directions of the start of the line of sight and the direction of the end of the line of sight. In the case of only one line of sight, the line of sight is the direction of the single line of sight.
- Step 305 If there is a line of sight range information of the voice command publisher, the line of sight range information of the voice command publisher is broadcasted in the candidate group.
- steps 301-305 are not necessarily performed by each robot.
- the robot that hears the voice information performs steps 301 and 302, and the robots in the candidate group perform steps 304 and 305.
- step 303 may only be performed by one robot or several robots, for example each The robot can broadcast its own working state to other robots, which is executed by the most idle robot, and then the executed robot will share the execution result to other robots through the network.
- certain line-of-sight range information is a line-of-sight direction and a time corresponding to the line-of-sight direction
- voice line range information when the voice line range information is confirmed according to the line-of-sight range information, whether the voice command issuer looks at himself or not, include:
- the position information of the robot is generally stored in the location information cache (which stores the historical position information for a certain period of time, such as within 5 seconds before the current time), and the robot can retrieve the position information within 5S or within 3S, for example, from the present. .
- each time point in the visual range information is determined, whether the position of the self is on the corresponding one-time equation, wherein in order to avoid the error of the line-of-sight direction by the face recognition, when determining whether the position information of the position conforms to the line of sight direction, Leave a certain range of angles, for example, to determine the position of the line of sight is the central axis of the line of sight, within 2 ° of each of the left and right.
- the line-of-sight range information is a line-of-sight interval and a time range corresponding to the line-of-sight interval
- the voice command is confirmed according to the line-of-sight range information, whether the voice command publisher looks at himself or not ,include:
- the user may be responsive to the user, for example, according to the direction of the sound determined by the microphone array, and the next command of the user is queried by voice. If it is confirmed that the user does not call himself, the candidate group is automatically logged out. If the candidate group is exited, the candidate group is logged off while exiting the candidate group, except that there are no other robots.
- the querying the voice command issuer includes:
- Step 401 Confirm whether the wake-up word is a pronoun representing a singular number
- Step 402 If it is a pronoun representing a singular number, the robot that has the highest sound pressure level of the voice command in the candidate group is confirmed, and the robot with the highest sound pressure level is asked whether the voice command issuer is calling the robot with the highest sound pressure level;
- the robot with the highest sound pressure level may be the robot closest to the user, and most likely the object that the user commands. When inquiring about the user, they can turn themselves to the user according to the direction of the sound obtained by their own microphone array.
- Step 403 If it is calling itself, responding to the voice command issuer;
- the user is further queried or the user instruction is executed. It is also possible to broadcast a message to the candidate group, cause each member to quit, and revoke the candidate group.
- Step 404 If it is not calling itself, if the voice command publisher issues a new voice command, obtain the line-of-sight range information when the voice command publisher issues a new voice command;
- Step 405 If it is not a pronoun representing a singular number, confirm that the voice command has the highest sound pressure level and the second largest robot in the candidate group, so that the sound pressure level is the largest and the second largest robot asks whether the voice command issuer is only calling the sound pressure.
- Step 406 respond to the voice command issuer if it is only the robot that calls the voice pressure level maximum and the second largest;
- the responding voice command issuer may be to further query the user command or execute the user command. It is also possible to broadcast a message to the candidate group, cause each member to quit, and revoke the candidate group.
- Step 407 Otherwise, when the voice command publisher issues a new voice command, obtain the line-of-sight range information when the voice command publisher issues a new voice command;
- Step 408 Broadcast visual range information when the voice command issuer issues a new voice command in the candidate group.
- the method further includes:
- the voice command publisher looks at himself, and if he looks at himself, he confirms that he is Call, respond to the voice command publisher.
- the method for obtaining the visual range information when the voice command publisher issues a new voice command, and determining whether the voice command publisher looks at the voice command when the voice command is issued according to the line of sight range information may refer to the above explanation. , will not repeat them here.
- step 401 it is confirmed whether the wake-up word is a pronoun indicating a singular number, the robot that confirms the voice pressure level of the voice command in the candidate group is the largest in step 402, and the voice command level is the largest and the voice command in the candidate group is confirmed in step 405.
- the two largest robots may be executed by only one robot or several robots. For example, each robot can broadcast its own working state to other robots, and the robot can execute it through the network. Other robots.
- Step 402 While the voice command issuer is queried in step 402 and steps 403 and 404 are performed by the robot with the highest sound pressure level, the voice command issuer is queried in step 405 and steps 406 and 407 are performed by the highest and second largest robots.
- Step 408 is performed by a robot having a maximum sound pressure level and a second largest sound pressure level.
- a flowchart of an embodiment of the method in the embodiment, the method includes:
- Step 501 Listening to the voice signal, parsing the wake-up word in the voice signal, and confirming the sound pressure level of the voice signal;
- Step 502 If the appearance of the wake-up word is a call, and the sound pressure level of the voice signal exceeds a preset threshold, record the start time and the end time of the wake-up word as time information of the voice command, and record the location
- the sound pressure level of the speech signal is used as the sound pressure level of the voice command
- Step 503 Identify a voice signal issuer according to the voice signal, record the voice signal issuer as voice command issuer information, and broadcast the sound pressure level, time information, and voice command issuer information;
- Step 504 Confirm whether there is a robot that hears the same voice command, and if so, join the robot to the same candidate group;
- Step 505 Acquire visual information and location information of the self that meets the time information, and obtain line-of-sight range information of the voice command publisher according to the visual information and the location information of the time information;
- Step 506 If there is a line of sight range information of the voice command publisher, the line of sight range information of the voice command publisher is broadcasted in the candidate group;
- Step 507 If the line-of-sight range information when the voice command publisher issues the voice command is obtained, if the voice command is issued according to the line-of-sight range information, whether the voice command publisher faces the user, and if facing the self, confirms that he is called. ;
- the line-of-sight range information may be the line-of-sight range information acquired by the robot itself, or may be the line-of-sight range information of other robots received by the robot.
- Step 508 If the line-of-sight range information when the voice command issuer issues the voice command is not obtained, step 509 is performed;
- Step 509 Confirm whether the wake-up word is a pronoun representing a singular number
- Step 510 If it is a pronoun representing a singular number, the robot that has the highest sound pressure level of the voice command in the candidate group is confirmed, and the robot with the highest sound pressure level is asked whether the voice command publisher is calling the robot with the highest sound pressure level; If you call yourself, go to step 512, otherwise go to step 513;
- Step 511 If it is not a pronoun representing a singular number, confirm that the voice command has the highest sound pressure level and the second largest robot in the candidate group, so that the sound pressure level is the largest and the second largest robot asks the voice command publisher whether Only call the robot with the highest sound pressure level and the second largest, if yes, go to step 512, otherwise go to step 513;
- Step 512 Respond to the voice command publisher
- Step 513 When the voice command publisher issues a new voice command, obtain the line-of-sight range information when the voice command publisher issues a new voice command.
- Step 514 Vision range information when a new voice command issued by the voice command issuer is broadcasted in the candidate group;
- Step 515 If the visual range information when the voice command publisher issues a new voice command is obtained, if the voice command is confirmed according to the line-of-sight information, whether the voice command publisher looks at himself or not, if you look at yourself, Confirm that you are called and respond to the voice command publisher.
- the embodiment of the present invention further provides a robot wake-up device, which is disposed in any of the robots shown in FIG. 1 or FIG. 2.
- the wake-up device 600 includes:
- a line-of-sight range obtaining module 601 configured to acquire line-of-sight range information when a voice command issuer issues a voice command;
- the call confirmation module 602 is configured to: if the line-of-sight range information when the voice command issuer issues the voice command is obtained, confirm whether the voice command publisher looks at himself when the voice command is issued according to the line-of-sight range information, and if you look at yourself, Make sure you are called.
- the robot determines whether the voice command publisher pays attention to himself when issuing the voice command according to the line of sight range information when the voice command publisher issues the voice command, and confirms that he or she is called if he or she looks at himself.
- the pronoun can be used as the wake-up word to wake up the robot without the user having to remember the name of each robot, thereby improving the user experience.
- FIG. 9 is a schematic structural diagram of another embodiment of the device.
- the device 700 includes:
- the line-of-sight range obtaining module 701, the call confirmation module 702, and the query module 703 are configured to query the voice command issuer if the line-of-sight range information when the voice command issuer issues the voice command is not acquired.
- the line of sight range obtaining module 801 includes:
- the voice command acquisition sub-module 8011 is configured to acquire voice command information, where the voice command information includes time information of the voice command and voice command issuer information.
- a voice command broadcast module 8012 configured to broadcast the voice command information
- the candidate group is added to the sub-module 8013 for confirming whether there is a robot that hears the same voice command, and if so, the robot is added to the same candidate group.
- a line of sight range obtaining sub-module 8014 configured to acquire visual information of the self that conforms to the time information And location information, according to the visual information and the location information of the user, obtaining the line of sight range information of the voice command publisher;
- the broadcast sub-module 8015 is configured to broadcast the line-of-sight range information of the voice command issuer in the candidate group.
- the voice command information further includes a sound pressure level of the voice command
- the voice command acquisition sub-module 900 includes:
- a voice monitoring subunit 901 configured to monitor a voice signal
- the wake-up word parsing sub-unit 902 is configured to parse out the wake-up words in the voice signal
- a sound pressure level confirmation subunit 903, configured to confirm a sound pressure level of the voice signal
- a first voice command recording sub-unit 904 configured to record the voice if the wake-up word is a pronoun representing a call and the appearance of the wake-up word is a call, and the sound pressure level of the voice signal exceeds a preset threshold
- the start time and the end time of the signal are used as time information of the voice command, and the sound pressure level of the voice signal is recorded as the sound pressure level of the voice command;
- the second voice command recording sub-unit 905 is configured to identify a voice signal issuer according to the voice signal, and record the voice signal issuer as voice command issuer information.
- the voice command information further includes a sound pressure level of the voice command
- the voice command acquisition submodule includes:
- a voice monitoring subunit for monitoring a voice signal
- a sound pressure level confirmation subunit for confirming a sound pressure level of the voice signal
- a third voice command recording subunit configured to record the wake word if the wake word is a pronoun representing a call and the appearance of the wake word is a call, and the sound pressure level of the voice signal exceeds a preset threshold
- the start time and the end time are used as time information of the voice command, and the sound pressure level of the voice signal is recorded as the sound pressure level of the voice command;
- a second voice command recording subunit configured to identify a voice signal issuer according to the voice signal, and record the voice signal issuer as voice command issuer information.
- the query module 1000 includes:
- the wake-up word confirmation sub-module 1001 is configured to confirm whether the wake-up word is a pronoun representing a singular number
- the first query sub-module 1002 is configured to, if it is a pronoun representing a singular, confirm the robot with the highest sound pressure level of the voice command in the candidate group, and ask the robot with the highest sound pressure level to ask whether the voice command publisher is the highest in the calling sound pressure level.
- the first response sub-module 1003 is configured to respond to the voice command issuer if it is calling itself;
- the first new line of sight range obtaining sub-module 1004 is configured to acquire, when the voice command publisher issues a new voice command, the line of sight range information when the voice command publisher issues a new voice command;
- the second query sub-module 1005 is configured to confirm, if not a singular pronoun, a voice command with a maximum voice pressure level and a second largest robot in the candidate group, so that the sound pressure level is the largest and the second largest robot asks the voice command publisher Whether it is only calling for the largest and second largest robot with sound pressure level;
- a second response sub-module 1006 for use in a robot that is only the highest and second largest in the calling sound pressure level
- a second new line of sight range obtaining sub-module 1007 for obtaining a voice command issuer if the voice command issuer issues a new voice command if it is not only calling itself and another robot having the highest or second highest sound pressure level Line of sight information when a new voice command is issued;
- a new visual range broadcast sub-module 1008 configured to broadcast visual range information when the voice command issuer issues a new voice command in the candidate group
- the device also includes:
- Calling a reconfirmation module if the visual range information when the voice command issuer issues a new voice command is obtained, and confirming, according to the line of sight range information, whether the voice command issuer looks at himself if the voice command is issued Look at yourself and confirm that you are called and respond to the voice command publisher.
- the line of sight range obtaining submodule 1100 includes:
- a time division subunit 1101 configured to divide time information of the voice command into multiple moments
- the line-of-sight direction confirmation sub-unit 1102 is configured to confirm, at each moment, the angle of the voice command publisher's face and itself according to the visual information of the moment, and obtain the moment according to the position information of the moment and the angle.
- the line-of-sight range acquisition sub-unit 1103 is configured to use the obtained line-of-sight direction and the time corresponding to the line-of-sight direction as the line-of-sight range information.
- the line of sight range obtaining submodule includes:
- a time division subunit configured to divide time information of the voice command into multiple moments
- the line-of-sight direction confirmation sub-unit is configured to confirm the angle of the voice command publisher's face and himself according to the visual information of the moment at a moment, and obtain the line-of-sight direction at the moment according to the position information of the moment and the angle. ;
- a second line of sight range acquisition subunit configured to acquire a line of sight interval and a time range corresponding to the line of sight interval according to the obtained line of sight direction and a time corresponding to the line of sight direction, and use the line of sight interval and the time range as a line of sight Range information.
- the call confirmation module 1200 includes:
- the self-position acquisition sub-module 1201 is configured to acquire self-position information of the preset time period
- the first gaze confirmation sub-module 1202 is configured to confirm whether the self-position information conforms to the line-of-sight direction at any time of the line-of-sight range information, and if so, confirm that the voice command issuer looks at himself.
- the call confirmation module includes:
- the self-position acquisition sub-module is configured to acquire self-position information of the preset time period
- the second gaze confirmation sub-module is configured to confirm whether the self-position information conforms to the line-of-sight interval within the time range of the line-of-sight range information, and if so, confirm that the voice command issuer looks at himself.
- the above-mentioned wake-up device can perform the wake-up method provided by the embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the execution method.
- the wake-up method provided by the embodiment of the present invention.
- FIG. 15 is a schematic diagram showing the hardware structure of the robot 20 for the robot wake-up method according to the embodiment of the present invention. As shown in FIG. 15, the robot 20 includes:
- One or more processors 21 and a memory 22 are exemplified by a processor 21 in FIG.
- the processor 21 and the memory 22 can be connected by a bus or other means, as exemplified by a bus connection in FIG.
- the memory 22 is used as a non-volatile computer readable storage medium, and can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to the wake-up method in the embodiment of the present invention.
- Module for example, call confirmation module 601 shown in Figure 8.
- the processor 21 executes various functional applications of the server and data processing by executing non-volatile software programs, instructions, and modules stored in the memory 22, that is, implementing the wake-up method of the above method embodiments.
- the memory 22 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to the use of the wake-up device, and the like. Further, the memory 22 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other nonvolatile solid state storage device. In some embodiments, memory 22 can optionally include memory remotely located relative to processor 21, which can be connected to the wake-up device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
- the one or more modules are stored in the memory 22, and when executed by the one or more processors 21, perform the wake-up method in any of the above-described method embodiments, for example, performing the above described FIG. Method steps 101-102, method step 201 to step 203 in FIG. 4, method step 301 to step 305 in FIG. 5, method step 401 to step 408 in FIG. 6, method step 501 to step 515 in FIG.
- the above product can perform the method provided by the embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the execution method.
- the above product can perform the method provided by the embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the execution method.
- Embodiments of the present invention provide a non-transitory computer readable storage medium storing computer executable instructions that are executed by one or more processors Executing, for example, a processor 21 in FIG. 15 may cause the one or more processors to perform the wake-up method in any of the above-described method embodiments, for example, to perform the method steps 101-102 in FIG. 3 described above, Method step 201 to step 203 in FIG. 4, method step 301 to step 305 in FIG. 5, method step 401 to step 408 in FIG. 6, method step 501 to step 515 in FIG. 7, and module 601 in FIG. And 602, sub-module 701, 702 and 703 in FIG. 9, module 801-803 in FIG. 10, sub-module 8011-8015, sub-unit 901-905 in FIG. 11, sub-module 1001-1008 in FIG. 12, sub-module in FIG. Units 1101-1103, the functions of sub-modules 1201-1202 in FIG.
- the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mechanical Engineering (AREA)
- Robotics (AREA)
- Signal Processing (AREA)
- Manufacturing & Machinery (AREA)
- Automation & Control Theory (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Manipulator (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (23)
- 一种机器人唤醒方法,所述唤醒方法应用于机器人,其特征在于,所述方法包括:获取语音命令发布者发布语音命令时的视线范围信息;如果获取到语音命令发布者发布语音命令时的视线范围信息,则根据所述视线范围信息确认语音命令被发布时,语音命令发布者是否注视自己,如果注视自己,则确认自己被呼唤。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:如果未获取到语音命令发布者发布语音命令时的视线范围信息,则询问所述语音命令发布者。
- 根据权利要求1或2所述的方法,其特征在于,所述获取语音命令发布者发布语音命令时的视线范围信息,包括:获取语音命令信息,所述语音命令信息包括语音命令的时间信息和语音命令发布者信息;广播所述语音命令信息;确认是否存在听到同一语音命令的机器人,如果存在,则使所述机器人加入相同的候选组;获取符合所述时间信息的自身的视觉信息和位置信息,根据自身的视觉信息和位置信息,获取语音命令发布者发布语音命令时的视线范围信息;如果存在语音命令发布者发布语音命令时的视线范围信息,则在候选组内广播所述语音命令发布者发布语音命令时的视线范围信息。
- 根据权利要求3所述的方法,其特征在于,所述语音命令信息还包括语音命令的声压级,所述获取语音命令信息,包括:监听语音信号;解析出所述语音信号中的唤醒词;确认所述语音信号的声压级;如果所述唤醒词的出现为呼唤,且所述语音信号的声压级超过预设阀值,则记录所述语音信号的起始时刻和终止时刻作为语音命令的时间信息,记录所述语音信号的声压级作为语音命令的声压级;根据所述语音信号识别出语音信号发出者,记录所述语音信号发出者作为语音命令发布者信息。
- 根据权利要求3所述的方法,其特征在于,所述语音命令信息还包括语音命令的声压级,所述获取语音命令信息,包括:监听语音信号;解析出所述语音信号中的唤醒词;确认所述语音信号的声压级;如果所述唤醒词的出现为呼唤,且所述语音信号的声压级超过预设阀值,则记录所述唤醒词的起始时刻和终止时刻作为语音命令的时间信息,记录所述语音信号的声压级作为语音命令的声压级;根据所述语音信号识别出语音信号发出者,记录所述语音信号发出者作为语音命令发布者信息。
- 根据权利要求4或5所述的方法,其特征在于,所述唤醒词为表示呼唤的代词;所述询问所述语音命令发布者,包括:确认所述唤醒词是否是表示单数的代词;如果是表示单数的代词,则确认候选组中语音命令的声压级最大的机器人,使声压级最大的机器人询问语音命令发布者是否在呼唤声压级最大的机器人;如果是在呼唤声压级最大的机器人,则响应语音命令发布者;否则,则在语音命令发布者发布新的语音命令的场合,获取语音命令发布者发布新的语音命令时的视线范围信息;如果不是表示单数的代词,则确认候选组中语音命令声压级最大和第二大的机器人,使声压级最大和第二大的机器人询问语音命令发布者是否仅在呼唤声压级最大和第二大的机器人;如果是仅在呼唤声压级最大和第二大的机器人,则响应语音命令发布者;否则,在语音命令发布者发布新的语音命令的场合,获取语音命令发布者发布新的语音命令时的视线范围信息;在候选组内广播所述语音命令发布者发布新的语音命令时的视觉范围信息;所述方法还包括:如果获取到所述语音命令发布者发布新的语音命令时的视觉范围信息,则根据所述视线范围信息确认语音命令被发布时,语音命令发布者是否注视自己,如果注视自己,则确认自己被呼唤,响应语音命令发布者。
- 根据权利要求3-6的任意一项所述的方法,其特征在于,所述根据自身的视觉信息和位置信息,获取语音命令发布者的视线范围信息,包括:将所述语音命令的时间信息均分成多个时刻;在每一时刻,根据这一时刻的视觉信息确认语音命令发布者面部与自身的角度,并根据这一时刻自身的位置信息和所述角度获得这一时刻的视线方向;将获得的视线方向和与所述视线方向对应的时刻作为视线范围信息。
- 根据权利要求3-6的任意一项所述的方法,其特征在于,所述根据自身的视觉信息和位置信息获取语音命令发布者的视线范围信息,包括:将所述语音命令的时间信息均分成多个时刻;在每一时刻,根据这一时刻的视觉信息确认语音命令发布者面部与自身的角度,并根据这一时刻自身的位置信息和所述角度获得这一时刻的视线方向;根据获得的视线方向以及与所述视线方向对应的时刻获取视线区间以及与所述视线区间对应的时间范围,将所述视线区间和所述时间范围作为视线范围信息。
- 根据权利要求7所述的方法,其特征在于,所述根据所述视线范围信息确认语音命令被发布时,语音命令发布者是否注视自己,包括:获取预设时段的自身位置信息;确认是否存在在所述视线范围信息的一个时刻,自身位置信息符合所述视线方向,如果存在,则确认语音命令发布者注视自己。
- 根据权利要求8所述的方法,其特征在于,所述根据所述视线范围信息确认语音命令被发布时,语音命令发布者是否注视自己,包括:获取预设时段的自身位置信息;确认在所述视线范围信息的时间范围内,自身位置信息是否符合所述视线区间,如果符合,则确认语音命令发布者注视自己。
- 一种机器人唤醒装置,所述唤醒装置应用于机器人,其特征在于,所述装置包括:视线范围获取模块,用于获取语音命令发布者发布语音命令时的视线范围信息;呼唤确认模块,用于如果获取到语音命令发布者发布语音命令时的视线范围信息,则根据所述视线范围信息确认语音命令被发布时,语音命令发布者是否注视自己,如果注视自己,则确认自己被呼唤。
- 根据权利要求11所述的装置,其特征在于,所述装置还包括:询问模块,用于如果未获取到语音命令发布者发布语音命令时的视线范围信息,则询问所述语音命令发布者。
- 根据权利要求11或12所述的装置,其特征在于,所述视线范围获取模块包括:语音命令获取子模块,用于获取语音命令信息,所述语音命令信息包括语音命令的时间信息和语音命令发布者信息;语音命令广播模块,用于广播所述语音命令信息;候选组加入子模块,用于确认是否存在听到同一语音命令的机器人,如果存在,则使所述机器人加入相同的候选组视线范围获取子模块,用于获取符合所述时间信息的自身的视觉信息和位置信息,根据自身的视觉信息和位置信息,获取语音命令发布者发布语音命令时的视线范围信息;广播子模块,用于如果存在语音命令发布者发布语音命令时的视线范围信息,则在候选组内广播所述语音命令发布者发布语音命令时的视线范围信息。
- 根据权利要求13所述的装置,其特征在于,所述语音命令信息还包括语音命令的声压级,所述语音命令获取子模块包括:语音监听子单元,用于监听语音信号;唤醒词解析子单元,用于解析出所述语音信号中的唤醒词;声压级确认子单元,用于确认所述语音信号的声压级;第一语音命令记录子单元,用于如果所述唤醒词为表示呼唤的代词且该唤醒词的出现为呼唤,而且所述语音信号的声压级超过预设阀值,则记录所述语音信号的起始时刻和终止时刻作为语音命令的时间信息,记录所述语音信号的声压级作为语音命令的声压级;第二语音命令记录子单元,用于根据所述语音信号识别出语音信号发出者,记录所述语音信号发出者作为语音命令发布者信息。
- 根据权利要求13所述的装置,其特征在于,所述语音命令信息还包括语音命令的声压级,所述语音命令获取子模块包括:语音监听子单元,用于监听语音信号;唤醒词解析子单元,用于解析出所述语音信号中的唤醒词;声压级确认子单元,用于确认所述语音信号的声压级;第三语音命令记录子单元,用于如果所述唤醒词为表示呼唤的代词且该唤醒词的出现为呼唤,而且所述语音信号的声压级超过预设阀值,则记录所述唤醒词的起始时刻和终止时刻作为语音命令的时间信息,记录所述语音信号的声压级作为语音命令的声压级;第二语音命令记录子单元,用于根据所述语音信号识别出语音信号发出者,记录所述语音信号发出者作为语音命令发布者信息。
- 根据权利要求14或15所述的装置,其特征在于,所述询问模块包括:唤醒词确认子模块,用于确认所述唤醒词是否是表示单数的代词;第一询问子模块,用于如果是表示单数的代词,则确认候选组中语音命令的声压级最大的机器人,使声压级最大的机器人询问语音命令发布者是否在呼唤声压级最大的机器人;第一响应子模块,用于如果是在呼唤声压级最大的机器人,则响应语音命令发布者;第一新视线范围获取子模块,用于如果不是在呼唤声压级最大的机器人, 则在语音命令发布者发布新的语音命令的场合,获取语音命令发布者发布新的语音命令时的视线范围信息;第二询问子模块,用于如果不是表示单数的代词,则确认候选组中语音命令声压级最大和第二大的机器人,使声压级最大和第二大的机器人询问语音命令发布者是否仅在呼唤声压级最大和第二大的机器人;第二响应子模块,用于如果是仅在呼唤声压级最大和第二大的机器人,则响应语音命令发布者;第二新视线范围获取子模块,用于如果不是仅在呼唤声压级最大和第二大的机器人,在语音命令发布者发布新的语音命令的场合,获取语音命令发布者发布新的语音命令时的视线范围信息;新视觉范围广播子模块,用于在候选组内广播所述语音命令发布者发布新的语音命令时的视觉范围信息所述装置还包括:呼唤再次确认模块,用于如果获取到所述语音命令发布者发布新的语音命令时的视觉范围信息,则根据所述视线范围信息确认语音命令被发布时,语音命令发布者是否注视自己,如果注视自己,则确认自己被呼唤,响应语音命令发布者。
- 根据权利要求13-16的任意一项所述的装置,其特征在于,所述视线范围获取子模块包括:时间分割子单元,用于将所述语音命令的时间信息均分成多个时刻;视线方向确认子单元,用于在每一时刻,根据这一时刻的视觉信息确认语音命令发布者面部与自身的角度,并根据这一时刻自身的位置信息和所述角度获得这一时刻的视线方向;视线范围获取子单元,用于将获得的视线方向和与所述视线方向对应的时刻作为视线范围信息。
- 根据权利要求13-16的任意一项所述的装置,其特征在于,所述视线范围获取子模块包括:时间分割子单元,用于将所述语音命令的时间信息均分成多个时刻;视线方向确认子单元,用于在每一时刻,根据这一时刻的视觉信息确认语音命令发布者面部与自身的角度,并根据这一时刻自身的位置信息和所述角度获得这一时刻的视线方向;第二视线范围获取子单元,用于根据获得的视线方向以及与所述视线方向对应的时刻获取视线区间以及与所述视线区间对应的时间范围,将所述视线区间和所述时间范围作为视线范围信息。
- 根据权利要求17所述的装置,其特征在于,所述呼唤确认模块包括:自身位置获取子模块,用于获取预设时段的自身位置信息;第一注视确认子模块,用于确认是否存在在所述视线范围信息的一个时刻,自身位置信息符合所述视线方向,如果存在,则确认语音命令发布者注视自己。
- 根据权利要求18所述的装置,其特征在于,所述呼唤确认模块包括:自身位置获取子模块,用于获取预设时段的自身位置信息;第二注视确认子模块,用于确认在所述视线范围信息的时间范围内,自身位置信息是否符合所述视线区间,如果符合,则确认语音命令发布者注视自己。
- 一种机器人,其特征在于,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-10任一项所述的方法。
- 一种非易失性计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机可执行指令,当所述计算机可执行指令被机器人执行时,使所述机器人执行执行权利要求1-10任一项所述的方法。
- 一种计算机程序产品,其特征在于,所述计算机程序产品包括存储在非易失性计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被机器人执行时,使所述机器人执行权利要求1-10任一项所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019561852A JP6994292B2 (ja) | 2017-05-08 | 2017-05-08 | ロボットのウェイクアップ方法、装置及びロボット |
PCT/CN2017/083424 WO2018205083A1 (zh) | 2017-05-08 | 2017-05-08 | 机器人唤醒方法、装置和机器人 |
CN201780000608.6A CN108235745B (zh) | 2017-05-08 | 2017-05-08 | 机器人唤醒方法、装置和机器人 |
US16/678,267 US11276402B2 (en) | 2017-05-08 | 2019-11-08 | Method for waking up robot and robot thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/083424 WO2018205083A1 (zh) | 2017-05-08 | 2017-05-08 | 机器人唤醒方法、装置和机器人 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/678,267 Continuation US11276402B2 (en) | 2017-05-08 | 2019-11-08 | Method for waking up robot and robot thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018205083A1 true WO2018205083A1 (zh) | 2018-11-15 |
Family
ID=62643181
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/083424 WO2018205083A1 (zh) | 2017-05-08 | 2017-05-08 | 机器人唤醒方法、装置和机器人 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11276402B2 (zh) |
JP (1) | JP6994292B2 (zh) |
CN (1) | CN108235745B (zh) |
WO (1) | WO2018205083A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109945407A (zh) * | 2019-03-13 | 2019-06-28 | 青岛海尔空调器有限总公司 | 空调器 |
CN110928583A (zh) * | 2019-10-10 | 2020-03-27 | 珠海格力电器股份有限公司 | 一种终端唤醒方法、装置、设备和计算机可读存储介质 |
WO2021076164A1 (en) * | 2019-10-15 | 2021-04-22 | Google Llc | Detection and/or enrollment of hot commands to trigger responsive action by automated assistant |
CN113359538A (zh) * | 2020-03-05 | 2021-09-07 | 东元电机股份有限公司 | 语音控制机器人 |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10621983B2 (en) * | 2018-04-20 | 2020-04-14 | Spotify Ab | Systems and methods for enhancing responsiveness to utterances having detectable emotion |
KR102628211B1 (ko) * | 2018-08-29 | 2024-01-23 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
CN109065060B (zh) * | 2018-10-23 | 2021-05-07 | 维沃移动通信有限公司 | 一种语音唤醒方法及终端 |
CN109358751A (zh) * | 2018-10-23 | 2019-02-19 | 北京猎户星空科技有限公司 | 一种机器人的唤醒控制方法、装置及设备 |
CN110164433A (zh) * | 2019-04-03 | 2019-08-23 | 美国乐歌有限公司 | 一种用于升降立柱的语音控制系统及方法 |
US11482217B2 (en) * | 2019-05-06 | 2022-10-25 | Google Llc | Selectively activating on-device speech recognition, and using recognized text in selectively activating on-device NLU and/or on-device fulfillment |
CN110737335B (zh) * | 2019-10-11 | 2021-03-23 | 深圳追一科技有限公司 | 机器人的交互方法、装置、电子设备及存储介质 |
CN113032017B (zh) * | 2019-12-25 | 2024-02-02 | 大众问问(北京)信息科技有限公司 | 一种设备唤醒方法、装置及电子设备 |
CN111443801B (zh) * | 2020-03-25 | 2023-10-13 | 北京百度网讯科技有限公司 | 人机交互方法、装置、设备及存储介质 |
CN112786044A (zh) * | 2020-12-30 | 2021-05-11 | 乐聚(深圳)机器人技术有限公司 | 语音控制方法、装置、主控制器、机器人及存储介质 |
US11934203B2 (en) * | 2021-05-06 | 2024-03-19 | Bear Robotics, Inc. | Method, system, and non-transitory computer-readable recording medium for controlling a robot |
US20230081605A1 (en) * | 2021-09-16 | 2023-03-16 | Apple Inc. | Digital assistant for moving and copying graphical elements |
CN113814981B (zh) * | 2021-10-18 | 2023-06-20 | 北京云迹科技股份有限公司 | 机器人运行方法、装置、存储介质和机器人 |
CN114227698B (zh) * | 2022-01-27 | 2024-04-26 | 上海擎朗智能科技有限公司 | 一种机器人的控制方法、装置、设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1423228A (zh) * | 2002-10-17 | 2003-06-11 | 南开大学 | 识别人眼注视方向的装置和方法及其应用 |
KR20080019834A (ko) * | 2006-08-29 | 2008-03-05 | (주)제이투디자인 | 로봇을 이용한 음성 경보 시스템 및 방법 |
CN106203259A (zh) * | 2016-06-27 | 2016-12-07 | 旗瀚科技股份有限公司 | 机器人的交互方向调整方法及装置 |
CN106292732A (zh) * | 2015-06-10 | 2017-01-04 | 上海元趣信息技术有限公司 | 基于声源定位和人脸检测的智能机器人转动方法 |
CN106448663A (zh) * | 2016-10-17 | 2017-02-22 | 海信集团有限公司 | 语音唤醒方法及语音交互装置 |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6721706B1 (en) * | 2000-10-30 | 2004-04-13 | Koninklijke Philips Electronics N.V. | Environment-responsive user interface/entertainment device that simulates personal interaction |
EP1215658A3 (en) * | 2000-12-05 | 2002-08-14 | Hewlett-Packard Company | Visual activation of voice controlled apparatus |
KR20070029794A (ko) * | 2004-07-08 | 2007-03-14 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | 유저와 시스템 사이에 통신을 하기 위한 방법 및 시스템 |
JP4204541B2 (ja) * | 2004-12-24 | 2009-01-07 | 株式会社東芝 | 対話型ロボット、対話型ロボットの音声認識方法および対話型ロボットの音声認識プログラム |
FR2963132A1 (fr) * | 2010-07-23 | 2012-01-27 | Aldebaran Robotics | Robot humanoide dote d'une interface de dialogue naturel, methode d'utilisation et de programmation de ladite interface |
US20150109191A1 (en) * | 2012-02-16 | 2015-04-23 | Google Inc. | Speech Recognition |
US20130238326A1 (en) * | 2012-03-08 | 2013-09-12 | Lg Electronics Inc. | Apparatus and method for multiple device voice control |
US9823742B2 (en) | 2012-05-18 | 2017-11-21 | Microsoft Technology Licensing, Llc | Interaction and management of devices using gaze detection |
US9143880B2 (en) * | 2013-08-23 | 2015-09-22 | Tobii Ab | Systems and methods for providing audio to a user based on gaze input |
US10430150B2 (en) * | 2013-08-23 | 2019-10-01 | Tobii Ab | Systems and methods for changing behavior of computer program elements based on gaze input |
US10317992B2 (en) * | 2014-09-25 | 2019-06-11 | Microsoft Technology Licensing, Llc | Eye gaze for spoken language understanding in multi-modal conversational interactions |
US9318107B1 (en) | 2014-10-09 | 2016-04-19 | Google Inc. | Hotword detection on multiple devices |
KR102387567B1 (ko) * | 2015-01-19 | 2022-04-18 | 삼성전자주식회사 | 음성 인식 방법 및 음성 인식 장치 |
US9652035B2 (en) * | 2015-02-23 | 2017-05-16 | International Business Machines Corporation | Interfacing via heads-up display using eye contact |
US20170262051A1 (en) * | 2015-03-20 | 2017-09-14 | The Eye Tribe | Method for refining control by combining eye tracking and voice recognition |
JP6739907B2 (ja) | 2015-06-18 | 2020-08-12 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 機器特定方法、機器特定装置及びプログラム |
CN105204628A (zh) * | 2015-09-01 | 2015-12-30 | 涂悦 | 一种基于视觉唤醒的语音控制方法 |
JP6447578B2 (ja) * | 2016-05-27 | 2019-01-09 | トヨタ自動車株式会社 | 音声対話装置および音声対話方法 |
CN106155326A (zh) * | 2016-07-26 | 2016-11-23 | 北京小米移动软件有限公司 | 虚拟现实通讯中的对象识别方法和装置、虚拟现实设备 |
US10534429B2 (en) * | 2017-01-10 | 2020-01-14 | International Business Machines Corporation | Method of instant sharing invoked from wearable devices |
-
2017
- 2017-05-08 JP JP2019561852A patent/JP6994292B2/ja active Active
- 2017-05-08 CN CN201780000608.6A patent/CN108235745B/zh active Active
- 2017-05-08 WO PCT/CN2017/083424 patent/WO2018205083A1/zh active Application Filing
-
2019
- 2019-11-08 US US16/678,267 patent/US11276402B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1423228A (zh) * | 2002-10-17 | 2003-06-11 | 南开大学 | 识别人眼注视方向的装置和方法及其应用 |
KR20080019834A (ko) * | 2006-08-29 | 2008-03-05 | (주)제이투디자인 | 로봇을 이용한 음성 경보 시스템 및 방법 |
CN106292732A (zh) * | 2015-06-10 | 2017-01-04 | 上海元趣信息技术有限公司 | 基于声源定位和人脸检测的智能机器人转动方法 |
CN106203259A (zh) * | 2016-06-27 | 2016-12-07 | 旗瀚科技股份有限公司 | 机器人的交互方向调整方法及装置 |
CN106448663A (zh) * | 2016-10-17 | 2017-02-22 | 海信集团有限公司 | 语音唤醒方法及语音交互装置 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109945407A (zh) * | 2019-03-13 | 2019-06-28 | 青岛海尔空调器有限总公司 | 空调器 |
CN110928583A (zh) * | 2019-10-10 | 2020-03-27 | 珠海格力电器股份有限公司 | 一种终端唤醒方法、装置、设备和计算机可读存储介质 |
WO2021076164A1 (en) * | 2019-10-15 | 2021-04-22 | Google Llc | Detection and/or enrollment of hot commands to trigger responsive action by automated assistant |
US11948556B2 (en) | 2019-10-15 | 2024-04-02 | Google Llc | Detection and/or enrollment of hot commands to trigger responsive action by automated assistant |
CN113359538A (zh) * | 2020-03-05 | 2021-09-07 | 东元电机股份有限公司 | 语音控制机器人 |
Also Published As
Publication number | Publication date |
---|---|
JP6994292B2 (ja) | 2022-01-14 |
CN108235745B (zh) | 2021-01-08 |
CN108235745A (zh) | 2018-06-29 |
US20200090653A1 (en) | 2020-03-19 |
US11276402B2 (en) | 2022-03-15 |
JP2020521997A (ja) | 2020-07-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018205083A1 (zh) | 机器人唤醒方法、装置和机器人 | |
US20220012470A1 (en) | Multi-user intelligent assistance | |
CN107223280B (zh) | 机器人唤醒方法、装置和机器人 | |
EP3611724A1 (en) | Voice response method and device, and smart device | |
WO2019007245A1 (zh) | 一种处理方法、控制方法、识别方法及其装置和电子设备 | |
CN109473092B (zh) | 一种语音端点检测方法及装置 | |
EP3896691A1 (en) | Speech interaction method, device and system | |
US20200374630A1 (en) | Human-machine interaction method and device, computer apparatus, and storage medium | |
WO2017059815A1 (zh) | 一种快速识别方法及家庭智能机器人 | |
CN107390851A (zh) | 支持准始终聆听的智能聆听模式 | |
WO2019227370A1 (zh) | 一种多语音助手控制方法、装置、系统及计算机可读存储介质 | |
CN110705356B (zh) | 功能控制方法及相关设备 | |
CN104078045A (zh) | 一种识别的方法及电子设备 | |
CN111251307A (zh) | 应用于机器人的语音采集方法和装置、一种机器人 | |
CN113593544A (zh) | 设备的控制方法和装置、存储介质及电子装置 | |
CN111339881A (zh) | 基于情绪识别的宝宝成长监护方法及系统 | |
WO2022089131A1 (zh) | 智能终端自动回信方法、装置、计算机设备和存储介质 | |
CN113138559A (zh) | 设备交互方法、装置、电子设备及存储介质 | |
WO2018014265A1 (zh) | 一种智能加湿器控制系统 | |
CN110958348B (zh) | 语音处理方法、装置、用户设备及智能音箱 | |
CN111354144A (zh) | 安防预警系统、方法、设备及存储介质 | |
US11804213B2 (en) | Systems and methods for training a control system based on prior audio inputs | |
CN112837694B (zh) | 设备唤醒方法、装置、存储介质及电子装置 | |
KR102134860B1 (ko) | 인공지능 스피커 및 이의 비언어적 요소 기반 동작 활성화 방법 | |
CN112820273A (zh) | 唤醒判别方法和装置、存储介质及电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17909066 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019561852 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.04.2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17909066 Country of ref document: EP Kind code of ref document: A1 |