CN117021083A - Robot, control method and device thereof, and storage medium - Google Patents

Robot, control method and device thereof, and storage medium Download PDF

Info

Publication number
CN117021083A
CN117021083A CN202311002234.5A CN202311002234A CN117021083A CN 117021083 A CN117021083 A CN 117021083A CN 202311002234 A CN202311002234 A CN 202311002234A CN 117021083 A CN117021083 A CN 117021083A
Authority
CN
China
Prior art keywords
robot
text information
capability
information
incremental
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311002234.5A
Other languages
Chinese (zh)
Inventor
尚子涵
杜坤
刘凯
文林风
丁松
易鹏
白忠星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Robot Technology Co ltd
Original Assignee
Beijing Xiaomi Robot Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Robot Technology Co ltd filed Critical Beijing Xiaomi Robot Technology Co ltd
Priority to CN202311002234.5A priority Critical patent/CN117021083A/en
Publication of CN117021083A publication Critical patent/CN117021083A/en
Pending legal-status Critical Current

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/003Controls for manipulators by means of an audio-responsive input
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The disclosure relates to the technical field of robots, and particularly provides a robot, a control method, a control device and a storage medium thereof. A robot control method comprises the steps of responding to detection of a voice command of a target user, converting the voice command into incremental text information in a text form, carrying out semantic recognition according to the incremental text information and cached historical text information of the target user, obtaining semantic information corresponding to the voice command, calling a capability set of a robot to generate a target control task based on the semantic information, and controlling the robot to operate according to the target control task. In the embodiment of the disclosure, the difficulty and complexity of robot programming control are reduced by utilizing voice programming, the task can be controlled according to the free combination of instructions by semantic recognition of the voice instructions of the user, the flexibility of robot control is stronger, the fuzzy instructions of the user can be recognized by carrying out context association on the historical instructions, and the control effect of the robot is improved.

Description

Robot, control method and device thereof, and storage medium
Technical Field
The disclosure relates to the technical field of robots, and in particular relates to a robot, a control method and device thereof, and a storage medium.
Background
With the development of robot technology, more and more researches are being conducted on a bionic robot, which has excellent motion balancing capability, and may include, for example, a bipedal robot, a quadruped robot, and the like. In the related art, aiming at behavior control of the bionic robot, professional personnel are required to program tasks, the technical threshold is high, and the programming operation is complex.
Disclosure of Invention
In order to improve a robot control effect, the embodiment of the disclosure provides a robot control method, a robot control device, a robot and a storage medium.
In a first aspect, embodiments of the present disclosure provide a robot control method, applied to a robot, the method including:
in response to detecting a voice command of a target user, converting the voice command into incremental text information in a text form;
carrying out semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice instruction;
invoking one or more capability modules in the capability set of the robot based on the semantic information, and generating a target control task corresponding to the voice instruction;
And controlling the robot to run according to the target control task.
In some embodiments, the responding to the detection of the voice instruction of the target user, converting the voice instruction into the text-form incremental text information comprises:
in response to detecting a voice command, extracting first voiceprint information of the voice command;
matching the first voiceprint information with the prestored second voiceprint information of the target user, and determining that the voice instruction of the target user is detected under the condition that the matching is successful;
and performing text conversion on the voice command to obtain the incremental text information.
In some embodiments, the performing semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice instruction includes:
obtaining full text information according to the incremental text information and the cached historical text information which is positioned in front of the incremental text information and is in a preset quantity;
and carrying out semantic recognition on the full text information to obtain the semantic information corresponding to the voice instruction.
In some embodiments, the obtaining the full text information according to the incremental text information and the cached historical text information located in front of the incremental text information and in a preset amount includes:
Caching the incremental text information in the queue tail of an instruction list corresponding to the target user;
acquiring the incremental text information from the instruction list and a preset number of historical text information positioned in front of the incremental text information;
and performing text splicing according to the incremental text information and the historical text information with the preset quantity before the incremental text information to obtain the full text information.
In some embodiments, invoking one or more capability modules in the capability set of the robot based on the semantic information, generating a target control task corresponding to the voice instruction comprises:
determining one or more target capability modules corresponding to the voice instruction from a capability set of the robot based on the semantic information;
and calling the target capacity module to carry out code programming based on a preset code format to obtain the target control task.
In some embodiments, the performing semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice instruction includes:
and under the condition that the voice command is determined to be a non-built-in command of the robot according to the incremental text information, executing a process of carrying out semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice command.
In some embodiments, the method further comprises:
and under the condition that the voice command is determined to be the built-in command of the robot according to the increment text information, determining that the voice command corresponds to the target control task according to the mapping relation corresponding to the built-in command.
In some embodiments, the process of pre-establishing the capability set of the robot comprises:
classifying the basic capability of the robot according to preset categories to obtain the capability modules corresponding to each category;
and constructing and obtaining the capability set according to the capability modules corresponding to the categories.
In some embodiments, the classifying the basic capability of the robot according to a preset category to obtain the capability module corresponding to each category includes:
for each category, constructing one or more capability sub-modules according to at least one basic capability corresponding to the category;
and obtaining the capability module corresponding to the category according to the basic capability and the capability sub-module.
In a second aspect, embodiments of the present disclosure provide a robot control device applied to a robot, the device including:
A text conversion module configured to convert a voice instruction of a target user into incremental text information in text form in response to detecting the voice instruction;
the semantic recognition module is configured to perform semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice instruction;
a task programming module configured to invoke one or more capability modules in the capability set of the robot based on the semantic information, generating a target control task corresponding to the voice instruction;
and the task running module is configured to control the robot to run according to the target control task.
In some embodiments, the text conversion module is configured to:
in response to detecting a voice command, extracting first voiceprint information of the voice command;
matching the first voiceprint information with the prestored second voiceprint information of the target user, and determining that the voice instruction of the target user is detected under the condition that the matching is successful;
and performing text conversion on the voice command to obtain the incremental text information.
In some embodiments, the semantic recognition module is configured to:
obtaining full text information according to the incremental text information and the cached historical text information which is positioned in front of the incremental text information and is in a preset quantity;
and carrying out semantic recognition on the full text information to obtain the semantic information corresponding to the voice instruction.
In some embodiments, the semantic recognition module is configured to:
caching the incremental text information in the queue tail of an instruction list corresponding to the target user;
acquiring the incremental text information from the instruction list and a preset number of historical text information positioned in front of the incremental text information;
and performing text splicing according to the incremental text information and the historical text information with the preset quantity before the incremental text information to obtain the full text information.
In some embodiments, the task programming module is configured to:
determining one or more target capability modules corresponding to the voice instruction from a capability set of the robot based on the semantic information;
and calling the target capacity module to carry out code programming based on a preset code format to obtain the target control task.
In some embodiments, the semantic recognition module is configured to:
and under the condition that the voice command is determined to be a non-built-in command of the robot according to the incremental text information, executing a process of carrying out semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice command.
In some embodiments, the semantic recognition module is configured to:
and under the condition that the voice command is determined to be the built-in command of the robot according to the increment text information, determining that the voice command corresponds to the target control task according to the mapping relation corresponding to the built-in command.
In some embodiments, the task programming module is configured to:
classifying the basic capability of the robot according to preset categories to obtain the capability modules corresponding to each category;
and constructing and obtaining the capability set according to the capability modules corresponding to the categories.
In some embodiments, the task programming module is configured to:
for each category, constructing one or more capability sub-modules according to at least one basic capability corresponding to the category;
And obtaining the capability module corresponding to the category according to the basic capability and the capability sub-module.
In a third aspect, embodiments of the present disclosure provide a robot comprising:
a processor;
a memory storing computer instructions for causing a processor to perform the method according to any implementation of the first aspect.
In a fourth aspect, an embodiment of the present disclosure provides a storage medium storing computer instructions for causing a computer to perform the method according to any embodiment of the first aspect.
The robot control method of the embodiment of the disclosure comprises the steps of responding to detection of a voice command of a target user, converting the voice command into incremental text information in a text form, carrying out semantic recognition according to the incremental text information and cached historical text information of the target user, obtaining semantic information corresponding to the voice command, calling a capability set of a robot based on the semantic information to generate a target control task, and controlling the robot to operate according to the target control task. In the embodiment of the disclosure, the difficulty and complexity of robot programming control are reduced by utilizing voice programming, the task can be controlled according to the free combination of instructions by semantic recognition of the voice instructions of the user, the flexibility of robot control is stronger, the fuzzy instructions of the user can be recognized by carrying out context association on the historical instructions, and the control effect of the robot is improved.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the prior art, the drawings that are required in the detailed description or the prior art will be briefly described, it will be apparent that the drawings in the following description are some embodiments of the present disclosure, and other drawings may be obtained according to the drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 is a schematic view of a scenario of a robot control method according to some embodiments of the present disclosure.
Fig. 2 is a flow chart of a robot control method in accordance with some embodiments of the present disclosure.
Fig. 3 is a flow chart of a robot control method in accordance with some embodiments of the present disclosure.
Fig. 4 is a flow chart of a robot control method in accordance with some embodiments of the present disclosure.
Fig. 5 is a flow chart of a robot control method in accordance with some embodiments of the present disclosure.
Fig. 6 is a flow chart of a robot control method in accordance with some embodiments of the present disclosure.
Fig. 7 is a schematic diagram of a robot control method according to some embodiments of the present disclosure.
Fig. 8 is a flow chart of a robot control method in accordance with some embodiments of the present disclosure.
Fig. 9 is a flow chart of a robot control method in accordance with some embodiments of the present disclosure.
Fig. 10 is a block diagram of a robot control device according to some embodiments of the present disclosure.
Fig. 11 is a block diagram of a robot in accordance with some embodiments of the present disclosure.
Detailed Description
The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure. In addition, technical features related to different embodiments of the present disclosure described below may be combined with each other as long as they do not make a conflict with each other.
The bionic robot has excellent motion balance capability and rich operability, so research on the bionic robot is one of important directions in the field of robots.
For example, taking a biped robot as an example, the biped robot is a humanoid robot, the lower limbs of the biped robot can swing like a human to realize operations such as walking, running, squatting, jumping and the like, and the upper limbs of the biped robot can simulate human arms to realize operations such as swinging arms, grabbing and the like. For example, a quadruped robot is taken as an example, the quadruped robot is a robot imitating quadruped animals, four limbs of the robot can swing like animals to realize operations such as walking, running, jumping and the like, and the quadruped robot has good motion balance capability.
The biped robot and the quadruped robot are bionic robots, and behavior control of the bionic robots is always the key point and the difficulty of robot research.
In the prior art, aiming at behavior control of a bionic robot, task programming is mainly performed by a worker, and then the robot is controlled to run according to tasks obtained by programming. Taking a robot demonstration scenario as an example, a worker needs to perform task programming on site in order to demonstrate the motion capability or other capabilities of the robot. The task programming mode is to combine a series of robot behavior combinations according to the basic capability of the robot, further program to obtain a task, and then execute the programming task by the robot to realize corresponding behavior control.
The capabilities of the robot are understood to mean the behavioral functions provided by the robot itself, such as functions of articulation, lights, navigation, radar, voice, etc., all of which may be referred to as capabilities of the robot. The basic capability is the minimum unit module which can not be split any more aiming at the behavior of the robot, and is also called an atomic module of the robot.
Each basic capability of the robot is an independent program module built in the robot, and when a worker performs task programming, one or more basic capabilities required by the task programming can be called through an API (Application Programming Interface ) provided by each basic capability, so that the robot behavior control is realized.
It can be seen that the technical threshold of the traditional robot programming control mode is high, programming operation is complex, so that the robot has low playability and practicability, and is difficult to apply on the floor.
In the field of software development, speech programming is an emerging programming approach that provides a programming solution for speech instructions to program code. For robot control, the programming threshold of the robot control code can be effectively reduced by utilizing voice programming, and a user only needs to send out a voice instruction, so that the robot can autonomously complete corresponding task programming.
In the related art, the conventional voice programming scheme is implemented through command mapping, that is, a developer is required to preset control codes corresponding to different capabilities in the robot, and a mapping relation between the control codes and text commands is established. After the robot collects the voice command of the user, the text command is obtained through voice conversion, then the control code corresponding to the text command is found by utilizing the built-in mapping relation, voice programming is achieved, and further voice control of the robot is completed through executing the control code.
However, the control instruction provided by the mode is very limited, and often only basic instructions such as 'forward', 'down' and the like can be realized, and the built-in instructions are program instructions which are written in advance and are built in the robot, so that the robot can only provide the movement behavior in a fixed mode, the behavior mode can not be combined at will according to the user instructions, the flexibility is insufficient, and the control effect is poor.
Based on the defects in the related art, the embodiment of the disclosure provides a robot control method, a device, a robot and a storage medium, which aim to reduce difficulty and complexity of robot programming control by utilizing voice programming, and through semantic recognition of user voice instructions, tasks can be freely combined and controlled according to instructions, the flexibility of robot control is stronger, and through context correlation of historical instructions, user fuzzy instructions can be recognized, and the control effect of the robot is improved.
Fig. 1 shows a schematic view of an application scenario of a robot according to some embodiments of the present disclosure, and is described below with reference to fig. 1.
As shown in fig. 1, in a scenario of a robot and a control method thereof of an example of the present disclosure, a user 100 and a robot 200 are included.
The user 100 refers to an object for issuing a voice command, and in the embodiment of the present disclosure, the user 100 is not limited to a professional with programming capability, and by means of voice programming, a person without any expert knowledge can also realize complex control over the robot 200.
Robot 200 refers to a target object for behavior control, and the robot behavior described in the embodiments of the present disclosure refers to any state change occurring on the robot, for example, the robot behavior may include movement of the robot, and may also include voice, vibration, light effect of the robot, etc., which is not limited by the present disclosure. In the embodiment of the present disclosure, the robot 200 may be any type of robot, such as a bipedal robot, a quadruped robot, etc., to which the present disclosure is not limited.
On the basis of the above-described scenario, a robot control method according to an embodiment of the present disclosure will be described below with reference to fig. 2.
In some embodiments, the disclosed embodiments provide a robot control method, which may be applied to the aforementioned robot 200, the control method including:
s210, responding to the detection of the voice instruction of the target user, and converting the voice instruction into incremental text information in a text form.
In the embodiment of the disclosure, the robot has the capability of picking up the environmental sound, for example, the robot can be provided with one or more microphones, the environmental sound can be picked up through the microphones, and when a user sends out a voice instruction, the robot picks up the environmental sound and can extract the voice instruction of the user from the environmental sound.
It should be noted that, in the embodiment of the present disclosure, the robot may perform voiceprint recognition on the picked voice command, so as to determine the identity of the user that issued the voice command. For example, in an example, as shown in fig. 1, when a user a issues a voice command, the robot picks up the voice command of the user a and performs voiceprint recognition, so as to determine that the user identity corresponding to the voice command is "user a".
In the embodiment of the disclosure, the target user refers to a user having control authority of the robot, for example, the target user may be a user having previously entered identity information in the robot, and the target user may be an administrator, a master, a developer, a visitor, or the like of the robot.
In some embodiments, after the robot detects a voice command sent by a user, the voice print information of the voice command can be extracted, the voice print information is matched with the voice print information of a target user which is input in advance, and under the condition that the matching is passed, the detected voice command can be determined to be the voice command sent by the target user.
In the case that the voice command is determined to be the voice command sent by the target user, the command information corresponding to the voice command can be converted into incremental text information in a text form through a voice-to-text technology. For the specific algorithm for converting voice to text, those skilled in the art will certainly appreciate and fully implement the related art, and this disclosure will not be repeated here.
It should be noted that, in conjunction with human language habits, it is understood that a user may have a context-dependent voice command issued multiple times in succession while performing voice control for a robot. For example, in the first control process, the voice command 1 of the user is "you walk 10 meters forward", and after the robot performs the corresponding action, the voice command 2 of the user at the second control is "calculate, you go back to the bar".
For a coherent dialog scenario for humans, it is apparent that human bonding context is understood "back" in voice command 2, referring to "robot back 10 meters". However, for the task of robot voice programming, since the robot does not have the association capability of the context instruction, the real intention of the user cannot be recognized only according to the voice instruction 2, the voice instruction may be mistakenly recognized as other semantics, and further the wrong action behavior is executed, or the robot directly reports the mistake, so that the voice instruction of the user cannot be recognized.
Therefore, in the embodiment of the present disclosure, after each voice command of the target user is detected and text conversion is performed, each detected text message is defined as incremental text message, that is, the text message is newly added in the current control process. The text information of the voice command is defined as incremental text information, so that the aim of combining the historical text information corresponding to the previously generated historical voice command is to perform semantic recognition by combining the upper and lower Wen Duiyu voice commands, and correct semantics are obtained.
S220, carrying out semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice instruction.
In the embodiment of the disclosure, when the target user sends out a voice command and converts the voice command into a text message command each time, the robot caches the text message of each time according to the time sequence. Thus, for a certain control process, the text information converted by the voice command generated at this time is incremental text information, and the text information converted by the voice command generated before this time is historical text information.
For example, in the previous example, in the first control process, the text information converted by the voice command is "you walk 10 meters forward", and the robot buffers the text information in the memory while completing the task programming and executing the task.
Therefore, in the second control process, the text information converted by the voice instruction is calculated, and you go back to the bar, at the moment, the previously cached text information is the historical text information when you go 10 meters forward, and the newly added text information is calculated, and the previously cached text information is the incremental text information when you go back to the bar.
In the embodiment of the disclosure, full text information can be obtained by assembling according to the historical text information and the incremental text information, for example, the full text information obtained by assembling according to the time sequence is "you walk forward 10 meters" by combining the historical text information "you walk forward 10 meters" and the incremental text information "calculate, you go back to bar"; you calculate, you go back to the bar.
Then, semantic recognition can be performed according to the full text information, the semantic recognition is an NLP (Natural Language Processing ) technology, and the effect of simulating human language understanding is mainly achieved by a computer, so that the meaning contained in each sentence is effectively recognized, and the real intention of a user is analyzed. In the related art, the semantic recognition technology may include, for example, semantic extraction based on grammar rules, semantic extraction based on a deep neural network generated language model, which is not limited in this disclosure.
In the previous example, the full text information "you walk 10 meters forward" is calculated by semantic recognition; after calculating, the user can determine that the user in the voice command 2 calculates and returns to the bar by semantic recognition, and the user intention in the voice command 2 is 10 meters backward, namely the semantic information corresponding to the semantic command is 10 meters backward.
S230, invoking one or more capability modules in the capability set of the robot based on the semantic information, and generating a target control task corresponding to the voice instruction.
In the embodiment of the disclosure, each basic capability of the robot can be integrated, classified and packaged in advance to obtain a capability set of the robot. The capability set may include a plurality of different capability modules according to capability categories, each capability module including at least one underlying capability.
In some embodiments, a plurality of preset categories may be divided in advance according to different capabilities of the robot, then all basic capabilities of the robot are classified and processed into each preset category, then corresponding capability modules are constructed according to the basic capabilities included in each preset category, and finally the capability modules corresponding to all the categories are integrated to obtain a capability set of the robot.
For instance, in one example, the set of capabilities described in this disclosure includes a "Motion" capability module that includes all of the base capabilities for controlling the Motion of the various joints of the robot, including, for example, lumbar joint Motion capabilities, hip joint Motion capabilities, knee joint Motion capabilities, ankle joint Motion capabilities, and the like. When the user programs the robot Motion task, the user can call and randomly combine each basic capability in a Motion capability module through an API interface provided by the capability set, so that the Motion control of the robot is realized.
In the embodiments of the present disclosure, the following embodiments of the present disclosure are described for a process of pre-constructing a robot capability set, and will not be described in detail herein.
After the semantic information corresponding to the voice instruction is obtained through the method process, the semantic information can be used as a request instruction for robot task programming. It will be appreciated that task programming requires invocation of one or more underlying capabilities of the robot, which in the disclosed embodiments are packaged in classes in the individual capability modules of the capability set, which provide APIs that invoke the individual capability modules.
Therefore, in the embodiment of the disclosure, one or more capability modules required to be called for task programming, namely, the target capability modules in the disclosure, can be determined according to the semantic information, and then the target capability modules can be called for task programming through the API provided by the capability set to generate the task for controlling the robot corresponding to the semantic information, namely, the target control task in the disclosure.
For example, in the foregoing example, the semantic information is used for controlling the movement of the robot, so that the corresponding programming task is the task for controlling the movement of the robot, the robot determines the "movement (Motion)" capability module as the target capability module according to the semantic information, and then invokes each basic capability in the "movement (Motion)" capability module according to the specific task parameter of the semantic information, so as to complete the programming of the movement task, and the obtained movement task is the target control task described in the disclosure.
S240, controlling the robot to run according to the target control task.
In the embodiment of the disclosure, after the target control task is obtained by programming, the robot can execute the target control task, so that the robot behavior control is realized.
It should be noted that, in some embodiments, after obtaining the target control task, the robot may understand to execute the target control task, or may store the target control task in a task list of the robot, and wait for the user to subsequently invoke and run, which is not limited in this disclosure.
In combination with the foregoing example scenario, in the robot behavior control scenario, the user only needs to send out a voice command to realize the control of the robot behavior, and does not need to master the professional ability related to programming, thereby greatly reducing the difficulty and complexity of robot control.
In addition, the robot performs semantic recognition on the voice command of the user, the user can realize more robot behavior control through natural language communication with the robot, and compared with the traditional command mapping scheme, the method in the embodiment of the disclosure does not need to require the user to speak the well-specified voice command, accords with natural language habit better, is not limited to realize built-in simple commands, and can realize more complex robot behavior control according to semantic information.
In addition, by combining the incremental voice command and the historical voice command through a caching mechanism, the context association capability of the voice command can be provided, so that the accurate identification of the fuzzy command of the user is realized. For example, in the previous example, the robot can accurately recognize the real intention of the user corresponding to the fuzzy instruction 'back' by combining the historical voice instruction, so that the robot has extremely high control flexibility and effect.
According to the method and the device, in the embodiment of the disclosure, the difficulty and complexity of robot programming control are reduced by utilizing voice programming, the task can be controlled according to the free combination of instructions through semantic recognition of user voice instructions, the flexibility of robot control is higher, and the user fuzzy instructions can be recognized through context association of historical instructions, so that the robot control effect is improved.
As shown in fig. 3, in some embodiments, a robot control method of an example of the present disclosure, in response to detecting a voice instruction of a target user, a process of converting the voice instruction into incremental text information in text form, includes:
s211, responding to the voice instruction, and extracting first voiceprint information of the voice instruction.
S212, matching the first voiceprint information with the prestored second voiceprint information of the target user, and determining that the voice instruction of the target user is detected under the condition that the matching is successful.
S213, performing text conversion on the voice instruction to obtain incremental text information.
Voiceprint recognition is one of the biological recognition technologies, and voiceprints refer to the spectrum of sound waves, which have uniqueness. Therefore, in the embodiment of the disclosure, the robot may input in advance the voiceprint information of one or more users having the control authority of the robot, where the voiceprint information is the second voiceprint information.
For example, in the example scenario of fig. 1, the user a and the user B may respectively enter their second voice information in the robot in advance, and the robot may store the user identity (for example, the user ID) and the second voice information correspondingly.
In the robot control scene, when the robot detects the environmental sound, after the voice instruction carried in the environmental sound is obtained through voice extraction, the voice instruction extraction technology can be utilized to extract and obtain first voice print information corresponding to the voice instruction.
And then matching the first voiceprint information with one or more pieces of prestored second voiceprint information one by one, and if the first voiceprint information is successfully matched with a certain piece of stored second voiceprint information, indicating that the user sending the voice instruction corresponding to the first voiceprint information is the target user corresponding to the second voiceprint information.
For example, in the example of fig. 1, the robot detects a voice command sent by the user a, and extracts first voiceprint information. And then, the first voiceprint information is matched with the prestored second voiceprint information of the user A and the prestored second voiceprint information of the user B respectively, and the voice instruction can be determined to be sent by the target user (namely the user A) under the condition that the matching with the second voiceprint information of the user A is successful.
Under the condition that the user sending the voice command is a pre-stored target user, the target user is indicated to have control authority on the robot, so that the voice command can be subjected to text conversion according to the method steps, and the incremental text information in a text form is obtained. Those skilled in the art can refer to the foregoing, and the disclosure is not repeated.
As shown in fig. 4, in some embodiments, a robot control method according to an example of the present disclosure, a process of obtaining semantic information of a voice instruction includes:
s410, obtaining full text information according to the increment text information and the cached historical text information which is positioned in front of the increment text information and is in a preset quantity.
S420, carrying out semantic recognition on the full text information to obtain semantic information corresponding to the voice instruction.
In the embodiment of the disclosure, after the incremental text information is obtained, the historical text information cached before the incremental text information needs to be combined to obtain the full text information. In some embodiments, caching of incremental text information as well as historical text information may be implemented using instruction lists, as described below in connection with FIG. 5.
As shown in fig. 5, in some embodiments, the robot control method of the examples of the present disclosure, a process of determining a full text message, includes:
S411, caching the increment text information in the tail of the instruction list corresponding to the target user.
S412, acquiring the incremental text information from the instruction list and the historical text information of the preset quantity before the incremental text information.
S413, performing text splicing according to the incremental text information and the historical text information with the preset quantity before the incremental text information to obtain the full text information.
In the embodiment of the disclosure, the robot may generate, in advance, a corresponding instruction list for the target user having the control authority, that is, each target user corresponds to one instruction list, where the instruction list is used to store text information of a voice instruction sent by the target user.
For example, taking the example scenario of fig. 1 as an example, assume that the target users include user a and user B together, so that when information of the target users is entered, the robot may allocate a memory for each target user and generate a corresponding instruction list. For example, in one example, the instruction list may be as shown in Table one below:
list one
Instruction time of occurrence User A User B
In table one, the "instruction occurrence time" in the first column indicates the time when the target user uttered the voice instruction, and the second column and the third column indicate the voice instruction generated by each target user.
In the embodiment of the disclosure, after detecting a voice command sent by each target user, the robot can obtain text information corresponding to each voice command through the method process, and then store the text information in respective corresponding command lists according to time sequence.
For example, in one exemplary scenario, at time T1, the instruction list for user a is shown in table two below:
watch II
Instruction time of occurrence User A
10:52 You walk 10 meters forward
After the robot executes the action behaviors, when the moment T2=10:55 is reached, the user A sends out a voice command again to calculate, whether you go back to the bar, and the robot obtains text information corresponding to the voice command to calculate, whether you go back to the bar through the process of the method, wherein the text information is the incremental text information generated at the current moment.
After the incremental text information is obtained, namely, after you go back to the bar, the incremental text information can be cached in the instruction list, and the incremental text information is cached at the tail of the instruction list according to the time sequence, and the obtained instruction list can be shown in the following table III:
watch III
Instruction time of occurrence User A
10:52 You walk 10 meters forward
10:55 You can go back to the bar
In the embodiment of the disclosure, when semantic recognition is performed, the newly generated incremental text information and the historical text information stored in the instruction list are combined. It will be appreciated that in this example, the delta text information is "calculate, you are going back to the bar" and the history text information is "you walk 10 meters forward" stored before the delta text information.
It should be noted that, in this example, only one piece of history text information is shown in the instruction list, in fact, many pieces of text information are gradually accumulated in the instruction list of the user a over time, so that in a certain control process, the history text information with respect to the current increment text information includes many pieces.
Thus, in some embodiments of the present disclosure, the full text information may be assembled in combination with the incremental text information and all of the historical text information stored in the instruction list of user a. The full text information can also be obtained by combining the incremental text information and the previous preset number of historical text information.
For example, in one example, the incremental text information is preceded by 10 pieces of historical text information, and in the embodiment of the disclosure, the incremental text information and all 10 previous pieces of historical text information can be assembled to obtain the full text information; the incremental text information and the previous n pieces of historical text information can be assembled to obtain the full text information, and n can take values of 1-3, for example; the present disclosure is not limited in this regard.
Taking a table three-example scene as an example, calculating through the history text information of ' you walk 10 meters forward ' and the increment text information ', and assembling to obtain the full text information of ' you walk 10 meters forward ' if you go back to the bar; you calculate, you go back to the bar. Then, the semantic recognition technology can be utilized to carry out semantic recognition on the full text information, so that the user 'calculated, you or going back to the bar' of the voice command 2 can be determined to be '10 meters backward', namely the semantic information corresponding to the semantic command is '10 meters backward'.
After the semantic information is determined, a target capacity module in the capacity set of the robot can be called based on the semantic information to perform task programming, and a target control task is obtained. For ease of understanding, the process of pre-building the robotic capability set is described below in connection with the fig. 6 embodiment.
As shown in fig. 6, in some embodiments, in a robot control method of an example of the present disclosure, a process of pre-establishing a capability set of a robot includes:
s610, classifying the basic capability of the robot according to preset categories to obtain capability modules corresponding to each category.
S620, constructing and obtaining a capability set according to the capability modules corresponding to the categories.
In the embodiment of the disclosure, all basic capabilities of a robot can be integrated first, then the basic capabilities are divided into a plurality of preset categories according to different capability types, each category at least comprises one basic capability, and then a capability module corresponding to each category is constructed according to the basic capability included in each category.
For instance, in one example, the preset categories of robotic capabilities may include: audio capability, light capability, touch pad capability, positioning capability, power management capability, depth detection capability, radar detection capability, inertial data detection capability, odometer capability, ultrasound communication capability, skin control capability, network capability, motion capability, navigation capability, target following capability, face recognition capability, voiceprint recognition capability, gesture recognition capability, personnel information management capability, task management capability, and 20 categories of capability.
Thus, all the basic capabilities can be classified according to the 20 preset categories, that is, the basic capabilities are classified according to the preset categories, wherein each category comprises at least one basic capability.
After integrating and classifying the basic capabilities of the robot, constructing and obtaining a capability module corresponding to each category according to the basic capabilities included in each category, namely obtaining the capability modules corresponding to the 20 categories, wherein the capability modules are respectively as follows:
1. Audio (Audio) module
A series of audio-related basic capabilities may be included to enter audio/visual effects, play audio, play text, online dialog, etc.
2. Light effect (Led) module
A series of basic capabilities related to light effects, such as play, pause, replacement, etc., of the light effects of the robot head, tail, eye, etc., may be included.
3. Touch pad (Touch) module
A series of basic capabilities associated with the robotic touchpad may be included, such as querying the touchpad status and data.
4. Positioning (GPS) module
A series of basic capabilities related to robotic satellite positioning GPS functions, such as querying GPS status and data, may be included.
5. Battery system (BMS) module
A series of basic capabilities related to the robot BMS (Battery Management System ) function, such as battery level, voltage, current, temperature, etc. detection may be included.
6. Depth of field (ToF) module
A robotic depth detection capability may be included, such as ToF (Time of flight) data detection, etc.
7. Radar (Lidar) module
A series of basic capabilities related to the robot radar function, such as querying radar status and detecting data, etc., may be included.
8. Inertial navigation (IMU) module
A series of basic capabilities related to the robotic IMU (Inertial Measurement Unit ) may be included, such as querying IMU status and detection data, etc.
9. Odometer (Odometer) module
A series of basic capabilities related to the robotic odometer may be included, such as querying odometer status and data, etc.
10. Ultrasonic module
Some column-based capabilities related to the robot ultrasound may be included, such as querying the ultrasound sensor status and data, etc.
11. Skin (Skin) module
Basic capabilities related to the robotic skin, such as controlling skin condition, etc., may be included.
12. Network (Network) module
A series of basic capabilities related to the robot network information, such as controlling the robot network, etc., may be included.
13. Motion module
A series of basic capabilities related to the articulation of the robot may be included, such as controlling individual articulation, etc.
14. Navigation module
A series of basic capabilities related to the robot navigation functions may be included, such as controlling the robot to navigate to a target location in an existing map, etc.
15. Follow (Follow) module
Basic capabilities of the robot to follow the target, such as following the target object movement, etc., may be included.
16. Face recognition (Face) module
A series of basic capabilities associated with robotic face recognition may be included, such as acquiring face images, recognizing faces based on image libraries, identifying target persons, etc.
17. Voiceprint recognition (Voiceprint) module
A series of basic capabilities related to robotic voiceprint recognition may be included, such as voiceprint acquisition, voiceprint recognition based on a voiceprint library, identifying a target person, and the like.
18. Gesture recognition (Gestme) module
A series of basic capabilities related to robotic gesture recognition may be included, such as gesture image acquisition, gesture recognition based on a gesture library, identifying a target person, and so forth.
19. Personnel information management (Personnel) module
A series of basic capabilities related to robot information acquisition may be included, such as acquiring database information as described above, etc.
20. Task management (Task) module
Basic capabilities for robotic task control, such as controlling the running, pausing, ending, etc., of the current task may be included.
In this example, through each capability module constructed as described above, the capability set of the robot is obtained by performing the encapsulation process, and the architecture of the capability set of the robot may be as shown in fig. 7.
In the capability set architecture shown in fig. 7, the capability set includes a robot module and a capability set interface module in addition to the 20 capability modules described above. The robot module is mainly responsible for providing the whole capability of the robot and covers the 20 capability modules. The capability set interface module is responsible for providing an API interface of the whole capability set, and also relates to the above 20 capability modules, so that one or more basic capabilities of any one capability module in the capability set can be called through the API interface.
According to the method and the device, in the embodiment of the disclosure, the robot capability set is obtained by integrating, classifying and packaging the robot capability, all the capabilities of the robot can be called through the capability set, flexible combined calling among basic capabilities is realized, task programming operation is simplified, occupation of task operation on robot resources is reduced, logic of task operation and control is simple, and the control effect of the robot is improved.
In some embodiments of the present disclosure, when each capability module is packaged, an upper capability sub-module may be built in advance based on the basic capability, so as to simplify the task programming operation of the user, which is described below in connection with the embodiment of fig. 8.
As shown in fig. 8, in some embodiments, the robot control method of the examples of the present disclosure, the process of constructing the capability module of each category includes:
s611, for each category, constructing and obtaining one or more capability sub-modules according to at least one basic capability corresponding to the category.
And S612, obtaining the capability modules corresponding to the types according to the basic capability and the capability sub-modules.
In the embodiment of the disclosure, based on the foregoing, the basic capability of the robot refers to a minimum behavior module that cannot be split any more, and the capability submodule in the disclosure is a capability of a higher layer constructed according to one or more basic capabilities, which can be understood as a combination of multiple basic capabilities of the robot.
In the embodiment of the disclosure, for each category, a capability sub-module can be built according to one or more basic capabilities of the category, and then the capability sub-modules and the original basic capabilities are packaged into a capability module.
For example, in one example, the lights of the robot are turned on and off, respectively, for a base capacity, if the user desires the robot to run a task of "the lights of the robot are alternately turned on and off for a fixed time. In the related art, it is necessary to set the robot eye lamp to constantly and alternately call the two basic capabilities at a fixed time in task programming. In the embodiment of the disclosure, the two basic capacities for respectively controlling the on and off of the robot eye lamps can be constructed as a capacity submodule, and when the robot programs a task, the capacity submodule is directly called to realize the programming of the task, so that the programming difficulty is greatly simplified, and the technical threshold of the robot control is reduced.
The foregoing only uses one capability module as an example, and any one capability module may be constructed through the foregoing process, which is not described in detail in this disclosure. After each capability module is obtained, a final robot capability set can be constructed through the process.
As can be seen from the foregoing, in the embodiment of the present disclosure, each capability module in the capability set includes a capability sub-module configured according to the basic capability, where the capability sub-module is further upper than the basic capability, so as to further simplify task programming operation and improve the control effect of the robot.
In the embodiment of the present disclosure, after obtaining the semantic information, the capability module of the capability set constructed as described above may be called to implement task programming, which is described below in connection with the embodiment of fig. 9.
As shown in fig. 9, in some embodiments, a robot control method of an example of the present disclosure, a process of generating a target control task, includes:
s231, one or more target capability modules corresponding to the voice instruction are determined from the capability set of the robot based on the semantic information.
S232, calling a target capacity module based on a preset code format to program codes, and obtaining a target control task.
In the embodiment of the present disclosure, the pre-built capability set provides a plurality of program interfaces, such as the API program interfaces of the three languages of C/c++/python in the foregoing examples, in other words, task programming may be performed using a plurality of programming languages, which is not limited by the present disclosure.
In the embodiment of the disclosure, after the semantic information corresponding to the voice instruction is determined, one or more capability modules in the capability set to be called can be determined according to the semantic information, and the capability module to be called is the target capability module in the disclosure.
In addition, in some embodiments, the preset code format of task programming may be preconfigured, that is, the programming language in which the task is programmed, for example, the preset code format may be any one or more of C/c++/python, which is not limited by the present disclosure.
After determining the target capability modules required by task programming, the APIs provided by the target capability modules can call one or more underlying basic capabilities, and program task codes by utilizing a preset code format to obtain a final target control task.
In the embodiment of the present disclosure, an algorithm for performing task programming based on semantic information may be implemented by using a programming script, which is not described in detail in this disclosure.
As can be seen from the foregoing, in the embodiments of the present disclosure, the capability set may provide a program interface of multiple languages, so that the capability set may be suitable for implementing robot control in different programming languages, and improve the practicability and control effect of the robot.
It should be noted that, in the above embodiments of the present disclosure, by combining the semantic recognition method associated with the context, the robot control of the more complex instruction or the fuzzy instruction may be implemented, and the robot control of the complex task and the timing task may be implemented, for example. For example, in one example scenario, the user voice command may be "50 minutes at 2 pm, you find out about three, let him come into the office for one trip", and after the target control task is generated through the foregoing method process, the robot sequentially invokes timing capability, navigation capability, face recognition capability, voice broadcast capability, and the like to execute the target control task, so as to complete task execution.
For some simple instructions, such as "forward", "down" basic instructions, if the above method process is continuously adopted to perform context association and semantic recognition, the calculation burden of the robot is definitely increased.
Therefore, in some embodiments of the present disclosure, when text conversion is performed on a voice command of a target user to obtain incremental text information, it is first determined whether the voice command is a built-in command of a robot according to the incremental text information.
It is understood that the built-in instructions refer to some basic instructions built in advance by the robot, which are mainly to implement some basic capabilities of the robot. In some embodiments of the present disclosure, a certain number of built-in instructions may be stored in advance in the robot, and corresponding task codes may be configured for the built-in instructions in advance by way of instruction mapping.
Thus, in some embodiments of the present disclosure, in the case where the voice instruction is determined to be a built-in instruction from the delta text information, it may be determined whether or not it is a built-in instruction by, for example, keyword recognition of the delta text information. For example, in one exemplary scenario, the incremental text information is "forward" and the text recognition matches that the incremental text information includes the keyword "forward" such that the voice command is determined to be a built-in command.
Under the condition that the voice instruction is determined to be a built-in instruction, as the instruction is very simple, if the semantic recognition is continuously carried out by combining with the historical text information of the instruction list, the redundant waste of the computing resource is definitely caused. Therefore, the task code corresponding to the built-in instruction can be determined directly according to the pre-stored mapping relation, and the task code is the target control task disclosed by the disclosure.
On the contrary, under the condition that the voice command is not the built-in command, the semantic recognition can be performed according to the method steps and by combining the historical text information in the command list, and the task programming is completed, which is not repeated in the disclosure.
As can be seen from the above, in the embodiment of the present disclosure, by performing the built-in instruction determination on the voice instruction, the robot is prevented from wasting the computing resources that are additionally generated when the robot executes the basic instruction, and the control effect is improved.
In some embodiments, the disclosed embodiments provide a robot control device that may be applied to the aforementioned robot 200, as shown in fig. 10, the control device includes:
a text conversion module 10 configured to convert a voice instruction of a target user into incremental text information in text form in response to detecting the voice instruction;
the semantic recognition module 20 is configured to perform semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice instruction;
a task programming module 30 configured to invoke one or more capability modules in the capability set of the robot based on the semantic information, generating a target control task corresponding to the voice instruction;
a task execution module 40 configured to control the robot to execute according to the target control task.
According to the method and the device, in the embodiment of the disclosure, the difficulty and complexity of robot programming control are reduced by utilizing voice programming, the task can be controlled according to the free combination of instructions through semantic recognition of user voice instructions, the flexibility of robot control is higher, and the user fuzzy instructions can be recognized through context association of historical instructions, so that the robot control effect is improved.
In some embodiments, the text conversion module 10 is configured to:
in response to detecting a voice command, extracting first voiceprint information of the voice command;
matching the first voiceprint information with the prestored second voiceprint information of the target user, and determining that the voice instruction of the target user is detected under the condition that the matching is successful;
and performing text conversion on the voice command to obtain the incremental text information.
In some embodiments, the semantic recognition module 20 is configured to:
caching the incremental text information in the queue tail of an instruction list corresponding to the target user;
acquiring the incremental text information from the instruction list and a preset number of historical text information positioned in front of the incremental text information;
and carrying out semantic recognition on the incremental text information and the historical text information with the preset quantity, and determining the semantic information corresponding to the voice instruction.
In some embodiments, the task programming module 30 is configured to:
determining one or more target capability modules corresponding to the voice instruction from a capability set of the robot based on the semantic information;
And calling the target capacity module to carry out code programming based on a preset code format to obtain the target control task.
In some embodiments, the semantic recognition module 20 is configured to:
under the condition that the voice command is determined to be a non-built-in command of the robot according to the incremental text information, executing a process of carrying out semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice command;
and under the condition that the voice command is determined to be the built-in command of the robot according to the increment text information, determining that the voice command corresponds to the target control task according to the mapping relation corresponding to the built-in command.
In some embodiments, the task programming module 30 is configured to:
classifying the basic capability of the robot according to preset categories to obtain the capability modules corresponding to each category;
and constructing and obtaining the capability set according to the capability modules corresponding to the categories.
In some embodiments, the task programming module 30 is configured to:
for each category, constructing one or more capability sub-modules according to at least one basic capability corresponding to the category;
And obtaining the capability module corresponding to the category according to the basic capability and the capability sub-module.
In some embodiments, examples of the present disclosure provide a robot comprising:
a processor;
a memory storing computer instructions for causing a processor to perform the method of any of the embodiments described above.
In some embodiments, the disclosed examples provide a storage medium storing computer instructions for causing a computer to perform the method of any of the embodiments described above.
Specifically, fig. 11 shows a schematic structural diagram of a robot 600 suitable for implementing the method of the present disclosure, and by means of the robot shown in fig. 11, the above-described corresponding functions of the processor and the storage medium may be implemented.
As shown in fig. 11, the robot 600 includes a processor 601, which can perform various appropriate actions and processes according to a program stored in a memory 602 or a program loaded into the memory 602 from a storage section 608. In the memory 602, various programs and data required for the operation of the robot 600 are also stored. The processor 601 and the memory 602 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
In particular, according to embodiments of the present disclosure, the above method processes may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method described above. In such an embodiment, the computer program can be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It should be apparent that the above embodiments are merely examples for clarity of illustration and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the present disclosure.

Claims (12)

1. A robot control method, applied to a robot, comprising:
in response to detecting a voice command of a target user, converting the voice command into incremental text information in a text form;
carrying out semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice instruction;
invoking one or more capability modules in the capability set of the robot based on the semantic information, and generating a target control task corresponding to the voice instruction;
and controlling the robot to run according to the target control task.
2. The method of claim 1, wherein in response to detecting a voice command of a target user, converting the voice command into incremental text information in text form, comprises:
in response to detecting a voice command, extracting first voiceprint information of the voice command;
matching the first voiceprint information with the prestored second voiceprint information of the target user, and determining that the voice instruction of the target user is detected under the condition that the matching is successful;
And performing text conversion on the voice command to obtain the incremental text information.
3. The method according to claim 1, wherein the performing semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice command includes:
obtaining full text information according to the incremental text information and the cached historical text information which is positioned in front of the incremental text information and is in a preset quantity;
and carrying out semantic recognition on the full text information to obtain the semantic information corresponding to the voice instruction.
4. The method of claim 3, wherein obtaining the full text message based on the incremental text message and the cached historical text message that precedes the incremental text message by a predetermined amount comprises:
caching the incremental text information in the queue tail of an instruction list corresponding to the target user;
acquiring the incremental text information from the instruction list and a preset number of historical text information positioned in front of the incremental text information;
and performing text splicing according to the incremental text information and the historical text information with the preset quantity before the incremental text information to obtain the full text information.
5. The method of any of claims 1 to 4, wherein invoking one or more capability modules in the capability set of the robot based on the semantic information generates a target control task corresponding to the voice instruction, comprising:
determining one or more target capability modules corresponding to the voice instruction from a capability set of the robot based on the semantic information;
and calling the target capacity module to carry out code programming based on a preset code format to obtain the target control task.
6. The method according to any one of claims 1 to 4, wherein the performing semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice command includes:
and under the condition that the voice command is determined to be a non-built-in command of the robot according to the incremental text information, executing a process of carrying out semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice command.
7. The method of claim 6, wherein the method further comprises:
And under the condition that the voice command is determined to be the built-in command of the robot according to the increment text information, determining that the voice command corresponds to the target control task according to the mapping relation corresponding to the built-in command.
8. The method of claim 1, wherein pre-establishing the capability set of the robot comprises:
classifying the basic capability of the robot according to preset categories to obtain the capability modules corresponding to each category;
and constructing and obtaining the capability set according to the capability modules corresponding to the categories.
9. The method of claim 8, wherein the classifying the basic capabilities of the robot according to the preset categories to obtain the capability modules corresponding to each category includes:
for each category, constructing one or more capability sub-modules according to at least one basic capability corresponding to the category;
and obtaining the capability module corresponding to the category according to the basic capability and the capability sub-module.
10. A robot control device, characterized by being applied to a robot, the device comprising:
A text conversion module configured to convert a voice instruction of a target user into incremental text information in text form in response to detecting the voice instruction;
the semantic recognition module is configured to perform semantic recognition according to the incremental text information and the cached historical text information of the target user to obtain semantic information corresponding to the voice instruction;
a task programming module configured to invoke one or more capability modules in the capability set of the robot based on the semantic information, generating a target control task corresponding to the voice instruction;
and the task running module is configured to control the robot to run according to the target control task.
11. A robot, comprising:
a processor;
memory storing computer instructions for causing a processor to perform the method according to any one of claims 1 to 9.
12. A storage medium having stored thereon computer instructions for causing a computer to perform the method according to any one of claims 1 to 9.
CN202311002234.5A 2023-08-09 2023-08-09 Robot, control method and device thereof, and storage medium Pending CN117021083A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311002234.5A CN117021083A (en) 2023-08-09 2023-08-09 Robot, control method and device thereof, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311002234.5A CN117021083A (en) 2023-08-09 2023-08-09 Robot, control method and device thereof, and storage medium

Publications (1)

Publication Number Publication Date
CN117021083A true CN117021083A (en) 2023-11-10

Family

ID=88623972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311002234.5A Pending CN117021083A (en) 2023-08-09 2023-08-09 Robot, control method and device thereof, and storage medium

Country Status (1)

Country Link
CN (1) CN117021083A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160351194A1 (en) * 2015-05-27 2016-12-01 Google Inc. Context-sensitive dynamic update of voice to text model in a voice-enabled electronic device
CN107220292A (en) * 2017-04-25 2017-09-29 上海庆科信息技术有限公司 Intelligent dialogue device, reaction type intelligent sound control system and method
CN107943458A (en) * 2017-11-20 2018-04-20 上海木爷机器人技术有限公司 A kind of robot development system
CN108509619A (en) * 2018-04-04 2018-09-07 科大讯飞股份有限公司 A kind of voice interactive method and equipment
CN114662500A (en) * 2022-03-18 2022-06-24 支付宝(杭州)信息技术有限公司 Man-machine interaction method and device and electronic equipment
CN114785842A (en) * 2022-06-22 2022-07-22 北京云迹科技股份有限公司 Robot scheduling method, device, equipment and medium based on voice exchange system
WO2022252946A1 (en) * 2021-06-03 2022-12-08 广州小鹏汽车科技有限公司 Voice control method, voice control device, server, and storage medium
CN116072119A (en) * 2023-03-31 2023-05-05 北京华录高诚科技有限公司 Voice control system, method, electronic equipment and medium for emergency command

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160351194A1 (en) * 2015-05-27 2016-12-01 Google Inc. Context-sensitive dynamic update of voice to text model in a voice-enabled electronic device
CN107220292A (en) * 2017-04-25 2017-09-29 上海庆科信息技术有限公司 Intelligent dialogue device, reaction type intelligent sound control system and method
CN107943458A (en) * 2017-11-20 2018-04-20 上海木爷机器人技术有限公司 A kind of robot development system
CN108509619A (en) * 2018-04-04 2018-09-07 科大讯飞股份有限公司 A kind of voice interactive method and equipment
WO2022252946A1 (en) * 2021-06-03 2022-12-08 广州小鹏汽车科技有限公司 Voice control method, voice control device, server, and storage medium
CN114662500A (en) * 2022-03-18 2022-06-24 支付宝(杭州)信息技术有限公司 Man-machine interaction method and device and electronic equipment
CN114785842A (en) * 2022-06-22 2022-07-22 北京云迹科技股份有限公司 Robot scheduling method, device, equipment and medium based on voice exchange system
CN116072119A (en) * 2023-03-31 2023-05-05 北京华录高诚科技有限公司 Voice control system, method, electronic equipment and medium for emergency command

Similar Documents

Publication Publication Date Title
CN111432989B (en) Artificial enhancement cloud-based robot intelligent framework and related methods
CN110490213B (en) Image recognition method, device and storage medium
US11430171B2 (en) Explainable artificial intelligence
Scheutz et al. An overview of the distributed integrated cognition affect and reflection diarc architecture
US11568855B2 (en) System and method for defining dialog intents and building zero-shot intent recognition models
JP7191987B2 (en) Speaker diarization using speaker embeddings and trained generative models
US11645444B2 (en) Systems and methods enabling online one-shot learning and generalization by intelligent systems of task-relevant features and transfer to a cohort of intelligent systems
CN110599557B (en) Image description generation method, model training method, device and storage medium
Zhu et al. AR-mentor: Augmented reality based mentoring system
Oviatt et al. Designing the user interface for multimodal speech and pen-based gesture applications: State-of-the-art systems and future research directions
KR102656620B1 (en) Electronic apparatus, controlling method of thereof and non-transitory computer readable recording medium
Rossi et al. An extensible architecture for robust multimodal human-robot communication
Schiffer et al. Caesar: an intelligent domestic service robot
KR102490916B1 (en) Electronic apparatus, method for controlling thereof, and non-transitory computer readable recording medium
CN112199486A (en) Task type multi-turn conversation method and system for office scene
Maurtua et al. Enhancing safe human-robot collaboration through natural multimodal communication
Tang et al. Real-time robot localization, vision, and speech recognition on Nvidia Jetson TX1
Pineda et al. Ioca: Interaction-oriented cognitive architecture
CN111177346B (en) Man-machine interaction method and device, electronic equipment and storage medium
Cacace et al. A robust multimodal fusion framework for command interpretation in human-robot cooperation
CN109002498B (en) Man-machine conversation method, device, equipment and storage medium
CN117021083A (en) Robot, control method and device thereof, and storage medium
Giachos et al. A contemporary survey on intelligent human-robot interfaces focused on natural language processing
US20240096093A1 (en) Ai-driven augmented reality mentoring and collaboration
CN112242139B (en) Voice interaction method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination