WO2020125252A1

WO2020125252A1 - Robot conversation switching method and apparatus, and computing device

Info

Publication number: WO2020125252A1
Application number: PCT/CN2019/116087
Authority: WO
Inventors: 徐文浩; 马世奎; 孙文豹
Original assignee: 达闼科技（北京）有限公司
Priority date: 2018-12-20
Filing date: 2019-11-06
Publication date: 2020-06-25
Also published as: CN109648573B; CN109648573A

Abstract

A robot conversation switching method, comprising: acquiring an environment image in front of a robot; determining candidate conversation partners from the environment image; determining whether a switching condition for switching a current conversation partner is satisfied; if so, selecting a target conversation partner from the candidate conversation partners; and determining the selected target conversation partner as a current conversation partner. According to the method, a robot is able to actively switch conversation partners. The present invention further relates to a robot conversation switching apparatus, a computing device, and a computer readable storage medium.

Description

Robot session switching method, device and computing equipment

Technical field

The embodiments of the present application relate to the technical field of intelligent robots, and in particular, to a robot session switching method, device, and computing equipment.

Background technique

With the development of the Internet and deep learning technology, robot technology has made great progress, from the original independent robot individual to the current cloud robot. Cloud robot means that the robot body is only responsible for data collection, data preprocessing and data transmission, and complex calculation and judgment work are transmitted to the cloud processor for execution through data transmission.

Robot conversation refers to collecting the user's audio information, analyzing the semantic meaning of the user's audio information, and returning the answer information to the user according to the semantic meaning to realize the user's dialogue with the robot.

During the process of implementing this application, the inventor of the present application found that the current robot conversation mode cannot actively switch the conversation person, and actively initiates the opening of the conversation.

Application content

In view of the above problems, the present application is proposed in order to provide a method, apparatus and computing device for robot session switching that overcome the above problems or at least partially solve the above problems.

In order to solve the above technical problems, a technical solution adopted by the embodiments of the present application is: to provide a method for robot session switching, which includes: collecting an environment image located in front of the robot; determining candidate candidates from the environment image; and judging Whether the switching condition for switching the current conversation person is satisfied; if it is satisfied, the target conversation person is selected from the candidate conversation persons; and the selected target conversation person is determined as the current conversation person.

Optionally, the judging whether the switching condition for switching the current conversational person includes: determining whether there is a current conversational person; if not, determining that the switching condition for switching the current conversational person is satisfied; if so, determining whether the current conversational person is in Conversation state; if it is in the conversation state, it is determined that the switching condition for switching the current conversation person is not satisfied; if it is not in the conversation state, it is determined that the switching condition for switching the current conversation person is satisfied.

Optionally, the judging whether the current conversation person is in a conversation state includes: judging whether the current conversation person is included in the candidate conversation person; if so, determining that the current conversation person is in a conversation state; if not Contains, it is determined whether there is an end command for ending the conversation returned by the current conversation person; if it exists, it is determined that the current conversation person is not in the conversation state; if not, it is determined whether none of the current conversation persons are included in the most recent Among the candidate conversational persons corresponding to the continuously collected environmental images, wherein the most recent consecutive environmental image is a preset number of images previously collected and has a continuous relationship with the environmental image; if the current conversational person does not include Among the candidate conversational persons corresponding to the recently continuously collected environmental images, it is determined that the current conversational person is not in a conversation state; if the current conversational person is included in any candidate conversational person corresponding to the most recently continuously collected environmental image To determine that the current conversational person is in a conversational state.

Optionally, the selecting the target conversation person from the candidate conversation persons includes: extracting the conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters include those extracted from the environment image Lip language, face size and position parameters; based on the conversation parameters of each candidate conversational person, calculate the conversation scores of each candidate conversational person respectively; take the candidate conversational person with the highest conversation score as the target conversational person.

Optionally, the method further includes: extracting the face image of the person in the current conversation; identifying whether a user matching the face image exists in a preset information library; if it exists, then extracting from the preset information Extract the background information corresponding to the user from the library; push the face image and background information to the artificial seat assistant terminal.

Another technical solution adopted by the embodiment of the present application is to provide a robot session switching device, which includes: a collection module: used to collect an environment image located in front of the robot; a first determination module: used to extract from the environment image Determine candidate candidates; Judgment module: Used to judge whether the switching conditions for switching the current conversation person are satisfied; Selection module: Used to meet the switching conditions for switching the current conversation person, select the target conversation person from the candidate conversation people; Second Determination module: used to determine the selected target conversation person as the current conversation person.

Optionally, the judgment module includes: a first judgment unit: used to judge whether there is a current conversation person; a first determination unit: used to determine that the switching condition for switching the current conversation person is satisfied when there is no current conversation person; Second judgment unit: used to judge whether the current conversation person is in the conversation state when there is a current conversation person; second determination unit: used to determine that the current conversation person is not satisfied when the current conversation person is in the conversation state Switching condition; a third determining unit: used to determine that the switching condition for switching the current conversational person is satisfied when the current conversational person is not in the conversation state.

Optionally, the second judgment unit is used to judge whether the current conversation person is in a conversation state when there is a current conversation person, including: determining whether the current conversation person is included in the candidate conversation person; , It is determined that the current conversation person is in a conversation state; if not, it is determined whether there is an end command for ending the conversation returned by the current conversation person; if it exists, it is determined that the current conversation person is not in a conversation state. If it does not exist, determine whether the current conversation person is not included in the candidate conversation person corresponding to the recently continuously collected environment image, wherein the most recent consecutive environment image is previously collected and exists continuously with the environment image A predetermined number of images of the relationship; if none of the current conversational persons are included in the candidate conversational persons corresponding to the environmental images that have been continuously collected recently, it is determined that the current conversational person is not in the conversational state; if the current conversational person contains Among the candidate conversational persons corresponding to any recently continuously collected environmental images, it is determined that the current conversational person is in a conversational state.

Optionally, the selection module includes: an extraction unit: used to extract conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters include lip words and people extracted from the environment image Face size and position parameters; calculation unit: used to calculate the session score of each candidate conversation person based on the session parameters of each candidate conversation person; selection unit: used to select the candidate conversation person with the highest session score as Target conversational person.

Optionally, the device further includes: a first extraction module: used to extract the face image of the person in the current conversation; an identification module: used to identify whether the face image matches in the preset information library User; second extraction module: used to extract background information corresponding to the user from the preset information library when there is a user who matches the face image in the preset information library; push module: used to Push the face image and background information to the artificial agent assistant terminal.

Another technical solution adopted in the embodiments of the present application is to provide a computing device, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface are completed through the communication bus Communication with each other; the memory is used to store at least one executable instruction, and the executable instruction causes the processor to perform an operation corresponding to the method for switching a robot session.

Another technical solution adopted by the embodiments of the present application is to provide a computer-readable storage medium in which at least one executable instruction is stored, and the executable instruction causes a processor to execute the robot session The operation corresponding to the method of switching.

The beneficial effects of the embodiment of the present application are: different from the situation in the prior art, the embodiment of the present application determines whether to switch the current conversation person by collecting the environment image in front of the robot, and can select the target conversation person from the candidate conversation persons; It can be seen that by using the embodiment of the present application, the robot can actively switch the conversation person.

The above description is only an overview of the technical solutions of this application. In order to understand the technical means of this application more clearly, it can be implemented in accordance with the content of the specification, and in order to make the above and other purposes, features and advantages of this application more obvious and understandable In the following, specific examples of the present application are cited.

BRIEF DESCRIPTION

By reading the detailed description of the preferred embodiments below, various other advantages and benefits will become clear to those of ordinary skill in the art. The drawings are only for the purpose of showing the preferred embodiments, and are not considered as limitations to the present application. Furthermore, the same reference numerals are used to denote the same parts throughout the drawings. In the drawings:

FIG. 1 is a flowchart of a method for robot session switching according to an embodiment of the present application;

2A is a flow chart for determining a switching condition for switching a current conversation person in a robot conversation switching method according to an embodiment of the present application;

2B is a flowchart of determining whether the current conversation person is in a conversation state in the embodiment of the present application;

2C is a flowchart of selecting a target conversation person from the candidate conversation persons in the embodiment of the present application;

3 is a flowchart of another embodiment of a method for robot session switching of this application;

4 is a functional block diagram of a robot session switching device of the present application;

5 is a schematic diagram of a computing device of the present application.

detailed description

Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although the exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

FIG. 1 shows a flowchart of an embodiment of a method for robot session switching according to the present application. As shown in Figure 1, the method includes the following steps:

Step S1: Acquire the environment image located in front of the robot.

In this step, the user usually stands in front of the robot when talking to the robot, and the environment image in front of the robot refers to the image in front of the robot. Therefore, when a user talks to the robot, the User's face image.

Step S2: Determine candidate candidates from the environment image.

In this step, when multiple human faces appear in the environment image in front of the robot, the faces farther away from the robot will be blurred or the face size is smaller, while the users who are farther away usually do not interact with the robot. The user of the conversation may be a passer-by, or a crowd standing nearby, etc. Therefore, in this embodiment, before determining the candidate conversational person from the environment image, the blurred person may also be identified Removal of faces and faces whose face size is less than a threshold, and the candidate conversational person is the remaining face after excluding the blurred faces in the environmental image or faces whose face size is less than the threshold.

Step S3: It is judged whether the switching condition for switching the current conversation person is satisfied, if it is satisfied, step S4 is performed, if not, step S5 is performed.

Step S4: Select a target conversation person from the candidate conversation persons.

Step S5: Continue the conversation with the current conversation person.

In this step, when the switching condition for switching the current conversation person is not satisfied, it indicates that the current conversation person is in conversation with the robot, and the conversation with the current conversation person is continued.

Step S6: Determine the selected target conversation person as the current conversation person.

In this step, when the switching condition for switching the current conversational person is satisfied, the current conversational person is switched to the selected target conversational person, and the robot actively initiates a conversation with the target conversational person, the active opening conversation includes the robot Obtain the face image of the target conversation person, display it on the conversation screen, and obtain the name of the person corresponding to the face image, and actively speak to the target conversation person. For example, the name of the target conversation person is Zhang San, the robot After switching the current conversation person to Zhang San, Zhang San's face image is displayed on the conversation screen, and a voice prompt "Hello Zhang San, may I help you" is issued to start the conversation. It can be understood that the face image displayed on the conversation screen may be an image pre-stored in a preset information library, or a face image of a target conversation person collected in real time by the camera of the robot.

FIG. 2A shows a flowchart of the judgment of the switching condition for switching the current conversation person in the embodiment of the present application. As shown in FIG. 2A, the judgment whether the switching condition for switching the current conversation person is satisfied includes the following steps:

Step S31: determine whether there is a current conversation person, if not, go to step S32, if yes, go to step S33.

The current conversation person is the object of the current conversation recorded by the robot. The state of the current conversation person is stored in the robot. For example, when the current conversation person exists, the state of the current conversation person is recorded as 1, when the current conversation does not exist When there is a person, record the state of the current conversation person as 0.

Step S32: It is determined that the switching condition for switching the current conversation person is satisfied.

Step S33: determine whether the current conversation person is in a conversation state, if yes, perform step S34, if not, perform step S35.

Step S34: It is determined that the switching condition for switching the current conversation person is not satisfied.

Step S35: It is determined that the switching condition for switching the current conversation person is satisfied.

FIG. 2B shows a flowchart of determining whether the current conversation person is in the conversation state in the embodiment of the present application. As shown in FIG. 2B, the determining whether the current conversation person is in the conversation state includes the following steps:

Step S331: determine whether the current conversational person is included in the candidate conversational person, if yes, perform step S332, if not, perform step S333.

In this step, the face image of the current conversation person is compared with the face image in the environment image. If the comparison is successful, the current conversation person is considered to be included in the candidate conversation person.

Step S332: It is determined that the current conversation person is in a conversation state.

In this step, when the face image of the current conversation person is successfully compared with the face image in the environment image, the current conversation person is considered to be in conversation with the robot.

Step S333: determine whether there is an end command to end the conversation returned by the current conversation person, if yes, perform step S334, if not, perform step S335.

In this step, when the current conversation person ends the conversation with the robot, a conversation end command is returned to the robot. The conversation end command is a voice command initiated by the current conversation person, for example, "Goodbye", "See you next time."

In some embodiments, the robot is provided with an end session button. When the current conversation person completes the conversation with the robot and wants to end the session, click the end session button to end the current session.

Step S334: It is determined that the current conversation person is not in a conversation state.

Step S335: determine whether none of the current conversational persons are included in the candidate conversational persons corresponding to the environmental images that have been continuously collected recently. If yes, perform step S336; if not, perform step S337.

In this step, when the current conversation person is not included in the candidate conversation person, and the robot does not receive the session end command, the current conversation person may be in a conversation state, for example, the current conversation person looks down or looks back , Causing no face of the current conversation person to be collected, in order to reduce the robot's judgment error, determine whether the current conversation person is included in the candidate conversation person corresponding to the N frames of the currently collected environmental image, where N is a preset A constant greater than 0, for example, if N is set to 5, it is determined whether none of the current conversational persons is included in the candidate conversational persons corresponding to the last five consecutively collected environmental images. When the continuous N frames of images have not collected the face of the current conversation person, it can be considered that the conversation person has left before sending, and the current conversation person is not in the conversation state.

Step S336: It is determined that the current conversation person is not in a conversation state.

Step S337: It is determined that the current conversation person is in a conversation state.

In some embodiments, when the recently collected environment image includes the current conversation person, the face of the corresponding current conversation person in the recently collected environment is substituted for the face in the face information database To make it easier to compare the next face comparison.

It should be noted that the environment in front of the camera will change with the action of the person, and the person may turn their head or change their expression. Considering the frequency of the environment image collected by the camera and the movement of the person, it may last for a period of time. In the next frame of the environmental image, there is a high probability that the person’s actions or expressions will remain in the action or expression corresponding to the environment image collected in the current frame. Therefore, the corresponding face of the current conversation person in the recently collected environment image corresponds to The face image of the current conversation person corresponding to the environmental image to be collected in the next frame has the highest face similarity, therefore, the face image of the current conversation person recently collected is substituted for the face image in the face information database , So that the robot can compare face images more conveniently and quickly.

FIG. 2C shows a flowchart of selecting a target conversation person from the candidate conversation persons in an embodiment of the present application. As shown in FIG. 2C, the selection of the target conversation person from the candidate conversation persons includes the following steps:

Step S41: extract the conversation parameters of each candidate conversation person from the environment image.

In this step, the conversation parameters include lip language, face size and position parameters extracted from the environment image, wherein the lip language is used to indicate whether each candidate conversation person is speaking In the case of lip parameters, in some embodiments, the value of the lip parameter corresponding to the candidate speaking person may be recorded as 1, and the value of the lip parameter corresponding to the candidate speaking person not speaking is 0.

The face size is used to represent the distance between each candidate conversation person and the robot. In some embodiments, when calculating the face size parameter, the pixel area corresponding to the face in the environmental image is divided by the The pixel area of the environment image obtains the ratio of the human face in the environment image to the entire environment image, and uses the ratio as the face size parameter.

The position parameter is used to represent the distance between each candidate conversation person and the robot center line. In some embodiments, when calculating the position parameter, it is first determined whether the candidate conversation person is located to the left of the robot center line or the robot To the right of the center line, if the candidate conversation person is located to the left of the robot center line, the left edge of the environment image is used as a starting point, and if the candidate conversation person is located to the right of the robot center line, the environment image is used The right edge is used as the starting point, the distance from the starting point to the robot center line is used as the denominator, and the distance from the candidate conversation person to the starting point is used as the numerator to obtain the candidate conversation person position parameters in the environment image.

Step S42: Calculate the session score of each candidate conversation person according to the conversation parameters of each candidate conversation person.

For the session parameters, the calculation formula for calculating the session score is: lip language weight*lip language parameter+face size weight*face size parameter+position weight*position parameter.

However, the accuracy of different session parameters reflecting whether the candidate conversation person is in the conversation state is different. Therefore, when calculating the conversation score of the candidate conversation person, different weights may be preset for the conversation parameter of the candidate conversation person, The session parameters are weighted and calculated according to the weights to obtain session scores for different candidates. For example, when conducting a conversation, among the conversational parameters of the candidate conversational person, the lip language can better reflect whether the candidate conversational person is in the conversational state. Therefore, when designing the weight, the weight of the lip language in the conversational parameter The highest, for example, the lip weight is set to 0.7, and the face size and position parameters are set to 0.2 and 0.1 respectively. One of the candidate conversations is speaking, the face size parameter is 20%, and the position parameter is 2/3 , Then the candidate conversation person score is: 0.7*1+0.2*20%+0.1*2/3≈0.8.

Step S43: Use the candidate conversation person with the highest conversation score as the target conversation person.

In the embodiment of the present application, whether to switch the current conversation person is determined by judging whether the current conversation person meets the conversation person's switching condition, and the target conversation person is selected from the candidate conversation persons by setting the conversation parameters, so as to meet the conversation person's switching condition When the current conversation person is switched to the target conversation person, the robot can actively switch the current conversation person.

FIG. 3 shows a flowchart of another embodiment of a method for robot session switching of the present application. Compared with the previous embodiment, the embodiment of the present application further includes the following steps:

Step S7: Extract the face image of the current conversation person.

Step S8: Identify whether there is a user who matches the face image in the preset information library. If it exists, perform step S9; if not, perform step S11.

In this step, the face image of the person in the current conversation is matched with the face image in the preset information library, the preset information library pre-stores a large number of user faces using the robot and their corresponding background information , The user's face and the corresponding background information are in one-to-one correspondence.

Step S9: Extract background information corresponding to the user from the preset information library.

Background information refers to the user's personal information, such as name, occupation, position, etc.

Step S10: Push the face image and background information to the artificial agent assistant terminal.

The artificial assistant terminal is the terminal equipment of the assistant robot. After receiving the face image and background information sent by the robot, the artificial assistant terminal can display background information and face image to facilitate the staff to understand the current conversation person, and when the robot cannot complete the current conversation person When the question is asked, the staff can accurately assist the robot to answer.

Step S11: Push the face image to the artificial assistant terminal.

In this step, when the robot cannot complete the question with the current conversational person, the staff can assist the robot to answer.

In the embodiment of the present application, the artificial assistant session is used to realize the artificial assistant session, which solves the problem that the robot cannot solve the problem of the current conversation person, and the artificial assistant solves the problem, which improves the efficiency of the robot work.

FIG. 4 shows a functional block diagram of a robot session switching device of the present application. As shown in FIG. 4, the device includes: an acquisition module 401, a first determination module 402, a determination module 403, a selection module 404, and a second determination module 405, wherein the collection module 401 is used to collect an environment image located in front of the robot; the first determination module 402 is used to determine candidate candidates from the environment image; and the determination module 403 is used to determine whether the switching current is satisfied The switching condition of the conversation person; the selection module 404 is used to select the target conversation person from the candidate conversation persons when the switching condition of switching the current conversation person is satisfied; the second determination module 405 is used to determine the selected target conversation person Is the current conversation.

The determination module 403 includes: a first determination unit 4031, a first determination unit 4032, a second determination unit 4033, a second determination unit 4034, and a third determination unit 4035, wherein the first determination unit 4031 is used to determine whether it exists The current conversation person; the first determination unit 4032 is used to determine that the switching condition for switching the current conversation person is satisfied when there is no current conversation person; the second judgment unit 4033 is used to determine the current conversation when there is a current conversation person Whether the person is in a conversation state; the second determination unit 4034 is used to determine that the switching condition for switching the current conversation person is not satisfied when the current conversation person is in the conversation state; the third determination unit 4035 is used when the current conversation person When not in the conversation state, it is determined that the switching condition for switching the current conversation person is satisfied.

Wherein, the second judgment unit 4033 is used to judge whether the current conversation person is in the conversation state when there is a current conversation person, including: determining whether the current conversation person is included in the candidate conversation person; It is determined that the current conversation person is in a conversation state; if not, it is determined whether there is an end command for ending the conversation returned by the current conversation person; if it exists, it is determined that the current conversation person is not in a conversation state; if it does not exist To determine whether none of the current conversational persons are included in the candidate conversational persons corresponding to the recently continuously collected environmental images, wherein the most recent consecutive environmental images are pre-collected and have a continuous relationship with the environmental images. Set the number of images; if none of the current conversational persons are included in the candidate conversational persons corresponding to the recently continuously collected environmental images, it is determined that the current conversational person is not in the conversational state; if the current conversational person is included in any one Among the candidate conversational persons corresponding to the environmental images collected continuously recently, it is determined that the current conversational person is in a conversational state.

Wherein, the selection module 404 includes: an extraction unit 4041, a calculation unit 4042, and a selection unit 4043, wherein the extraction unit 4041 is used to extract the conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters Including lip language, face size and position parameters extracted from the environment image; a calculation unit 4042 is used to calculate the conversation score of each candidate conversation person according to the conversation parameters of each candidate conversation person; selection Unit 4043 is used to select the candidate conversation person with the highest conversation score as the target conversation person.

In the embodiment of the present application, the device further includes: a first extraction module 406, an identification module 407, a second extraction module 408, and a push module 409, wherein the first extraction module 406 is used to extract the current conversation person Face image; recognition module 407, used to identify whether there is a user who matches the face image in the preset information library; second extraction module 408, used when the face exists in the preset information library When a user whose image matches, extract the background information corresponding to the user from the preset information library; a pushing module 409 is used to push the face image and the background information to the artificial seat assistant terminal.

The embodiment of the present application judges whether the current conversation person meets the conversation person's switching condition through the judgment module, and selects the target conversation person from the candidate conversation persons through the selection module, so that when the conversation person's switching condition is satisfied, the current conversation person is switched to the target Conversation person; In addition, the facial image and background information corresponding to the current conversation person corresponding to the preset information library are pushed to the artificial agent assistant terminal through the push module to realize the artificial assistant conversation; through the embodiment of the present application, the robot can actively switch the current Conversing with people, and solving robot conversation through manual assistance, which improves the efficiency of robot work.

An embodiment of the present application provides a non-volatile computer-readable storage medium, where the computer storage medium stores at least one executable instruction, and the computer-executable instruction can perform a robot session switching in any of the foregoing method embodiments Methods.

FIG. 5 is a schematic structural diagram of an embodiment of a computing device of the present application, and specific embodiments of the present application do not limit the specific implementation of the computing device.

As shown in FIG. 5, the computing device may include: a processor 502, a communication interface 504, a memory 506, and a communication bus 508.

among them:

The processor 502, the communication interface 504, and the memory 506 communicate with each other through the communication bus 508.

The communication interface 504 is used to communicate with other devices.

The processor 502 is used to execute the program 510, and specifically can execute relevant steps in the foregoing embodiment of a method for switching a robot session.

Specifically, the program 510 may include a program code, and the program code includes a computer operation instruction.

The processor 502 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present application. The one or more processors included in the computing device may be processors of the same type, such as one or more CPUs, or may be processors of different types, such as one or more CPUs and one or more ASICs.

The memory 506 is used to store the program 510. The memory 506 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), for example, at least one magnetic disk memory.

The program 510 may specifically be used to cause the processor 502 to perform the following operations: collect an environmental image located in front of the robot; determine candidate conversational persons from the environmental image; determine whether the switching condition for switching the current conversational person is satisfied; if satisfied, then Select a target talker from the candidate talkers; determine the selected target talker as the current talker.

In an optional manner, the program 510 may be further specifically configured to cause the processor 502 to perform the following operations: determine whether there is a current conversational person; if not, determine that the switching condition for switching the current conversational person is satisfied; if so, determine It is stated whether the current conversation person is in the conversation state; if it is in the conversation state, it is determined that the switching condition for switching the current conversation person is not satisfied; if it is not in the conversation state, it is determined that the switching condition for switching the current conversation person is satisfied.

In an optional manner, the program 510 may be further specifically configured to cause the processor 502 to perform the following operations: determine whether the current conversation person is included in the candidate conversation person; if so, determine the current conversation person It is in the conversation state; if it does not contain, it is judged whether there is an end command for ending the conversation returned by the current conversation person; if it exists, it is determined that the current conversation person is not in the conversation state. If it does not exist, determine whether the current conversation person is not included in the candidate conversation person corresponding to the recently continuously collected environment image, wherein the most recent consecutive environment image is previously collected and exists continuously with the environment image A predetermined number of images of the relationship; if none of the current conversational persons are included in the candidate conversational persons corresponding to the environmental images that have been continuously collected recently, it is determined that the current conversational person is not in the conversational state; if the current conversational person contains Among the candidate conversational persons corresponding to any recently continuously collected environmental images, it is determined that the current conversational person is in a conversational state.

In an optional manner, the program 510 may be further specifically configured to cause the processor 502 to perform the following operation: extract the conversation parameters of each candidate conversation person from the environment image, where the conversation parameters include the environment parameters Lips, face size and position parameters extracted from the image; based on the conversation parameters of each candidate conversation person, calculate the conversation score of each candidate conversation person respectively; target the candidate conversation person with the highest conversation score as the target Conversational person.

In an optional manner, the program 510 may be further specifically configured to cause the processor 502 to perform the following operations: extract the face image of the person in the current conversation; identify whether the face image phase exists in the preset information library The matched user; if it exists, extract the background information corresponding to the user from the preset information library; push the face image and background information to the artificial agent auxiliary terminal.

The algorithms and displays provided here are not inherently related to any particular computer, virtual system or other devices. Various general-purpose systems can also be used with the teaching based on this. From the above description, the structure required to construct such systems is obvious. In addition, this application does not target any specific programming language. It should be understood that various programming languages can be used to implement the content of the present application described herein, and the description of the specific language above is for disclosing the best embodiment of the present application.

The specification provided here explains a lot of specific details. However, it can be understood that the embodiments of the present application can be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.

Similarly, it should be understood that in order to streamline the disclosure and help understand one or more of the various inventive aspects, in the above description of exemplary embodiments of the present application, various features of the present application are sometimes grouped together into a single embodiment, Figure, or its description. However, the disclosed method should not be interpreted as reflecting the intention that the claimed application claims more features than those explicitly recited in each claim. Rather, as the claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Therefore, the claims that follow a specific embodiment are hereby expressly incorporated into the specific embodiment, where each claim itself serves as a separate embodiment of the present application.

Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and set in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and in addition, they may be divided into a plurality of submodules or subunits or subcomponents. Except that at least some of such features and/or processes or units are mutually exclusive, all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any methods so disclosed or All processes or units of equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose.

In addition, those skilled in the art can understand that although some of the embodiments described herein include certain features included in other embodiments instead of other features, the combination of features of different embodiments is meant to be within the scope of the present application And form different embodiments. For example, in the following claims, any one of the claimed embodiments can be used in any combination.

Each component embodiment of the present application may be implemented by hardware, or implemented by a software module running on one or more processors, or implemented by a combination thereof. Those skilled in the art should understand that, in practice, a microprocessor or a digital signal processor (DSP) may be used to implement some or all functions of some or all components in a robot session switching device according to an embodiment of the present application. The present application may also be implemented as a device or device program (eg, computer program and computer program product) for performing a part or all of the method described herein. Such a program for implementing the present application may be stored on a computer-readable medium, or may have the form of one or more signals. Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate the present application rather than limit the present application, and those skilled in the art can design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs between parentheses should not be constructed as limitations on the claims. The word "comprising" does not exclude the presence of elements or steps not listed in the claims. The word "a" or "one" before an element does not exclude the presence of multiple such elements. The application can be realized by means of hardware including several different elements and by means of a suitably programmed computer. In the unit claims enumerating several devices, several of these devices may be embodied by the same hardware item. The use of the words first, second, and third does not indicate any order. These words can be interpreted as names.

Claims

A method for robot session switching, which is characterized by comprising:

Collecting environmental images located in front of the robot;

Determine candidate candidates from the environment image;

Determine whether the switching conditions for switching the current conversation person are met;

If satisfied, select the target conversation person from the candidate conversation persons;

The selected target conversation person is determined as the current conversation person.
The method of claim 1, wherein:

The judging whether the switching condition for switching the current conversation person is satisfied includes:

Determine whether there is a current conversation person;

If not, it is determined that the switching conditions for switching the current conversation person are satisfied;

If yes, it is determined whether the current conversation person is in a conversation state;

If it is in the conversation state, it is determined that the switching condition for switching the current conversation person is not satisfied;

If it is not in the conversation state, it is determined that the switching condition for switching the current conversation person is satisfied.
The method according to claim 2, wherein the judging whether the current conversation person is in a conversation state includes:

Determine whether the current conversational person is included in the candidate conversational person;

If it is included, it is determined that the current conversation person is in a conversation state;

If not, it is judged whether there is an end command to end the conversation returned by the current conversation person;

If it exists, it is determined that the current conversation person is not in a conversation state;

If it does not exist, determine whether none of the current conversational persons are included in the candidate conversational person corresponding to the recently continuously collected environmental image, wherein the most recent consecutive environmental image is previously collected and exists continuously with the environmental image A predetermined number of images of the relationship;

If none of the current conversational persons is included in the candidate conversational persons corresponding to the recently continuously collected environmental images, it is determined that the current conversational person is not in the conversation state;

If the current conversation person is included in any candidate conversation person corresponding to the most recently continuously collected environmental image, it is determined that the current conversation person is in a conversation state.
The method of claim 1, wherein:

The selecting the target conversation person from the candidate conversation persons includes:

Extracting the conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters include lip language, face size and position parameters extracted from the environment image;

Calculating the session score of each candidate conversation person according to the conversation parameters of each candidate conversation person;

The candidate conversation person with the highest conversation score is taken as the target conversation person.
The method according to any one of claims 1-4, wherein the method further comprises:

Extract the face image of the current conversation person;

Identify whether there is a user matching the face image in the preset information library;

If it exists, extract the background information corresponding to the user from the preset information library;

Push the face image and background information to the artificial agent assistant terminal.
A robot conversation switching device, characterized by comprising:

Acquisition module: used to collect environmental images located in front of the robot;

A first determination module: used to determine candidate conversational persons from the environment image;

Judgment module: used to judge whether the switching conditions for switching the current conversation person are satisfied;

Selection module: used to select the target conversation person from the candidate conversation persons when the switching condition for switching the current conversation person is satisfied;

Second determination module: used to determine the selected target conversation person as the current conversation person.
The apparatus according to claim 6, wherein the judgment module comprises:

The first judgment unit: used to judge whether there is a current conversation person;

The first determining unit: used to determine that the switching condition for switching the current conversation person is satisfied when there is no current conversation person;

The second judgment unit: used to judge whether the current conversation person is in a conversation state when there is a current conversation person;

A second determining unit: used to determine that the switching condition for switching the current conversation person is not satisfied when the current conversation person is in the conversation state;

The third determining unit is used to determine that the switching condition for switching the current conversation person is satisfied when the current conversation person is not in the conversation state.
The apparatus according to claim 7, wherein the second judgment unit is configured to judge whether the current conversation person is in a conversation state when there is a current conversation person, including:

Determine whether the current conversational person is included in the candidate conversational person;

If it is included, it is determined that the current conversation person is in a conversation state;

If not, it is judged whether there is an end command to end the conversation returned by the current conversation person;

If it exists, it is determined that the current conversation person is not in a conversation state;

If it does not exist, determine whether none of the current conversational persons are included in the candidate conversational person corresponding to the recently continuously collected environmental image, wherein the most recent consecutive environmental image is previously collected and exists continuously with the environmental image A predetermined number of images of the relationship;

If none of the current conversational persons is included in the candidate conversational persons corresponding to the recently continuously collected environmental images, it is determined that the current conversational person is not in the conversation state;

If the current conversation person is included in any candidate conversation person corresponding to the most recently continuously collected environmental image, it is determined that the current conversation person is in a conversation state.
The apparatus according to claim 6, wherein the selection module comprises:

Extraction unit: used to extract conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters include lip language, face size and position parameters extracted from the environment image;

Calculation unit: used to calculate the session score of each candidate conversation person according to the conversation parameters of each candidate conversation person;

Selection unit: used to select the candidate conversation person with the highest conversation score as the target conversation person.
The device according to claim 6, wherein the device further comprises:

The first extraction module: used to extract the face image of the current conversation person;

Recognition module: used to identify whether there is a user who matches the face image in the preset information library;

A second extraction module: used to extract background information corresponding to the user from the preset information library when there is a user who matches the face image in the preset information library;

Pushing module: used to push the face image and background information to the artificial assistant terminal.
A computing device includes: a processor, a memory, a communication interface, and a communication bus. The processor, the memory, and the communication interface communicate with each other through the communication bus; the memory is used to store at least one An executable instruction that causes the processor to perform an operation corresponding to the method for robot session switching according to any one of claims 1-5.
A computer-readable storage medium, at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform a method for robot session switching according to any one of claims 1-5 Corresponding operation.