WO2020125252A1 - Robot conversation switching method and apparatus, and computing device - Google Patents

Robot conversation switching method and apparatus, and computing device Download PDF

Info

Publication number
WO2020125252A1
WO2020125252A1 PCT/CN2019/116087 CN2019116087W WO2020125252A1 WO 2020125252 A1 WO2020125252 A1 WO 2020125252A1 CN 2019116087 W CN2019116087 W CN 2019116087W WO 2020125252 A1 WO2020125252 A1 WO 2020125252A1
Authority
WO
WIPO (PCT)
Prior art keywords
conversation
person
current
candidate
conversation person
Prior art date
Application number
PCT/CN2019/116087
Other languages
French (fr)
Chinese (zh)
Inventor
徐文浩
马世奎
孙文豹
Original Assignee
达闼科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 达闼科技(北京)有限公司 filed Critical 达闼科技(北京)有限公司
Publication of WO2020125252A1 publication Critical patent/WO2020125252A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • B25J11/0005Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes

Definitions

  • the embodiments of the present application relate to the technical field of intelligent robots, and in particular, to a robot session switching method, device, and computing equipment.
  • Cloud robot means that the robot body is only responsible for data collection, data preprocessing and data transmission, and complex calculation and judgment work are transmitted to the cloud processor for execution through data transmission.
  • Robot conversation refers to collecting the user's audio information, analyzing the semantic meaning of the user's audio information, and returning the answer information to the user according to the semantic meaning to realize the user's dialogue with the robot.
  • the inventor of the present application found that the current robot conversation mode cannot actively switch the conversation person, and actively initiates the opening of the conversation.
  • the present application is proposed in order to provide a method, apparatus and computing device for robot session switching that overcome the above problems or at least partially solve the above problems.
  • a technical solution adopted by the embodiments of the present application is: to provide a method for robot session switching, which includes: collecting an environment image located in front of the robot; determining candidate candidates from the environment image; and judging Whether the switching condition for switching the current conversation person is satisfied; if it is satisfied, the target conversation person is selected from the candidate conversation persons; and the selected target conversation person is determined as the current conversation person.
  • the judging whether the switching condition for switching the current conversational person includes: determining whether there is a current conversational person; if not, determining that the switching condition for switching the current conversational person is satisfied; if so, determining whether the current conversational person is in Conversation state; if it is in the conversation state, it is determined that the switching condition for switching the current conversation person is not satisfied; if it is not in the conversation state, it is determined that the switching condition for switching the current conversation person is satisfied.
  • the judging whether the current conversation person is in a conversation state includes: judging whether the current conversation person is included in the candidate conversation person; if so, determining that the current conversation person is in a conversation state; if not Contains, it is determined whether there is an end command for ending the conversation returned by the current conversation person; if it exists, it is determined that the current conversation person is not in the conversation state; if not, it is determined whether none of the current conversation persons are included in the most recent Among the candidate conversational persons corresponding to the continuously collected environmental images, wherein the most recent consecutive environmental image is a preset number of images previously collected and has a continuous relationship with the environmental image; if the current conversational person does not include Among the candidate conversational persons corresponding to the recently continuously collected environmental images, it is determined that the current conversational person is not in a conversation state; if the current conversational person is included in any candidate conversational person corresponding to the most recently continuously collected environmental image To determine that the current conversational person is in a conversational state.
  • the selecting the target conversation person from the candidate conversation persons includes: extracting the conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters include those extracted from the environment image Lip language, face size and position parameters; based on the conversation parameters of each candidate conversational person, calculate the conversation scores of each candidate conversational person respectively; take the candidate conversational person with the highest conversation score as the target conversational person.
  • the method further includes: extracting the face image of the person in the current conversation; identifying whether a user matching the face image exists in a preset information library; if it exists, then extracting from the preset information Extract the background information corresponding to the user from the library; push the face image and background information to the artificial seat assistant terminal.
  • a robot session switching device which includes: a collection module: used to collect an environment image located in front of the robot; a first determination module: used to extract from the environment image Determine candidate candidates; Judgment module: Used to judge whether the switching conditions for switching the current conversation person are satisfied; Selection module: Used to meet the switching conditions for switching the current conversation person, select the target conversation person from the candidate conversation people; Second Determination module: used to determine the selected target conversation person as the current conversation person.
  • the judgment module includes: a first judgment unit: used to judge whether there is a current conversation person; a first determination unit: used to determine that the switching condition for switching the current conversation person is satisfied when there is no current conversation person; Second judgment unit: used to judge whether the current conversation person is in the conversation state when there is a current conversation person; second determination unit: used to determine that the current conversation person is not satisfied when the current conversation person is in the conversation state Switching condition; a third determining unit: used to determine that the switching condition for switching the current conversational person is satisfied when the current conversational person is not in the conversation state.
  • the second judgment unit is used to judge whether the current conversation person is in a conversation state when there is a current conversation person, including: determining whether the current conversation person is included in the candidate conversation person; , It is determined that the current conversation person is in a conversation state; if not, it is determined whether there is an end command for ending the conversation returned by the current conversation person; if it exists, it is determined that the current conversation person is not in a conversation state.
  • the selection module includes: an extraction unit: used to extract conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters include lip words and people extracted from the environment image Face size and position parameters; calculation unit: used to calculate the session score of each candidate conversation person based on the session parameters of each candidate conversation person; selection unit: used to select the candidate conversation person with the highest session score as Target conversational person.
  • the device further includes: a first extraction module: used to extract the face image of the person in the current conversation; an identification module: used to identify whether the face image matches in the preset information library User; second extraction module: used to extract background information corresponding to the user from the preset information library when there is a user who matches the face image in the preset information library; push module: used to Push the face image and background information to the artificial agent assistant terminal.
  • a first extraction module used to extract the face image of the person in the current conversation
  • an identification module used to identify whether the face image matches in the preset information library User
  • second extraction module used to extract background information corresponding to the user from the preset information library when there is a user who matches the face image in the preset information library
  • push module used to Push the face image and background information to the artificial agent assistant terminal.
  • Another technical solution adopted in the embodiments of the present application is to provide a computing device, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface are completed through the communication bus Communication with each other; the memory is used to store at least one executable instruction, and the executable instruction causes the processor to perform an operation corresponding to the method for switching a robot session.
  • Another technical solution adopted by the embodiments of the present application is to provide a computer-readable storage medium in which at least one executable instruction is stored, and the executable instruction causes a processor to execute the robot session The operation corresponding to the method of switching.
  • the embodiment of the present application determines whether to switch the current conversation person by collecting the environment image in front of the robot, and can select the target conversation person from the candidate conversation persons; It can be seen that by using the embodiment of the present application, the robot can actively switch the conversation person.
  • FIG. 1 is a flowchart of a method for robot session switching according to an embodiment of the present application
  • FIG. 2A is a flow chart for determining a switching condition for switching a current conversation person in a robot conversation switching method according to an embodiment of the present application
  • 2B is a flowchart of determining whether the current conversation person is in a conversation state in the embodiment of the present application
  • 2C is a flowchart of selecting a target conversation person from the candidate conversation persons in the embodiment of the present application.
  • FIG. 3 is a flowchart of another embodiment of a method for robot session switching of this application.
  • FIG. 4 is a functional block diagram of a robot session switching device of the present application.
  • FIG. 5 is a schematic diagram of a computing device of the present application.
  • FIG. 1 shows a flowchart of an embodiment of a method for robot session switching according to the present application. As shown in Figure 1, the method includes the following steps:
  • Step S1 Acquire the environment image located in front of the robot.
  • the user usually stands in front of the robot when talking to the robot, and the environment image in front of the robot refers to the image in front of the robot. Therefore, when a user talks to the robot, the User's face image.
  • Step S2 Determine candidate candidates from the environment image.
  • the blurred person before determining the candidate conversational person from the environment image, the blurred person may also be identified Removal of faces and faces whose face size is less than a threshold, and the candidate conversational person is the remaining face after excluding the blurred faces in the environmental image or faces whose face size is less than the threshold.
  • Step S3 It is judged whether the switching condition for switching the current conversation person is satisfied, if it is satisfied, step S4 is performed, if not, step S5 is performed.
  • Step S4 Select a target conversation person from the candidate conversation persons.
  • Step S5 Continue the conversation with the current conversation person.
  • Step S6 Determine the selected target conversation person as the current conversation person.
  • the active opening conversation includes the robot Obtain the face image of the target conversation person, display it on the conversation screen, and obtain the name of the person corresponding to the face image, and actively speak to the target conversation person.
  • the name of the target conversation person is Zhang San
  • the robot After switching the current conversation person to Zhang San, Zhang San's face image is displayed on the conversation screen, and a voice prompt "Hello Zhang San, may I help you" is issued to start the conversation.
  • the face image displayed on the conversation screen may be an image pre-stored in a preset information library, or a face image of a target conversation person collected in real time by the camera of the robot.
  • FIG. 2A shows a flowchart of the judgment of the switching condition for switching the current conversation person in the embodiment of the present application. As shown in FIG. 2A, the judgment whether the switching condition for switching the current conversation person is satisfied includes the following steps:
  • Step S31 determine whether there is a current conversation person, if not, go to step S32, if yes, go to step S33.
  • the current conversation person is the object of the current conversation recorded by the robot.
  • the state of the current conversation person is stored in the robot. For example, when the current conversation person exists, the state of the current conversation person is recorded as 1, when the current conversation does not exist When there is a person, record the state of the current conversation person as 0.
  • Step S32 It is determined that the switching condition for switching the current conversation person is satisfied.
  • Step S33 determine whether the current conversation person is in a conversation state, if yes, perform step S34, if not, perform step S35.
  • Step S34 It is determined that the switching condition for switching the current conversation person is not satisfied.
  • Step S35 It is determined that the switching condition for switching the current conversation person is satisfied.
  • FIG. 2B shows a flowchart of determining whether the current conversation person is in the conversation state in the embodiment of the present application. As shown in FIG. 2B, the determining whether the current conversation person is in the conversation state includes the following steps:
  • Step S331 determine whether the current conversational person is included in the candidate conversational person, if yes, perform step S332, if not, perform step S333.
  • the face image of the current conversation person is compared with the face image in the environment image. If the comparison is successful, the current conversation person is considered to be included in the candidate conversation person.
  • Step S332 It is determined that the current conversation person is in a conversation state.
  • Step S333 determine whether there is an end command to end the conversation returned by the current conversation person, if yes, perform step S334, if not, perform step S335.
  • a conversation end command is returned to the robot.
  • the conversation end command is a voice command initiated by the current conversation person, for example, "Goodbye", "See you next time.”
  • the robot is provided with an end session button.
  • click the end session button to end the current session.
  • Step S334 It is determined that the current conversation person is not in a conversation state.
  • Step S335 determine whether none of the current conversational persons are included in the candidate conversational persons corresponding to the environmental images that have been continuously collected recently. If yes, perform step S336; if not, perform step S337.
  • the current conversation person when the current conversation person is not included in the candidate conversation person, and the robot does not receive the session end command, the current conversation person may be in a conversation state, for example, the current conversation person looks down or looks back , Causing no face of the current conversation person to be collected, in order to reduce the robot's judgment error, determine whether the current conversation person is included in the candidate conversation person corresponding to the N frames of the currently collected environmental image, where N is a preset A constant greater than 0, for example, if N is set to 5, it is determined whether none of the current conversational persons is included in the candidate conversational persons corresponding to the last five consecutively collected environmental images.
  • N is a preset A constant greater than 0, for example, if N is set to 5, it is determined whether none of the current conversational persons is included in the candidate conversational persons corresponding to the last five consecutively collected environmental images.
  • Step S336 It is determined that the current conversation person is not in a conversation state.
  • Step S337 It is determined that the current conversation person is in a conversation state.
  • the face of the corresponding current conversation person in the recently collected environment is substituted for the face in the face information database To make it easier to compare the next face comparison.
  • the environment in front of the camera will change with the action of the person, and the person may turn their head or change their expression.
  • the person may last for a period of time.
  • the next frame of the environmental image there is a high probability that the person’s actions or expressions will remain in the action or expression corresponding to the environment image collected in the current frame. Therefore, the corresponding face of the current conversation person in the recently collected environment image corresponds to
  • the face image of the current conversation person corresponding to the environmental image to be collected in the next frame has the highest face similarity, therefore, the face image of the current conversation person recently collected is substituted for the face image in the face information database , So that the robot can compare face images more conveniently and quickly.
  • FIG. 2C shows a flowchart of selecting a target conversation person from the candidate conversation persons in an embodiment of the present application. As shown in FIG. 2C, the selection of the target conversation person from the candidate conversation persons includes the following steps:
  • Step S41 extract the conversation parameters of each candidate conversation person from the environment image.
  • the conversation parameters include lip language, face size and position parameters extracted from the environment image, wherein the lip language is used to indicate whether each candidate conversation person is speaking
  • the value of the lip parameter corresponding to the candidate speaking person may be recorded as 1, and the value of the lip parameter corresponding to the candidate speaking person not speaking is 0.
  • the face size is used to represent the distance between each candidate conversation person and the robot.
  • the pixel area corresponding to the face in the environmental image is divided by the The pixel area of the environment image obtains the ratio of the human face in the environment image to the entire environment image, and uses the ratio as the face size parameter.
  • the position parameter is used to represent the distance between each candidate conversation person and the robot center line.
  • Step S42 Calculate the session score of each candidate conversation person according to the conversation parameters of each candidate conversation person.
  • the calculation formula for calculating the session score is: lip language weight*lip language parameter+face size weight*face size parameter+position weight*position parameter.
  • the accuracy of different session parameters reflecting whether the candidate conversation person is in the conversation state is different. Therefore, when calculating the conversation score of the candidate conversation person, different weights may be preset for the conversation parameter of the candidate conversation person, The session parameters are weighted and calculated according to the weights to obtain session scores for different candidates. For example, when conducting a conversation, among the conversational parameters of the candidate conversational person, the lip language can better reflect whether the candidate conversational person is in the conversational state. Therefore, when designing the weight, the weight of the lip language in the conversational parameter The highest, for example, the lip weight is set to 0.7, and the face size and position parameters are set to 0.2 and 0.1 respectively. One of the candidate conversations is speaking, the face size parameter is 20%, and the position parameter is 2/3 , Then the candidate conversation person score is: 0.7*1+0.2*20%+0.1*2/3 ⁇ 0.8.
  • Step S43 Use the candidate conversation person with the highest conversation score as the target conversation person.
  • whether to switch the current conversation person is determined by judging whether the current conversation person meets the conversation person's switching condition, and the target conversation person is selected from the candidate conversation persons by setting the conversation parameters, so as to meet the conversation person's switching condition
  • the robot can actively switch the current conversation person.
  • FIG. 3 shows a flowchart of another embodiment of a method for robot session switching of the present application. Compared with the previous embodiment, the embodiment of the present application further includes the following steps:
  • Step S7 Extract the face image of the current conversation person.
  • Step S8 Identify whether there is a user who matches the face image in the preset information library. If it exists, perform step S9; if not, perform step S11.
  • the face image of the person in the current conversation is matched with the face image in the preset information library, the preset information library pre-stores a large number of user faces using the robot and their corresponding background information , The user's face and the corresponding background information are in one-to-one correspondence.
  • Step S9 Extract background information corresponding to the user from the preset information library.
  • Background information refers to the user's personal information, such as name, occupation, position, etc.
  • Step S10 Push the face image and background information to the artificial agent assistant terminal.
  • the artificial assistant terminal is the terminal equipment of the assistant robot. After receiving the face image and background information sent by the robot, the artificial assistant terminal can display background information and face image to facilitate the staff to understand the current conversation person, and when the robot cannot complete the current conversation person When the question is asked, the staff can accurately assist the robot to answer.
  • Step S11 Push the face image to the artificial assistant terminal.
  • the staff can assist the robot to answer.
  • the artificial assistant session is used to realize the artificial assistant session, which solves the problem that the robot cannot solve the problem of the current conversation person, and the artificial assistant solves the problem, which improves the efficiency of the robot work.
  • FIG. 4 shows a functional block diagram of a robot session switching device of the present application.
  • the device includes: an acquisition module 401, a first determination module 402, a determination module 403, a selection module 404, and a second determination module 405, wherein the collection module 401 is used to collect an environment image located in front of the robot; the first determination module 402 is used to determine candidate candidates from the environment image; and the determination module 403 is used to determine whether the switching current is satisfied The switching condition of the conversation person; the selection module 404 is used to select the target conversation person from the candidate conversation persons when the switching condition of switching the current conversation person is satisfied; the second determination module 405 is used to determine the selected target conversation person Is the current conversation.
  • the determination module 403 includes: a first determination unit 4031, a first determination unit 4032, a second determination unit 4033, a second determination unit 4034, and a third determination unit 4035, wherein the first determination unit 4031 is used to determine whether it exists The current conversation person; the first determination unit 4032 is used to determine that the switching condition for switching the current conversation person is satisfied when there is no current conversation person; the second judgment unit 4033 is used to determine the current conversation when there is a current conversation person Whether the person is in a conversation state; the second determination unit 4034 is used to determine that the switching condition for switching the current conversation person is not satisfied when the current conversation person is in the conversation state; the third determination unit 4035 is used when the current conversation person When not in the conversation state, it is determined that the switching condition for switching the current conversation person is satisfied.
  • the second judgment unit 4033 is used to judge whether the current conversation person is in the conversation state when there is a current conversation person, including: determining whether the current conversation person is included in the candidate conversation person; It is determined that the current conversation person is in a conversation state; if not, it is determined whether there is an end command for ending the conversation returned by the current conversation person; if it exists, it is determined that the current conversation person is not in a conversation state; if it does not exist To determine whether none of the current conversational persons are included in the candidate conversational persons corresponding to the recently continuously collected environmental images, wherein the most recent consecutive environmental images are pre-collected and have a continuous relationship with the environmental images.
  • the selection module 404 includes: an extraction unit 4041, a calculation unit 4042, and a selection unit 4043, wherein the extraction unit 4041 is used to extract the conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters Including lip language, face size and position parameters extracted from the environment image; a calculation unit 4042 is used to calculate the conversation score of each candidate conversation person according to the conversation parameters of each candidate conversation person; selection Unit 4043 is used to select the candidate conversation person with the highest conversation score as the target conversation person.
  • the device further includes: a first extraction module 406, an identification module 407, a second extraction module 408, and a push module 409, wherein the first extraction module 406 is used to extract the current conversation person Face image; recognition module 407, used to identify whether there is a user who matches the face image in the preset information library; second extraction module 408, used when the face exists in the preset information library When a user whose image matches, extract the background information corresponding to the user from the preset information library; a pushing module 409 is used to push the face image and the background information to the artificial seat assistant terminal.
  • the embodiment of the present application judges whether the current conversation person meets the conversation person's switching condition through the judgment module, and selects the target conversation person from the candidate conversation persons through the selection module, so that when the conversation person's switching condition is satisfied, the current conversation person is switched to the target Conversation person;
  • the facial image and background information corresponding to the current conversation person corresponding to the preset information library are pushed to the artificial agent assistant terminal through the push module to realize the artificial assistant conversation;
  • the robot can actively switch the current Conversing with people, and solving robot conversation through manual assistance, which improves the efficiency of robot work.
  • An embodiment of the present application provides a non-volatile computer-readable storage medium, where the computer storage medium stores at least one executable instruction, and the computer-executable instruction can perform a robot session switching in any of the foregoing method embodiments Methods.
  • FIG. 5 is a schematic structural diagram of an embodiment of a computing device of the present application, and specific embodiments of the present application do not limit the specific implementation of the computing device.
  • the computing device may include: a processor 502, a communication interface 504, a memory 506, and a communication bus 508.
  • the processor 502, the communication interface 504, and the memory 506 communicate with each other through the communication bus 508.
  • the communication interface 504 is used to communicate with other devices.
  • the processor 502 is used to execute the program 510, and specifically can execute relevant steps in the foregoing embodiment of a method for switching a robot session.
  • the program 510 may include a program code, and the program code includes a computer operation instruction.
  • the processor 502 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present application.
  • the one or more processors included in the computing device may be processors of the same type, such as one or more CPUs, or may be processors of different types, such as one or more CPUs and one or more ASICs.
  • the memory 506 is used to store the program 510.
  • the memory 506 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), for example, at least one magnetic disk memory.
  • the program 510 may specifically be used to cause the processor 502 to perform the following operations: collect an environmental image located in front of the robot; determine candidate conversational persons from the environmental image; determine whether the switching condition for switching the current conversational person is satisfied; if satisfied, then Select a target talker from the candidate talkers; determine the selected target talker as the current talker.
  • the program 510 may be further specifically configured to cause the processor 502 to perform the following operations: determine whether there is a current conversational person; if not, determine that the switching condition for switching the current conversational person is satisfied; if so, determine It is stated whether the current conversation person is in the conversation state; if it is in the conversation state, it is determined that the switching condition for switching the current conversation person is not satisfied; if it is not in the conversation state, it is determined that the switching condition for switching the current conversation person is satisfied.
  • the program 510 may be further specifically configured to cause the processor 502 to perform the following operations: determine whether the current conversation person is included in the candidate conversation person; if so, determine the current conversation person It is in the conversation state; if it does not contain, it is judged whether there is an end command for ending the conversation returned by the current conversation person; if it exists, it is determined that the current conversation person is not in the conversation state.
  • the program 510 may be further specifically configured to cause the processor 502 to perform the following operation: extract the conversation parameters of each candidate conversation person from the environment image, where the conversation parameters include the environment parameters Lips, face size and position parameters extracted from the image; based on the conversation parameters of each candidate conversation person, calculate the conversation score of each candidate conversation person respectively; target the candidate conversation person with the highest conversation score as the target Conversational person.
  • the program 510 may be further specifically configured to cause the processor 502 to perform the following operations: extract the face image of the person in the current conversation; identify whether the face image phase exists in the preset information library The matched user; if it exists, extract the background information corresponding to the user from the preset information library; push the face image and background information to the artificial agent auxiliary terminal.
  • modules in the device in the embodiment can be adaptively changed and set in one or more devices different from the embodiment.
  • the modules or units or components in the embodiments may be combined into one module or unit or component, and in addition, they may be divided into a plurality of submodules or subunits or subcomponents. Except that at least some of such features and/or processes or units are mutually exclusive, all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any methods so disclosed or All processes or units of equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose.
  • Each component embodiment of the present application may be implemented by hardware, or implemented by a software module running on one or more processors, or implemented by a combination thereof.
  • a microprocessor or a digital signal processor (DSP) may be used to implement some or all functions of some or all components in a robot session switching device according to an embodiment of the present application.
  • the present application may also be implemented as a device or device program (eg, computer program and computer program product) for performing a part or all of the method described herein.
  • Such a program for implementing the present application may be stored on a computer-readable medium, or may have the form of one or more signals.
  • Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.

Abstract

A robot conversation switching method, comprising: acquiring an environment image in front of a robot; determining candidate conversation partners from the environment image; determining whether a switching condition for switching a current conversation partner is satisfied; if so, selecting a target conversation partner from the candidate conversation partners; and determining the selected target conversation partner as a current conversation partner. According to the method, a robot is able to actively switch conversation partners. The present invention further relates to a robot conversation switching apparatus, a computing device, and a computer readable storage medium.

Description

机器人会话切换方法、装置及计算设备Robot session switching method, device and computing equipment 技术领域Technical field
本申请实施例涉及智能机器人技术领域,特别是涉及一种机器人会话切换方法、装置及计算设备。The embodiments of the present application relate to the technical field of intelligent robots, and in particular, to a robot session switching method, device, and computing equipment.
背景技术Background technique
随着互联网及深度学习技术的发展,机器人技术取得了很大进步,由原来的独立机器人个体,发展到现在的云端机器人。云端机器人是指机器人本体只负责数据的采集,数据预处理及数据传输,复杂的计算和判断工作通过数据传输至云端处理器执行。With the development of the Internet and deep learning technology, robot technology has made great progress, from the original independent robot individual to the current cloud robot. Cloud robot means that the robot body is only responsible for data collection, data preprocessing and data transmission, and complex calculation and judgment work are transmitted to the cloud processor for execution through data transmission.
机器人会话是指采集用户的音频信息,并且分析用户的音频信息的语意,并且根据语意向用户返回作答信息,实现用户与机器人对话。Robot conversation refers to collecting the user's audio information, analyzing the semantic meaning of the user's audio information, and returning the answer information to the user according to the semantic meaning to realize the user's dialogue with the robot.
本申请的发明人在实现本申请的过程中,发现:当前机器人会话方式不能主动切换会话人,主动开启开会话。During the process of implementing this application, the inventor of the present application found that the current robot conversation mode cannot actively switch the conversation person, and actively initiates the opening of the conversation.
申请内容Application content
鉴于上述问题,提出了本申请以便提供一种克服上述问题或者至少部分地解决上述问题的一种机器人会话切换的方法、装置及计算设备。In view of the above problems, the present application is proposed in order to provide a method, apparatus and computing device for robot session switching that overcome the above problems or at least partially solve the above problems.
为解决上述技术问题,本申请实施例采用的一个技术方案是:提供一种机器人会话切换的方法,包括:采集位于所述机器人前方的环境图像;从所述环境图像中确定候选会话人;判断是否满足切换当前会话人的切换条件;若满足,则从所述候选会话人中选择目标会话人;将所选择的目标会话人确定为当前会话人。In order to solve the above technical problems, a technical solution adopted by the embodiments of the present application is: to provide a method for robot session switching, which includes: collecting an environment image located in front of the robot; determining candidate candidates from the environment image; and judging Whether the switching condition for switching the current conversation person is satisfied; if it is satisfied, the target conversation person is selected from the candidate conversation persons; and the selected target conversation person is determined as the current conversation person.
可选的,所述判断是否满足切换当前会话人的切换条件包括:判断是否存在当前会话人;若否,则确定满足切换当前会话人的切换条件;若是,则判断所述当前会话人是否处于会话状态;若处于会话状态,则确定不满足切换当前会话人的切换条件;若不处于会话状态,则确定满足切换当前会话人的切换条件。Optionally, the judging whether the switching condition for switching the current conversational person includes: determining whether there is a current conversational person; if not, determining that the switching condition for switching the current conversational person is satisfied; if so, determining whether the current conversational person is in Conversation state; if it is in the conversation state, it is determined that the switching condition for switching the current conversation person is not satisfied; if it is not in the conversation state, it is determined that the switching condition for switching the current conversation person is satisfied.
可选的,所述判断所述当前会话人是否处于会话状态包括:判断所述当前会话人是否包含在所述候选会话人中;若包含,则确定所述当前会话人处于会 话状态;若不包含,则判断是否存在所述当前会话人返回的结束会话的结束命令;若存在,则确定所述当前会话人不处于会话状态;若不存在,判断所述当前会话人是否均不包含在最近连续采集到的环境图像对应的候选会话人中,其中,所述最近连续环境图像为先前采集到的并且与所述环境图像存在连续关系的预设数量的图像;若所述当前会话人均不包含在最近连续采集到的环境图像对应的候选会话人中,则确定所述当前会话人不处于会话状态;若所述当前会话人包含在任意一个最近连续采集到的环境图像对应的候选会话人中,确定所述当前会话人处于会话状态。Optionally, the judging whether the current conversation person is in a conversation state includes: judging whether the current conversation person is included in the candidate conversation person; if so, determining that the current conversation person is in a conversation state; if not Contains, it is determined whether there is an end command for ending the conversation returned by the current conversation person; if it exists, it is determined that the current conversation person is not in the conversation state; if not, it is determined whether none of the current conversation persons are included in the most recent Among the candidate conversational persons corresponding to the continuously collected environmental images, wherein the most recent consecutive environmental image is a preset number of images previously collected and has a continuous relationship with the environmental image; if the current conversational person does not include Among the candidate conversational persons corresponding to the recently continuously collected environmental images, it is determined that the current conversational person is not in a conversation state; if the current conversational person is included in any candidate conversational person corresponding to the most recently continuously collected environmental image To determine that the current conversational person is in a conversational state.
可选的,所述从所述候选会话人中选择目标会话人包括:从所述环境图像中提取各候选会话人的会话参数,其中,所述会话参数包括从所述环境图像中提取到的唇语、人脸尺寸及位置参数;根据各所述候选会话人的会话参数,分别计算各所述候选会话人的会话得分;将所述会话得分最高的候选会话人作为目标会话人。Optionally, the selecting the target conversation person from the candidate conversation persons includes: extracting the conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters include those extracted from the environment image Lip language, face size and position parameters; based on the conversation parameters of each candidate conversational person, calculate the conversation scores of each candidate conversational person respectively; take the candidate conversational person with the highest conversation score as the target conversational person.
可选的,所述方法还包括:提取所述当前会话人的人脸图像;识别在预设信息库中是否存在所述人脸图像相匹配的用户;若存在,则从所述预设信息库中提取所述用户对应的背景信息;将所述人脸图像和背景信息推送至人工座席辅助终端。Optionally, the method further includes: extracting the face image of the person in the current conversation; identifying whether a user matching the face image exists in a preset information library; if it exists, then extracting from the preset information Extract the background information corresponding to the user from the library; push the face image and background information to the artificial seat assistant terminal.
本申请实施例采用的另一技术方案是:提供一种机器人会话切换装置,包括:采集模块:用于采集位于所述机器人前方的环境图像;第一确定模块:用于从所述环境图像中确定候选会话人;判断模块:用于判断是否满足切换当前会话人的切换条件;选择模块:用于满足切换当前会话人的切换条件时,从所述候选会话人中选择目标会话人;第二确定模块:用于将所选择的目标会话人确定为当前会话人。Another technical solution adopted by the embodiment of the present application is to provide a robot session switching device, which includes: a collection module: used to collect an environment image located in front of the robot; a first determination module: used to extract from the environment image Determine candidate candidates; Judgment module: Used to judge whether the switching conditions for switching the current conversation person are satisfied; Selection module: Used to meet the switching conditions for switching the current conversation person, select the target conversation person from the candidate conversation people; Second Determination module: used to determine the selected target conversation person as the current conversation person.
可选的,所述判断模块包括:第一判断单元:用于判断是否存在当前会话人;第一确定单元:用于当不存在当前会话人时,确定满足切换当前会话人的切换条件;第二判断单元:用于当存在当前会话人时,判断所述当前会话人是否处于会话状态;第二确定单元:用于当所述当前会话人处于会话状态时,确定不满足切换当前会话人的切换条件;第三确定单元:用于当所述当前会话人不处于会话状态时,确定满足切换当前会话人的切换条件。Optionally, the judgment module includes: a first judgment unit: used to judge whether there is a current conversation person; a first determination unit: used to determine that the switching condition for switching the current conversation person is satisfied when there is no current conversation person; Second judgment unit: used to judge whether the current conversation person is in the conversation state when there is a current conversation person; second determination unit: used to determine that the current conversation person is not satisfied when the current conversation person is in the conversation state Switching condition; a third determining unit: used to determine that the switching condition for switching the current conversational person is satisfied when the current conversational person is not in the conversation state.
可选的,所述第二判断单元用于当存在当前会话人时,判断所述当前会话 人是否处于会话状态,包括:判断所述当前会话人是否包含在所述候选会话人中;若包含,则确定所述当前会话人处于会话状态;若不包含,则判断是否存在所述当前会话人返回的结束会话的结束命令;若存在,则确定所述当前会话人不处于会话状态。若不存在,判断所述当前会话人是否均不包含在最近连续采集到的环境图像对应的候选会话人中,其中,所述最近连续环境图像为先前采集到的并且与所述环境图像存在连续关系的预设数量的图像;若所述当前会话人均不包含在最近连续采集到的环境图像对应的候选会话人中,则确定所述当前会话人不处于会话状态;若所述当前会话人包含在任意一个最近连续采集到的环境图像对应的候选会话人中,确定所述当前会话人处于会话状态。Optionally, the second judgment unit is used to judge whether the current conversation person is in a conversation state when there is a current conversation person, including: determining whether the current conversation person is included in the candidate conversation person; , It is determined that the current conversation person is in a conversation state; if not, it is determined whether there is an end command for ending the conversation returned by the current conversation person; if it exists, it is determined that the current conversation person is not in a conversation state. If it does not exist, determine whether the current conversation person is not included in the candidate conversation person corresponding to the recently continuously collected environment image, wherein the most recent consecutive environment image is previously collected and exists continuously with the environment image A predetermined number of images of the relationship; if none of the current conversational persons are included in the candidate conversational persons corresponding to the environmental images that have been continuously collected recently, it is determined that the current conversational person is not in the conversational state; if the current conversational person contains Among the candidate conversational persons corresponding to any recently continuously collected environmental images, it is determined that the current conversational person is in a conversational state.
可选的,所述选择模块包括:提取单元:用于从所述环境图像中提取各候选会话人的会话参数,其中,所述会话参数包括从所述环境图像中提取到的唇语、人脸尺寸及位置参数;计算单元:用于根据各所述候选会话人的会话参数,分别计算各所述候选会话人的会话得分;选择单元:用于将所述会话得分最高的候选会话人作为目标会话人。Optionally, the selection module includes: an extraction unit: used to extract conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters include lip words and people extracted from the environment image Face size and position parameters; calculation unit: used to calculate the session score of each candidate conversation person based on the session parameters of each candidate conversation person; selection unit: used to select the candidate conversation person with the highest session score as Target conversational person.
可选的,所述装置还包括:第一提取模块:用于提取所述当前会话人的人脸图像;识别模块:用于识别在预设信息库中是否存在所述人脸图像相匹配的用户;第二提取模块:用于当在预设信息库中存在所述人脸图像相匹配的用户时,从所述预设信息库中提取所述用户对应的背景信息;推送模块:用于将所述人脸图像和背景信息推送至人工座席辅助终端。Optionally, the device further includes: a first extraction module: used to extract the face image of the person in the current conversation; an identification module: used to identify whether the face image matches in the preset information library User; second extraction module: used to extract background information corresponding to the user from the preset information library when there is a user who matches the face image in the preset information library; push module: used to Push the face image and background information to the artificial agent assistant terminal.
本申请实施例采用的再一技术方案是:提供一种计算设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行所述的一种机器人会话切换的方法对应的操作。Another technical solution adopted in the embodiments of the present application is to provide a computing device, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface are completed through the communication bus Communication with each other; the memory is used to store at least one executable instruction, and the executable instruction causes the processor to perform an operation corresponding to the method for switching a robot session.
本申请实施例采用的又一技术方案是:提供一种计算机可读存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令使处理器执行所述的一种机器人会话切换的方法对应的操作。Another technical solution adopted by the embodiments of the present application is to provide a computer-readable storage medium in which at least one executable instruction is stored, and the executable instruction causes a processor to execute the robot session The operation corresponding to the method of switching.
本申请实施例的有益效果是:区别于现有技术的情况,本申请实施例通过采集机器人前方的环境图像,确定是否切换当前会话人,并可以从候选会话人中选择目标会话人;由此可见,利用本申请实施例,可以实现机器人主动切换会话人。The beneficial effects of the embodiment of the present application are: different from the situation in the prior art, the embodiment of the present application determines whether to switch the current conversation person by collecting the environment image in front of the robot, and can select the target conversation person from the candidate conversation persons; It can be seen that by using the embodiment of the present application, the robot can actively switch the conversation person.
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施例。The above description is only an overview of the technical solutions of this application. In order to understand the technical means of this application more clearly, it can be implemented in accordance with the content of the specification, and in order to make the above and other purposes, features and advantages of this application more obvious and understandable In the following, specific examples of the present application are cited.
附图说明BRIEF DESCRIPTION
通过阅读下文优选实施例的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施例的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:By reading the detailed description of the preferred embodiments below, various other advantages and benefits will become clear to those of ordinary skill in the art. The drawings are only for the purpose of showing the preferred embodiments, and are not considered as limitations to the present application. Furthermore, the same reference numerals are used to denote the same parts throughout the drawings. In the drawings:
图1是本申请实施例的一种机器人会话切换的方法流程图;FIG. 1 is a flowchart of a method for robot session switching according to an embodiment of the present application;
图2A是本申请实施例的一种机器人会话切换的方法中切换当前会话人的切换条件判断流程图;2A is a flow chart for determining a switching condition for switching a current conversation person in a robot conversation switching method according to an embodiment of the present application;
图2B是本申请实施例中判断当前会话人是否处于会话状态的流程图;2B is a flowchart of determining whether the current conversation person is in a conversation state in the embodiment of the present application;
图2C是本申请实施例中从所述候选会话人中选择目标会话人流程图;2C is a flowchart of selecting a target conversation person from the candidate conversation persons in the embodiment of the present application;
图3是本申请一种机器人会话切换的方法另一实施例的流程图;3 is a flowchart of another embodiment of a method for robot session switching of this application;
图4是本申请一种机器人会话切换装置的功能框图;4 is a functional block diagram of a robot session switching device of the present application;
图5是本申请一种计算设备的示意图。5 is a schematic diagram of a computing device of the present application.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although the exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.
图1示出了本申请一种机器人会话切换的方法实施例的流程图。如图1所示,该方法包括以下步骤:FIG. 1 shows a flowchart of an embodiment of a method for robot session switching according to the present application. As shown in Figure 1, the method includes the following steps:
步骤S1:采集位于所述机器人前方的环境图像。Step S1: Acquire the environment image located in front of the robot.
在本步骤中,用户与机器人进行对话时,通常会站在机器人的前方,而所述机器人前方的环境图像是指机器人前方的图像,因此,当有用户与机器人对话时,则可以采集到该用户的人脸图像。In this step, the user usually stands in front of the robot when talking to the robot, and the environment image in front of the robot refers to the image in front of the robot. Therefore, when a user talks to the robot, the User's face image.
步骤S2:从所述环境图像中确定候选会话人。Step S2: Determine candidate candidates from the environment image.
在本步骤中,当所述机器人前方的环境图像中出现多个人脸时,距离所述 机器人位置较远的人脸会模糊或者人脸尺寸比较小,而位置较远的用户通常不是与机器人进行对话的用户,其可能是经过的路人,或者,站立在附近的群众等等,因此,在本实施例中,在从所述环境图像中确定候选会话人之前,还可以将模糊不清的人脸和人脸尺寸小于阈值的人脸剔除,所述候选会话人是所述环境图像中剔除模糊不清或者人脸尺寸小于阈值的人脸之后剩余的人脸。In this step, when multiple human faces appear in the environment image in front of the robot, the faces farther away from the robot will be blurred or the face size is smaller, while the users who are farther away usually do not interact with the robot. The user of the conversation may be a passer-by, or a crowd standing nearby, etc. Therefore, in this embodiment, before determining the candidate conversational person from the environment image, the blurred person may also be identified Removal of faces and faces whose face size is less than a threshold, and the candidate conversational person is the remaining face after excluding the blurred faces in the environmental image or faces whose face size is less than the threshold.
步骤S3:判断是否满足切换当前会话人的切换条件,若满足,执行步骤S4,若不满足,执行步骤S5。Step S3: It is judged whether the switching condition for switching the current conversation person is satisfied, if it is satisfied, step S4 is performed, if not, step S5 is performed.
步骤S4:从所述候选会话人中选择目标会话人。Step S4: Select a target conversation person from the candidate conversation persons.
步骤S5:继续与所述当前会话人的会话。Step S5: Continue the conversation with the current conversation person.
在本步骤中,当不满足切换当前会话人的切换条件时,则说明存在所述当前会话人正在与机器人会话,继续与所述当前会话人的会话。In this step, when the switching condition for switching the current conversation person is not satisfied, it indicates that the current conversation person is in conversation with the robot, and the conversation with the current conversation person is continued.
步骤S6:将所选择的目标会话人确定为当前会话人。Step S6: Determine the selected target conversation person as the current conversation person.
在本步骤中,当满足切换当前会话人的切换条件时,将所述当前会话人切换为所选择的目标会话人,并且机器人主动与所述目标会话人开启会话,所述主动开启会话包括机器人获取所述目标会话人的人脸图像,显示在会话屏幕上,并获取所述人脸图像对应的人物姓名,主动对所述目标会话人发出语音,如,目标会话人姓名为张三,机器人在将当前会话人切换为张三后,在会话屏幕上显示张三的人脸图像,并发出“张三,您好,请问有什么可以帮助您”的语音提示,从而开启会话。可以理解的是,所述会话屏幕上显示的人脸图像可以是预先存储在预设信息库中的图像,也可以是机器人的摄像头实时采集到的目标会话人的人脸图像。In this step, when the switching condition for switching the current conversational person is satisfied, the current conversational person is switched to the selected target conversational person, and the robot actively initiates a conversation with the target conversational person, the active opening conversation includes the robot Obtain the face image of the target conversation person, display it on the conversation screen, and obtain the name of the person corresponding to the face image, and actively speak to the target conversation person. For example, the name of the target conversation person is Zhang San, the robot After switching the current conversation person to Zhang San, Zhang San's face image is displayed on the conversation screen, and a voice prompt "Hello Zhang San, may I help you" is issued to start the conversation. It can be understood that the face image displayed on the conversation screen may be an image pre-stored in a preset information library, or a face image of a target conversation person collected in real time by the camera of the robot.
图2A示出了本申请实施例中切换当前会话人的切换条件判断的流程图,如图2A所示,所述判断是否满足切换当前会话人的切换条件包括以下步骤:FIG. 2A shows a flowchart of the judgment of the switching condition for switching the current conversation person in the embodiment of the present application. As shown in FIG. 2A, the judgment whether the switching condition for switching the current conversation person is satisfied includes the following steps:
步骤S31:判断是否存在当前会话人,若否,执行步骤S32,若是,执行步骤S33。Step S31: determine whether there is a current conversation person, if not, go to step S32, if yes, go to step S33.
当前会话人是机器人记录的当前正在对话的对象,在机器人中存储有当前会话人存在的状态,如,当存在当前会话人时,将存在当前会话人的状态记录为1,当不存在当前会话人时,将存在当前会话人的状态记录为0。The current conversation person is the object of the current conversation recorded by the robot. The state of the current conversation person is stored in the robot. For example, when the current conversation person exists, the state of the current conversation person is recorded as 1, when the current conversation does not exist When there is a person, record the state of the current conversation person as 0.
步骤S32:确定满足切换当前会话人的切换条件。Step S32: It is determined that the switching condition for switching the current conversation person is satisfied.
步骤S33:判断所述当前会话人是否处于会话状态,若是,执行步骤S34, 若否,执行步骤S35。Step S33: determine whether the current conversation person is in a conversation state, if yes, perform step S34, if not, perform step S35.
步骤S34:确定不满足切换当前会话人的切换条件。Step S34: It is determined that the switching condition for switching the current conversation person is not satisfied.
步骤S35:确定满足切换当前会话人的切换条件。Step S35: It is determined that the switching condition for switching the current conversation person is satisfied.
图2B示出了本申请实施例中判断当前会话人是否处于会话状态的流程图,如图2B所示,所述判断所述当前会话人是否处于会话状态包括以下步骤:FIG. 2B shows a flowchart of determining whether the current conversation person is in the conversation state in the embodiment of the present application. As shown in FIG. 2B, the determining whether the current conversation person is in the conversation state includes the following steps:
步骤S331:判断所述当前会话人是否包含在所述候选会话人中,若是,执行步骤S332,若否,执行步骤S333。Step S331: determine whether the current conversational person is included in the candidate conversational person, if yes, perform step S332, if not, perform step S333.
在本步骤中,将所述当前会话人的人脸图像与所述环境图像中的人脸图像做对比,若对比成功,则认为所述当前会话人包含在所述候选会话人中。In this step, the face image of the current conversation person is compared with the face image in the environment image. If the comparison is successful, the current conversation person is considered to be included in the candidate conversation person.
步骤S332:确定所述当前会话人处于会话状态。Step S332: It is determined that the current conversation person is in a conversation state.
在本步骤中,当所述当前会话人的人脸图像与所述环境图像中的人脸图像对比成功时,认为所述当前会话人正在与机器人会话。In this step, when the face image of the current conversation person is successfully compared with the face image in the environment image, the current conversation person is considered to be in conversation with the robot.
步骤S333:判断是否存在所述当前会话人返回的结束会话的结束命令,若是,执行步骤S334,若否,执行步骤S335。Step S333: determine whether there is an end command to end the conversation returned by the current conversation person, if yes, perform step S334, if not, perform step S335.
在本步骤中,当所述当前会话人结束与机器人的会话时,向所述机器人返回会话结束命令,所述会话结束命令是由所述当前会话人发起的语音命令,如,“再见”、“下次见”。In this step, when the current conversation person ends the conversation with the robot, a conversation end command is returned to the robot. The conversation end command is a voice command initiated by the current conversation person, for example, "Goodbye", "See you next time."
在一些实施例中,机器人设置有结束会话按钮,当所述当前会话人完成与机器人会话,想要结束会话时,点击结束会话按钮,即可实现结束当前会话。In some embodiments, the robot is provided with an end session button. When the current conversation person completes the conversation with the robot and wants to end the session, click the end session button to end the current session.
步骤S334:确定所述当前会话人不处于会话状态。Step S334: It is determined that the current conversation person is not in a conversation state.
步骤S335:判断所述当前会话人是否均不包含在最近连续采集到的环境图像对应的候选会话人中,若是,执行步骤S336,若否,执行步骤S337。Step S335: determine whether none of the current conversational persons are included in the candidate conversational persons corresponding to the environmental images that have been continuously collected recently. If yes, perform step S336; if not, perform step S337.
在本步骤中,当所述当前会话人没有包含在所述候选会话人中,且机器人没有收到会话结束命令时,所述当前会话人有可能处于会话状态,例如,当前会话人低头或回头,造成没有采集到当前会话人人脸,为了减少机器人判断误差,判断当前采集到的环境图像的上N帧图像对应的候选会话人中是否包含所述当前会话人,所述N为预设的大于0的常数,如,N设置为5,则判断所述当前会话人是否均不包含在最近连续采集到的前5帧环境图像对应的候选会话人中。当连续N帧图像都没有采集到当前会话人的人脸,则可以认为发前会话人已经离开了,当前会话人没有处于会话状态。In this step, when the current conversation person is not included in the candidate conversation person, and the robot does not receive the session end command, the current conversation person may be in a conversation state, for example, the current conversation person looks down or looks back , Causing no face of the current conversation person to be collected, in order to reduce the robot's judgment error, determine whether the current conversation person is included in the candidate conversation person corresponding to the N frames of the currently collected environmental image, where N is a preset A constant greater than 0, for example, if N is set to 5, it is determined whether none of the current conversational persons is included in the candidate conversational persons corresponding to the last five consecutively collected environmental images. When the continuous N frames of images have not collected the face of the current conversation person, it can be considered that the conversation person has left before sending, and the current conversation person is not in the conversation state.
步骤S336:确定所述当前会话人不处于会话状态。Step S336: It is determined that the current conversation person is not in a conversation state.
步骤S337:确定所述当前会话人处于会话状态。Step S337: It is determined that the current conversation person is in a conversation state.
在一些实施例中,当所述最近采集到的环境图像中包含所述当前会话人时,将所述最近采集到的环境中对应的当前会话人的人脸替代人脸信息库中的人脸,方便下次做人脸对比时对比。In some embodiments, when the recently collected environment image includes the current conversation person, the face of the corresponding current conversation person in the recently collected environment is substituted for the face in the face information database To make it easier to compare the next face comparison.
需要说明的是,摄像头前的环境会随人的动作而变化,人可能会发生转头或表情变化,考虑到摄像头采集环境图像的频率,以及人发生动作变化时可能会持续一段时间,当采集下一帧环境图像时,人的动作或表情有很大可能性还停留在当前帧采集的环境图像对应的动作或表情,所以,最近采集到的环境图像中对应的当前会话人的人脸与下一帧即将采集到的环境图像中对应的当前会话人的人脸相似性最高,因此,将最近采集到的所述当前会话人的人脸图像替代所述人脸信息库中的人脸图像,以使机器人更加方便快捷的进行人脸图像的对比。It should be noted that the environment in front of the camera will change with the action of the person, and the person may turn their head or change their expression. Considering the frequency of the environment image collected by the camera and the movement of the person, it may last for a period of time. In the next frame of the environmental image, there is a high probability that the person’s actions or expressions will remain in the action or expression corresponding to the environment image collected in the current frame. Therefore, the corresponding face of the current conversation person in the recently collected environment image corresponds to The face image of the current conversation person corresponding to the environmental image to be collected in the next frame has the highest face similarity, therefore, the face image of the current conversation person recently collected is substituted for the face image in the face information database , So that the robot can compare face images more conveniently and quickly.
图2C示出了本申请实施例中从所述候选会话人中选择目标会话人流程图,如图2C所示,所述从所述候选会话人中选择目标会话人包括以下步骤:FIG. 2C shows a flowchart of selecting a target conversation person from the candidate conversation persons in an embodiment of the present application. As shown in FIG. 2C, the selection of the target conversation person from the candidate conversation persons includes the following steps:
步骤S41:从所述环境图像中提取各候选会话人的会话参数。Step S41: extract the conversation parameters of each candidate conversation person from the environment image.
在本步骤中,所述会话参数包括从所述环境图像中提取到的唇语、人脸尺寸及位置参数,其中,所述唇语用来表示所述各候选会话人是否在说话,在计算唇语参数时,在一些实施例中,可以将说话的候选会话人对应的唇语参数的数值记为1,没有说话的候选会话人对应的唇语参数的数值记为0。In this step, the conversation parameters include lip language, face size and position parameters extracted from the environment image, wherein the lip language is used to indicate whether each candidate conversation person is speaking In the case of lip parameters, in some embodiments, the value of the lip parameter corresponding to the candidate speaking person may be recorded as 1, and the value of the lip parameter corresponding to the candidate speaking person not speaking is 0.
所述人脸尺寸用来表示所述各候选会话人与机器人的距离,在一些实施例中,在计算人脸尺寸参数时,将所述环境图像中的人脸对应的像素区域除以所述环境图像的像素区域,得到所述环境图像中的人脸在整个环境图像中的比例,将所述比例作为人脸尺寸参数。The face size is used to represent the distance between each candidate conversation person and the robot. In some embodiments, when calculating the face size parameter, the pixel area corresponding to the face in the environmental image is divided by the The pixel area of the environment image obtains the ratio of the human face in the environment image to the entire environment image, and uses the ratio as the face size parameter.
所述位置参数用来表示所述各候选会话人与机器人中心线的距离,在一些实施例中,在计算位置参数时,首先判断所述候选会话人位于所述机器人中心线左边还是所述机器人中心线右边,若所述候选会话人位于所述机器人中心线左边,则将所述环境图像左边缘作为起始点,若所述候选会话人位于所述机器人中心线右边,则将所述环境图像右边缘作为起始点,将所述起始点到所述机器人中心线的距离作为分母,将候选会话人距离起始点的距离作为分子,得到 所述环境图像中的候选会话人位置参数。The position parameter is used to represent the distance between each candidate conversation person and the robot center line. In some embodiments, when calculating the position parameter, it is first determined whether the candidate conversation person is located to the left of the robot center line or the robot To the right of the center line, if the candidate conversation person is located to the left of the robot center line, the left edge of the environment image is used as a starting point, and if the candidate conversation person is located to the right of the robot center line, the environment image is used The right edge is used as the starting point, the distance from the starting point to the robot center line is used as the denominator, and the distance from the candidate conversation person to the starting point is used as the numerator to obtain the candidate conversation person position parameters in the environment image.
步骤S42:根据各所述候选会话人的会话参数,分别计算各所述候选会话人的会话得分。Step S42: Calculate the session score of each candidate conversation person according to the conversation parameters of each candidate conversation person.
对于根据会话参数,计算会话得分的计算公式为:唇语权重*唇语参数+人脸尺寸权重*人脸尺寸参数+位置权重*位置参数。For the session parameters, the calculation formula for calculating the session score is: lip language weight*lip language parameter+face size weight*face size parameter+position weight*position parameter.
而不同会话参数反应候选会话人是否处于会话状态的准确性是不相同的,因此,在计算候选会话人的会话得分时,还可以给所述候选会话人的会话参数预设有不同的权重,根据所述权重将所述会话参数加权计算,得到不同候选会话人的会话得分。例如:在进行会话时,所述候选会话人的会话参数中,唇语更能够反映所述候选会话人是否处于会话状态,因此,在进行权重设计时,唇语在会话参数中所占的权重最高,例如:,唇语权重设置为0.7,人脸尺寸及位置参数的权重分别设置为0.2和0.1,其中一个候选会话人正在说话,其人脸尺寸参数为20%,位置参数为2/3,则该候选会话人得分为:0.7*1+0.2*20%+0.1*2/3≈0.8。However, the accuracy of different session parameters reflecting whether the candidate conversation person is in the conversation state is different. Therefore, when calculating the conversation score of the candidate conversation person, different weights may be preset for the conversation parameter of the candidate conversation person, The session parameters are weighted and calculated according to the weights to obtain session scores for different candidates. For example, when conducting a conversation, among the conversational parameters of the candidate conversational person, the lip language can better reflect whether the candidate conversational person is in the conversational state. Therefore, when designing the weight, the weight of the lip language in the conversational parameter The highest, for example, the lip weight is set to 0.7, and the face size and position parameters are set to 0.2 and 0.1 respectively. One of the candidate conversations is speaking, the face size parameter is 20%, and the position parameter is 2/3 , Then the candidate conversation person score is: 0.7*1+0.2*20%+0.1*2/3≈0.8.
步骤S43:将所述会话得分最高的候选会话人作为目标会话人。Step S43: Use the candidate conversation person with the highest conversation score as the target conversation person.
在本申请实施例中,通过判断当前会话人是否满足会话人的切换条件,决定是否切换当前会话人,并通过设置会话参数从候选会话人中选择目标会话人,以便在满足会话人的切换条件时,将当前会话人切换为目标会话人,实现机器人主动切换当前会话人。In the embodiment of the present application, whether to switch the current conversation person is determined by judging whether the current conversation person meets the conversation person's switching condition, and the target conversation person is selected from the candidate conversation persons by setting the conversation parameters, so as to meet the conversation person's switching condition When the current conversation person is switched to the target conversation person, the robot can actively switch the current conversation person.
图3示出了本申请一种机器人会话切换的方法另一实施例的流程图。与上一实施例相比,本申请实施例还包括以下步骤:FIG. 3 shows a flowchart of another embodiment of a method for robot session switching of the present application. Compared with the previous embodiment, the embodiment of the present application further includes the following steps:
步骤S7:提取所述当前会话人的人脸图像。Step S7: Extract the face image of the current conversation person.
步骤S8:识别在预设信息库中是否存在所述人脸图像相匹配的用户,若存在,执行步骤S9,若不存在,则执行步骤S11。Step S8: Identify whether there is a user who matches the face image in the preset information library. If it exists, perform step S9; if not, perform step S11.
在本步骤中,将所述当前会话人的人脸图像与预设信息库中的人脸图像进行匹配,所述预设信息库预存储了大量使用机器人的用户人脸及其对应的背景信息,所述用户人脸与所述对应的背景信息是一一对应的。In this step, the face image of the person in the current conversation is matched with the face image in the preset information library, the preset information library pre-stores a large number of user faces using the robot and their corresponding background information , The user's face and the corresponding background information are in one-to-one correspondence.
步骤S9:从所述预设信息库中提取所述用户对应的背景信息。Step S9: Extract background information corresponding to the user from the preset information library.
背景信息是指用户的个人信息,例如:姓名、职业、职位等等。Background information refers to the user's personal information, such as name, occupation, position, etc.
步骤S10:将所述人脸图像和背景信息推送至人工座席辅助终端。Step S10: Push the face image and background information to the artificial agent assistant terminal.
人工座席辅助终端是工作人员辅助机器人的终端设备。人工座席辅助终端在接收到所述机器人发送的人脸图像和背景信息之后,可以显示背景信息和人脸图像,以方便工作人员了解当前会话人,以及当所述机器人无法完成所述当前会话人的问题时,工作人员可以准确辅助机器人作答。The artificial assistant terminal is the terminal equipment of the assistant robot. After receiving the face image and background information sent by the robot, the artificial assistant terminal can display background information and face image to facilitate the staff to understand the current conversation person, and when the robot cannot complete the current conversation person When the question is asked, the staff can accurately assist the robot to answer.
步骤S11:将所述人脸图像推送至人工座席辅助终端。Step S11: Push the face image to the artificial assistant terminal.
在本步骤中,当机器人无法完成与当前会话人的问题时,工作人员可以辅助机器人作答。In this step, when the robot cannot complete the question with the current conversational person, the staff can assist the robot to answer.
在本申请实施例中,通过人工座席辅助终端,实现人工辅助会话,解决了机器人无法解决当前会话人的问题时,人工辅助解决,提高了机器人工作的效率。In the embodiment of the present application, the artificial assistant session is used to realize the artificial assistant session, which solves the problem that the robot cannot solve the problem of the current conversation person, and the artificial assistant solves the problem, which improves the efficiency of the robot work.
图4示出了本申请一种机器人会话切换装置的功能框图,如图4所示,所述装置包括:采集模块401、第一确定模块402、判断模块403、选择模块404及第二确定模块405,其中,采集模块401,用于采集位于所述机器人前方的环境图像;第一确定模块402,用于从所述环境图像中确定候选会话人;判断模块403,用于判断是否满足切换当前会话人的切换条件;选择模块404,用于满足切换当前会话人的切换条件时,从所述候选会话人中选择目标会话人;第二确定模块405,用于将所选择的目标会话人确定为当前会话人。FIG. 4 shows a functional block diagram of a robot session switching device of the present application. As shown in FIG. 4, the device includes: an acquisition module 401, a first determination module 402, a determination module 403, a selection module 404, and a second determination module 405, wherein the collection module 401 is used to collect an environment image located in front of the robot; the first determination module 402 is used to determine candidate candidates from the environment image; and the determination module 403 is used to determine whether the switching current is satisfied The switching condition of the conversation person; the selection module 404 is used to select the target conversation person from the candidate conversation persons when the switching condition of switching the current conversation person is satisfied; the second determination module 405 is used to determine the selected target conversation person Is the current conversation.
其中,判断模块403包括:第一判断单元4031、第一确定单元4032、第二判断单元4033、第二确定单元4034及第三确定单元4035,其中,第一判断单元4031,用于判断是否存在当前会话人;第一确定单元4032,用于当不存在当前会话人时,确定满足切换当前会话人的切换条件;第二判断单元4033,用于当存在当前会话人时,判断所述当前会话人是否处于会话状态;第二确定单元4034,用于当所述当前会话人处于会话状态时,确定不满足切换当前会话人的切换条件;第三确定单元4035,用于当所述当前会话人不处于会话状态时,确定满足切换当前会话人的切换条件。The determination module 403 includes: a first determination unit 4031, a first determination unit 4032, a second determination unit 4033, a second determination unit 4034, and a third determination unit 4035, wherein the first determination unit 4031 is used to determine whether it exists The current conversation person; the first determination unit 4032 is used to determine that the switching condition for switching the current conversation person is satisfied when there is no current conversation person; the second judgment unit 4033 is used to determine the current conversation when there is a current conversation person Whether the person is in a conversation state; the second determination unit 4034 is used to determine that the switching condition for switching the current conversation person is not satisfied when the current conversation person is in the conversation state; the third determination unit 4035 is used when the current conversation person When not in the conversation state, it is determined that the switching condition for switching the current conversation person is satisfied.
其中,所述第二判断单元4033用于当存在当前会话人时,判断所述当前会话人是否处于会话状态,包括:判断所述当前会话人是否包含在所述候选会话人中;若包含,则确定所述当前会话人处于会话状态;若不包含,则判断是否存在所述当前会话人返回的结束会话的结束命令;若存在,则确定所述当前会话人不处于会话状态;若不存在,判断所述当前会话人是否均不包含在最近连 续采集到的环境图像对应的候选会话人中,其中,所述最近连续环境图像为先前采集到的并且与所述环境图像存在连续关系的预设数量的图像;若所述当前会话人均不包含在最近连续采集到的环境图像对应的候选会话人中,则确定所述当前会话人不处于会话状态;若所述当前会话人包含在任意一个最近连续采集到的环境图像对应的候选会话人中,确定所述当前会话人处于会话状态。Wherein, the second judgment unit 4033 is used to judge whether the current conversation person is in the conversation state when there is a current conversation person, including: determining whether the current conversation person is included in the candidate conversation person; It is determined that the current conversation person is in a conversation state; if not, it is determined whether there is an end command for ending the conversation returned by the current conversation person; if it exists, it is determined that the current conversation person is not in a conversation state; if it does not exist To determine whether none of the current conversational persons are included in the candidate conversational persons corresponding to the recently continuously collected environmental images, wherein the most recent consecutive environmental images are pre-collected and have a continuous relationship with the environmental images. Set the number of images; if none of the current conversational persons are included in the candidate conversational persons corresponding to the recently continuously collected environmental images, it is determined that the current conversational person is not in the conversational state; if the current conversational person is included in any one Among the candidate conversational persons corresponding to the environmental images collected continuously recently, it is determined that the current conversational person is in a conversational state.
其中,所述选择模块404包括:提取单元4041、计算单元4042及选择单元4043,其中,提取单元4041,用于从所述环境图像中提取各候选会话人的会话参数,其中,所述会话参数包括从所述环境图像中提取到的唇语、人脸尺寸及位置参数;计算单元4042,用于根据各所述候选会话人的会话参数,分别计算各所述候选会话人的会话得分;选择单元4043,用于将所述会话得分最高的候选会话人作为目标会话人。Wherein, the selection module 404 includes: an extraction unit 4041, a calculation unit 4042, and a selection unit 4043, wherein the extraction unit 4041 is used to extract the conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters Including lip language, face size and position parameters extracted from the environment image; a calculation unit 4042 is used to calculate the conversation score of each candidate conversation person according to the conversation parameters of each candidate conversation person; selection Unit 4043 is used to select the candidate conversation person with the highest conversation score as the target conversation person.
在本申请实施例中,所述装置还包括:第一提取模块406、识别模块407、第二提取模块408、及推送模块409,其中,第一提取模块406,用于提取所述当前会话人的人脸图像;识别模块407,用于识别在预设信息库中是否存在所述人脸图像相匹配的用户;第二提取模块408,用于当在预设信息库中存在所述人脸图像相匹配的用户时,从所述预设信息库中提取所述用户对应的背景信息;推送模块409,用于将所述人脸图像和背景信息推送至人工座席辅助终端。In the embodiment of the present application, the device further includes: a first extraction module 406, an identification module 407, a second extraction module 408, and a push module 409, wherein the first extraction module 406 is used to extract the current conversation person Face image; recognition module 407, used to identify whether there is a user who matches the face image in the preset information library; second extraction module 408, used when the face exists in the preset information library When a user whose image matches, extract the background information corresponding to the user from the preset information library; a pushing module 409 is used to push the face image and the background information to the artificial seat assistant terminal.
本申请实施例通过判断模块判断当前会话人是否满足会话人的切换条件,并通过选择模块从候选会话人中选择目标会话人,以便在满足会话人的切换条件时,将当前会话人切换为目标会话人;此外,通过推送模块将当前会话人对应在预设信息库中的人脸图像和背景信息推送到人工座席辅助终端,实现人工辅助会话;通过本申请实施例,可以实现机器人主动切换当前会话人,并通过人工辅助解决机器人会话,提高了机器人工作的效率。The embodiment of the present application judges whether the current conversation person meets the conversation person's switching condition through the judgment module, and selects the target conversation person from the candidate conversation persons through the selection module, so that when the conversation person's switching condition is satisfied, the current conversation person is switched to the target Conversation person; In addition, the facial image and background information corresponding to the current conversation person corresponding to the preset information library are pushed to the artificial agent assistant terminal through the push module to realize the artificial assistant conversation; through the embodiment of the present application, the robot can actively switch the current Conversing with people, and solving robot conversation through manual assistance, which improves the efficiency of robot work.
本申请实施例提供了一种非易失性计算机可读存储介质,所述计算机存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实施例中的一种机器人会话切换的方法。An embodiment of the present application provides a non-volatile computer-readable storage medium, where the computer storage medium stores at least one executable instruction, and the computer-executable instruction can perform a robot session switching in any of the foregoing method embodiments Methods.
图5为本申请计算设备实施例的结构示意图,本申请具体实施例并不对计算设备的具体实现做限定。FIG. 5 is a schematic structural diagram of an embodiment of a computing device of the present application, and specific embodiments of the present application do not limit the specific implementation of the computing device.
如图5所示,该计算设备可以包括:处理器(processor)502、通信接口(Communications Interface)504、存储器(memory)506、以及通信总线508。As shown in FIG. 5, the computing device may include: a processor 502, a communication interface 504, a memory 506, and a communication bus 508.
其中:among them:
处理器502、通信接口504、以及存储器506通过通信总线508完成相互间的通信。The processor 502, the communication interface 504, and the memory 506 communicate with each other through the communication bus 508.
通信接口504,用于与其它设备通信。The communication interface 504 is used to communicate with other devices.
处理器502,用于执行程序510,具体可以执行上述一种机器人会话切换的方法实施例中的相关步骤。The processor 502 is used to execute the program 510, and specifically can execute relevant steps in the foregoing embodiment of a method for switching a robot session.
具体地,程序510可以包括程序代码,该程序代码包括计算机操作指令。Specifically, the program 510 may include a program code, and the program code includes a computer operation instruction.
处理器502可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本申请实施例的一个或多个集成电路。计算设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。The processor 502 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present application. The one or more processors included in the computing device may be processors of the same type, such as one or more CPUs, or may be processors of different types, such as one or more CPUs and one or more ASICs.
存储器506,用于存放程序510。存储器506可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The memory 506 is used to store the program 510. The memory 506 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), for example, at least one magnetic disk memory.
程序510具体可以用于使得处理器502执行以下操作:采集位于所述机器人前方的环境图像;从所述环境图像中确定候选会话人;判断是否满足切换当前会话人的切换条件;若满足,则从所述候选会话人中选择目标会话人;将所选择的目标会话人确定为当前会话人。The program 510 may specifically be used to cause the processor 502 to perform the following operations: collect an environmental image located in front of the robot; determine candidate conversational persons from the environmental image; determine whether the switching condition for switching the current conversational person is satisfied; if satisfied, then Select a target talker from the candidate talkers; determine the selected target talker as the current talker.
在一种可选的方式中,程序510具体可以进一步用于使得处理器502执行以下操作:判断是否存在当前会话人;若否,则确定满足切换当前会话人的切换条件;若是,则判断所述当前会话人是否处于会话状态;若处于会话状态,则确定不满足切换当前会话人的切换条件;若不处于会话状态,则确定满足切换当前会话人的切换条件。In an optional manner, the program 510 may be further specifically configured to cause the processor 502 to perform the following operations: determine whether there is a current conversational person; if not, determine that the switching condition for switching the current conversational person is satisfied; if so, determine It is stated whether the current conversation person is in the conversation state; if it is in the conversation state, it is determined that the switching condition for switching the current conversation person is not satisfied; if it is not in the conversation state, it is determined that the switching condition for switching the current conversation person is satisfied.
在一种可选的方式中,程序510具体可以进一步用于使得处理器502执行以下操作:判断所述当前会话人是否包含在所述候选会话人中;若包含,则确定所述当前会话人处于会话状态;若不包含,则判断是否存在所述当前会话人返回的结束会话的结束命令;若存在,则确定所述当前会话人不处于会话状态。若不存在,判断所述当前会话人是否均不包含在最近连续采集到的环境图像对应的候选会话人中,其中,所述最近连续环境图像为先前采集到的并且与所述 环境图像存在连续关系的预设数量的图像;若所述当前会话人均不包含在最近连续采集到的环境图像对应的候选会话人中,则确定所述当前会话人不处于会话状态;若所述当前会话人包含在任意一个最近连续采集到的环境图像对应的候选会话人中,确定所述当前会话人处于会话状态。In an optional manner, the program 510 may be further specifically configured to cause the processor 502 to perform the following operations: determine whether the current conversation person is included in the candidate conversation person; if so, determine the current conversation person It is in the conversation state; if it does not contain, it is judged whether there is an end command for ending the conversation returned by the current conversation person; if it exists, it is determined that the current conversation person is not in the conversation state. If it does not exist, determine whether the current conversation person is not included in the candidate conversation person corresponding to the recently continuously collected environment image, wherein the most recent consecutive environment image is previously collected and exists continuously with the environment image A predetermined number of images of the relationship; if none of the current conversational persons are included in the candidate conversational persons corresponding to the environmental images that have been continuously collected recently, it is determined that the current conversational person is not in the conversational state; if the current conversational person contains Among the candidate conversational persons corresponding to any recently continuously collected environmental images, it is determined that the current conversational person is in a conversational state.
在一种可选的方式中,程序510具体可以进一步用于使得处理器502执行以下操作:从所述环境图像中提取各候选会话人的会话参数,其中,所述会话参数包括从所述环境图像中提取到的唇语、人脸尺寸及位置参数;根据各所述候选会话人的会话参数,分别计算各所述候选会话人的会话得分;将所述会话得分最高的候选会话人作为目标会话人。In an optional manner, the program 510 may be further specifically configured to cause the processor 502 to perform the following operation: extract the conversation parameters of each candidate conversation person from the environment image, where the conversation parameters include the environment parameters Lips, face size and position parameters extracted from the image; based on the conversation parameters of each candidate conversation person, calculate the conversation score of each candidate conversation person respectively; target the candidate conversation person with the highest conversation score as the target Conversational person.
在一种可选的方式中,程序510具体可以进一步用于使得处理器502执行以下操作:提取所述当前会话人的人脸图像;识别在预设信息库中是否存在所述人脸图像相匹配的用户;若存在,则从所述预设信息库中提取所述用户对应的背景信息;将所述人脸图像和背景信息推送至人工座席辅助终端。In an optional manner, the program 510 may be further specifically configured to cause the processor 502 to perform the following operations: extract the face image of the person in the current conversation; identify whether the face image phase exists in the preset information library The matched user; if it exists, extract the background information corresponding to the user from the preset information library; push the face image and background information to the artificial agent auxiliary terminal.
在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本申请也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本申请的内容,并且上面对特定语言所做的描述是为了披露本申请的最佳实施例。The algorithms and displays provided here are not inherently related to any particular computer, virtual system or other devices. Various general-purpose systems can also be used with the teaching based on this. From the above description, the structure required to construct such systems is obvious. In addition, this application does not target any specific programming language. It should be understood that various programming languages can be used to implement the content of the present application described herein, and the description of the specific language above is for disclosing the best embodiment of the present application.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。The specification provided here explains a lot of specific details. However, it can be understood that the embodiments of the present application can be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本申请的示例性实施例的描述中,本申请的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本申请要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施例的权利要求书由此明确地并入该具体实施例,其中每个权利要求本身都作为本申请的单独实施例。Similarly, it should be understood that in order to streamline the disclosure and help understand one or more of the various inventive aspects, in the above description of exemplary embodiments of the present application, various features of the present application are sometimes grouped together into a single embodiment, Figure, or its description. However, the disclosed method should not be interpreted as reflecting the intention that the claimed application claims more features than those explicitly recited in each claim. Rather, as the claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Therefore, the claims that follow a specific embodiment are hereby expressly incorporated into the specific embodiment, where each claim itself serves as a separate embodiment of the present application.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适 应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and set in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and in addition, they may be divided into a plurality of submodules or subunits or subcomponents. Except that at least some of such features and/or processes or units are mutually exclusive, all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any methods so disclosed or All processes or units of equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose.
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本申请的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。In addition, those skilled in the art can understand that although some of the embodiments described herein include certain features included in other embodiments instead of other features, the combination of features of different embodiments is meant to be within the scope of the present application And form different embodiments. For example, in the following claims, any one of the claimed embodiments can be used in any combination.
本申请的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的一种机器人会话切换装置中的一些或者全部部件的一些或者全部功能。本申请还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。Each component embodiment of the present application may be implemented by hardware, or implemented by a software module running on one or more processors, or implemented by a combination thereof. Those skilled in the art should understand that, in practice, a microprocessor or a digital signal processor (DSP) may be used to implement some or all functions of some or all components in a robot session switching device according to an embodiment of the present application. The present application may also be implemented as a device or device program (eg, computer program and computer program product) for performing a part or all of the method described herein. Such a program for implementing the present application may be stored on a computer-readable medium, or may have the form of one or more signals. Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.
应该注意的是上述实施例对本申请进行说明而不是对本申请进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本申请可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate the present application rather than limit the present application, and those skilled in the art can design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs between parentheses should not be constructed as limitations on the claims. The word "comprising" does not exclude the presence of elements or steps not listed in the claims. The word "a" or "one" before an element does not exclude the presence of multiple such elements. The application can be realized by means of hardware including several different elements and by means of a suitably programmed computer. In the unit claims enumerating several devices, several of these devices may be embodied by the same hardware item. The use of the words first, second, and third does not indicate any order. These words can be interpreted as names.

Claims (12)

  1. 一种机器人会话切换的方法,其特征在于,包括:A method for robot session switching, which is characterized by comprising:
    采集位于所述机器人前方的环境图像;Collecting environmental images located in front of the robot;
    从所述环境图像中确定候选会话人;Determine candidate candidates from the environment image;
    判断是否满足切换当前会话人的切换条件;Determine whether the switching conditions for switching the current conversation person are met;
    若满足,则从所述候选会话人中选择目标会话人;If satisfied, select the target conversation person from the candidate conversation persons;
    将所选择的目标会话人确定为当前会话人。The selected target conversation person is determined as the current conversation person.
  2. 根据权利要求1所述的方法,其特征在于,The method of claim 1, wherein:
    所述判断是否满足切换当前会话人的切换条件包括:The judging whether the switching condition for switching the current conversation person is satisfied includes:
    判断是否存在当前会话人;Determine whether there is a current conversation person;
    若否,则确定满足切换当前会话人的切换条件;If not, it is determined that the switching conditions for switching the current conversation person are satisfied;
    若是,则判断所述当前会话人是否处于会话状态;If yes, it is determined whether the current conversation person is in a conversation state;
    若处于会话状态,则确定不满足切换当前会话人的切换条件;If it is in the conversation state, it is determined that the switching condition for switching the current conversation person is not satisfied;
    若不处于会话状态,则确定满足切换当前会话人的切换条件。If it is not in the conversation state, it is determined that the switching condition for switching the current conversation person is satisfied.
  3. 根据权利要求2所述的方法,其特征在于,所述判断所述当前会话人是否处于会话状态包括:The method according to claim 2, wherein the judging whether the current conversation person is in a conversation state includes:
    判断所述当前会话人是否包含在所述候选会话人中;Determine whether the current conversational person is included in the candidate conversational person;
    若包含,则确定所述当前会话人处于会话状态;If it is included, it is determined that the current conversation person is in a conversation state;
    若不包含,则判断是否存在所述当前会话人返回的结束会话的结束命令;If not, it is judged whether there is an end command to end the conversation returned by the current conversation person;
    若存在,则确定所述当前会话人不处于会话状态;If it exists, it is determined that the current conversation person is not in a conversation state;
    若不存在,判断所述当前会话人是否均不包含在最近连续采集到的环境图像对应的候选会话人中,其中,所述最近连续环境图像为先前采集到的并且与所述环境图像存在连续关系的预设数量的图像;If it does not exist, determine whether none of the current conversational persons are included in the candidate conversational person corresponding to the recently continuously collected environmental image, wherein the most recent consecutive environmental image is previously collected and exists continuously with the environmental image A predetermined number of images of the relationship;
    若所述当前会话人均不包含在最近连续采集到的环境图像对应的候选会话人中,则确定所述当前会话人不处于会话状态;If none of the current conversational persons is included in the candidate conversational persons corresponding to the recently continuously collected environmental images, it is determined that the current conversational person is not in the conversation state;
    若所述当前会话人包含在任意一个最近连续采集到的环境图像对应的候选会话人中,确定所述当前会话人处于会话状态。If the current conversation person is included in any candidate conversation person corresponding to the most recently continuously collected environmental image, it is determined that the current conversation person is in a conversation state.
  4. 根据权利要求1所述的方法,其特征在于,The method of claim 1, wherein:
    所述从所述候选会话人中选择目标会话人包括:The selecting the target conversation person from the candidate conversation persons includes:
    从所述环境图像中提取各候选会话人的会话参数,其中,所述会话参数包括从所述环境图像中提取到的唇语、人脸尺寸及位置参数;Extracting the conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters include lip language, face size and position parameters extracted from the environment image;
    根据各所述候选会话人的会话参数,分别计算各所述候选会话人的会话得分;Calculating the session score of each candidate conversation person according to the conversation parameters of each candidate conversation person;
    将所述会话得分最高的候选会话人作为目标会话人。The candidate conversation person with the highest conversation score is taken as the target conversation person.
  5. 根据权利要求1-4中任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-4, wherein the method further comprises:
    提取所述当前会话人的人脸图像;Extract the face image of the current conversation person;
    识别在预设信息库中是否存在所述人脸图像相匹配的用户;Identify whether there is a user matching the face image in the preset information library;
    若存在,则从所述预设信息库中提取所述用户对应的背景信息;If it exists, extract the background information corresponding to the user from the preset information library;
    将所述人脸图像和背景信息推送至人工座席辅助终端。Push the face image and background information to the artificial agent assistant terminal.
  6. 一种机器人会话切换装置,其特征在于,包括:A robot conversation switching device, characterized by comprising:
    采集模块:用于采集位于所述机器人前方的环境图像;Acquisition module: used to collect environmental images located in front of the robot;
    第一确定模块:用于从所述环境图像中确定候选会话人;A first determination module: used to determine candidate conversational persons from the environment image;
    判断模块:用于判断是否满足切换当前会话人的切换条件;Judgment module: used to judge whether the switching conditions for switching the current conversation person are satisfied;
    选择模块:用于满足切换当前会话人的切换条件时,从所述候选会话人中选择目标会话人;Selection module: used to select the target conversation person from the candidate conversation persons when the switching condition for switching the current conversation person is satisfied;
    第二确定模块:用于将所选择的目标会话人确定为当前会话人。Second determination module: used to determine the selected target conversation person as the current conversation person.
  7. 根据权利要求6所述的装置,其特征在于,所述判断模块包括:The apparatus according to claim 6, wherein the judgment module comprises:
    第一判断单元:用于判断是否存在当前会话人;The first judgment unit: used to judge whether there is a current conversation person;
    第一确定单元:用于当不存在当前会话人时,确定满足切换当前会话人的切换条件;The first determining unit: used to determine that the switching condition for switching the current conversation person is satisfied when there is no current conversation person;
    第二判断单元:用于当存在当前会话人时,判断所述当前会话人是否处于会话状态;The second judgment unit: used to judge whether the current conversation person is in a conversation state when there is a current conversation person;
    第二确定单元:用于当所述当前会话人处于会话状态时,确定不满足切换当前会话人的切换条件;A second determining unit: used to determine that the switching condition for switching the current conversation person is not satisfied when the current conversation person is in the conversation state;
    第三确定单元:用于当所述当前会话人不处于会话状态时,确定满足切换当前会话人的切换条件。The third determining unit is used to determine that the switching condition for switching the current conversation person is satisfied when the current conversation person is not in the conversation state.
  8. 根据权利要求7所述的装置,其特征在于,所述第二判断单元用于当存在当前会话人时,判断所述当前会话人是否处于会话状态,包括:The apparatus according to claim 7, wherein the second judgment unit is configured to judge whether the current conversation person is in a conversation state when there is a current conversation person, including:
    判断所述当前会话人是否包含在所述候选会话人中;Determine whether the current conversational person is included in the candidate conversational person;
    若包含,则确定所述当前会话人处于会话状态;If it is included, it is determined that the current conversation person is in a conversation state;
    若不包含,则判断是否存在所述当前会话人返回的结束会话的结束命令;If not, it is judged whether there is an end command to end the conversation returned by the current conversation person;
    若存在,则确定所述当前会话人不处于会话状态;If it exists, it is determined that the current conversation person is not in a conversation state;
    若不存在,判断所述当前会话人是否均不包含在最近连续采集到的环境图像对应的候选会话人中,其中,所述最近连续环境图像为先前采集到的并且与所述环境图像存在连续关系的预设数量的图像;If it does not exist, determine whether none of the current conversational persons are included in the candidate conversational person corresponding to the recently continuously collected environmental image, wherein the most recent consecutive environmental image is previously collected and exists continuously with the environmental image A predetermined number of images of the relationship;
    若所述当前会话人均不包含在最近连续采集到的环境图像对应的候选会话人中,则确定所述当前会话人不处于会话状态;If none of the current conversational persons is included in the candidate conversational persons corresponding to the recently continuously collected environmental images, it is determined that the current conversational person is not in the conversation state;
    若所述当前会话人包含在任意一个最近连续采集到的环境图像对应的候选会话人中,确定所述当前会话人处于会话状态。If the current conversation person is included in any candidate conversation person corresponding to the most recently continuously collected environmental image, it is determined that the current conversation person is in a conversation state.
  9. 根据权利要求6所述的装置,其特征在于,所述选择模块包括:The apparatus according to claim 6, wherein the selection module comprises:
    提取单元:用于从所述环境图像中提取各候选会话人的会话参数,其中,所述会话参数包括从所述环境图像中提取到的唇语、人脸尺寸及位置参数;Extraction unit: used to extract conversation parameters of each candidate conversation person from the environment image, wherein the conversation parameters include lip language, face size and position parameters extracted from the environment image;
    计算单元:用于根据各所述候选会话人的会话参数,分别计算各所述候选会话人的会话得分;Calculation unit: used to calculate the session score of each candidate conversation person according to the conversation parameters of each candidate conversation person;
    选择单元:用于将所述会话得分最高的候选会话人作为目标会话人。Selection unit: used to select the candidate conversation person with the highest conversation score as the target conversation person.
  10. 根据权利要求6所述的装置,其特征在于,所述装置还包括:The device according to claim 6, wherein the device further comprises:
    第一提取模块:用于提取所述当前会话人的人脸图像;The first extraction module: used to extract the face image of the current conversation person;
    识别模块:用于识别在预设信息库中是否存在所述人脸图像相匹配的用户;Recognition module: used to identify whether there is a user who matches the face image in the preset information library;
    第二提取模块:用于当在预设信息库中存在所述人脸图像相匹配的用户时,从所述预设信息库中提取所述用户对应的背景信息;A second extraction module: used to extract background information corresponding to the user from the preset information library when there is a user who matches the face image in the preset information library;
    推送模块:用于将所述人脸图像和背景信息推送至人工座席辅助终端。Pushing module: used to push the face image and background information to the artificial assistant terminal.
  11. 一种计算设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-5中任一项所述的一种机器人会话切换的方法对应的操作。A computing device includes: a processor, a memory, a communication interface, and a communication bus. The processor, the memory, and the communication interface communicate with each other through the communication bus; the memory is used to store at least one An executable instruction that causes the processor to perform an operation corresponding to the method for robot session switching according to any one of claims 1-5.
  12. 一种计算机可读存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令使处理器执行如权利要求1-5中任一项所述的一种机器人会话切换的方法对应的操作。A computer-readable storage medium, at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform a method for robot session switching according to any one of claims 1-5 Corresponding operation.
PCT/CN2019/116087 2018-12-20 2019-11-06 Robot conversation switching method and apparatus, and computing device WO2020125252A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811562114.X 2018-12-20
CN201811562114.XA CN109648573B (en) 2018-12-20 2018-12-20 Robot session switching method and device and computing equipment

Publications (1)

Publication Number Publication Date
WO2020125252A1 true WO2020125252A1 (en) 2020-06-25

Family

ID=66115305

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116087 WO2020125252A1 (en) 2018-12-20 2019-11-06 Robot conversation switching method and apparatus, and computing device

Country Status (2)

Country Link
CN (1) CN109648573B (en)
WO (1) WO2020125252A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109648573B (en) * 2018-12-20 2020-11-10 达闼科技(北京)有限公司 Robot session switching method and device and computing equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011221101A (en) * 2010-04-05 2011-11-04 Ai:Kk Communication device
CN107160409A (en) * 2017-06-22 2017-09-15 星际(重庆)智能装备技术研究院有限公司 A kind of Intelligent greeting robot based on recognition of face and Voice command
CN108172225A (en) * 2017-12-27 2018-06-15 浪潮金融信息技术有限公司 Voice interactive method and robot, computer readable storage medium, terminal
CN108818531A (en) * 2018-06-25 2018-11-16 珠海格力智能装备有限公司 The control method and device of robot
CN109648573A (en) * 2018-12-20 2019-04-19 达闼科技(北京)有限公司 A kind of robot conversation switching method, device and calculate equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60151715A (en) * 1984-01-18 1985-08-09 Seiko Epson Corp Coordinate conversion method of robot
CN101187990A (en) * 2007-12-14 2008-05-28 华南理工大学 A session robotic system
CN107679504A (en) * 2017-10-13 2018-02-09 北京奇虎科技有限公司 Face identification method, device, equipment and storage medium based on camera scene

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011221101A (en) * 2010-04-05 2011-11-04 Ai:Kk Communication device
CN107160409A (en) * 2017-06-22 2017-09-15 星际(重庆)智能装备技术研究院有限公司 A kind of Intelligent greeting robot based on recognition of face and Voice command
CN108172225A (en) * 2017-12-27 2018-06-15 浪潮金融信息技术有限公司 Voice interactive method and robot, computer readable storage medium, terminal
CN108818531A (en) * 2018-06-25 2018-11-16 珠海格力智能装备有限公司 The control method and device of robot
CN109648573A (en) * 2018-12-20 2019-04-19 达闼科技(北京)有限公司 A kind of robot conversation switching method, device and calculate equipment

Also Published As

Publication number Publication date
CN109648573B (en) 2020-11-10
CN109648573A (en) 2019-04-19

Similar Documents

Publication Publication Date Title
US20190279642A1 (en) System and method for speech understanding via integrated audio and visual based speech recognition
Minotto et al. Multimodal multi-channel on-line speaker diarization using sensor fusion through SVM
JP6732703B2 (en) Emotion interaction model learning device, emotion recognition device, emotion interaction model learning method, emotion recognition method, and program
US11900959B2 (en) Speech emotion recognition method and apparatus
JP5196199B2 (en) Keyword display system, keyword display method, and program
US20210174702A1 (en) Communication skill evaluation system, device, method, and program
KR20170098675A (en) Robot control system
US11443554B2 (en) Determining and presenting user emotion
KR102222911B1 (en) System for Providing User-Robot Interaction and Computer Program Therefore
JP2019217558A (en) Interactive system and control method for the same
Kanvinde et al. Bidirectional sign language translation
CN112528004A (en) Voice interaction method, voice interaction device, electronic equipment, medium and computer program product
CN112632349A (en) Exhibition area indicating method and device, electronic equipment and storage medium
CN112286364A (en) Man-machine interaction method and device
WO2020125252A1 (en) Robot conversation switching method and apparatus, and computing device
CN113822187A (en) Sign language translation, customer service, communication method, device and readable medium
JP7370050B2 (en) Lip reading device and method
JP2010191530A (en) Nationality decision device and method, and program
CN112711331A (en) Robot interaction method and device, storage equipment and electronic equipment
Miyake et al. A spoken dialogue system using virtual conversational agent with augmented reality
US20210166685A1 (en) Speech processing apparatus and speech processing method
WO2021114682A1 (en) Session task generation method and apparatus, computer device, and storage medium
JP2017182261A (en) Information processing apparatus, information processing method, and program
JP2020067562A (en) Device, program and method for determining action taking timing based on video of user's face
CN113379879A (en) Interaction method, device, equipment, storage medium and computer program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19900195

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19900195

Country of ref document: EP

Kind code of ref document: A1