CN109648573A

CN109648573A - A kind of robot conversation switching method, device and calculate equipment

Info

Publication number: CN109648573A
Application number: CN201811562114.XA
Authority: CN
Inventors: 徐文浩; 马世奎; 孙文豹
Original assignee: As Science And Technology (beijing) Co Ltd
Current assignee: As Science And Technology (beijing) Co Ltd; Cloudminds Beijing Technologies Co Ltd
Priority date: 2018-12-20
Filing date: 2018-12-20
Publication date: 2019-04-19
Anticipated expiration: 2038-12-20
Also published as: CN109648573B; WO2020125252A1

Abstract

The present invention relates to intelligent robot technology fields, in particular disclose a kind of method, apparatus of machine conference, calculate equipment and computer storage medium, wherein method includes: the ambient image that acquisition is located in front of the robot；Candidate session people is determined from the ambient image；Judge whether the switching condition for meeting switching current sessions people；If satisfied, the then selection target session people from the candidate session people；Selected target session people is determined as current sessions people.It can be seen that robot, which may be implemented, using the present invention program actively switches session people.

Description

A kind of robot conversation switching method, device and calculate equipment

Technical field

The present embodiments relate to intelligent robot technology field, more particularly to a kind of robot conversation switching method, Device and calculating equipment.

Background technique

With the development of internet and depth learning technology, robot technology is had made great progress, by original independence Machine individual human, cloud robot till now.Cloud robot refers to that robot body is only responsible for the acquisition of data, number Data preprocess and data transmission, complicated calculating and judge work by data be transmitted to cloud processor execution.

Machine conference refers to the audio-frequency information of acquisition user, and analyzes the meaning of one's words of the audio-frequency information of user, and root Information of answering is returned to user according to the meaning of one's words, realizes that user and robot talk with.

In the implementation of the present invention, discovery: current robot conversational mode cannot actively be cut the present inventor Session people is changed, meeting words are actively opened.

Summary of the invention

In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind It states a kind of method, apparatus of machine conference switching of problem and calculates equipment.

In order to solve the above technical problems, a technical solution used in the embodiment of the present invention is: providing a kind of robot meeting The method for talking about switching, comprising: acquisition is located at the ambient image in front of the robot；Candidate's meeting is determined from the ambient image Talk about people；Judge whether the switching condition for meeting switching current sessions people；If satisfied, the then selection target from the candidate session people Session people；Selected target session people is determined as current sessions people.

Optionally, described to judge whether that the switching condition for meeting switching current sessions people includes: to judge whether there is currently Session people；If not, it is determined that meet the switching condition of switching current sessions people；If so, whether judging the current sessions people In session status；If being in session status, it is determined that be unsatisfactory for the switching condition of switching current sessions people；If being not at session State, it is determined that meet the switching condition of switching current sessions people.

It is optionally, described that judge whether the current sessions people is in session status include: to judge the current sessions people Whether it is included in the candidate session people；If comprising, it is determined that the current sessions people is in session status；If not including, Then judge whether there is the end order for the end session that the current sessions people returns；If it exists, it is determined that the current meeting Words people is not in speech phase；If it does not exist, judge whether the current sessions people is not included in what nearest continuous acquisition arrived In the corresponding candidate session people of ambient image, wherein the ambient image continuous recently be it is previously collected and with it is described There are the images of the preset quantity of serial relation for ambient image；If the current sessions are not included in nearest continuous acquisition per capita and arrive The corresponding candidate session people of ambient image in, it is determined that the current sessions people is not in speech phase；If the current meeting Talk about people be included in any one nearest continuous acquisition to the corresponding candidate session people of ambient image in, determine the current sessions People is in session status.

Optionally, the selection target session people from the candidate session people includes: to extract from the ambient image The session parameter of each candidate session people, wherein the session parameter includes the lip reading extracted from the ambient image, face Size and location parameter；According to the session parameter of each candidate session people, the session of each candidate session people is calculated separately Score；Using the candidate session people of the session highest scoring as target session people.

Optionally, the method also includes: extract the facial image of the current sessions people；Identification is in presupposed information library The user to match with the presence or absence of the facial image；If it exists, then it is corresponding that the user is extracted from the presupposed information library Background information；The facial image and background information are pushed into manual position auxiliary terminal.

It is used in the embodiment of the present invention another solution is that providing a kind of machine conference switching device, comprising: acquisition Module: for acquiring the ambient image being located in front of the robot；First determining module: for true from the ambient image Fixed candidate session people；Judgment module: the switching condition for judging whether to meet switching current sessions people；Selecting module: it is used for When meeting the switching condition of switching current sessions people, the selection target session people from the candidate session people；Second determining module: For selected target session people to be determined as current sessions people.

Optionally, the judgment module includes: the first judging unit: for judging whether there is current sessions people；First Determination unit: for determining the switching condition for meeting switching current sessions people when current sessions people is not present；Second judgement is single Member: for when there are current sessions people, judging whether the current sessions people is in session status；Second determination unit: it uses In when the current sessions people is in session status, the switching condition for being unsatisfactory for switching current sessions people is determined；Third determines Unit: for determining the switching condition for meeting switching current sessions people when the current sessions people is not in speech phase.

Optionally, the second judgment unit is used for when there are current sessions people, whether judges the current sessions people In session status, comprising: judge whether the current sessions people is included in the candidate session people；If comprising, it is determined that The current sessions people is in session status；If not including, the end meeting that the current sessions people returns is judged whether there is The end order of words；If it exists, it is determined that the current sessions people is not in speech phase.If it does not exist, judge described current Session people whether be not included in nearest continuous acquisition to the corresponding candidate session people of ambient image in, wherein it is described recently Continuous ambient image is previously collected and there are the images of the preset quantity of serial relation with the ambient image；If institute State current sessions be not included in per capita nearest continuous acquisition to the corresponding candidate session people of ambient image in, it is determined that it is described to work as Preceding session people is not in speech phase；If the current sessions people is included in the ambient image that any one nearest continuous acquisition arrives In corresponding candidate session people, determine that the current sessions people is in session status.

Optionally, the selecting module includes: extraction unit: for extracting each candidate session people from the ambient image Session parameter, wherein the session parameter include the lip reading extracted from the ambient image, facial size and position ginseng Number；Computing unit: for the session parameter according to each candidate session people, the session of each candidate session people is calculated separately Score；Selecting unit: for using the candidate session people of the session highest scoring as target session people.

Optionally, described device further include: the first extraction module: for extracting the facial image of the current sessions people； Identification module: the user to match in presupposed information library with the presence or absence of the facial image for identification；Second extraction module: Described in being extracted from the presupposed information library as the user to match in presupposed information library there are the facial image The corresponding background information of user；Pushing module: for the facial image and background information to be pushed to manual position auxiliary eventually End.

Yet another aspect used in the embodiment of the present invention is: providing a kind of calculating equipment, comprising: processor, memory, Communication interface and communication bus, the processor, the memory and the communication interface complete phase by the communication bus Communication between mutually；The memory executes the processor for storing an at least executable instruction, the executable instruction A kind of method corresponding operation of the machine conference switching.

Another technical solution used in the embodiment of the present invention is: providing a kind of computer storage medium, the storage medium In be stored with an at least executable instruction, the executable instruction makes processor execute a kind of machine conference switching The corresponding operation of method.

The beneficial effect of the embodiment of the present invention is: being in contrast to the prior art, the embodiment of the present invention passes through harvester Ambient image in front of device people, it is determined whether switching current sessions people, and can from candidate session people selection target session people； It can be seen that robot may be implemented and actively switch session people using the embodiment of the present invention.

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, it is special below to lift specific embodiments of the present invention.

Detailed description of the invention

By reading hereafter detailed description of preferred embodiment, various other advantages and benefits skill common for this field Art personnel will become clear.Attached drawing is only used for showing the purpose of preferred embodiment, and is not considered as to limit of the invention System.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Fig. 1 is a kind of method flow diagram of machine conference switching of the embodiment of the present invention；

Fig. 2A is the switching item of switching current sessions people in a kind of method of machine conference switching of the embodiment of the present invention Part decision flow chart；

Fig. 2 B is to judge whether current sessions people is in the flow chart of session status in the embodiment of the present invention；

Fig. 2 C be in the embodiment of the present invention from the candidate session people selection target session people's flow chart；

Fig. 3 is a kind of flow chart of another embodiment of method of machine conference switching of the present invention；

Fig. 4 is a kind of functional block diagram of machine conference switching device of the present invention；

Fig. 5 is a kind of schematic diagram for calculating equipment of the present invention.

Specific embodiment

Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.

Fig. 1 shows a kind of flow chart of the embodiment of the method for machine conference switching of the present invention.As shown in Figure 1, the party Method the following steps are included:

Step S1: acquisition is located at the ambient image in front of the robot.

In this step, when user and robot engage in the dialogue, it will usually it stands in the front of robot, and the robot The ambient image in front refers to the image in front of robot, therefore, when having user and robot talks with, then can collect this The facial image of user.

Step S2: candidate session people is determined from the ambient image.

In this step, when there are multiple faces in the ambient image in front of the robot, apart from the robot The farther away face in position can obscure or facial size is smaller, and position user farther out is not usually to carry out pair with robot The user of words, may be by passerby, alternatively, the nigh masses etc. that stand, therefore, in the present embodiment, from Before determining candidate session people in the ambient image, ambiguous face and facial size can also be less than to the people of threshold value Face reject, it is described candidate session people be rejected in the ambient image smudgy or facial size less than threshold value face it Remaining face afterwards.

Step S3: judging whether the switching condition for meeting switching current sessions people, if satisfied, step S4 is executed, if discontented Foot executes step S5.

Step S4: the selection target session people from the candidate session people.

Step S5: continue the session with the current sessions people.

In this step, when being unsatisfactory for the switching condition of switching current sessions people, then there are the current sessions for explanation People with machine conference, continues the session with the current sessions people.

Step S6: selected target session people is determined as current sessions people.

In this step, when meeting the switching condition of switching current sessions people, the current sessions people is switched to institute The target session people of selection, and robot is actively and the target session people opens session, it is described actively to open session and include Robot obtains the facial image of the target session people, is shown on session screen, and it is corresponding to obtain the facial image Characters name actively goes out voice to the target session human hair, and e.g., target session people's name is Zhang San, and robot is will be current After session people is switched to Zhang San, the facial image of Zhang San is shown on session screen, and issues that " Zhang San, you are good, what may I ask Can help you " voice prompting, to open session.It is understood that the facial image shown on the session screen It can be the image being stored in advance in presupposed information library, be also possible to the camera of robot collected target session in real time The facial image of people.

Fig. 2A shows the flow chart for switching the switching condition judgement of current sessions people in the embodiment of the present invention, such as Fig. 2A institute Show, it is described judge whether meet switching current sessions people switching condition the following steps are included:

Step S31: judging whether there is current sessions people, if it is not, step S32 is executed, if so, executing step S33.

Current sessions people is the object of robot records currently talked with, and current sessions people is stored in robot Existing state, e.g., when there are current sessions people, the state recording that current sessions people will be present is 1, when there is no current meetings When talking about people, the state recording that current sessions people will be present is 0.

Step S32: the switching condition for meeting switching current sessions people is determined.

Step S33: judging whether the current sessions people is in session status, if so, step S34 is executed, if it is not, executing Step S35.

Step S34: the switching condition for being unsatisfactory for switching current sessions people is determined.

Step S35: the switching condition for meeting switching current sessions people is determined.

Fig. 2 B shows the flow chart for judging whether current sessions people is in session status in the embodiment of the present invention, such as Fig. 2 B It is shown, it is described judge the current sessions people whether be in session status the following steps are included:

Step S331: judging whether the current sessions people is included in the candidate session people, if so, executing step S332, if it is not, executing step S333.

In this step, the facial image in the facial image of the current sessions people and the ambient image is done pair Than if comparing successfully, then it is assumed that the current sessions people is included in the candidate session people.

Step S332: determine that the current sessions people is in session status.

In this step, when the facial image comparison in the facial image of the current sessions people and the ambient image at When function, it is believed that the current sessions people with machine conference.

Step S333: the end order for the end session that the current sessions people returns is judged whether there is, if so, executing Step S334, if it is not, executing step S335.

In this step, when the current sessions people terminates the session with robot, Xiang Suoshu robot returns to session Terminate order, the CSE command session end is the voice command initiated by the current sessions people, e.g., " goodbye ", " next time sees ".

In some embodiments, robot is provided with end session button, when current sessions people completion and robot Session, it is desirable to which when terminating session, clicking terminates session button, and end current sessions can be realized.

Step S334: determine that the current sessions people is not in speech phase.

Step S335: it is corresponding to judge whether the current sessions people is not included in the ambient image that nearest continuous acquisition arrives Candidate session people in, if so, execute step S336, if it is not, execute step S337.

In this step, when the current sessions people is not comprised in the candidate session people, and robot is not received When to CSE command session end, the current sessions people is likely to be in session status, for example, current sessions people bows or later, It causes not collect current sessions everybody faces and judges current collected ambient image to reduce robot error in judgement The corresponding candidate session people of upper N frame image in whether include the current sessions people, the N be it is preset be greater than 0 constant, Such as, N is set as 5, then judges whether the current sessions people is not included in the preceding 5 frame ambient image pair that nearest continuous acquisition arrives In the candidate session people answered.When continuous N frame image does not all collect the face of current sessions people, it may be considered that session before sending out People has been moved off, and current sessions people is not on session status.

Step S336: determine that the current sessions people is not in speech phase.

Step S337: determine that the current sessions people is in session status.

In some embodiments, when including the current sessions people in the collected ambient image recently, by institute The face in the face substitution face information library of corresponding current sessions people in nearest collected environment is stated, facilitates conduct oneself next time Face compares when comparing.

It should be noted that the environment before camera can change with the movement of people, rotary head or expression may occur for people Variation, it is contemplated that the frequency and human hair of camera acquisition ambient image may continue for some time when vividly making to change, when When acquiring next frame ambient image, the ambient image pair of the movement of people or expression there is a strong possibility property also rests on present frame acquisition The movement or expression answered, so, the face and next frame of corresponding current sessions people will in nearest collected ambient image Therefore the face similitude highest of corresponding current sessions people in collected ambient image collected recently described will be worked as The facial image of preceding session people substitutes the facial image in the face information library, so that the progress that robot is more convenient The comparison of facial image.

Fig. 2 C shows in the embodiment of the present invention selection target session people's flow chart from the candidate session people, such as Fig. 2 C It is shown, the selection target session people from the candidate session people the following steps are included:

Step S41: the session parameter of each candidate session people is extracted from the ambient image.

In this step, the session parameter includes the lip reading extracted from the ambient image, facial size and position Set parameter, wherein the lip reading is used to indicate whether each candidate session people is speaking, when calculating lip reading parameter, one In a little embodiments, the numerical value for the corresponding lip reading parameter of candidate session people spoken can be denoted as 1, the candidate session that do not speak The numerical value of the corresponding lip reading parameter of people is denoted as 0.

The facial size be used to indicate each candidate session people at a distance from robot, in some embodiments, When calculating facial size parameter, by the corresponding pixel region of face in the ambient image divided by the pixel of the ambient image Region obtains ratio of the face in entire ambient image in the ambient image, joins the ratio as facial size Number.

The location parameter is used to indicate each candidate session people at a distance from robot center, in some embodiments In, in calculating position when parameter, first determine whether that the candidate session people is located at the robot center left side or the machine On the right of device people center line, if the candidate session people is located at the robot center left side, by the ambient image left side Edge is as starting point, if the candidate session people is located on the right of the robot center, by the ambient image right hand edge As starting point, using the distance of the starting point to the robot center as denominator, by candidate session people distance starting The distance of point obtains the candidate session people location parameter in the ambient image as molecule.

Step S42: according to the session parameter of each candidate session people, the session of each candidate session people is calculated separately Score.

For calculating the calculation formula of session score are as follows: lip reading weight * lip reading parameter+facial size according to session parameter Weight * facial size parameter+position weight * location parameter.

And different session parameters react the accuracy whether candidate session people are in session status be it is different, therefore, When calculating the session score of candidate session people, different power can also be preset with to the session parameter of the candidate session people Weight obtains the session score of different candidate session people according to the weight by the session parameter weighted calculation.Such as: into When guild is talked about, in the session parameter of the candidate session people, lip reading is more able to reflect whether the candidate session people is in session State, therefore, when carrying out weight design, lip reading weight highest shared in session parameter, such as:, the setting of lip reading weight It is 0.7, the weight of facial size and location parameter is respectively set to 0.2 and 0.1, and one of candidate session people is speaking, Its facial size parameter is 20%, location parameter 2/3, then candidate's session people's score are as follows: 0.7*1+0.2*20%+0.1*2/ 3≈0.8。

Step S43: using the candidate session people of the session highest scoring as target session people.

In embodiments of the present invention, by judging whether current sessions people meets the switching condition of session people, decide whether Switch current sessions people, and by setting session parameter selection target session people from candidate session people, to meet session When the switching condition of people, current sessions people is switched to target session people, realizes that robot actively switches current sessions people.

Fig. 3 shows a kind of flow chart of another embodiment of method of machine conference switching of the present invention.Implement with upper one Example is compared, and the embodiment of the invention also includes following steps:

Step S7: the facial image of the current sessions people is extracted.

Step S8: the user that identification matches in presupposed information library with the presence or absence of the facial image, and if it exists, execute Step S9 thens follow the steps S11 if it does not exist.

In this step, by the facial image progress in the facial image of the current sessions people and presupposed information library Match, the presupposed information library has been pre-stored a large amount of user's faces using robot and its corresponding background information, the user Face is one-to-one with the corresponding background information.

Step S9: the corresponding background information of the user is extracted from the presupposed information library.

Background information refers to the personal information of user, such as: name, occupation, position etc..

Step S10: the facial image and background information are pushed into manual position auxiliary terminal.

Manual position auxiliary terminal is the terminal device of staff's auxiliary robot.Manual position auxiliary terminal is receiving It, can be with displays background information and facial image, to facilitate work after the facial image and background information that are sent to the robot Make personnel and understand current sessions people, and when the robot being unable to complete the current sessions people the problem of, staff It can accurately auxiliary robot answer.

Step S11: the facial image image is pushed into manual position auxiliary terminal.

In this step, when robot was unable to complete with the problem of current sessions people, staff can be with auxiliary machinery People answers.

In embodiments of the present invention, by manual position assist terminal, realize human assistance session, solve robot without When method solves the problems, such as current sessions people, human assistance is solved, and improves the efficiency of robot work.

Fig. 4 shows a kind of functional block diagram of machine conference switching device of the present invention, as shown in figure 4, described device packet It includes: acquisition module 401, the first determining module 402, judgment module 403, selecting module 404 and the second determining module 405, wherein Acquisition module 401, for acquiring the ambient image being located in front of the robot；First determining module 402 is used for from the ring Candidate session people is determined in the image of border；Judgment module 403, the switching condition for judging whether to meet switching current sessions people； Selecting module 404, when for meeting the switching condition of switching current sessions people, the selection target session from the candidate session people People；Second determining module 405, for selected target session people to be determined as current sessions people.

Wherein, judgment module 403 includes: the first judging unit 4031, the first determination unit 4032, second judgment unit 4033, the second determination unit 4034 and third determination unit 4035, wherein the first judging unit 4031 is deposited for judging whether In current sessions people；First determination unit 4032, for determining and meeting switching current sessions people when current sessions people is not present Switching condition；Second judgment unit 4033, for when there are current sessions people, judging whether the current sessions people is in Session status；Second determination unit 4034, for when the current sessions people is in session status, determining that being unsatisfactory for switching works as The switching condition of preceding session people；Third determination unit 4035, for determining when the current sessions people is not in speech phase Meet the switching condition of switching current sessions people.

Wherein, the second judgment unit 4033 is used for when there are current sessions people, judges that the current sessions people is It is no to be in session status, comprising: to judge whether the current sessions people is included in the candidate session people；If comprising really The fixed current sessions people is in session status；If not including, the end that the current sessions people returns is judged whether there is The end order of session；If it exists, it is determined that the current sessions people is not in speech phase；If it does not exist, work as described in judgement Preceding session people whether be not included in nearest continuous acquisition to the corresponding candidate session people of ambient image in, wherein it is described most Nearly continuous ambient image is previously collected and there are the images of the preset quantity of serial relation with the ambient image；If The current sessions be not included in per capita nearest continuous acquisition to the corresponding candidate session people of ambient image in, it is determined that it is described Current sessions people is not in speech phase；If the current sessions people is included in the environment map that any one nearest continuous acquisition arrives As determining that the current sessions people is in session status in corresponding candidate session people.

Wherein, the selecting module 404 includes: extraction unit 4041, computing unit 4042 and selecting unit 4043, In, extraction unit 4041, for extracting the session parameter of each candidate session people from the ambient image, wherein the session Parameter includes the lip reading extracted from the ambient image, facial size and location parameter；Computing unit 4042 is used for basis The session parameter of each candidate session people calculates separately the session score of each candidate session people；Selecting unit 4043 is used In using the candidate session people of the session highest scoring as target session people.

In embodiments of the present invention, described device further include: the first extraction module 406, identification module 407, second extract Module 408 and pushing module 409, wherein the first extraction module 406, for extracting the facial image of the current sessions people； Identification module 407, the user to match in presupposed information library with the presence or absence of the facial image for identification；Second extracts mould Block 408, for being mentioned from the presupposed information library as the user to match in presupposed information library there are the facial image Take the corresponding background information of the user；Pushing module 409, for the facial image and background information to be pushed to artificial seat Seat assists terminal.

The embodiment of the present invention judges whether current sessions people meets the switching condition of session people by judgment module, and passes through Selecting module selection target session people from candidate session people, so as to when meeting the switching condition of session people, by current sessions People is switched to target session people；In addition, current sessions people is corresponded to the facial image in presupposed information library by pushing module It is pushed to manual position auxiliary terminal with background information, realizes human assistance session；Through the embodiment of the present invention, machine may be implemented Device people actively switches current sessions people, and solves machine conference by human assistance, improves the efficiency of robot work.

The embodiment of the present application provides a kind of nonvolatile computer storage media, and the computer storage medium is stored with One of above-mentioned any means embodiment machine conference can be performed in an at least executable instruction, the computer executable instructions The method of switching.

Fig. 5 is the structural schematic diagram that the present invention calculates apparatus embodiments, and the specific embodiment of the invention is not to calculating equipment Specific implementation limit.

As shown in figure 5, the calculating equipment may include: processor (processor) 502, communication interface (Communications Interface) 504, memory (memory) 506 and communication bus 508.

Wherein:

Processor 502, communication interface 504 and memory 506 complete mutual communication by communication bus 508.

Communication interface 504, for being communicated with the network element of other equipment such as client or other servers etc..

Processor 502, for executing program 510, the method that can specifically execute a kind of above-mentioned machine conference switching is real Apply the correlation step in example.

Specifically, program 510 may include program code, which includes computer operation instruction.

Processor 502 may be central processor CPU or specific integrated circuit ASIC (Application Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention Road.The one or more processors that equipment includes are calculated, can be same type of processor, such as one or more CPU；It can also To be different types of processor, such as one or more CPU and one or more ASIC.

Memory 506, for storing program 510.Memory 506 may include high speed RAM memory, it is also possible to further include Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.

Program 510 specifically can be used for so that processor 502 executes following operation: acquisition is located in front of the robot Ambient image；Candidate session people is determined from the ambient image；Judge whether the switching condition for meeting switching current sessions people； If satisfied, the then selection target session people from the candidate session people；Selected target session people is determined as current sessions People.

In a kind of optional mode, program 510 can specifically be further used for so that processor 502 executes following behaviour Make: judging whether there is current sessions people；If not, it is determined that meet the switching condition of switching current sessions people；If so, judgement Whether the current sessions people is in session status；If being in session status, it is determined that be unsatisfactory for cutting for switching current sessions people Change condition；If being not in speech phase, it is determined that meet the switching condition of switching current sessions people.

In a kind of optional mode, program 510 can specifically be further used for so that processor 502 executes following behaviour Make: judging whether the current sessions people is included in the candidate session people；If comprising, it is determined that at the current sessions people In session status；If not including, the end order for the end session that the current sessions people returns is judged whether there is；If depositing , it is determined that the current sessions people is not in speech phase.If it does not exist, judge whether the current sessions people does not include Nearest continuous acquisition to the corresponding candidate session people of ambient image in, wherein the ambient image continuous recently is previous It is collected and there are the images of the preset quantity of serial relation with the ambient image；If the current sessions do not wrap per capita Be contained in nearest continuous acquisition to the corresponding candidate session people of ambient image in, it is determined that the current sessions people is not at session State；If the current sessions people is included in the corresponding candidate session people of ambient image that any one nearest continuous acquisition arrives In, determine that the current sessions people is in session status.

In a kind of optional mode, program 510 can specifically be further used for so that processor 502 executes following behaviour Make: extracting the session parameter of each candidate session people from the ambient image, wherein the session parameter includes from the environment Lip reading, facial size and the location parameter extracted in image；According to the session parameter of each candidate session people, calculate separately The session score of each candidate session people；Using the candidate session people of the session highest scoring as target session people.

In a kind of optional mode, program 510 can specifically be further used for so that processor 502 executes following behaviour Make: extracting the facial image of the current sessions people；Identification matches in presupposed information library with the presence or absence of the facial image User；If it exists, then the corresponding background information of the user is extracted from the presupposed information library；By the facial image and Background information pushes to manual position auxiliary terminal.

Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein. Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright most preferred embodiment.

In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.

Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, such as right As claim reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows tool Thus claims of body embodiment are expressly incorporated in the specific embodiment, wherein each claim itself is used as this hair Bright separate embodiments.

Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.

Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) realize a kind of machine conference switching device according to an embodiment of the present invention In some or all components some or all functions.The present invention is also implemented as described herein for executing Some or all device or device programs (for example, computer program and computer program product) of method.In this way Realization program of the invention can store on a computer-readable medium, or can have the shape of one or more signal Formula.Such signal can be downloaded from an internet website to obtain, and perhaps be provided on the carrier signal or with any other shape Formula provides.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims

1. a kind of method of machine conference switching characterized by comprising

Acquisition is located at the ambient image in front of the robot；

Candidate session people is determined from the ambient image；

Judge whether the switching condition for meeting switching current sessions people；

If satisfied, the then selection target session people from the candidate session people；

Selected target session people is determined as current sessions people.

2. the method according to claim 1, wherein

It is described to judge whether that the switching condition for meeting switching current sessions people includes:

Judge whether there is current sessions people；

If not, it is determined that meet the switching condition of switching current sessions people；

If so, judging whether the current sessions people is in session status；

If being in session status, it is determined that be unsatisfactory for the switching condition of switching current sessions people；

If being not in speech phase, it is determined that meet the switching condition of switching current sessions people.

3. according to the method described in claim 2, it is characterized in that, described judge whether the current sessions people is in session shape State includes:

Judge whether the current sessions people is included in the candidate session people；

If comprising, it is determined that the current sessions people is in session status；

If not including, the end order for the end session that the current sessions people returns is judged whether there is；

If it exists, it is determined that the current sessions people is not in speech phase；

If it does not exist, judge whether the current sessions people is not included in the corresponding time of ambient image that nearest continuous acquisition arrives It selects in session people, wherein the ambient image continuous recently is previously collected and exists with the ambient image continuous The image of the preset quantity of relationship；

If the current sessions be not included in per capita nearest continuous acquisition to the corresponding candidate session people of ambient image in, really The fixed current sessions people is not in speech phase；

If the current sessions people be included in any one nearest continuous acquisition to the corresponding candidate session people of ambient image in, Determine that the current sessions people is in session status.

4. the method according to claim 1, wherein

The selection target session people from the candidate session people includes:

The session parameter of each candidate session people is extracted from the ambient image, wherein the session parameter includes from the ring Lip reading, facial size and the location parameter extracted in the image of border；

According to the session parameter of each candidate session people, the session score of each candidate session people is calculated separately；

Using the candidate session people of the session highest scoring as target session people.

5. method described in any one of -4 according to claim 1, which is characterized in that the method also includes:

Extract the facial image of the current sessions people；

Identify the user to match in presupposed information library with the presence or absence of the facial image；

If it exists, then the corresponding background information of the user is extracted from the presupposed information library；

The facial image and background information are pushed into manual position auxiliary terminal.

6. a kind of machine conference switching device characterized by comprising

Acquisition module: for acquiring the ambient image being located in front of the robot；

First determining module: for determining candidate session people from the ambient image；

Judgment module: the switching condition for judging whether to meet switching current sessions people；

Selecting module: when for meeting the switching condition of switching current sessions people, the selection target meeting from the candidate session people Talk about people；

Second determining module: for selected target session people to be determined as current sessions people.

7. device according to claim 6, which is characterized in that the judgment module includes:

First judging unit: for judging whether there is current sessions people；

First determination unit: for determining the switching condition for meeting switching current sessions people when current sessions people is not present；

Second judgment unit: for when there are current sessions people, judging whether the current sessions people is in session status；

Second determination unit: for determining and being unsatisfactory for switching current sessions people when the current sessions people is in session status Switching condition；

Third determination unit: for determining and meeting switching current sessions people when the current sessions people is not in speech phase Switching condition.

8. device according to claim 7, which is characterized in that the second judgment unit is used for when there are current sessions people When, judge whether the current sessions people is in session status, comprising:

9. device according to claim 6, which is characterized in that the selecting module includes:

Extraction unit: for extracting the session parameter of each candidate session people from the ambient image, wherein the session parameter Including lip reading, facial size and the location parameter extracted from the ambient image；

Computing unit: for the session parameter according to each candidate session people, the meeting of each candidate session people is calculated separately Talk about score；

Selecting unit: for using the candidate session people of the session highest scoring as target session people.

10. device according to claim 6, which is characterized in that described device further include:

First extraction module: for extracting the facial image of the current sessions people；

Identification module: the user to match in presupposed information library with the presence or absence of the facial image for identification；

Second extraction module: for as the user to match in presupposed information library there are the facial image, from described pre- If extracting the corresponding background information of the user in information bank；

Pushing module: for the facial image and background information to be pushed to manual position auxiliary terminal.

11. a kind of calculating equipment, comprising: processor, memory, communication interface and communication bus, the processor, the storage Device and the communication interface complete mutual communication by the communication bus；The memory can be held for storing at least one Row instruction, the executable instruction make the processor execute a kind of robot meeting according to any one of claims 1 to 5 Talk about the corresponding operation of method of switching.

12. a kind of computer storage medium, an at least executable instruction, the executable instruction are stored in the storage medium Processor is set to execute a kind of corresponding operation of method of machine conference switching according to any one of claims 1 to 5.