CN112511757B

CN112511757B - Video conference implementation method and system based on mobile robot

Info

Publication number: CN112511757B
Application number: CN202110157473.2A
Authority: CN
Inventors: 焦显伟; 孟夏冰
Original assignee: Beijing Telecom Easiness Information Technology Co Ltd
Current assignee: Beijing Telecom Easiness Information Technology Co Ltd
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-05-04
Anticipated expiration: 2041-02-05
Also published as: CN112511757A

Abstract

The invention provides a video conference realization method and a system based on a mobile robot, wherein the method comprises the following steps: calibrating the positions of a microphone and a camera, calibrating the positions and postures of an annular lens and a laser radar, building a picture by using laser slam, obtaining a point cloud distribution map of a conference room, and identifying a main area of the conference room; determining the position of a speaker and generating a central axis of the speaker; selecting different processing logics according to the position of the speaker; planning a path track in the point cloud map and moving to a target position; adjusting and shooting a holder according to the position, the posture and the speaker position of the mobile robot; the data is transmitted to the intelligent conference management. According to the invention, the position of the mobile robot is automatically adjusted according to the position of the speaker in the participants, and the camera on the robot can automatically move to a proper position to collect the information of the front visual angle of the speaker when the fixed camera cannot independently acquire the front visual angle of the complete speaker, so that the efficiency of the video conference is improved.

Description

Video conference implementation method and system based on mobile robot

Technical Field

The invention relates to the technical field of mobile robots, in particular to a video conference realization method and system based on a mobile robot.

Background

In the existing multi-person video conference implementation scene, the switching of the front view angle of a speaker and the close-up shot of the speaker is a function required by the multi-person video conference, and can bring real meeting place atmosphere feeling to other participants.

In current scheme, adopt many cameras to shoot speaker close-up shot usually, the camera generally needs the manual work to control, extravagant manpower, and is inefficient, and when speaker position, when not shooting the scope in the ideal of camera, can't reach the effect that close-up shot was gathered moreover.

Disclosure of Invention

In view of this, the present invention provides a video conference implementation method based on a mobile robot, which can automatically adjust the position of the robot according to the position of a speaker among participants, and the robot has a camera, and can automatically move to a proper position to collect information of the front view angle of the speaker when the fixed camera cannot independently acquire the front view angle of the speaker.

The invention provides a video conference implementation method based on a mobile robot, which comprises the following steps:

s1, installing each component in the intelligent conference management system under the same coordinate system, calibrating the positions of a microphone and a camera of the intelligent conference management system, calibrating the positions and postures of an annular lens and a laser radar, and performing laser slam mapping work by adopting a laser point cloud segmentation algorithm to obtain a point cloud distribution map of a conference room;

preferably, obtaining a point cloud distribution map of the conference room is completed by using A3_1 point cloud data;

the laser point cloud segmentation algorithm comprises the following steps:

a1: placing the robot with the annular lens and the laser radar installed in an open room;

a2: the laser radar starts working to obtain all point cloud data, and clustering is carried out according to the characteristics of the point cloud data into three types of farthest, common and nearest;

a3: due to the characteristics of the hardware structure, the laser beam has three distribution modes, and the following processing results are respectively obtained:

a3_ 1: irradiating a laser beam onto a transparent lens, wherein laser directly penetrates through the lens, and a laser point at the moment is irradiated onto a wall surface and marks angle information of the laser beam corresponding to point cloud data with the farthest distance measurement distance;

a3_ 2: when a laser beam irradiates a reflector, the laser is directly reflected to the ground, a laser point at the moment irradiates the ground, and the angle information of the laser beam is marked corresponding to point cloud data with common ranging distance;

a3_ 3: the laser beam does not carry out A3_1 and A3_2, the distance between the laser radar and the annular lens is directly measured, the point cloud is closest, the numerical value is fixed, and the point cloud is directly discarded;

s2, identifying a main area of the conference room in the point cloud map;

preferably, the main areas of the conference room such as desks, projector screens, etc.;

s3, determining the position of the speaker according to the speech of the speaker, and generating a central axis of the speaker;

the central axis is a directed line segment which takes the speaker position as a starting point and takes the central area of the desk as an end point;

the position analysis method for determining the speaker comprises the following steps:

xa, Ya, Za, Xb, Yb, Zb, Xc, Yc, Zc, Xd, Yd, and Zd are coordinates of the microphone abcd, Tba is the time difference of sound reaching the microphones b and a, Tca is the time difference of sound reaching the microphones c and a, Tda is the time difference of sound reaching the microphones d and a, V is the propagation speed of sound, and x and y are final resolving results, namely the position of a person;

s4, selecting different processing logics according to the position of the speaker;

s5, planning a path track in a point cloud map according to the self position and the best shooting position of the mobile robot, and moving to a target position;

in the process, the obstacle avoidance and collision avoidance sub-module ensures the safety of equipment and personnel, and uses point cloud data in A3_ 2;

s6, after the target position is reached, adjusting a holder according to the position, the posture and the speaker position of the mobile robot, and starting shooting;

preferably, in the moving process, the pan-tilt can be adjusted according to the position, the posture and the speaker position of the robot to shoot;

and S7, transmitting the data to the intelligent conference management system, and ending the process.

Further, the step of S4 includes:

4 a: if the camera of the intelligent conference management can directly acquire the front visual angle of the position of the speaker, namely the close-up shot, the image data of the camera is directly selected, and the process is ended;

4 b: if the camera of the intelligent conference management does not have a proper camera for collecting the front visual angle of the position of the speaker, selecting the camera on the mobile robot, establishing connection with the mobile robot through second communication by the intelligent conference management, and sending an instruction, wherein the instruction content is the position of the speaker and the best shooting position;

preferably, the optimal photographing angle is a position opposite to the speaker and symmetrical to the desk as an axis of symmetry.

Further, in the step S3, the speaker position may be equivalently replaced by a conference focus, conference staff information is acquired through camera acquisition and image acquisition processing, the face orientation of the current participant is analyzed and judged, and a focus is extended and fitted according to the face orientations of a plurality of participants, and the focus is used as the conference focus.

Further, when the conference focus is outside the conference room, the mobile robot shoots the PPT position; when the meeting focus is among the participants, the mobile robot shoots the position;

when the conference focus is outside the conference room, preferably, the mobile robot shoots the PPT position by considering that the participants are watching PPT (or similar behaviors); when the conference focus is in the middle of the participant, it is preferable that the mobile robot photographs the position considering that the participant is watching an article spoken or displayed by the speaker.

The invention also provides an implementation system of the video conference implementation method, which comprises the following steps:

a calibration module: the intelligent conference management system is used for installing all components contained in the intelligent conference management system under the same coordinate system and calibrating the positions of the microphone and the camera; calibrating the positions and postures of the annular lens and the laser radar; performing laser slam mapping work by adopting a laser point cloud segmentation algorithm to obtain a point cloud distribution map of a conference room; the system comprises a point cloud map, a database and a database, wherein the point cloud map is used for identifying a main area of a meeting room;

a path planning and moving module: the system comprises a point cloud map, a target position and a position acquisition unit, wherein the point cloud map is used for planning a path track according to the self position and the best shooting position of the robot and moving to the target position;

a robot module: after the target position is reached, the cradle head is adjusted according to the position, the posture and the position of the speaker, and shooting is started;

a voiceprint positioning module: determining the position of a speaker according to the speech of the speaker, and generating a central axis of the speaker;

a conference management module: selecting different processing logics according to the position of the speaker;

a data transmission module: for transmitting data to the intelligent conference management system.

Further, the calibration module comprises a sensor and a positioning resolving unit;

wherein the sensor comprises a laser radar; the laser radar can also be used for an obstacle avoidance and collision avoidance submodule;

and the positioning calculation unit selects different algorithms according to the selected sensor to calculate the final position.

Furthermore, the voiceprint positioning module comprises a synchronizer, a plurality of microphones and an operation unit, and acquires the position of the speaker by acquiring the arrival time of the voice of the speaker;

the synchronizer generates a synchronization signal for unifying the time references of the microphones;

the microphones are used for collecting sound signals and acquiring the arrival time of the signals;

and the computing unit is used for computing the position of the speaker according to the arrival time of the sound signal collected by the microphone.

Furthermore, the robot module further comprises an obstacle avoidance and collision avoidance sub-module, a video acquisition and tracking sub-module and a first communication sub-module;

the video acquisition and tracking submodule comprises a camera and a holder; the camera is used for collecting image information, and the holder is used for rotating the camera and tracking a conference focus;

the first communication sub-module is a wireless communication device and is used for communicating with the second communication sub-module and transmitting data and commands;

the obstacle avoidance and collision avoidance sub-module comprises a laser radar, an annular lens and a gas collision avoidance detector;

the annular lens is used for reflecting laser lines of the laser radar to other angles and measuring the part outside the plane of the laser radar;

the gas collision avoidance detector comprises a gas bag and a gas pressure sensor, wherein the gas bag is distributed on the surface of the robot, and gas is filled in the gas bag; the air pressure sensor is positioned in the air bag and used for measuring air pressure. When the robot bumps, the gas bag can be compressed, and the pressure sensor detects that the pressure changes, triggering the anti-collision effect.

Further, the intelligent conference management system comprises an image acquisition processing sub-module and the second communication sub-module;

the image acquisition processing submodule comprises a conference focus judgment unit and a conference focus extraction module, wherein the conference focus judgment unit is used for processing image information and extracting a focus of a conference;

the focus of the conference is extracted by acquiring conference staff information through a camera, analyzing and judging the face orientation of the current participant, and extending and fitting a focus according to the face orientations of a plurality of participants, wherein the focus is used as a conference focus;

and the second communication sub-module is a wireless communication device and is used for communicating with the first communication sub-system and transmitting data and commands.

Further, the path planning and moving module comprises a path planning algorithm, a motor control algorithm and a robot vehicle body.

Compared with the prior art, the invention has the beneficial effects that:

the invention can automatically adjust the position of the robot according to the position of a speaker in participants, and the mobile robot is provided with the camera, so that the mobile robot can automatically move to a proper position to collect the information of the front visual angle of the speaker when the fixed camera cannot independently acquire the front visual angle of the complete speaker, thereby improving the efficiency of the video conference.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.

In the drawings:

FIG. 1 is a block diagram of a system module according to an embodiment of the present invention;

FIG. 2 is a block diagram of a positioning module according to an embodiment of the present invention;

FIG. 3 is a block diagram of an obstacle avoidance and collision avoidance module according to an embodiment of the present invention;

FIG. 4 is a schematic front and top view of a toroidal lens construction according to an embodiment of the present invention;

fig. 5 is a flowchart of a video conference implementation method based on a mobile robot according to the present invention.

Reference numerals in fig. 4 denote:

121. a laser radar; 122. an annular lens; 122_1, a mirror; 122_2, transparent lenses; 13. a video acquisition and tracking subsystem; 14. a path planning and moving subsystem.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, and third may be used in this disclosure to describe various information, this information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The embodiment of the invention provides a video conference implementation method based on a mobile robot, which is shown in fig. 5 and comprises the following steps:

the laser point cloud segmentation algorithm comprises the following steps:

s2, identifying a main area of the conference room in the point cloud map;

preferably, during the moving process, the pan-tilt can be adjusted according to the position and the posture of the vehicle and the position of the speaker, so as to shoot;

The step of S4 includes:

And in the step S3, the speaker position may be equivalently replaced by a conference focus, conference staff information is acquired through camera acquisition and image acquisition processing, the face orientation of the current participant is analyzed and judged, and a focus is extended and fitted according to the face orientations of a plurality of participants, and is used as the conference focus.

When the conference focus is located outside the conference room, the mobile robot shoots the PPT position; when the meeting focus is among the participants, the mobile robot shoots the position;

The calibration module comprises a sensor and a positioning resolving unit;

The voiceprint positioning module comprises a synchronizer, a plurality of microphones and an operation unit, and acquires the position of a speaker by acquiring the arrival time of the voice of the speaker;

The robot module further comprises an obstacle avoidance and collision avoidance submodule, a video acquisition and tracking submodule and a first communication submodule;

The intelligent conference management system comprises an image acquisition processing submodule and the second communication submodule;

The path planning and moving module comprises a path planning algorithm, a motor control algorithm and a robot body and is used for planning a moving path of the robot and controlling the robot to move to the position.

The embodiment of the invention is shown in figures 1, 2 and 3 and comprises a mobile robot and an intelligent conference management system.

The mobile robot comprises a calibration module, an obstacle avoidance and collision avoidance sub-module, a video acquisition and tracking sub-module, a path planning and moving module and a first communication sub-module.

And the calibration module comprises a sensor and a positioning calculation unit.

The sensor is generally a laser radar, and other sensors do not need to be installed in the environment for matching; additionally, ultra-wideband positioning and RFID positioning sensors can be selected, and the sensors need to be matched with other sensors installed in the external environment. The laser radar can also be used for obstacle avoidance and collision avoidance.

The positioning calculation unit selects different algorithms according to different selected sensors to calculate the final position, and generally, when a laser radar or a camera is used, the positioning is performed by using algorithms such as SLAM and the like; when the ultra-wideband positioning and RFID positioning sensors are used, triangulation methods such as electromagnetic wave arrival time/angle are used for positioning.

Keep away barrier and anticollision submodule piece, including laser radar, annular lens, gaseous anticollision detector.

The annular lens is used for reflecting laser lines of the laser radar to other angles and is used for measuring the part outside the plane of the laser radar.

A gas collision avoidance detector includes a gas bag and a gas pressure sensor. Wherein the gas bag is distributed on the surface of the robot like the outer coating, and the gas is filled in the gas bag; the air pressure sensor is positioned inside the air bag and can measure air pressure. When the robot bumps, the gas bag can be compressed, and the pressure sensor detects that the pressure changes, triggering the anti-collision effect.

As shown in the schematic front view and the top view of the annular lens structure shown in fig. 4, the annular lens 122 is a circular ring type structure, and the circular ring type structure is composed of a reflective mirror 122_1 and a transparent lens 122_2, which are alternately distributed, and the installation position is parallel to the scanning surface of the laser radar 121. When a laser beam irradiates the reflector 122_1, the laser beam is directly reflected to the ground, is 121_1 and is used for detecting whether the vertical angle of the side surface of the robot is blocked; when the laser beam irradiates on the transparent lens 122_2, the laser directly penetrates through the transparent lens 122_2, and the laser beam is 121_2 and is used for detecting a horizontal angle obstacle of the robot.

Preferably, the distribution of the reflective mirror 122_1 and the transparent mirror 122_2 should be calibrated before use.

Preferably, in order to match the laser radar 121 with the annular lens 122, a laser point cloud segmentation algorithm is further included:

a1: and (4) placing the robot with the annular lens and the laser radar installed in an open room.

A2: the laser radar starts working to obtain all point cloud data, and according to the characteristics of the point cloud data, the point cloud data are clustered into three types of farthest, common and nearest points.

A3: due to the characteristics of the hardware structure, the laser beam has three distribution modes, and the following processing results are obtained respectively:

a3_ 1: and irradiating the laser beam onto the transparent lens, wherein the laser directly penetrates through the lens, and the laser point at the moment is irradiated onto the wall surface and marks the angle information of the laser beam corresponding to the point cloud data with the farthest distance measurement distance.

A3_ 2: when a laser beam irradiates the reflector, the laser is directly reflected to the ground, the laser point irradiates the ground, and the angle information of the laser beam is marked corresponding to the point cloud data with the common ranging distance.

A3_ 3: the laser beam does not carry out the two conditions of A3_1 and A3_2, the distance between the laser radar and the annular lens is directly measured, the point cloud is nearest, the numerical value is fixed, and the point cloud is directly discarded.

Video acquisition and tracking submodule includes camera, cloud platform. The camera is used for collecting image information, and the holder is used for rotating the camera and tracking a conference focus.

And the path planning and moving module comprises a path planning algorithm, a motor control algorithm and a robot body and is used for planning the moving path of the robot and controlling the robot to move to the position.

The first communication sub-module is a wireless communication device and is used for communicating data and commands with the second communication module subsystem.

The intelligent conference management system comprises a voiceprint positioning module, an image acquisition processing sub-module and a second communication sub-module.

The voiceprint positioning module comprises a synchronizer, a plurality of microphones and an operation unit, and acquires the position of a speaker by acquiring the arrival time of the voice of the speaker.

The synchronizer can generate a synchronous signal which can synchronize all the microphones to work under the same clock system.

The number of the microphones is four or more, and the microphones are used for collecting sound information when a conference occurs and recording the time when the information reaches the microphones.

The microphone needs to be calibrated before being used, the calibration content is the installation position of the microphone, and the calibration is carried out through measuring equipment such as a total station.

The microphone will also mark the arrival time of the sound when collecting the sound information. According to the number of the microphones (numbers a, b, c, d, etc.), the numbers are marked as ta, tb, tc, td, etc., wherein ta is the time when the signal reaches the number a microphone, and tb is the time when the signal reaches the number b microphone, and the ratio is the same.

And the operation unit analyzes the position of the current speaker according to the acquired information and determines the position of the speaker.

Further, the position analysis method is as follows:

xa, Ya, Za, Xb, Yb, Zb, Xc, Yc, Zc, Xd, Yd, and Zd are the coordinates of the microphone abcd, Tba is the time difference of sound reaching the microphones b and a, Tca is the time difference of sound reaching the microphones c and a, Tda is the time difference of sound reaching the microphones d and a, V is the propagation speed of sound, and x and y are the final calculation results, i.e. the position of the person.

And the image acquisition processing submodule comprises a plurality of cameras and a conference focus judgment unit and is used for acquiring image information and processing and acquiring the focus of the conference.

The conference focus extraction is to collect conference staff information through a camera, analyze and judge the face orientation of the current participant, prolong and fit a focus according to the face orientations of a plurality of participants, and the focus is used as a conference focus.

And the second communication sub-module is a wireless communication device and is used for communicating with the first communication sub-module and transmitting data and commands.

Compared with the prior art, the invention has the beneficial effects that:

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principle of the invention, a person skilled in the art can make the same changes or substitutions on the related technical features, and the technical solutions after the changes or substitutions will fall within the protection scope of the invention.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention; various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, substitution and improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A video conference realization method based on a mobile robot is characterized by comprising the following steps:

the laser point cloud segmentation algorithm comprises the following steps:

a3_ 1: irradiating a laser beam onto the transparent lens, wherein the laser directly penetrates through the lens, and a laser point at the moment is irradiated onto a wall surface and marks angle information of the laser beam corresponding to point cloud data with the farthest distance measurement distance;

s2, identifying a main area of the conference room in the point cloud map;

xa, Ya, Za, Xb, Yb, Zb, Xc, Yc, Zc, Xd, Yd, and Zd are the coordinates of the microphone abcd, Tba is the time difference of sound reaching the microphones b and a, Tca is the time difference of sound reaching the microphones c and a, Tda is the time difference of sound reaching the microphones d and a, V is the propagation speed of sound, and x and y are the final resolving results, namely the position of the person;

s4, selecting different processing logics according to the position of the speaker; the method comprises the following steps:

2. The method of claim 1, wherein in step S3, the speaker position is equivalently replaced by a conference focus, and the conference position is acquired by a camera, and the image acquisition process acquires conference information, analyzes and determines the face orientation of the current participant, and extends and fits a focus according to the face orientations of a plurality of participants, where the focus is used as the conference focus.

3. The video conference implementation method of claim 2, wherein when the conference focus is outside a conference room, the mobile robot shoots a PPT location; when the meeting focus is among the participants, the mobile robot shoots the position.

4. The system for implementing the video conference implementation method according to any one of claims 1 to 3, comprising:

a conference management module: selecting different processing logics according to the position of the speaker; the method comprises the following steps:

if the camera of the intelligent conference management can directly acquire the front visual angle of the position of the speaker, namely the close-up shot, the image data of the camera is directly selected, and the process is ended;

if the camera of the intelligent conference management does not have a proper camera for collecting the front visual angle of the position of the speaker, selecting the camera on the mobile robot, establishing connection with the mobile robot through second communication by the intelligent conference management, and sending an instruction, wherein the instruction content is the position of the speaker and the best shooting position;

5. The video conference implementation system of claim 4, wherein the calibration module comprises a sensor and a positioning solution unit;

wherein the sensor comprises a laser radar; the laser radar can also be used for an obstacle avoidance and collision avoidance module;

6. The video conference realization system of claim 4, wherein the voiceprint positioning module comprises a synchronizer, a plurality of microphones, and an arithmetic unit, and acquires the position of the speaker by collecting the arrival time of the voice of the speaker;

7. The video conference implementation system of claim 4, wherein the robot module further comprises an obstacle avoidance and collision avoidance sub-module, a video acquisition and tracking sub-module, and a first communication sub-module;

the collision avoidance and collision avoidance sub-module comprises a laser radar, an annular lens and a gas collision avoidance detector;

the gas collision avoidance detector comprises a gas bag and a gas pressure sensor, wherein the gas bag is distributed on the surface of the robot, and gas is filled in the gas bag; the air pressure sensor is positioned in the air bag and used for measuring air pressure.

8. The video conference implementation system of claim 4, wherein the intelligent conference management system comprises an image acquisition processing sub-module and a second communication sub-module;

the second communication sub-module is a wireless communication device and is used for communicating with the first communication sub-module and transmitting data and commands.

9. The video conference fulfillment system of claim 4 wherein said path planning and movement module comprises a path planning algorithm, a motor control algorithm and a robotic vehicle body.