CN115914530A

CN115914530A - Video conference control method, management equipment and storage medium

Info

Publication number: CN115914530A
Application number: CN202110892027.6A
Authority: CN
Inventors: 杨俊�
Original assignee: Chengdu Jimi Technology Co Ltd
Current assignee: Chengdu Jimi Technology Co Ltd
Priority date: 2021-08-04
Filing date: 2021-08-04
Publication date: 2023-04-04

Abstract

The embodiment of the application provides a video conference control method, management equipment and a storage medium, wherein the video conference control method comprises the following steps: the video data of the remote object sent by the remote equipment is projected in the air through the projection equipment in the conference system and the optical element corresponding to the projection equipment; when the projection trigger event is detected, a target object is determined from a plurality of presence objects in a conference scene where the conference system is located based on the projection trigger event, and the projection angle of at least one of the projection device and the optical element is adjusted so that the projection picture corresponding to the aerial projection is directed to the target object.

Description

Video conference control method, management equipment and storage medium

Technical Field

The present application relates to the field of data processing, and in particular, to a video conference control method, a management device, and a storage medium.

Background

With the rapid development of communication networks and video technologies, teleconferencing is used more and more frequently. In the related art, the remote video conference is based on either displaying images of the remote participants on a display in a conference scene or projecting the images of the remote participants to a curtain for displaying on a projection device.

However, the above method has at least the problems that the conference picture in the conference scene is not real enough, and the participants at the scene cannot experience the realistic effect of face-to-face conversation with the remote participants.

Disclosure of Invention

The embodiment of the application is expected to provide a video conference control method, a management device and a storage medium, so as to solve the problems that in the related art, at least, conference pictures in a conference scene are not real enough, and participants at a site cannot experience the vivid effect of face-to-face conversation with remote participants.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a video conference control method, where the method includes:

the video data of a far-end object sent by a far-end device is projected in the air through a projection device in a conference system and an optical element corresponding to the projection device;

when a projection trigger event is detected, a target object is determined from a plurality of presence objects in a conference scene where the conference system is located based on the projection trigger event, and a projection angle of at least one of the projection device and the optical element is adjusted to enable a projection picture corresponding to the aerial projection to face the target object.

In a second aspect, an embodiment of the present application provides a management device, where the management device includes: a processor, a memory, and a communication bus;

the communication bus is used for realizing communication connection between the processor and the memory;

the processor is used for executing the video conference control program stored in the memory so as to realize the video conference control method.

In a third aspect, an embodiment of the present application provides a storage medium, where the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the video conference control method described above.

The embodiment of the application provides a video conference control method, a management device and a storage medium, wherein video data of a remote object sent by a remote device is projected in the air through a projection device in a conference system and an optical element corresponding to the projection device; when the projection triggering event is detected, determining a target object from a plurality of existing objects in a conference scene where the conference system is located based on the projection triggering event, and adjusting a projection angle of at least one of the projection device and the optical element so as to enable a projection picture corresponding to aerial projection to face the target object; that is, the video data of the remote object is projected in the air through the combination of the projection device and the optical element, and the projection angle of at least one of the projection device and the optical element is flexibly adjusted according to the projection trigger event, so as to change the orientation of the projection picture corresponding to the aerial projection; therefore, under the condition that hardware equipment is not required to be added, the real conference scene is simulated, the participants in the conference scene can feel the vivid effect of face-to-face conversation with the remote participants, and the intelligence of the conference system is improved.

Drawings

Fig. 1 is a schematic architecture diagram of a conference system implementing a video conference control method provided in the present application;

fig. 2 is an optional flowchart schematic diagram of a video conference control method according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating the working principle of an equivalent negative refractive index flat lens provided in an embodiment of the present application;

fig. 4 is a schematic view of a projection picture of video data of a remote object in a conference scene according to an embodiment of the present application;

fig. 5 is a schematic view illustrating an orientation adjustment of a projection screen according to an embodiment of the present disclosure;

fig. 6 is an alternative flowchart of a video conference control method according to an embodiment of the present application;

fig. 7 is an alternative flowchart of a video conference control method according to an embodiment of the present application;

fig. 8 is an alternative flowchart of a video conference control method according to an embodiment of the present application;

fig. 9 is an alternative flowchart of a video conference control method according to an embodiment of the present application;

fig. 10 is a schematic flow chart of an alternative video conference control method according to an embodiment of the present application;

fig. 11 is an alternative structural schematic diagram of a management device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein may be combined with other embodiments.

The video conference control method is applied to a management device, and the management device can be a terminal device, a server, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), a camera, a wearable device, an Access Point (AP) device, a smart television, a laptop portable computer, and a desktop computer. Here, the present application will be described taking a server as an example of a management device.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a conference system for implementing a video conference method provided in the present application, where the conference system at least includes a remote device 100, a projection device 200, a network 300, and a server 400; the remote device 100 and the projection device 200 are respectively connected to the server 400 through the network 300; the conference system may further include a remote device 100, a projection device 200, a network 300, a server 400, a sound pickup device 500, and an image acquisition module 600; the remote device 100, the projection device 200, the sound pickup device 500, and the image capturing module 600 are connected to the server 400 via the network 300, respectively. Here, the network 300 may be a wide area network or a local area network, or a combination of both, using wireless links for data transmission. Illustratively, the remote device 100 includes, but is not limited to, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), a camera, a wearable device, a wireless Access Point (AP) device, a smart tv, a laptop portable computer, a desktop computer, and the like; a device where the projection device 200 can project an image or video onto a curtain, the projection device 200 including but not limited to a projector, a pico projector, a smart camera, a smart projector, etc.; the server 400 may be a single server, or may be a server cluster, a cloud computing center, or the like, which is formed by a plurality of servers; pickup equipment 500 is used for gathering site environment sound and then transmits the rear end equipment that can handle sound, and the image acquisition module can be camera or camera array.

Referring to fig. 2, fig. 2 is a schematic flow chart of an implementation of a video conference control method provided in an embodiment of the present application, where the video conference control method may be applied to a server 400 in the conference system shown in fig. 1, and the video conference control method includes the following steps:

step 201, performing aerial projection on video data of a remote object sent by a remote device through a projection device in the conference system and an optical element corresponding to the projection device.

In the embodiment of the application, the video data of the far-end object comprises a plurality of frames of ordered images and far-end audio data.

In the embodiment of the application, the conference system comprises a far-end device used by at least one far-end object, a plurality of projection devices, a network, a server, a plurality of sound pickup devices and a plurality of image acquisition modules. The first part of projection devices are used for projecting projection pictures corresponding to video data of a remote object, and the second part of projection devices are used for projecting meeting pictures, wherein the plurality of projection devices include the first part of projection devices and the second part of projection devices, and illustratively, the meeting pictures include meeting Presentation Point (PPT) pictures and meeting environment pictures.

The sound pickup device may be a microphone or a microphone array, and is used for collecting the audio data of the presence of the object. It should be noted that sound pickup devices are disposed at the positions of the seats where the present objects are located, and each sound pickup device corresponds to the position of each seat one by one, that is, each sound pickup device corresponds to each present object one by one.

The image acquisition module can be a camera or a camera array. The image acquisition module can be built in the remote device or the projection device, and the image acquisition module can be connected with the remote device or the projection device through a network or Bluetooth and the like. The image acquisition module is used for acquiring the video data of the objects in the meeting scene and uploading the video data to the server. Further, the server sends the presence video data to the remote device, and the remote device receives the presence video data and displays the presence video data through a display interface of the remote device for the remote object to watch. In one implementable application scenario, from the perspective of the remote object, during the video conference, the remote object views the conference picture of the presence object included in the conference scene presented by the presentation interface of the remote device. Then, in the course of the conference, after acquiring the image data of the object in the presence, the server acquires, through the image acquisition module of the remote device, a rotation change of the eyes or a rotation change of the head of the remote object, for example, the head of the remote object turns to the left, and the eyes also turn to the left, which indicates that the remote object may be interested in the left area of the currently displayed conference picture, that is, it is determined that the left area is an interested area. At the moment, the server is linked with the acquisition direction of the image acquisition module of the conference site to move towards the region of interest, so that the image acquisition module of the conference system acquires the video data of the region of interest corresponding to the remote object and transmits the video data back to the server. The far-end equipment receives the video data of the region of interest sent by the server and displays the video data on the display interface, so that the video picture watched by the far-end object is a conference picture based on the movement of the head and eyes of the far-end object.

It should be noted that the meeting scene where the meeting system is located includes a plurality of seats, and each empty seat may be provided with a projection device and an optical element corresponding to the projection device.

In the embodiment of the present application, the optical element may be an equivalent negative refractive index flat lens (DCT-plate). The equivalent negative refractive index flat lens is a flat lens reconstructed by a characteristic precise surrounding structure, and can enable a two-dimensional or three-dimensional light source to directly realize image in the air by adopting a multi-row and multi-column rectangular light waveguide array and peripheral triangular light waveguides. The method realizes naked eye three-dimensional display characteristics while realizing large field of view, large aperture, high resolution, no distortion and no dispersion. In addition, the microlens array included in the equivalent negative refractive index plate lens can realize negative refraction for light rays generated by a two-dimensional or three-dimensional light source scattered by the projection equipment. For example, referring to fig. 3, fig. 3 is a schematic diagram illustrating an operation principle of an equivalent negative refractive index flat lens, and light generated by a projection device is refracted by a DCT-plate and then deflected in opposite directions to be converged again at a position with planar symmetry to form an equal-size real image, thereby implementing medium-free aerial imaging.

In this embodiment of the application, first, the remote device collects video data of a remote object through an image collection module such as a camera, and uploads the video data to the server, where the video data includes a plurality of frames of ordered images and remote audio data. And secondly, the server transmits the video data to the projection equipment in the conference system which is associated with the remote equipment. Then, the projection equipment and the optical element corresponding to the projection equipment perform aerial projection on the video data to obtain a projection picture corresponding to the aerial projection.

In other embodiments of the present application, step 201 may be implemented by performing aerial projection on video data of a remote object transmitted from a remote device by using a projection device in a conference system and an optical element corresponding to the projection device,

the first method is as follows: and after receiving the video data of the remote object sent by the remote equipment, the server performs semantic segmentation on the video data and extracts portrait outline data in the video data. Further, the server issues the portrait contour data to a projection device in a conference system associated with the remote device, and performs aerial projection on the portrait contour data through the projection device and an optical element corresponding to the projection device to obtain a projection picture corresponding to the aerial projection of the remote object. Therefore, the real conference scene is simulated without adding hardware equipment, and the participants in the conference scene can feel the vivid effect of face-to-face conversation with the remote participants.

The second method comprises the following steps: after receiving video data of a remote object sent by a remote device, a server segments a portrait area and a background area of each frame of image in a plurality of frames of ordered images included in the video data through an image segmentation algorithm, and performs standardized processing on the background area. Further, the server issues the portrait area and the processed background area to a projection device in a conference system associated with the remote device, and performs aerial projection on the processed video data through the projection device and an optical element corresponding to the projection device to obtain a projection picture corresponding to the aerial projection of the remote object. Therefore, the real conference scene is simulated without adding hardware equipment, and the participants in the conference scene can feel the vivid effect of face-to-face conversation with the remote participants.

It should be noted that each empty virtual seat in the conference scene corresponds to each remote object one to one, and the projection device of each empty virtual seat performs aerial projection on the video data of the remote object through the corresponding optical device, so as to realize real-time projection of the conference picture of the remote object in the whole conference process in the conference scene. Referring to fig. 4, fig. 4 is a schematic diagram illustrating a projection picture of video data of a remote object in a conference scene.

Step 202, when a projection trigger event is detected, a target object is determined from a plurality of presence objects in a conference scene where the conference system is located based on the projection trigger event, and a projection angle of at least one of the projection device and the optical element is adjusted so as to enable a projection picture corresponding to aerial projection to face the target object.

In the embodiment of the application, the projection trigger event comprises a presence object with a speech in a conference scene where a conference system is located; projecting a trigger event may further include detecting operation of the orientation button by a presence object; the projecting a trigger event may in turn comprise an operation of detecting that the remote object is selected as a presence object.

In an embodiment of the present application, the projection trigger event is used to adjust a projection angle of at least one of the projection device and the optical element to change an orientation of a projection picture corresponding to the aerial projection.

In the embodiment of the application, the projection picture corresponding to the aerial projection is a real image represented by the projection device and the optical element under the condition that original video data does not need to be carried by a medium.

In this embodiment of the present application, in step 202, adjusting a projection angle of at least one of the projection device and the optical element to enable the projection picture corresponding to the aerial projection to face the target object may be implemented by:

and Step1, acquiring a first straight line associated with a plane area where a projection picture corresponding to the far-end object is located.

The first straight line is a straight line which passes through the center of the plane area and is perpendicular to the plane area. Here, the first straight line is a center normal line of the plane area, and the normal line passes through a center position of the plane area.

And Step2, acquiring the object position of the target object.

And Step3, acquiring a second straight line connecting the central position and the object position.

And Step4, obtaining an angle value corresponding to the included angle between the first straight line and the second straight line.

In this embodiment, the angle value may be an angle value corresponding to a minimum included angle between the first straight line and the second straight line.

And Step5, taking the central point of the equipment object to be adjusted as a rotation central point, taking the projection angle of the equipment object to be adjusted at present as an initial angle, and controlling the equipment object to be adjusted to rotate an angle value along the rotation direction to obtain the adjusted equipment object.

The device object to be adjusted comprises projection device, optical element or matched device formed by projection device and optical element.

The device object to be adjusted may be a projection device, the device object to be adjusted may also be an optical element, and the device object to be adjusted may also be a matching device formed by the projection device and the optical element.

In the embodiment of the application, the server determines that the central point of the equipment object to be adjusted is a rotation central point and the projection angle at which the equipment object to be adjusted is located currently is an initial angle, and controls the rotation direction rotation angle value of the equipment object to be adjusted to obtain the adjusted equipment object.

And Step6, performing aerial projection on the video data based on the adjusted equipment object, so that a projection picture corresponding to the aerial projection faces to the target object.

In the embodiment of the application, the server performs aerial projection on the video data based on the adjusted device object, and the obtained projection picture corresponding to the aerial projection is perpendicular to the second straight line.

In an implementation-capable application scenario, referring to fig. 5, first, a server responds to a projection trigger event, and acquires a first straight line L1 of a plane area where a projection picture corresponding to an aerial projection of a remote object is located; secondly, the server acquires the object position of the target object; thirdly, acquiring a second straight line L2 connecting the central position and the object position; then, the server obtains an angle value theta corresponding to an included angle of the first straight line L1 and the second straight line L2 on the horizontal plane; and finally, the server determines that the equipment object to be adjusted is projection equipment, the central point of the projection equipment is taken as a rotation central point, the plane where the projection picture corresponding to the aerial projection of the remote object is located is taken as a rotation starting plane, the rotation angle value theta is rotated along the clockwise direction, the video data are subjected to aerial projection based on the adjusted projection equipment and the optical element, and the obtained projection picture faces the target object. In fig. 5, a first straight line L1 is perpendicular to a plane L3, for example, on which a projection screen obtained by projecting video data in the air by the projection apparatus and the optical element that are not adjusted is located, and a second straight line L2 is perpendicular to a plane L4, for example, on which a projection screen obtained by projecting video data in the air by the projection apparatus and the optical element that are adjusted is located. Because θ + θ 1=90 °, θ 1+ θ 2=90 °, θ = θ 2, i.e., the rotation angle θ 2 of the projection apparatus is equal to the angle θ between the first straight line L1 and the second straight line L2 in the horizontal plane. In this way, after the object position of the target object is determined, the projection angle of the projection equipment is flexibly adjusted based on an included angle between a first straight line of a plane area where the projection picture corresponding to the aerial projection of the far-end object is located and a second straight line of a connecting line of the center position of the plane area and the object position of the target object, so that the projection picture of the far-end object faces the target object.

In the embodiment of the application, when a server performs aerial projection on video data of a remote object sent by the remote device through a projection device and an optical element corresponding to the projection device in a conference system, the server detects a projection trigger event in real time, determines a target object from a plurality of presence objects in a conference scene where the conference system is located based on the projection trigger event when the projection trigger event is detected, and adjusts a projection angle of at least one of the projection device and the optical element so as to enable a projection picture corresponding to aerial projection to face the target object.

The embodiment of the application provides a video conference control method, which comprises the steps of projecting video data of a remote object sent by remote equipment in the air through projection equipment in a conference system and optical elements corresponding to the projection equipment; when the projection triggering event is detected, determining a target object from a plurality of existing objects in a conference scene where the conference system is located based on the projection triggering event, and adjusting a projection angle of at least one of the projection device and the optical element so as to enable a projection picture corresponding to aerial projection to face the target object; that is to say, the video data of the remote object is projected in the air through the combination of the projection device and the optical element, and the projection angle of at least one of the projection device and the optical element is flexibly adjusted according to the projection trigger event, so as to change the orientation of the projection picture corresponding to the aerial projection; therefore, under the condition that hardware equipment is not required to be added, the real conference scene is simulated, the participants in the conference scene can feel the vivid effect of face-to-face conversation with the remote participants, and the intelligence of the conference system is improved.

Referring to fig. 6, fig. 6 is a schematic diagram illustrating an implementation flow of a video conference control method provided in the embodiment of the present application, where the video conference control method is applied to a server 400 in the conference system shown in fig. 1; here, taking an example that the conference system includes the remote device 100, the projection device 200, the network 300, the server 400, the sound pickup device 500, and the image capturing module 600, the video conference control method includes the following steps:

and 301, projecting video data of a remote object sent by the remote device in the air through the projection device in the conference system and the optical element corresponding to the projection device.

Step 302, determining the presence object speaking in the plurality of presence objects as the target object.

Wherein, the casting trigger event refers to the presence object with a speech in the conference scene where the conference system is located.

In the embodiment of the application, the speaking presence object is a sound pickup device of a conference system, and the sound of the presence object corresponding to the sound pickup device can be collected. It should be noted that, in the same time period in the conference scene, one presence object may speak, or multiple presence objects may speak simultaneously.

In the embodiment of the application, when a server detects that a presence object speaking exists in a conference scene where a conference system is located, that is, it is determined that a casting trigger event is detected, and then a presence object speaking in a plurality of presence objects is determined as a target object based on the casting trigger event.

In this embodiment of the present application, the step 302 determines that a presence object speaking in a plurality of presence objects is a target object, and may be implemented by the following steps: and if at least one speaking presence object exists in the conference scene, determining the speaking presence object with the speaking duration being greater than the duration threshold as a target object.

In the embodiment of the present application, the duration threshold may be preset by a user based on experimental data, and the duration threshold may also be set by the conference system in real time according to actual requirements, which is not specifically limited in the present application. In practical applications, the duration threshold may be set to 5 seconds, for example.

In the embodiment of the application, in the video conference process, a server acquires whether a present object corresponding to a sound pickup device speaks or not through the sound pickup device in a conference scene in a conference system, and if the sound pickup device acquires the present object corresponding to the sound pickup device and the speaking time exceeds a time threshold, the present object of the speech with the speaking time exceeding the time threshold is determined as a target object.

And step 303, detecting the sound source position corresponding to the target object.

In the embodiment of the present application, the sound source position corresponding to the target object may be determined by the position of the sound pickup apparatus corresponding to the target object, where the position of each sound pickup apparatus is stored in the server in advance.

In the embodiment of the application, the server acquires the position of the pickup equipment for acquiring the voice data of the target object, and the sound source position corresponding to the target object can be determined based on the position of the pickup equipment.

And step 304, adjusting a projection angle of at least one of the projection device and the optical element based on the sound source position corresponding to the target object, so that the projection picture corresponding to the aerial projection is directed to the target object.

In the embodiment of the present application, referring to fig. 7, in step 304, the projection angle of at least one of the projection device and the optical element is adjusted based on the sound source position corresponding to the target object, so as to enable the projection picture corresponding to the aerial projection to be directed to the target object, which may be implemented by the following steps,

step 3041, if there are at least two target objects in the meeting scene, collecting the audio data of the presence of each target object when asking a question through the sound pickup device corresponding to each target object in the meeting system.

In the embodiment of the application, if at least two target objects exist in a conference scene, the server acquires the presence audio data of the target object corresponding to the sound pickup equipment through each sound pickup equipment in the conference system, performs semantic recognition on the presence audio data, and acquires the presence audio data when asking questions in all the presence audio data corresponding to each target object if the semantic recognition result indicates that each target object in the at least two target objects is asking questions.

Step 3042, performing semantic recognition on the presence audio data corresponding to each target object, and determining the remote object to which each target object asks.

In the embodiment of the present application, each target object may ask a question of one or more remote objects, and the present application is not particularly limited.

In the embodiment of the application, if at least two target objects exist in a conference scene, after the server collects the audio data in the presence of each target object when the conference system asks for a question through the sound pickup equipment corresponding to each target object, the server performs semantic recognition on the audio data in the presence corresponding to each target object based on a semantic recognition algorithm to determine a remote object asked for each target object.

Step 3043, adjusting a projection angle of at least one of the projection device and the optical element based on the sound source position of each target object and the remote object of each target object question, so that the projection picture corresponding to the aerial projection of the remote object of each target object question faces each target object.

Exemplarily, it is assumed that two target objects, namely a target object a and a target object B, exist in the conference scene, the remote object asked by the target object a is the remote object 1, the remote object asked by the target object B is the remote object 2, and the video data of each remote object is projected in the air by the projection device and the optical element configured at the hollow virtual seat in the conference scene corresponding to each remote object, so as to obtain a projection picture corresponding to the video data of each remote object. In this case, the server adjusts the projection angle of at least one of the projection device and the optical element corresponding to the remote object 1 asked by the target object a based on the acquired sound source position of the target object a so that the projection screen corresponding to the video data of the remote object 1 asked by the target object a is directed toward the target object a. Likewise, the server adjusts the projection angle of at least one of the projection device and the optical element corresponding to the remote object 2 asked by the target object B based on the acquired sound source position of the target object B so that the projection screen corresponding to the video data of the remote object 2 asked by the target object B is directed toward the target object B.

As can be seen from the above description, in the embodiment of the present application, the server determines that at least two target objects exist in a conference scene, acquires, through each sound pickup device in the conference system, present audio data when a target object corresponding to each device is asked, performs semantic recognition on each present audio data, and determines a remote object in each target object question, and the server adjusts a projection angle of at least one of the projection device and the optical element based on a sound source position of the target object and a corresponding question object, so that a projection screen corresponding to aerial projection of the remote object in each target object question faces each target object. In this way, both the sound source position of the present object based on the question and the remote object to be asked are realized, so that the projection screen of the remote object to be asked faces the present object to be asked; and the real conference scene is simulated, so that the participants in the conference scene can feel the vivid effect of face-to-face conversation with the remote participants, and the intelligence of the conference system is improved.

In the embodiment of the present application, referring to fig. 8, the step 3043 of adjusting the projection angle of at least one of the projection device and the optical element based on the sound source position of each target object and the remote object asked for each target object, so that the projection picture corresponding to the aerial projection of the remote object asked for each target object faces each target object, may be implemented by,

step A1, if at least two target objects ask the same remote object, acquiring remote audio data of the same remote object.

The video data comprises a plurality of frames of ordered images and far-end audio data.

And step A2, performing semantic recognition on the far-end audio data, determining that the question reply of the same far-end object to one of the at least two target objects is finished, and positioning the next target object to be replied by the far-end object based on the question sequence of the at least two target objects.

In the embodiment of the application, if the questions of at least two target objects are the same remote object, the server acquires remote audio data in the video data of the same remote object, and performs semantic recognition on the same remote audio data through a semantic recognition algorithm to obtain a semantic recognition result. If the semantic recognition result represents that the question reply of the same remote object to one target object of the at least two target objects is finished, positioning a next target object to be replied by the remote object based on the question sequence of the at least two target objects so that the projection picture of the same remote object faces the next target object.

In other embodiments of the present application, the step A2 of locating the next target object to be replied to by the same remote object may also be implemented by the following method: if the server determines that the question reply of the same remote object to one of the at least two target objects is finished, the server positions the next target object to be replied by the same remote object based on the position sequence of the at least two target objects.

In an implementation application scenario, when a plurality of target objects ask the same remote object such as a professor, the server records the time stamp of the question of each target object, and determines the target object with the smallest time stamp as the first object to be replied to of the same remote object. Further, the server acquires the remote voice data of the same remote object such as a lecture book in real time, and carries out semantic recognition on the remote voice data through a semantic recognition algorithm to obtain a semantic recognition result. And if the semantic recognition result represents that the question reply of the same remote object to the first target object is finished, positioning the target object with the smaller timestamp, namely the target object of the second question, as the next target object to be replied by the same remote object based on the question sequence of at least two target objects. And repeating the steps until the same remote object teaches and answers the questions of all the target objects.

In another practical application scenario, when a plurality of target objects are available to ask a question of the same remote object, such as a professor, the server records the target objects of the question, and determines the target object with the shortest distance to the projection plane of the same remote object as the first target object to be replied to of the same remote object. Further, the server acquires the remote voice data of the same remote object such as a lecture book in real time, and carries out semantic recognition on the remote voice data through a semantic recognition algorithm to obtain a semantic recognition result. And if the semantic recognition result represents that the question reply of the same remote object to the first target object is finished, positioning the target object closest to the target object to be asked for the first question as the next target object to be replied by the same remote object based on the position sequence of at least two target objects, such as the clockwise position sequence or the anticlockwise position sequence. And analogizing until the same remote object professor finishes answering questions of all target objects.

And step A3, based on the sound source position of one target object after the question reply is finished and the sound source positions of the same far-end object and the next target object, adjusting the projection angle of at least one of the projection equipment and the optical element so as to enable the projection picture corresponding to the aerial projection of the same far-end object to face the next target object.

As can be seen from the above description, in the embodiment of the present application, the server performs semantic recognition on the far-end audio data, determines that the question reply of the far-end object to one target object of the at least two target objects is completed, and after locating a next target object to be replied by the far-end object based on the question order of the at least two target objects, adjusts the projection angle of at least one of the projection device and the optical element based on the sound source position of the previous target object at which the question reply is completed and the sound source positions of the same far-end object and the next target object, so that the projection picture corresponding to the aerial projection of the same far-end object is changed from facing the previous target object to facing the next target object. In this way, when a plurality of target objects ask the same remote object, after the question reply of the same remote object to the last target object is determined, the projection picture of the asked same remote object faces to the field object of each question according to the question sequence based on the question sequence; and the simulation of a real conference scene is realized, so that participants in the conference scene can feel the vivid effect of face-to-face conversation with the participants at the far end, and the intelligence of the conference system is improved.

As can be seen from the above description, in the embodiment of the present application, video data of a remote object is projected in the air through the combination of the projection device and the optical element, and the projection angle of at least one of the projection device and the optical element is flexibly adjusted according to a projection trigger event, so as to change the orientation of a projection picture corresponding to the aerial projection of the remote object, which is beneficial to simulating a real conference scene, so that participants present in the conference scene can feel the realistic effect of performing a face-to-face conversation with the remote participants, and the intelligence of the conference system is improved.

It should be noted that, for the descriptions of the same steps and the same contents in this embodiment as those in other embodiments, reference may be made to the descriptions in other embodiments, which are not described herein again.

Referring to fig. 9, fig. 9 is a schematic implementation flowchart of a video conference control method provided in this embodiment, where the video conference control method can be applied to the server 400 in the conference system shown in fig. 1, and here, taking the conference system including the remote device 100, the projection device 200, the network 300, the server 400, the sound pickup device 500, the image capturing module 600, and the direction button as an example, the video conference control method includes the following steps:

step 401, performing aerial projection on video data of a remote object sent by a remote device through a projection device in the conference system and an optical element corresponding to the projection device.

Step 402, determining a presence object operating the orientation button as a target object from a plurality of presence objects in a conference scene where the conference system is located.

Wherein, the orientation button is installed at least at the position of the target object in the plurality of presence objects, and the projecting the trigger event refers to detecting the operation of the presence object on the orientation button.

In this embodiment of the application, after the server performs aerial projection on video data of a remote object sent from the remote device through a projection device and an optical element corresponding to the projection device in the conference system, the server detects an operation on the orientation button, that is, determines that a projection trigger event is detected, and further determines, based on the projection trigger event, a presence object operating the orientation button from a plurality of presence objects in a conference scene where the conference system is located as a target object. The operation performed on the orientation button includes, but is not limited to, pressing, touch, voice, and the like for the orientation button, which is not specifically limited in this application.

In the embodiment of the present application, a plurality of heading buttons are installed at least at a position of a presence object in a plurality of presence objects in a conference scene where the conference system is located, and each heading button in the plurality of heading buttons corresponds to each far-end object. Illustratively, a plurality of orientation buttons are installed at the positions of each presence object in a plurality of presence objects in a conference scene where the conference system is located, and each orientation button in the plurality of orientation buttons corresponds to each far-end object; or, only one presence object in the plurality of presence objects in the conference scene of the conference system is installed with a plurality of orientation buttons, such as a master control position, and each orientation button in the plurality of orientation buttons corresponds to each far-end object. It should be noted that the orientation button of the master control position may correspond to a fixed direction, or a fixed seat. If the fixed direction corresponds to one fixed direction, the fixed direction range can cover one existing object or a plurality of existing objects, and the server determines that at least one existing object in the fixed direction range is a target object of a projection picture of a remote object; and if the corresponding fixed seat is determined, determining that the object on the fixed seat is the target object of the projection picture of the remote object.

And step 403, acquiring the button identification of the orientation button operated by the target object and the position of the orientation button.

Wherein the orientation button is used for controlling the orientation of the projection picture of the far-end object corresponding to the button identification.

In an embodiment of the present application, a button identifier of an orientation button is used to determine a remote object corresponding to the button identifier. The button identification and location of each orientation button is pre-stored in the server.

In the embodiment of the application, the server determines a presence object operating the orientation button as a target object from a plurality of presence objects in a conference scene in which the conference system is located, and acquires the button identifier of the orientation button operated by the target object and the position of the orientation button, so that the server controls the projection picture of the remote object corresponding to the button identifier to face the target object operated by the orientation button according to the button identifier and the position of the orientation button.

Step 404, based on the button identifier and the position of the orientation button, adjusting a projection angle of at least one of the projection device and the optical element so that a projection picture corresponding to the aerial projection of the remote object corresponding to the button identifier is oriented toward the target object at the position of the orientation button.

As can be seen from the above, in the embodiment of the present application, the server determines, from the presence objects, a presence object that operates the facing button as a target object, and then adjusts, based on the button identifier of the facing button and the position of the facing button, a projection angle of at least one of the projection device and the optical element, so that a projection picture of a remote object corresponding to the button identifier faces the target object at the position of the facing button, so that the presence object selectively requires the projection picture of the remote object to face its own position through the facing button; and the simulation of a real conference scene is realized, so that participants in the conference scene can feel the vivid effect of face-to-face conversation with the participants at the far end, and the intelligence of the conference system is improved.

In other embodiments of the present application, when only the projection device is adjusted, the server may further record a projection angle before the projection device is adjusted and a projection angle after the projection device is adjusted, that is, a rotation angle and a rotation direction are obtained based on the projection angle before the adjustment and the projection angle after the adjustment; when the same projection angle before adjustment and the same projection angle after adjustment occur, the server directly reads the stored rotation angle and rotation direction, and adjusts the current projection angle of the projection equipment again according to the read rotation angle and rotation direction, so that a faster and more accurate rotation mode is realized.

In other embodiments of the present application, when only the optical element is adjusted, the server may further record a projection angle before the optical element is adjusted and a projection angle after the optical element is adjusted, that is, obtain a rotation angle and a rotation direction based on the projection angle before the adjustment and the projection angle after the adjustment; when the same projection angle before adjustment and the same projection angle after adjustment occur, the server directly reads the stored rotation angle and rotation direction, and adjusts the current projection angle of the optical element again according to the read rotation angle and rotation direction, so that a faster and more accurate rotation mode is realized.

In other embodiments of the present application, when the supporting device composed of the projection device and the optical element is adjusted, the server may further record a projection angle before the supporting device is adjusted and a projection angle after the supporting device is adjusted, that is, a rotation angle and a rotation direction are obtained based on the projection angle before the adjustment and the projection angle after the adjustment; when the same projection angle before adjustment and the same projection angle after adjustment occur, the server directly reads the stored rotation angle and rotation direction, and adjusts the current projection angle of the corollary equipment again according to the read rotation angle and rotation direction, so that a faster and more accurate rotation mode is realized.

It should be noted that, a virtual coordinate system is arranged inside the projection device, and can record the coordinate value of each movement, and when the same coordinate value before movement and the same target coordinate value occur, the corresponding rotation angle value is immediately read to perform real-time steering, thereby realizing a faster and more accurate rotation mode.

It should be noted that, for the description of the same steps and the same contents in this embodiment as those in other embodiments, reference may be made to the description in the other embodiments, which is not repeated herein.

Referring to fig. 10, fig. 10 is a schematic flowchart of an implementation flow of a video conference control method provided in an embodiment of the present application, where the video conference control method may be applied to a server 400 in the conference system shown in fig. 1, and the video conference control method includes the following steps:

and step 501, projecting video data of a remote object sent by the remote device in the air through a projection device in the conference system and an optical element corresponding to the projection device.

Step 502, if the remote device displays a plurality of presence objects in a conference scene where the conference system is located to the remote object, acquiring a presence object selected by the remote object from the plurality of presence objects as a target object.

Wherein, the projection triggering event refers to the operation of detecting that the remote object selects the present object.

In the embodiment of the application, after a server performs aerial projection on video data of a remote object sent by a remote device through a projection device in a conference system and an optical element corresponding to the projection device, a presentation interface of the remote device presents a plurality of presence objects in a conference scene where the conference system is located to the remote object, the server detects an operation that the remote object selects the presence object, that is, determines that a projection trigger event is detected, and then determines the presence object selected by the remote object from the plurality of presence objects as a target object based on the projection trigger event. In one implementation scenario, each remote object may select a target object via a conferencing application installed in its remote device. In the remote conference process, for example, a display interface of a conference application of the terminal device displays a plurality of presence objects in a conference scene in which the conference system is located, and the remote object may select one of the presence objects, so that a projection picture corresponding to video data of the remote object is aligned with the selected presence object.

Step 503, acquiring the position of the target object in the meeting scene.

Step 504, based on the position of the target object, adjusting a projection angle of at least one of the projection device and the optical element so as to enable a projection picture of the far-end object to face the target object.

In the embodiment of the application, the server acquires the position of a target object in a conference scene, and adjusts the projection angle of at least one of the projection device and the optical element based on the position of the target object, so that the projection picture of the remote object faces the target object selected by the remote object.

In other embodiments of the present application, the server detects a first casting trigger event, where the first casting trigger event refers to a remote object selecting a first presence object from a plurality of presence objects; at the same time, the server also detects a second cast trigger event, wherein the second cast trigger event refers to that a second presence object operating the orientation button exists in the plurality of presence objects; the first presence object and the second presence object may be the same or different. At this time, the server acquires a first priority level of the first projection trigger event and a second priority level of the second projection trigger event, and processes the projection trigger event with a higher priority level based on the first priority level and the second priority level. In one implementation scenario, the second casting trigger event is prioritized over the first casting trigger event.

As can be seen from the above, the server selects a target object from the plurality of presence objects through the remote object, which not only enables the remote object to pass through the selected target object, so that the projection image of the remote object is directed to the target object; the real conference scene is simulated, participants in the conference scene can feel the vivid effect of the projection picture corresponding to the aerial projection of the remote object, and the intelligence of the conference system is improved.

An embodiment of the present application provides a management device, where the management device may be used to implement a video conference control method provided in embodiments corresponding to fig. 2 and fig. 6 to fig. 10, and as shown in fig. 11, the management device 11 includes: a processor 1101, a memory 1102, and a communication bus 1103, wherein:

the communication bus 1103 is used for implementing communication connections between the processor 1101 and the memory 1102;

the processor 1101 is configured to execute the videoconference control program stored in the memory 1102 to perform the following steps:

the video data of the remote object sent by the remote equipment is projected in the air through the projection equipment in the conference system and the optical element corresponding to the projection equipment;

when the projection trigger event is detected, a target object is determined from a plurality of presence objects in a conference scene where the conference system is located based on the projection trigger event, and a projection angle of at least one of the projection device and the optical element is adjusted so that a projection picture corresponding to the aerial projection faces the target object.

In other embodiments of the present application, the processor 1101 is configured to execute the videoconference control program stored in the memory 1102 to implement the following steps:

determining a presence object which speaks in a plurality of presence objects as a target object, wherein the triggering event is projected and refers to the presence object which speaks in a conference scene where a conference system is located; detecting a sound source position corresponding to a target object; adjusting a projection angle of at least one of the projection device and the optical element based on a position of the sound source corresponding to the target object to direct a projection screen corresponding to the aerial projection toward the target object.

determining a presence object operating an orientation button as a target object from a plurality of presence objects; the method comprises the following steps that a plurality of presence objects are provided with orientation buttons at least at the positions of target objects, and the projecting triggering events refer to the detection of the operation of the presence objects on the orientation buttons; acquiring a button identifier of an orientation button operated by a target object and a position of the orientation button, wherein the orientation button is used for controlling the orientation of a projection picture corresponding to aerial projection of a remote object corresponding to the button identifier; based on the button identification and the position of the orientation button, a projection angle of at least one of the projection device and the optical element is adjusted to cause an aerial projection of the remote object corresponding to the button identification to project a corresponding projection screen towards the target object at the position of the orientation button.

if the remote equipment displays a plurality of presence objects in a conference scene where the conference system is located to the remote object, acquiring the presence object selected by the remote object from the plurality of presence objects as a target object, wherein the projecting of the trigger event refers to the operation of detecting the presence object selected by the remote object; acquiring the position of a target object in a conference scene; based on the position of the target object, a projection angle of at least one of the projection device and the optical element is adjusted to direct the projection picture corresponding to the aerial projection toward the target object.

and if at least one speaking presence object exists in the conference scene, determining the speaking presence object with the speaking duration larger than the duration threshold value as a target object.

if at least two target objects exist in the conference scene, acquiring the audio data in the presence of each target object when asking a question through sound pickup equipment corresponding to each target object in the conference system; performing semantic recognition on the presence audio data corresponding to each target object, and determining a remote object for asking a question of each target object; based on the sound source position of each target object and the remote object of each target object question, adjusting a projection angle of at least one of the projection device and the optical element so that the aerial projection of the remote object of each target object question corresponds to the projection picture facing each target object.

if the questions of at least two target objects are the same remote object, acquiring remote audio data of the same remote object, wherein the video data comprises multiple frames of ordered images and remote audio data; performing semantic recognition on the far-end audio data, determining that the question reply of the same far-end object to one target object of the at least two target objects is finished, and positioning a next target object to be replied by the same far-end object based on the question sequence of the at least two target objects; based on the sound source position of each target object, the remote object asked for by each target object, and the next target object, adjusting a projection angle of at least one of the projection device and the optical element so that the aerial projection of the remote object asked for by each target object faces the next target object.

acquiring a first straight line associated with a plane area where a projection picture corresponding to a far-end object is located; the first straight line passes through the center of the plane area and is perpendicular to the plane area; acquiring an object position of a target object; acquiring a second straight line connecting the central position and the object position; acquiring an angle value corresponding to an included angle between the first straight line and the second straight line; taking the central point of the equipment object to be adjusted as a rotation central point, taking the projection angle of the equipment object to be adjusted at present as an initial angle, and controlling the equipment object to be adjusted to rotate by an angle value along the clockwise direction to obtain the adjusted equipment object; the device object to be adjusted comprises projection equipment, an optical element or matched equipment consisting of the projection equipment and the optical element; and performing aerial projection on the video data based on the adjusted equipment object, so that a projection picture corresponding to the aerial projection faces to the target object.

Embodiments of the application provide a computer storage medium storing one or more programs executable by one or more processors to perform the steps of:

projecting video data of a far-end object sent by a far-end device in the air through a projection device in the conference system and an optical element corresponding to the projection device;

when the projection trigger event is detected, a target object is determined from a plurality of presence objects in a conference scene where the conference system is located based on the projection trigger event, and the projection angle of at least one of the projection device and the optical element is adjusted so that the projection picture corresponding to the aerial projection is directed to the target object.

In other embodiments of the present application, the one or more programs are executable by the one or more processors and further implement the steps of:

determining a presence object operating a facing button as a target object from among the plurality of presence objects; the method comprises the following steps that a plurality of presence objects are arranged at least at the positions of target objects, and projecting trigger events refers to detecting the operation of the presence objects on the orientation buttons; acquiring a button identifier of an orientation button operated by a target object and a position of the orientation button, wherein the orientation button is used for controlling the orientation of a projection picture corresponding to aerial projection of a remote object corresponding to the button identifier; based on the button identification and the position of the orientation button, a projection angle of at least one of the projection device and the optical element is adjusted to cause an aerial projection of the remote object corresponding to the button identification to project a corresponding projection screen towards the target object at the position of the orientation button.

if the remote equipment displays a plurality of presence objects in a conference scene where the conference system is located to the remote object, acquiring the presence object selected by the remote object from the plurality of presence objects as a target object, wherein the projecting of the trigger event refers to the detection of the operation of selecting the presence object by the remote object; acquiring the position of a target object in a conference scene; based on the position of the target object, a projection angle of at least one of the projection device and the optical element is adjusted to direct the projection picture corresponding to the aerial projection toward the target object.

if at least two target objects exist in the conference scene, acquiring the audio data in the presence of each target object when asking a question through sound pickup equipment corresponding to each target object in the conference system; performing semantic recognition on the presence audio data corresponding to each target object, and determining a remote object for questioning each target object; based on the sound source position of each target object and the remote object of each target object question, adjusting a projection angle of at least one of the projection device and the optical element so that the aerial projection of the remote object of each target object question corresponds to a projection picture directed toward each target object.

if the questions of at least two target objects are the same remote object, acquiring remote audio data of the same remote object, wherein the video data comprises multiple frames of ordered images and remote audio data; performing semantic recognition on the far-end audio data, determining that the question reply of the same far-end object to one target object of at least two target objects is finished, and positioning a next target object to be replied by the same far-end object based on the question sequence of the at least two target objects; based on the sound source position of each target object, the remote object asked for by each target object, and the next target object, adjusting a projection angle of at least one of the projection device and the optical element so that the aerial projection of the remote object asked for by each target object faces the next target object.

acquiring a first straight line associated with a plane area where a projection picture corresponding to a far-end object is located; the first straight line is a straight line which passes through the center of the plane area and is perpendicular to the plane area; acquiring an object position of a target object; acquiring a second straight line connecting the central position and the object position; acquiring an angle value corresponding to an included angle between the first straight line and the second straight line; taking the central point of the equipment object to be adjusted as a rotation central point, taking the projection angle of the equipment object to be adjusted at present as an initial angle, and controlling the equipment object to be adjusted to rotate by an angle value along the clockwise direction to obtain the adjusted equipment object; the device object to be adjusted comprises projection equipment, an optical element or matched equipment consisting of the projection equipment and the optical element; and performing aerial projection on the video data based on the adjusted equipment object so that a projection picture corresponding to the aerial projection faces to the target object.

Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

The computer storage medium/Memory may be a Memory such as a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic Random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Read Only Disc (CD-ROM); but may also be various terminals such as mobile phones, computers, tablet devices, personal digital assistants, etc., that include one or any combination of the above-mentioned memories.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment of the present application" or "a previous embodiment" or "some embodiments" or "some implementations" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "an embodiment of the present application" or "an embodiment of the foregoing" or "some embodiments" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit may be implemented in the form of hardware, or in the form of hardware plus a software functional unit.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided herein may be combined in any combination to arrive at a new method or apparatus embodiment without conflict.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application or portions thereof that contribute to the related art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

It should be noted that the drawings in the embodiments of the present application are only for illustrating schematic positions of the respective devices on the terminal equipment, and do not represent actual positions in the terminal equipment, actual positions of the respective devices or the respective areas may be changed or shifted according to actual situations (for example, structures of the terminal equipment), and the proportions of different parts in the terminal equipment in the drawings do not represent actual proportions.

The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A video conference control method, the method comprising:

2. The method of claim 1, wherein when a projection trigger event is detected, determining a target object from a plurality of presence objects in a conference scene in which the conference system is located based on the projection trigger event, and adjusting a projection angle of at least one of the projection device and the optical element to direct the aerial projection corresponding projection screen toward the target object comprises:

determining a presence object speaking in the plurality of presence objects as the target object, wherein the cast trigger event refers to the presence object presenting speech in a conference scene where the conference system is located;

detecting a sound source position corresponding to the target object;

adjusting a projection angle of at least one of the projection device and the optical element based on a sound source position corresponding to the target object to direct a projection screen corresponding to the aerial projection toward the target object.

3. The method of claim 1, wherein when a projection trigger event is detected, determining a target object from a plurality of presence objects in a conference scene in which the conference system is located based on the projection trigger event, and adjusting a projection angle of at least one of the projection device and the optical element to direct the aerial projection corresponding projection screen toward the target object comprises:

determining a presence object operating a facing button as the target object from among the plurality of presence objects; wherein, an orientation button is installed at least at the position of the target object in the plurality of presence objects, and the projection trigger event refers to detecting the operation of the presence object on the orientation button;

acquiring a button identifier of the orientation button operated by the target object and a position of the orientation button, wherein the orientation button is used for controlling the orientation of a projection picture corresponding to the aerial projection of the far-end object corresponding to the button identifier;

based on the button identification and the position of the orientation button, adjusting a projection angle of at least one of the projection device and the optical element to cause an aerial projection of a distal object corresponding to the button identification to project a corresponding projection screen toward the target object at the position of the orientation button.

4. The method of claim 1, wherein when a projection trigger event is detected, determining a target object from a plurality of presence objects in a conference scene in which the conference system is located based on the projection trigger event, and adjusting a projection angle of at least one of the projection device and the optical element to direct the aerial projection corresponding projection screen toward the target object comprises:

if the remote device displays a plurality of presence objects in a conference scene where the conference system is located to the remote object, acquiring a presence object selected by the remote object from the plurality of presence objects as the target object, wherein the projection trigger event refers to an operation of detecting the presence object selected by the remote object;

acquiring the position of the target object in the conference scene;

adjusting a projection angle of at least one of the projection device and the optical element based on the position of the target object to direct a projection picture corresponding to the aerial projection toward the target object.

5. The method of claim 2, wherein the determining that the presence object that is speaking in the plurality of presence objects is the target object comprises:

and if at least one speaking presence object exists in the conference scene, determining the speaking presence object with the speaking duration larger than the duration threshold as the target object.

6. The method of claim 2, wherein the adjusting a projection angle of at least one of the projection device and the optical element based on a sound source location corresponding to the target object to direct a projection screen corresponding to the aerial projection toward the target object comprises:

if at least two target objects exist in the conference scene, acquiring the on-site audio data of each target object when asking a question through the sound pickup equipment corresponding to each target object in the conference system;

performing semantic recognition on the presence audio data corresponding to each target object, and determining a remote object for questioning each target object;

adjusting a projection angle of at least one of the projection device and the optical element based on the sound source position of each target object and the remote object of each target object question so that the aerial projection of the remote object of each target object question corresponds to the projection picture facing each target object.

7. The method of claim 6, wherein adjusting a projection angle of at least one of the projection device and the optical element based on a sound source location of each target object and a remote object of each target object question to direct a projection screen corresponding to an aerial projection of the remote object of each target object question toward each target object comprises:

if the questions of the at least two target objects are the same remote object, acquiring remote audio data of the same remote object, wherein the video data comprise multi-frame ordered images and the remote audio data;

performing semantic recognition on the far-end audio data, determining that the question reply of the same far-end object to one of the at least two target objects is finished, and positioning a next target object to be replied by the same far-end object based on the question sequence of the at least two target objects;

and adjusting the projection angle of at least one of the projection equipment and the optical element based on the sound source position of one target object of which the question reply is ended, the sound source positions of the same far-end object and the next target object, so that the projection picture corresponding to the aerial projection of the same far-end object faces the next target object.

8. The method of any of claims 1 to 7, wherein the adjusting the projection angle of at least one of the projection device and the optical element to direct the projection view corresponding to the aerial projection toward the target object comprises:

acquiring a first straight line associated with a plane area where a projection picture corresponding to the far-end object is located; the first straight line passes through the center of the plane area and is perpendicular to the plane area;

acquiring an object position of the target object;

acquiring a second straight line connecting the central position and the object position;

acquiring an angle value corresponding to an included angle between the first straight line and the second straight line;

controlling the equipment object to be adjusted to rotate the angle value along the clockwise direction by taking the central point of the equipment object to be adjusted as a rotation central point and taking the projection angle of the equipment object to be adjusted at present as an initial angle to obtain the adjusted equipment object; wherein the device object to be adjusted comprises the projection device, the optical element, or a corollary device composed of the projection device and the optical element;

and performing aerial projection on the video data based on the adjusted device object, so that a projection picture corresponding to the aerial projection faces to the target object.

9. A management device, characterized in that the management device comprises: a processor, a memory, and a communication bus;

the processor is configured to execute a video conference control program stored in the memory to implement the video conference control method according to any one of claims 1 to 8.

10. A storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the video conference control method of any one of claims 1 to 8.