CN112073613A

CN112073613A - Conference portrait shooting method, interactive tablet, computer equipment and storage medium

Info

Publication number: CN112073613A
Application number: CN202010948507.5A
Authority: CN
Inventors: 吴文宪
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shirui Electronics Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shirui Electronics Co Ltd
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2020-12-11
Anticipated expiration: 2040-09-10
Also published as: CN112073613B

Abstract

The application discloses a method for shooting a conference portrait, which comprises the following steps: collecting wide-angle images in a video conference room; determining a target position of a target person in the wide-angle image; and controlling a zoom lens to shoot a target face image of the target person according to the target position. The application also discloses an interactive tablet, a computer device and a computer readable storage medium. The method and the device for obtaining the close-up image of the target person in the video conference are achieved.

Description

Conference portrait shooting method, interactive tablet, computer equipment and storage medium

Technical Field

The present application relates to the field of image capturing, and in particular, to a method for capturing a conference portrait, an interactive tablet, a computer device, and a computer-readable storage medium.

Background

In a common scene of a video conference, a plurality of people in a conference room participate in a video call, and in order to achieve a more excellent video effect, a speaker often performs close-up shooting and tracking of a portrait. However, since a wide-angle shot is generally used in a video camera to record images of all participants, when a close-up image of a target person (e.g., a speaker) needs to be displayed, the target person image is cut from the wide-angle shot, but the obtained close-up image of the target person is often unclear.

The above is only for the purpose of assisting understanding of the technical solutions of the present application, and does not represent an admission that the above is prior art.

Disclosure of Invention

The application mainly aims to provide a conference portrait shooting method, an interactive tablet, a computer device and a computer readable storage medium, and aims to solve the problem that a clear close-up image of a target person in a video conference is difficult to obtain.

In order to achieve the above object, the present application provides a method for capturing a conference portrait, comprising the following steps:

collecting wide-angle images in a video conference room;

determining a target position of a target person in the wide-angle image;

and controlling a zoom lens to shoot a target face image of the target person according to the target position.

Further, the step of determining the target position of the target person in the wide-angle image comprises:

determining a first position of the target person in the wide-angle image by using a sound positioning algorithm, identifying a face image of the target person in the wide-angle image, and determining a second position of the face image in the wide-angle image;

and obtaining the target position of the target person in the wide-angle image according to the first position and the second position.

Further, after the step of identifying the face image of the target person in the wide-angle image, the method further includes:

determining a first distance between the target person and a shooting position according to the number of pixel points corresponding to the face image;

and determining the focal length of the zoom lens according to the first distance, wherein the focal length is applied to shooting the target face image.

Further, after the step of determining the target position of the target person in the wide-angle image, the method further includes:

when the plurality of target characters are determined, determining a second distance between target positions corresponding to the plurality of target characters;

judging whether the second distance is smaller than or equal to a preset threshold value;

and if so, executing the step of controlling a zoom lens to shoot the target face image of the target person according to the target position.

Further, after the step of determining whether the second distance is smaller than a preset threshold, the method further includes:

if not, determining a first target person and a second target person according to the speaking time corresponding to each target person;

and controlling the zoom lens to shoot a target face image of the first target person according to the target position corresponding to the first target person, and taking a face image of the second target person in the wide-angle image as a target face image of the second target person.

Further, after the step of controlling the zoom lens to capture the target face image of the target person according to the target position, the method further includes:

generating a picture-in-picture image according to the target face image and the wide-angle image, and outputting the picture-in-picture image;

or outputting the target face image.

Further, the method for shooting the meeting portrait further comprises the following steps:

and after the picture-in-picture image or the target face image is output, if the voice information is not detected within a preset time, outputting the wide-angle image.

To achieve the above object, the present application further provides an interactive tablet, including:

the acquisition module is used for acquiring wide-angle images in a video conference room;

a determination module for determining a target position of a target person in the wide-angle image;

and the shooting module is used for controlling the zoom lens to shoot the target face image of the target person according to the target position.

To achieve the above object, the present application also provides a computer device, comprising:

the computer device comprises a memory, a processor and a conference portrait shooting program which is stored on the memory and can run on the processor, wherein the conference portrait shooting program realizes the steps of the conference portrait shooting method when being executed by the processor.

To achieve the above object, the present application further provides a computer-readable storage medium, on which a program for capturing a conference portrait is stored, and when the program for capturing a conference portrait is executed by a processor, the steps of the method for capturing a conference portrait are implemented.

The conference portrait shooting method, the interactive flat plate, the computer device and the computer readable storage medium collect wide-angle images in a video conference room; determining a target position of a target person in the wide-angle image; and controlling a zoom lens to shoot a target face image of the target person according to the target position. Therefore, by using the mode of combining the wide-angle lens and the zoom lens, a wide-angle image of the video conference is obtained, and meanwhile, a clear target face image of a target person can be obtained.

Drawings

Fig. 1 is a schematic diagram illustrating a step of a method for capturing a portrait of a conference in an embodiment of the present application;

fig. 2 is a schematic diagram illustrating another step of a method for capturing a portrait of a conference in an embodiment of the present application;

fig. 3 is a schematic diagram illustrating another step of a method for capturing a portrait of a conference in an embodiment of the present application;

fig. 4 is a schematic diagram of a further step of a method for capturing a portrait of a conference in an embodiment of the present application;

FIG. 5 is a block diagram illustrating a schematic structure of an interactive tablet in an embodiment of the present application;

FIG. 6 is a block diagram illustrating a computer device according to an embodiment of the present application;

fig. 7 is a block diagram schematically illustrating a configuration of a terminal system according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, in an embodiment, the method for capturing a meeting portrait includes:

and step S10, acquiring a wide-angle image in the video conference room.

And step S20, determining the target position of the target person in the wide-angle image.

And step S30, controlling a zoom lens to shoot a target face image of the target person according to the target position.

In this embodiment, the execution terminal may be an interactive tablet (also referred to as an interactive smart tablet), a conference machine, a computer device, or the like, or may be a shooting device (such as an image processor) for a conference portrait.

As set forth in step S10: optionally, the terminal system may be configured as shown in fig. 7, and includes: the device comprises an image processor, a first image sensor, a second image sensor, a wide-angle lens, a zoom lens, a holder, a microphone array and a sound processing device. The wide-angle camera is used for shooting and recording the scene inside the video conference room, and a first image sensor transmits a wide-angle image acquired based on the wide-angle camera to an image processor; the microphone array is used for collecting the voice information in the conference room and transmitting the collected voice information to the sound processing device, and the sound processing device can analyze the voice information while outputting audio, so that a speaker in the conference room is subjected to sound source positioning and the sound source positioning data is transmitted to the image processor; the zoom lens is erected on the holder, the image processor can adjust the shooting angle of the zoom lens by controlling the holder to rotate, and the second image sensor is used for transmitting the image collected based on the zoom lens to the image processor; the image processor is also used for integrating the images transmitted by the first image sensor and the second image sensor to obtain corresponding videos and outputting the videos.

It should be noted that the wide-angle lens may be a fisheye lens; the holder is a supporting platform of the camera; the image sensor can convert the light image on the light sensing surface into an electric signal in a corresponding proportional relation with the light image by utilizing the photoelectric conversion function of the photoelectric device.

It should be understood that the zoom lens and the pan/tilt/second image sensor may be integrally formed, such as a zoom single-shot pan/tilt camera; the wide-angle lens and the second image sensor may also be integrally formed, such as a wide-angle camera.

Optionally, when detecting that the video conference is started, the terminal may acquire a wide-angle image in the video conference room by using the wide-angle lens.

As set forth in step S20: optionally, when the terminal collects the wide-angle image, the microphone array is further used for collecting the voice information in the conference room, and a speaker corresponding to the voice information is defined as the target person.

Further, the terminal determines a first position of a target person in the wide-angle image by utilizing a sound source positioning technology according to the voice information acquired by the microphone array.

It should be noted that, in the sound source localization technology, the position of a sound source can be obtained by using the difference between time points (or sound intensities) corresponding to the same sound source received by at least two microphones in a microphone array and combining with plane geometry according to distance information between the microphones.

Optionally, a rectangular plane coordinate system is constructed with the plane of the wide-angle image, and when the plane of the wide-angle image is perpendicular to the horizontal plane, the direction in which the intersection line extends is the horizontal direction (X-axis direction), and the direction perpendicular to the horizontal plane is the vertical direction (Y-axis direction). In addition, a direction perpendicular to the plane of the wide-angle image is defined as a Z-axis direction.

Optionally, the X-axis coordinate of the target person in the wide-angle image is analyzed by using a sound localization algorithm and recorded as the first position.

After the first position of the target person is obtained, the target person is positioned to a portrait position (namely the portrait corresponding to the target person) corresponding to the first position in the wide-angle image, then a face image of the target person is obtained by recognition through a face recognition technology, and then a Y-axis coordinate of the face image of the target person in the vertical direction is obtained according to the distance between a development area of the face image in the wide-angle image and the upper boundary and/or the lower boundary of the wide-angle image and is used as a second position of the face image in the wide-angle image.

Optionally, when the second position corresponding to the face image is determined, the central point of the image area of the face image may be determined, and then the Y-axis coordinate is obtained as the second position according to the distance between the central point of the image area and the upper boundary and/or the lower boundary of the wide-angle image.

It should be understood that when recognizing a face image, it may be possible to recognize an image including the entire face head.

Combining the first position (X-axis coordinate) and the second position (Y-axis coordinate), the target position (X, Y) of the target person in the wide-angle image can be obtained. Thus, the target position of the target person can be quickly located.

Optionally, after the terminal obtains the wide-angle image, the terminal may also identify the face images in the wide-angle image to obtain a second position corresponding to each face image, then determine the first position of the target person in the wide-angle image in real time or at regular time according to a sound positioning algorithm, then position the first position to the face image in the wide-angle image in the same vertical direction as the first position to obtain the second position corresponding to the face image, and combine the first position and the second position to obtain the target position of the target person in the wide-angle image.

Optionally, the terminal may also be a reference face image in which the target person is stored in advance, and after the terminal acquires the wide-angle image, the face image in the wide-angle image, which is the same as or similar to the reference face image, is identified directly by using a face identification technology as the face image of the target person. And further determining the X-axis coordinate and the Y-axis coordinate of the face image in the wide-angle image to obtain the target position of the target person in the wide-angle image.

As set forth in step S30: when the target position corresponding to the face image of the target person in the wide-angle image is obtained, the holder is controlled to rotate so that the shooting angle of the zoom lens is over against the direction of the target position, then the zoom lens is used for aligning the head portrait of the target person, the focal length of the zoom lens is adjusted so that the image proportion of the head portrait is within a set range, meanwhile, the head portrait is focused clearly, and the target person is shot in a close-up manner to obtain the target face image of the target person.

Optionally, the focal length of the zoom lens used for shooting the target face image may be preset according to actual needs (e.g., a preset factory value); or determining a first distance between the target person and the shooting position according to the number of pixel points corresponding to the face image, and then determining the focal length of the zoom lens according to the first distance.

Therefore, by using the mode of combining the wide-angle lens and the zoom lens, a wide-angle image of the video conference is obtained, and meanwhile, a clear target face image of a target person can be obtained.

In an embodiment, as shown in fig. 2, on the basis of the embodiment shown in fig. 1, the method for capturing a portrait of a conference further includes:

step S40, determining a first distance between the target person and the shooting position according to the number of pixel points corresponding to the face image of the target person in the wide-angle image;

and step S41, determining the focal length of the zoom lens according to the first distance, wherein the focal length is applied to shooting the target face image.

In this embodiment, while the terminal identifies and obtains the face image of the target person in the wide-angle image, the terminal may further obtain a development area corresponding to the face image, calculate the number of pixels in the development area, determine a ratio of the number of pixels corresponding to the face image to the total number of pixels in the wide-angle image, and determine a first distance along the Z-axis direction between the shooting positions of the target person (or the face image of the target person) and the wide-angle lens (or the zoom lens) according to the ratio.

It should be understood that the smaller the occupancy value, the farther the resulting first distance.

The terminal only needs to be subjected to data analysis training in advance, the relation between different ratio values and the corresponding first distance is stored, and after the ratio value of the pixel point number corresponding to the face image in the total pixel point number of the wide-angle image is obtained, the first distance corresponding to the ratio value can be obtained.

Optionally, the terminal determines in advance a corresponding relationship between different first distances and focal lengths of the zoom lens, and sets corresponding focal lengths for the different first distances, so that after the zoom lens is adjusted to a focal length corresponding to the current first distance, a ratio of a head portrait image of a target person captured by the zoom lens is within a set range, and the target person is focused clearly.

It should be understood that the setting range can be set according to the actual situation, and the application is not limited.

Therefore, according to the difference of the first distance between the target person and the shooting position, the corresponding focal length is determined, the obtained focal length control zoom lens carries out close-up shooting on the target person, and the clear target face image of the target person can be obtained more accurately.

In an embodiment, as shown in fig. 3, on the basis of the above embodiments of fig. 1 to 2, after the step of determining the target position of the target person in the wide-angle image, the method further includes:

in step S50, when the plurality of target persons are determined, a second distance between the target positions corresponding to the plurality of target persons is determined.

Step S51, judging whether the second distance is smaller than or equal to a preset threshold value;

and step S60, if yes, executing the step of controlling the zoom lens to shoot the target face image of the target person according to the target position.

In this embodiment, the terminal may perform sound localization by using the voice information acquired by the microphone array within the first preset time period. When a plurality of persons speak within the first preset time period, the terminal can determine a plurality of target persons and first positions of the target persons in the wide-angle image.

It should be noted that the first preset time period may be set according to actual requirements, such as 30 seconds, one minute, and the like.

Further, after obtaining the first position of each target person, the terminal respectively determines the face image of each target person in the wide-angle image, respectively determines the second position of the face image corresponding to each target person in the wide-angle image, and obtains the target position of the target person by combining the first position and the second position.

Optionally, if the terminal controls a plurality of zoom lenses, and the number of the zoom lenses is greater than or equal to the number of the target persons, the terminal assigns a corresponding zoom lens to each target person according to the target position corresponding to each target person, and captures a target face image of the target person by using the zoom lenses.

Optionally, if the terminal only controls one zoom lens, or the number of zoom lenses is smaller than the number of target persons, the terminal determines a second distance between target positions corresponding to face images of a plurality of target persons, and if the number of target persons is greater than or equal to 3, determines a second distance between a target person located at the leftmost side and a target person located at the rightmost side in the wide-angle image.

And after second distances among the target positions corresponding to the target people are obtained, whether the second distances are smaller than or equal to a preset threshold value is detected. The preset threshold value is characterized in that the maximum transverse length of a target face image shot by the zoom lens meets the requirement of definition.

Optionally, when the terminal detects that the second distance is smaller than or equal to the preset threshold, the step of controlling the zoom lens to shoot the target face image of the target person according to the target position is executed (step S30), so that the shot target face image includes the head portraits of all the target persons. The terminal can determine the central position of a plurality of target characters according to the target position corresponding to each target character, align the focus of the zoom lens with the central position and then shoot a target face image.

Therefore, when a plurality of target persons exist, the shooting of the target persons by the zoom lenses can be distributed or adjusted according to actual needs, and clear target face images of the target persons can be obtained to the maximum extent.

In an embodiment, as shown in fig. 4, on the basis of the embodiments of fig. 1 to fig. 3, after the step of determining whether the second distance is smaller than a preset threshold, the method further includes:

step S70, if not, determining a first target character and a second target character according to the speaking time corresponding to each target character;

step S71, controlling the zoom lens to capture a target face image of the first target person according to the target position corresponding to the first target person, and taking a face image of the second target person in the wide-angle image as a target face image of the second target person.

In this embodiment, after obtaining the second distances between the target positions corresponding to the plurality of target persons, it is detected whether the second distances are smaller than or equal to a preset threshold.

Optionally, when the terminal detects that the second distance is greater than the preset threshold, the terminal determines the first target person and the second target person according to the speaking time corresponding to each target person.

Optionally, the speaking time may be speaking duration, and the target person with the longest speaking duration in the first preset duration is taken as the first target person, and the rest of the target persons are taken as the second target person.

Optionally, the speaking time may be a speaking time point, and the target person whose corresponding speaking time point is closest to the current time point is taken as the first target person, and the remaining target persons are taken as the second target persons.

Optionally, after the terminal distinguishes a first target person from a second target person from the plurality of target persons, the zoom lens is controlled to capture a target face image of the first target person according to a target position corresponding to the first target person (that is, step S30 is executed only for the first target person, and the zoom lens is used to capture the target face image of the first target person); and for the second target person, directly cutting the face image of the second target person in the wide-angle image from the wide-angle image, and performing image amplification processing to obtain the final face image as the target face image of the second target person.

In one embodiment, after the step of controlling the zoom lens to capture the target face image of the target person according to the target position is performed, the method further includes:

and step S80, generating a picture-in-picture image according to the target face image and the wide-angle image, and outputting the picture-in-picture image.

Or, in step S81, outputting the target face image.

In this embodiment, when the video conference is started, especially in a remote conference scene, a video in a conference room acquired by a general local terminal may be transmitted through a network and output to a remote terminal for playing, so that a participant in the conference room of the remote terminal can know the condition in the local conference room.

Optionally, after obtaining the target face image of the target person, the terminal may generate a picture-in-picture image according to the target face image and the wide-angle image, for example, taking the wide-angle image as a bottom-layer image, superimpose the target face image on the wide-angle image, and display the target face image outside a person area in the wide-angle image, thereby obtaining the picture-in-picture image. And after the terminal obtains the picture-in-picture image, continuously outputting the picture-in-picture image, and integrating the picture-in-picture image into a video to be transmitted to the remote terminal.

Or, after obtaining the target face image of the target person, the terminal may only output the target face image, and integrate the continuously output target face image into a video to transmit to the remote terminal.

Therefore, the attention of the participants to the target person can be improved by emphasizing the target face image of the target person.

Optionally, after the terminal outputs the picture-in-picture image or the target face image, if the microphone array does not detect the voice information within the preset time (which is recorded as the second preset time) (for example, it takes the turn of speaking to the participant at the remote terminal, so that the participant at the local terminal remains quiet, which may be the case), the terminal switches the output picture-in-picture image or the target face image to the output wide-angle image, and integrates the continuously output wide-angle image into a video to be transmitted to the remote terminal.

It should be noted that the second preset time period may be set according to actual requirements, such as one minute, two minutes, and the like.

Optionally, when the terminal determines that a plurality of target persons are obtained, and detects that the second distances between the target positions corresponding to the face images of the plurality of target persons are greater than the preset threshold, it is determined that the distances of the target persons are too dispersed at the moment and are not suitable for shooting the target face images by using the zoom lens, and the terminal can directly output the wide-angle images in the video conference room and integrate the continuously output wide-angle images into a video to be transmitted to the remote terminal.

Therefore, by outputting the wide-angle image, remote participants can know the global situation of the local terminal conference conveniently.

Referring to fig. 5, an interactive tablet 10 is further provided in the embodiment of the present application, including:

the acquisition module 11 is used for acquiring wide-angle images in a video conference room;

a determining module 12 for determining a target position of a target person in the wide-angle image;

and the shooting module 13 is used for controlling the zoom lens to shoot a target face image of the target person according to the target position.

Referring to fig. 6, a computer device, which may be a server and whose internal structure may be as shown in fig. 6, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for a program for capturing images of the conference. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of capturing a conference portrait.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the present teachings and is not intended to limit the scope of the present teachings as applied to computer devices.

Furthermore, the present application also proposes a computer-readable storage medium, which includes a program for capturing a conference portrait, and when the program for capturing a conference portrait is executed by a processor, the steps of the method for capturing a conference portrait as described in the above embodiments are implemented. It is to be understood that the computer-readable storage medium in the present embodiment may be a volatile-readable storage medium or a non-volatile-readable storage medium.

In summary, the wide-angle image in the video conference room is acquired for the conference portrait shooting method, the interactive tablet, the computer device and the storage medium provided in the embodiment of the application; determining a target position of a target person in the wide-angle image; and controlling a zoom lens to shoot a target face image of the target person according to the target position. Therefore, by using the mode of combining the wide-angle lens and the zoom lens, a wide-angle image of the video conference is obtained, and meanwhile, a clear target face image of a target person can be obtained.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only for the preferred embodiment of the present application and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims

1. A method for shooting a meeting portrait is characterized by comprising the following steps:

collecting wide-angle images in a video conference room;

determining a target position of a target person in the wide-angle image;

2. The method of capturing a human image of a conference as set forth in claim 1, wherein the step of determining the target position of the target person in the wide-angle image includes:

3. The method for capturing a conference portrait according to claim 2, wherein the step of recognizing a face image of the target person in the wide-angle image is followed by further comprising:

4. The method for capturing a human image for a conference as set forth in any one of claims 1 to 3, wherein said step of determining a target position of a target person in said wide-angle image further comprises:

5. The method for capturing the portrait of the conference as set forth in claim 4, wherein the step of determining whether the second distance is smaller than a preset threshold value further includes:

6. The method for photographing a human image in a conference as claimed in claim 1, wherein the step of controlling a zoom lens to photograph a face image of the subject person according to the subject position further comprises:

or outputting the target face image.

7. The method for photographing a conference portrait according to claim 6, further comprising:

8. An interactive tablet, comprising:

9. A computer device, characterized in that the computer device comprises a memory, a processor and a program for capturing a conference figure stored on the memory and executable on the processor, the program for capturing a conference figure realizing the steps of the method for capturing a conference figure according to any one of claims 1 to 7 when executed by the processor.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a program for photographing a conference portrait, which when executed by a processor implements the steps of the method for photographing a conference portrait according to any one of claims 1 to 7.