WO2018209879A1

WO2018209879A1 - Method and device for automatically selecting camera image, and audio and video system

Info

Publication number: WO2018209879A1
Application number: PCT/CN2017/104657
Authority: WO
Inventors: 陈双龙
Original assignee: 广州视源电子科技股份有限公司; 广州视臻信息科技有限公司
Priority date: 2017-05-16
Filing date: 2017-09-29
Publication date: 2018-11-22
Also published as: CN107277427A

Abstract

Disclosed is a method for automatically selecting a camera image, comprising: obtaining image data of each camera; performing face detection on the image data of each camera to obtain face information in the image data of each camera; obtaining a first priority of each camera according to the face information; obtaining the location of a sound source by means of each microphone to obtain the distance between each camera and the sound source; obtaining a second priority of each camera according to the distance between each camera and the sound source; obtaining a final priority according to the first priority and the second priority of each camera; and outputting the image data of the camera having the highest final priority for display. Correspondingly, the present invention further provides a device for automatically selecting a camera image, an audio and video system and a smart tablet. According to the present invention, during audio and video conferences, the images photographed by the cameras are automatically selected for display, and thus the user experience is improved.

Description

Method, device and audio and video system for automatically selecting camera screen

Technical field

The present invention relates to the field of communications technologies, and in particular, to a method and apparatus for automatically selecting a camera screen, an audio and video system, and a smart tablet.

Background technique

With the development of technology, large-sized smart tablets (for example, 50 inches or more) have been increasingly used in education, conferences and other fields. In the conference area, some brands of smart tablets have multiple camera configurations. As shown in FIG. 1, one camera 201 is installed on each of the left and right sides of the smart tablet. When performing a remote conference, it is usually a fixed (left or right) camera or manually select one of the cameras to acquire image data for remote audio and video conferences.

In particular, in a multi-person multi-party remote audio and video conference, the participants in the conference may be in different positions of the conference room, and if only a single camera is selected, the image data of all participants may not be acquired. In general, people manually switch cameras according to different conference situations to select corresponding conference scenes to meet people's needs for different conference display screens. The existing method of switching cameras requires manual intervention and the user experience is poor.

Summary of the invention

An object of the embodiments of the present invention is to provide a method and apparatus for automatically selecting a camera screen, and an audio and video system. In an audio and video conference, a camera screen can be automatically selected, so that the user can better understand the conference situation and improve the user experience.

To achieve the above objective, an embodiment of the present invention provides a method for automatically selecting a camera screen, including:

Obtaining image data of each camera; wherein the acquired image data is each of the cameras Real-time shooting of the screen image;

Performing face detection on the image data of each camera to obtain face information in the image data of each camera;

Acquiring, according to the image data of each camera, the first priority of each of the cameras;

Determining, by each microphone, a position of a sound source corresponding to the microphone, thereby obtaining a distance between each of the cameras and the sound source;

Obtaining a second priority of each of the cameras according to a distance between each camera and the sound source position;

Obtaining a final priority of each camera according to the first priority and the second priority of each camera;

The image data of the camera with the highest final priority is output for display.

Compared with the prior art, a method for automatically selecting a camera screen is provided by first acquiring image data of each camera; then performing face detection on the image data to obtain face information, according to the face Information to obtain a first priority of each of the cameras; then, determining a position of the sound source by each microphone, and acquiring a number of each of the cameras according to a distance between each camera and the position of the sound source Second priority; finally obtaining the final priority of each camera according to the first priority and the second priority of each camera, and the image data of the camera with the highest final priority The output is used for display technology, and realizes the requirement of video conference in the remote audio and video conference, and automatically selects the image captured by the camera for displaying based on detecting the face and acquiring the voice position, thereby realizing intelligent automatic switching of the camera screen in real time. Enable users to better understand the situation of the meeting and enhance the user experience.

Preferably, the obtaining the final priority of each camera according to the first priority and the second priority of each camera specifically includes:

For each of the cameras, multiplying the first priority by a preset first weight to obtain a first product value, and multiplying the second priority by a preset second weight to obtain a second product value;

A final priority is obtained based on a sum of the first product value and the second product value.

As a preferred solution of the embodiment of the present invention, the first priority and the second priority are respectively multiplied by the preset weight value and then summed to obtain the final priority, which can simultaneously recognize the face detection. The result and the sound source localization result. At the same time, the preferred solution can adapt the user's demand for the conference display screen by adjusting the first weight and the second weight, and has strong adaptability and high user experience.

Further, according to the distance between each camera and the sound source, acquiring the second priority of each camera includes:

Each camera is sorted according to the distance between each camera and the sound source, and the second priority obtained by the camera corresponding to the distance from the sound source is higher.

As a further aspect of the embodiment of the present invention, the higher the second priority of the camera that is disposed closer to the sound source, the more the screen of the speaker's speech is displayed when the remote audio is used.

Further, the face information includes a number of faces, a face area, and a position of the face in the image.

Further, the obtaining the first priority of each of the camera image data according to the image information that is obtained by the image data of each camera includes:

Obtaining, according to the image data of each camera, the number of the faces, the face area, and the position of the face in the image, respectively, the corresponding face number and the face area score And the position score of the face in the image;

Acquiring each camera according to the sum of the face number of the image data of the camera, the face area score, and the sum of the position scores of the face in the image. The first priority.

An embodiment of the present invention further provides an apparatus for automatically selecting a camera screen, including:

An image data acquiring unit, configured to acquire image data of each camera; wherein the acquired image data is a real-time captured screen image of each camera;

a face information acquiring unit, configured to perform face detection on the image data of each camera to obtain face information in the image data of each camera;

a first priority acquiring unit, configured to acquire, according to the image information acquired by the image data of each camera, a first priority of each of the cameras;

a sound source distance obtaining unit, configured to determine, by each microphone, a position of a sound source corresponding to the microphone, so as to obtain a distance between each of the cameras and the sound source;

a second priority acquiring unit, configured to acquire a second priority of each of the cameras according to a distance between each camera and the sound source position;

a final priority acquiring unit, configured to acquire a final priority of each camera according to the first priority and the second priority of each camera;

And an output unit, configured to output image data of the camera with the highest final priority for display.

Compared with the prior art, an apparatus for automatically selecting a camera screen according to an embodiment of the present invention first acquires image data of each camera by an image data acquiring unit; and then performs face detection on the image data by using a face information acquiring unit. Obtaining the face information, and acquiring, by the first priority acquiring unit, the first priority of each of the cameras according to the face information; and then determining the position of the sound source by using each microphone of the sound source distance acquiring unit, And obtaining, by the second priority acquiring unit, a second priority of each of the cameras according to the distance between each camera and the sound source position; and finally, according to the final priority acquiring unit, according to each camera The first priority and the second priority acquire the final priority of each camera, and the image data of the camera with the highest priority is output by the output unit for the technical solution for display, Realized in the remote audio and video conference, for the needs of video conferencing, based on detecting the face and acquiring the voice position to automatically select An image captured by the camera display, Realize intelligent real-time automatic switching of the camera screen, so that users can better understand the meeting situation and enhance the user experience.

Further, the final priority acquiring unit is specifically configured to:

Further, the second priority acquiring unit is configured to acquire, according to the distance between each camera and the sound source, a second priority of each of the cameras, according to each camera and the sound The size of the source location distance is sorted for each camera, and the second distance that the camera obtains correspondingly is smaller as the distance from the sound source is smaller.

Correspondingly, the embodiment of the present invention further provides an audio and video system, including an apparatus for automatically selecting a camera screen, which is provided by the embodiment of the present invention, and further includes:

At least two cameras, the cameras are respectively mounted on the left and right sides or the upper and lower sides of the smart tablet, for real-time shooting of the image of the screen;

a microphone for receiving a sound source and determining the location of the received sound source.

Compared with the prior art, an audio and video system provided by an embodiment of the present invention, on the one hand, captures a picture image in real time through the camera, receives a sound source through the microphone, and determines a position of the sound source; The device for automatically selecting a camera screen disclosed in the embodiment of the invention acquires image data captured by the camera in real time, performs face detection on the image data to obtain face information to obtain a first priority, and determines a sound source position based on the microphone. a distance from the camera to obtain a second priority, and obtaining a final priority of each camera according to the first priority and the second priority of each camera, and the final priority is the highest The image data of the camera is output for display. The above technical solution realizes that, in the remote audio and video conference, for the requirement of the video conference, the image captured by the camera is automatically selected and displayed based on detecting the face and acquiring the voice position, thereby realizing intelligent automatic switching of the camera screen in real time, so that the user is better. Understand the situation of the meeting and enhance the user experience.

Correspondingly, the embodiment of the invention further provides a smart tablet, which comprises an audio and video system provided by an embodiment of the invention.

Compared with the prior art, a smart tablet provided by an embodiment of the present invention, on the one hand, captures a picture image in real time through the camera, receives a sound source through the microphone, and determines a position of the sound source; The device for automatically selecting a camera screen disclosed in the embodiment acquires image data captured by the camera in real time, performs face detection on the image data to acquire face information to obtain a first priority, and based on the position of the sound source determined by the microphone The distance of the camera is used to obtain a second priority, and the final priority of each camera is obtained according to the first priority and the second priority of each camera, and the highest priority is obtained. The image data of the camera is output for display. The above technical solution realizes that, in the remote audio and video conference, for the requirement of the video conference, the image captured by the camera is automatically selected and displayed based on detecting the face and acquiring the voice position, thereby realizing intelligent automatic switching of the camera screen in real time, so that the user is better. Understand the situation of the meeting and enhance the user experience.

DRAWINGS

1 is a schematic structural view of a large-sized smart tablet configured with two cameras;

2 is a schematic flow chart of a method for automatically selecting a camera screen according to Embodiment 1 of the present invention;

3 is a schematic flowchart of step S3 of a method for automatically selecting a camera screen according to Embodiment 1 of the present invention;

4 is a schematic structural diagram of an apparatus for automatically selecting a camera screen according to Embodiment 2 of the present invention;

FIG. 5 is a schematic structural diagram of an audio and video system according to Embodiment 3 of the present invention; FIG.

FIG. 6 is a schematic structural diagram of a smart tablet according to Embodiment 4 of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

Referring to FIG. 2, FIG. 2 is a schematic flowchart of a method for automatically selecting a camera screen according to Embodiment 1 of the present invention, including:

S1: acquiring image data of each camera; wherein the acquired image data is a real-time captured screen image of each camera;

S2, performing face detection on image data of each camera to obtain face information in image data of each camera;

S3. Acquire the first priority of each camera according to the acquired face information according to the image data of each camera.

S4. Determine, by using each microphone, a position of a sound source corresponding to the microphone, so as to obtain a distance between each camera and the sound source;

S5. Obtain a second priority of each camera according to a distance between each camera and a sound source position.

S6. Acquire a final priority of each camera according to the first priority and the second priority of each camera.

S7. Output image data of the camera with the highest priority for display.

The image data captured by the camera acquired in step S1 in real time corresponds to each camera one by one.

Further, the specific calculation process of obtaining the final priority of each camera in step S6 is:

For each camera, multiplying the first priority by the preset first weight to obtain the first product value, and the second Multiplying the priority by the preset second weight to obtain a second product value;

The final priority is obtained based on the sum of the first product value and the second product value.

In step S6 of the embodiment, the first priority obtained based on the face detection and the second priority obtained based on the sound source location identification are respectively multiplied by the corresponding weights and added, and the preset first weight can be adjusted. And the second weight to adjust the proportion of the first priority and the second priority, to meet the actual needs in the teleconference. In addition, for the preferred embodiment for determining the final priority, in addition to the preferred embodiment using the proportional value fusion addition described above, in actual life, only the second priority determined by the sound source position may be considered as the final priority according to the demand. The primary consideration is that the first priority obtained based on face detection is a secondary consideration, that is, the display screen can display the screen of the speaker in real time, and when no one speaks, the result is selected based on the result of the face detection. Picture. The above embodiments are also within the scope of the invention.

Step S5 specifically includes:

According to the distance between each camera and the sound source, each camera is sorted, and the second priority obtained by the camera corresponding to the smaller distance from the sound source is higher.

The higher the second priority level corresponding to the camera closer to the sound source is, the higher the possibility that the screen of the camera closer to the sound source is selected as the display screen display is higher by the step S5.

Wherein, when the image of the camera is used for face detection in step S2, the face is represented by determining a rectangular frame of the face, thereby determining the number of faces, the face area, and the face in the image based on the rectangular frame of the face. s position. The specific face detection method involved herein can be obtained by those skilled in the art from the prior art, and therefore will not be described herein.

Obtaining the face information in the image data of each camera in step S3 includes acquiring the number of faces, the face area, and the position of the face in the image in the image data of each camera. Correspondingly, referring to FIG. 3, FIG. 3 is a schematic flowchart of step S3 of a method for automatically selecting a camera screen according to Embodiment 1 of the present invention. Step S3 specifically includes the following steps:

S31. Acquire corresponding face number, face area score, and face in the image according to the image number of each camera, the face area, and the position of the face in the image. Location score

Preferably, the position of the face in the image is set as the distance between the center position of the face and the position of the center of the current image, and the closer the distance is, the closer the face is to the image center of the camera. Generally, in order to make the display screen better display the meeting scene, the corresponding setting: the more the number of faces, the higher the score of the corresponding face; the larger the face area, the higher the face area score The closer the face is in the image to the center of the camera image, the higher the position score of the face in the image.

Further preferably, in order to satisfy different considerations of the number of faces, the face area, and the position of the face in the image in the actual display conference screen, the number of faces, the face area, and the face may be specifically When the position in the image determines the corresponding face number, the face area score, and the position score of the face in the image, the corresponding weight value is set to adjust the face number and the face area respectively. The score and the score of the position score of the face in the image, which in turn affects the picture of the camera that is finally selected for display.

S32. Acquire a first priority of each camera according to the sum of the face number of the image data of each camera, the face area score, and the sum of the position scores of the faces in the image.

In a specific implementation, the embodiment first obtains image data captured by the camera in real time, and then performs face detection on the image data captured by each camera, including detecting the number of faces, the face area, and the face in each image. The position in the middle, and then the face score, the face area score, and the position score of the face in the image; the sum of the above three scores obtained by the image of each camera is used as the camera corresponding First priority; then, acquiring the position of the sound source received by the microphone, thereby obtaining the distance between the sound source position and each camera, and obtaining a corresponding second priority according to the distance between each camera and the sound source position, wherein The smaller the distance of the sound source position, the higher the second priority of the camera; finally, for each camera, the first priority and the second priority are respectively multiplied by the corresponding weight values and then summed to obtain the final priority, The image data of the camera with the highest priority is transmitted to the display screen.

In this embodiment, the first priority is obtained by using the face information, and then the order of obtaining the second priority by the distance between the sound source and the camera is only an implementation example, and the sequence of the successive operations is performed, or the two steps are performed in parallel. The embodiments are all within the scope of the present embodiment.

Compared with the prior art, the embodiment selects the captured image of the displayed camera based on the result of face detection and sound source position, realizes intelligent automatic switching of the camera screen, meets the display requirement of the remote audio and video conference, and reduces manual operation. , more intelligent, automated; improve user experience.

Referring to FIG. 4, FIG. 4 is a schematic structural diagram of an apparatus for automatically selecting a camera screen according to Embodiment 2 of the present invention. The embodiment specifically includes the following structure:

The image data acquiring unit 11 is configured to acquire image data of each camera; wherein the acquired image data is a real-time captured screen image of each camera;

The face information acquiring unit 12 is configured to perform face detection on image data of each camera to obtain face information in image data of each camera;

The first priority acquiring unit 13 is configured to acquire a first priority of each camera according to the acquired face information according to the image data of each camera;

The sound source distance obtaining unit 14 is configured to determine, by each microphone, a position of the sound source corresponding to the microphone, so as to obtain a distance between each camera and the sound source;

a second priority acquiring unit 15 configured to acquire a second priority of each camera according to a distance between each camera and a sound source position;

The final priority obtaining unit 16 is configured to obtain a final priority of each camera according to the first priority and the second priority of each camera;

The output unit 17 is configured to output image data of the camera with the highest priority for display.

Specifically, the final priority obtaining unit 16 is configured to:

For each camera, multiplying the first priority by the preset first weight to obtain a first product value, and multiplying the second priority by the preset second weight to obtain a second product value;

The final priority acquiring unit 16 of the present embodiment adopts a method of multiplying the first priority acquired based on the face detection and the second priority acquired based on the sound source location identification by the corresponding weights, and adding the pre-adjustment The first weight and the second weight are set to adjust the proportion of the first priority and the second priority to meet the actual requirements in the remote conference. In addition, for the preferred embodiment for determining the final priority, in addition to the preferred embodiment using the proportional value fusion addition described above, in actual life, only the second priority determined by the sound source position may be considered as the final priority according to the demand. The primary consideration is that the first priority obtained based on face detection is a secondary consideration, that is, the display screen can display the screen of the speaker in real time, and when no one speaks, the result is selected based on the result of the face detection. Picture. The above embodiments are also within the scope of the invention.

The second priority acquiring unit 15 is configured to sort each camera according to the distance between each camera and the sound source according to the distance between each camera and the sound source position, and obtain the second priority of each camera, and The smaller the distance of the sound source, the higher the second priority obtained by the camera.

The second priority acquisition unit 15 makes the second priority level corresponding to the camera closer to the sound source higher, so that the possibility that the screen of the camera closer to the sound source is selected as the display screen is higher.

When the face information acquiring unit 12 is configured to perform face detection on the image of the camera, the face is represented by a rectangular frame of the face, and the number of faces and the face area are determined based on the rectangular frame of the face. And the position of the face in the image. The specific face detection method involved herein can be obtained by those skilled in the art from the prior art, and therefore will not be described herein.

The face information in the image data of each camera acquired by the face information acquiring unit 12 includes the number of faces, the face area, and the position of the face in the image.

The first priority acquiring unit 13 is specifically configured to:

According to the image data of each camera, the number of faces obtained, the face area, and the position of the face in the image respectively acquire corresponding face number scores, face area scores, and face positions in the image. Score

Further preferably, in order to satisfy different considerations of the number of faces, the face area, and the position of the face in the image in the actual display conference screen, the image may be specifically based on the number of faces, the face area, and the face. The position in the middle determines the corresponding face number, the face area score, and the position score of the face in the image, and sets the corresponding weight value to adjust the face number and the face area score respectively. And the score of the position score of the face in the image, which in turn affects the screen of the camera that is finally selected for display.

The first priority of each camera is obtained according to the sum of the face number of the image data of each camera, the face area score, and the sum of the position scores of the faces in the image.

In a specific implementation, the image data acquiring unit 11 captures the image data captured by the camera in real time, and then performs face detection on the image data captured by each camera by the face information acquiring unit 12, including detecting each image. The number of faces, the face area, and the position of the face in the image, and then the first priority acquisition unit 13 obtains the score of the face, the face score, and the position score of the face in the image. The sum of the above three scores obtained by the image of each camera is taken as the first priority corresponding to the camera; then, the sound source distance acquiring unit 14 acquires the position of the sound source received by the microphone, thereby obtaining the sound source position and The distance of each camera is obtained by the second priority acquiring unit 15 according to the distance between each camera and the sound source position, wherein the position of the sound source is The smaller the distance, the higher the second priority of the camera; finally, the final priority acquisition unit 16 multiplies the first priority and the second priority by the corresponding weight values for each camera and then sums them to obtain the final priority. The output unit 17 outputs the image data of the camera with the highest priority for display.

In the second embodiment, the first priority is obtained by using the face information, and then the order of obtaining the second priority by the distance between the sound source and the camera is only an implementation example, and the sequence of the successive operations is performed, or the two are performed in parallel. The embodiments of the steps are all within the scope of the protection of the present embodiment.

Embodiment 3 of the present invention further provides an audio and video system. Referring to FIG. 5, FIG. 5 is a schematic structural diagram of Embodiment 3 of the present invention, where Embodiment 3 includes an automatic selection camera screen provided by Embodiment 2 of the present invention. For the device 1, the content of the device described in the second embodiment of the present invention can be seen in detail, and details are not described herein. In addition, the third embodiment further includes the following structure:

The two cameras 201 and the cameras 201 are respectively mounted on the left and right sides or the upper and lower sides of the smart tablet for capturing the image of the screen in real time; preferably, the two cameras 201 of FIG. 1 are mounted on the left and right sides of the smart tablet. Here, the installation position of the two cameras 201 shown in FIG. 1 in the smart tablet is only one embodiment. Based on the principle of the embodiment of the present invention, only the installation position of the camera 201 on the smart tablet or the camera 201 is added. The number of embodiments is also within the scope of the present invention.

The microphone 202 is configured to receive a sound source and determine the position of the received sound source.

In a specific implementation, the image data of the camera 201 is first acquired by the image data acquiring unit 11 , wherein the image of the camera is a screen image captured by the camera 201 in real time and then passed through a face letter. The information acquisition unit 12 performs face detection on the image data captured by each camera 201, including detecting the number of faces in each image, the face area, and the position of the face in the image, and then acquiring the unit through the first priority. 13 acquiring a face number score, a face area score, and a position score of the face in the image, and using the sum of the above three scores obtained by the image of each camera 201 as the first priority corresponding to the camera 201 Then, the sound source distance acquiring unit 14 obtains the position of the sound source, thereby obtaining the distance between the sound source position and each camera 201, wherein the position of the sound source is the microphone 202 receiving the sound source, and determining the received sound source. Obtaining a position; then, obtaining, by the second priority acquiring unit 15, a corresponding second priority according to the distance between each camera 201 and the sound source position, wherein the distance from the sound source position is smaller, the second of the camera 201 The higher the priority; finally, the final priority acquisition unit 16 multiplies the first priority and the second priority by the corresponding weight values for each camera 201 and then sums them to obtain the final advantage. First, the image data of the camera 201 having the highest priority is transmitted from the output unit 17 to the display screen.

Compared with the prior art, an audio and video system of the present embodiment can select a captured image of the displayed camera based on the result of face detection and sound source position, and realize intelligent autonomous switching of the camera image to satisfy the remote audio and video conference. Display requirements, reduce manual operations, be more intelligent and automated; and improve user experience.

The fourth embodiment of the present invention provides a smart tablet. Referring to FIG. 6, FIG. 6 is a schematic structural diagram of Embodiment 4 of the present invention. The fourth embodiment includes an audio and video system according to Embodiment 3 of the present invention. The content of the audio and video system apparatus according to the third embodiment of the present invention is not described herein.

In the embodiment, the image data of the camera 201 is first acquired by the image data acquiring unit 11 , wherein the image of the camera is the image captured by the camera 201 in real time and then the image captured by the face information acquiring unit 12 for each camera 201 . The data is subjected to face detection, including detecting the number of faces in each image, the face area, and the position of the face in the image, and then acquiring the number of faces and the face area by the first priority acquiring unit 13. The score and the position score of the face in the image, and the sum of the above three scores obtained by the image of each camera 201 is taken as the first priority corresponding to the camera 201. Then, the sound source distance acquiring unit 14 obtains the position of the sound source, thereby obtaining the distance between the sound source position and each camera 201, wherein the position of the sound source is the microphone 202 receiving the sound source, and determining the received sound source. Obtaining a position; then, obtaining, by the second priority acquiring unit 15, a corresponding second priority according to the distance between each camera 201 and the sound source position, wherein the distance from the sound source position is smaller, the second of the camera 201 The higher the priority; finally, the final priority acquisition unit 16 multiplies the first priority and the second priority by the corresponding weight values for each camera 201 and then sums them to obtain the final priority, which is output by the output unit 17 The image data of the camera 201 having the highest priority is transmitted to the display screen.

Compared with the prior art, the smart tablet of the embodiment can select the captured image of the displayed camera based on the result of the face detection and the sound source position, and realize the intelligent autonomous switching of the camera image to meet the display of the remote audio and video conference. Demand, reduce manual operations, be more intelligent and automated; improve user experience.

The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It is the scope of protection of the present invention.

Claims

A method for automatically selecting a camera screen, comprising:

Obtaining image data of each camera; wherein the acquired image data is a screen image captured by each camera in real time;

Performing face detection on the image data of each camera to obtain face information in the image data of each camera;

Acquiring, according to the image data of each camera, the first priority of each of the cameras;

Determining, by each microphone, a position of a sound source corresponding to the microphone, thereby obtaining a distance between each of the cameras and the sound source;

Obtaining a second priority of each of the cameras according to a distance between each camera and the sound source position;

Obtaining a final priority of each camera according to the first priority and the second priority of each camera;

The image data of the camera with the highest final priority is output for display.
The method of automatically selecting a camera screen according to claim 1, wherein the obtaining the final of each camera according to the first priority and the second priority of each camera Priority includes:

For each of the cameras, multiplying the first priority by a preset first weight to obtain a first product value, and multiplying the second priority by a preset second weight to obtain a second product value;

A final priority is obtained based on a sum of the first product value and the second product value.
The method of automatically selecting a camera screen according to claim 1, wherein the obtaining the second priority of each camera according to the distance between the camera and the sound source comprises:

Each camera is sorted according to the distance between each camera and the sound source, and the second priority obtained by the camera corresponding to the distance from the sound source is higher.
The method of automatically selecting a camera screen according to claim 1, wherein the face information comprises acquiring a number of faces, a face area, and a position of the face in the image.
The method for automatically selecting a camera screen according to claim 4, wherein the image data of each of the camera images is acquired according to the face information acquired corresponding to the image data of each camera A priority specifically includes:

Obtaining, according to the image data of each camera, the number of the faces, the face area, and the position of the face in the image, respectively, the corresponding face number and the face area score And the position score of the face in the image;

Acquiring each camera according to the sum of the face number of the image data of the camera, the face area score, and the sum of the position scores of the face in the image. The first priority.
An apparatus for automatically selecting a camera screen, comprising:

An image data acquiring unit, configured to acquire image data of each camera; wherein the acquired image The image data is a real-time captured image of each of the cameras;

a face information acquiring unit, configured to perform face detection on the image data of each camera to obtain face information in the image data of each camera;

a first priority acquiring unit, configured to acquire, according to the image information acquired by the image data of each camera, a first priority of each of the cameras;

a sound source distance obtaining unit, configured to determine, by each microphone, a position of a sound source corresponding to the microphone, so as to obtain a distance between each of the cameras and the sound source;

a second priority acquiring unit, configured to acquire a second priority of each of the cameras according to a distance between each camera and the sound source position;

a final priority acquiring unit, configured to acquire a final priority of each camera according to the first priority and the second priority of each camera;

And an output unit, configured to output image data of the camera with the highest final priority for display.
The apparatus for automatically selecting a camera screen according to claim 6, wherein the final priority acquisition unit is specifically configured to:

For each of the cameras, multiplying the first priority by a preset first weight to obtain a first product value, and multiplying the second priority by a preset second weight to obtain a second product value;

A final priority is obtained based on a sum of the first product value and the second product value.
The apparatus for automatically selecting a camera screen according to claim 6, wherein the second priority acquiring unit is configured to acquire each of the cameras according to the distance between each camera and the sound source. The second priority, according to the distance between each camera and the sound source position, Each of the cameras is sorted, and the second distance that the camera obtains correspondingly is smaller as the distance from the sound source is smaller.
An audio and video system, comprising: an apparatus for automatically selecting a camera screen according to any one of claims 6 to 8, further comprising:

At least two cameras, the cameras are respectively mounted on the left and right sides or the upper and lower sides of the smart tablet, for real-time shooting of the image of the screen;

a microphone for receiving a sound source and determining the location of the received sound source.
A smart tablet comprising the audio and video system of claim 9.