WO2018209879A1 - Method and device for automatically selecting camera image, and audio and video system - Google Patents

Method and device for automatically selecting camera image, and audio and video system Download PDF

Info

Publication number
WO2018209879A1
WO2018209879A1 PCT/CN2017/104657 CN2017104657W WO2018209879A1 WO 2018209879 A1 WO2018209879 A1 WO 2018209879A1 CN 2017104657 W CN2017104657 W CN 2017104657W WO 2018209879 A1 WO2018209879 A1 WO 2018209879A1
Authority
WO
WIPO (PCT)
Prior art keywords
camera
priority
image data
sound source
face
Prior art date
Application number
PCT/CN2017/104657
Other languages
French (fr)
Chinese (zh)
Inventor
陈双龙
Original Assignee
广州视源电子科技股份有限公司
广州视臻信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州视源电子科技股份有限公司, 广州视臻信息科技有限公司 filed Critical 广州视源电子科技股份有限公司
Publication of WO2018209879A1 publication Critical patent/WO2018209879A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method and apparatus for automatically selecting a camera screen, an audio and video system, and a smart tablet.
  • one camera 201 is installed on each of the left and right sides of the smart tablet.
  • it is usually a fixed (left or right) camera or manually select one of the cameras to acquire image data for remote audio and video conferences.
  • the participants in the conference may be in different positions of the conference room, and if only a single camera is selected, the image data of all participants may not be acquired.
  • the existing method of switching cameras requires manual intervention and the user experience is poor.
  • An object of the embodiments of the present invention is to provide a method and apparatus for automatically selecting a camera screen, and an audio and video system.
  • a camera screen can be automatically selected, so that the user can better understand the conference situation and improve the user experience.
  • an embodiment of the present invention provides a method for automatically selecting a camera screen, including:
  • the image data of the camera with the highest final priority is output for display.
  • a method for automatically selecting a camera screen is provided by first acquiring image data of each camera; then performing face detection on the image data to obtain face information, according to the face Information to obtain a first priority of each of the cameras; then, determining a position of the sound source by each microphone, and acquiring a number of each of the cameras according to a distance between each camera and the position of the sound source Second priority; finally obtaining the final priority of each camera according to the first priority and the second priority of each camera, and the image data of the camera with the highest final priority
  • the output is used for display technology, and realizes the requirement of video conference in the remote audio and video conference, and automatically selects the image captured by the camera for displaying based on detecting the face and acquiring the voice position, thereby realizing intelligent automatic switching of the camera screen in real time. Enable users to better understand the situation of the meeting and enhance the user experience.
  • the obtaining the final priority of each camera according to the first priority and the second priority of each camera specifically includes:
  • a final priority is obtained based on a sum of the first product value and the second product value.
  • the first priority and the second priority are respectively multiplied by the preset weight value and then summed to obtain the final priority, which can simultaneously recognize the face detection.
  • the result and the sound source localization result can adapt the user's demand for the conference display screen by adjusting the first weight and the second weight, and has strong adaptability and high user experience.
  • acquiring the second priority of each camera includes:
  • Each camera is sorted according to the distance between each camera and the sound source, and the second priority obtained by the camera corresponding to the distance from the sound source is higher.
  • the face information includes a number of faces, a face area, and a position of the face in the image.
  • the obtaining the first priority of each of the camera image data according to the image information that is obtained by the image data of each camera includes:
  • An embodiment of the present invention further provides an apparatus for automatically selecting a camera screen, including:
  • An image data acquiring unit configured to acquire image data of each camera; wherein the acquired image data is a real-time captured screen image of each camera;
  • a face information acquiring unit configured to perform face detection on the image data of each camera to obtain face information in the image data of each camera
  • a first priority acquiring unit configured to acquire, according to the image information acquired by the image data of each camera, a first priority of each of the cameras;
  • a sound source distance obtaining unit configured to determine, by each microphone, a position of a sound source corresponding to the microphone, so as to obtain a distance between each of the cameras and the sound source;
  • a second priority acquiring unit configured to acquire a second priority of each of the cameras according to a distance between each camera and the sound source position
  • a final priority acquiring unit configured to acquire a final priority of each camera according to the first priority and the second priority of each camera
  • an output unit configured to output image data of the camera with the highest final priority for display.
  • an apparatus for automatically selecting a camera screen first acquires image data of each camera by an image data acquiring unit; and then performs face detection on the image data by using a face information acquiring unit. Obtaining the face information, and acquiring, by the first priority acquiring unit, the first priority of each of the cameras according to the face information; and then determining the position of the sound source by using each microphone of the sound source distance acquiring unit, And obtaining, by the second priority acquiring unit, a second priority of each of the cameras according to the distance between each camera and the sound source position; and finally, according to the final priority acquiring unit, according to each camera
  • the first priority and the second priority acquire the final priority of each camera, and the image data of the camera with the highest priority is output by the output unit for the technical solution for display, Realized in the remote audio and video conference, for the needs of video conferencing, based on detecting the face and acquiring the voice position to automatically select An image captured by the camera display, Realize intelligent real-time automatic switching
  • the final priority acquiring unit is specifically configured to:
  • a final priority is obtained based on a sum of the first product value and the second product value.
  • the second priority acquiring unit is configured to acquire, according to the distance between each camera and the sound source, a second priority of each of the cameras, according to each camera and the sound
  • the size of the source location distance is sorted for each camera, and the second distance that the camera obtains correspondingly is smaller as the distance from the sound source is smaller.
  • the embodiment of the present invention further provides an audio and video system, including an apparatus for automatically selecting a camera screen, which is provided by the embodiment of the present invention, and further includes:
  • At least two cameras are respectively mounted on the left and right sides or the upper and lower sides of the smart tablet, for real-time shooting of the image of the screen;
  • a microphone for receiving a sound source and determining the location of the received sound source.
  • an audio and video system captureds a picture image in real time through the camera, receives a sound source through the microphone, and determines a position of the sound source;
  • the device for automatically selecting a camera screen disclosed in the embodiment of the invention acquires image data captured by the camera in real time, performs face detection on the image data to obtain face information to obtain a first priority, and determines a sound source position based on the microphone. a distance from the camera to obtain a second priority, and obtaining a final priority of each camera according to the first priority and the second priority of each camera, and the final priority is the highest
  • the image data of the camera is output for display.
  • the above technical solution realizes that, in the remote audio and video conference, for the requirement of the video conference, the image captured by the camera is automatically selected and displayed based on detecting the face and acquiring the voice position, thereby realizing intelligent automatic switching of the camera screen in real time, so that the user is better. Understand the situation of the meeting and enhance the user experience.
  • the embodiment of the invention further provides a smart tablet, which comprises an audio and video system provided by an embodiment of the invention.
  • a smart tablet provided by an embodiment of the present invention, on the one hand, captures a picture image in real time through the camera, receives a sound source through the microphone, and determines a position of the sound source;
  • the device for automatically selecting a camera screen disclosed in the embodiment acquires image data captured by the camera in real time, performs face detection on the image data to acquire face information to obtain a first priority, and based on the position of the sound source determined by the microphone
  • the distance of the camera is used to obtain a second priority, and the final priority of each camera is obtained according to the first priority and the second priority of each camera, and the highest priority is obtained.
  • the image data of the camera is output for display.
  • the above technical solution realizes that, in the remote audio and video conference, for the requirement of the video conference, the image captured by the camera is automatically selected and displayed based on detecting the face and acquiring the voice position, thereby realizing intelligent automatic switching of the camera screen in real time, so that the user is better. Understand the situation of the meeting and enhance the user experience.
  • 1 is a schematic structural view of a large-sized smart tablet configured with two cameras
  • FIG. 2 is a schematic flow chart of a method for automatically selecting a camera screen according to Embodiment 1 of the present invention
  • step S3 is a schematic flowchart of step S3 of a method for automatically selecting a camera screen according to Embodiment 1 of the present invention
  • FIG. 4 is a schematic structural diagram of an apparatus for automatically selecting a camera screen according to Embodiment 2 of the present invention.
  • FIG. 5 is a schematic structural diagram of an audio and video system according to Embodiment 3 of the present invention.
  • FIG. 6 is a schematic structural diagram of a smart tablet according to Embodiment 4 of the present invention.
  • FIG. 2 is a schematic flowchart of a method for automatically selecting a camera screen according to Embodiment 1 of the present invention, including:
  • S1 acquiring image data of each camera; wherein the acquired image data is a real-time captured screen image of each camera;
  • the image data captured by the camera acquired in step S1 in real time corresponds to each camera one by one.
  • step S6 the specific calculation process of obtaining the final priority of each camera in step S6 is:
  • the final priority is obtained based on the sum of the first product value and the second product value.
  • step S6 of the embodiment the first priority obtained based on the face detection and the second priority obtained based on the sound source location identification are respectively multiplied by the corresponding weights and added, and the preset first weight can be adjusted. And the second weight to adjust the proportion of the first priority and the second priority, to meet the actual needs in the teleconference.
  • the preferred embodiment for determining the final priority in addition to the preferred embodiment using the proportional value fusion addition described above, in actual life, only the second priority determined by the sound source position may be considered as the final priority according to the demand.
  • the primary consideration is that the first priority obtained based on face detection is a secondary consideration, that is, the display screen can display the screen of the speaker in real time, and when no one speaks, the result is selected based on the result of the face detection. Picture.
  • the display screen can display the screen of the speaker in real time, and when no one speaks, the result is selected based on the result of the face detection. Picture.
  • the above embodiments are also within the scope of the invention.
  • Step S5 specifically includes:
  • each camera is sorted, and the second priority obtained by the camera corresponding to the smaller distance from the sound source is higher.
  • the face is represented by determining a rectangular frame of the face, thereby determining the number of faces, the face area, and the face in the image based on the rectangular frame of the face. s position.
  • the specific face detection method involved herein can be obtained by those skilled in the art from the prior art, and therefore will not be described herein.
  • step S3 includes acquiring the number of faces, the face area, and the position of the face in the image in the image data of each camera.
  • FIG. 3 is a schematic flowchart of step S3 of a method for automatically selecting a camera screen according to Embodiment 1 of the present invention. Step S3 specifically includes the following steps:
  • the position of the face in the image is set as the distance between the center position of the face and the position of the center of the current image, and the closer the distance is, the closer the face is to the image center of the camera.
  • the corresponding setting the more the number of faces, the higher the score of the corresponding face; the larger the face area, the higher the face area score The closer the face is in the image to the center of the camera image, the higher the position score of the face in the image.
  • the number of faces, the face area, and the face may be specifically When the position in the image determines the corresponding face number, the face area score, and the position score of the face in the image, the corresponding weight value is set to adjust the face number and the face area respectively.
  • the score and the score of the position score of the face in the image which in turn affects the picture of the camera that is finally selected for display.
  • the embodiment first obtains image data captured by the camera in real time, and then performs face detection on the image data captured by each camera, including detecting the number of faces, the face area, and the face in each image.
  • the position in the middle, and then the face score, the face area score, and the position score of the face in the image; the sum of the above three scores obtained by the image of each camera is used as the camera corresponding First priority; then, acquiring the position of the sound source received by the microphone, thereby obtaining the distance between the sound source position and each camera, and obtaining a corresponding second priority according to the distance between each camera and the sound source position, wherein
  • the smaller the distance of the sound source position, the higher the second priority of the camera the first priority and the second priority are respectively multiplied by the corresponding weight values and then summed to obtain the final priority,
  • the image data of the camera with the highest priority is transmitted to the display screen.
  • the first priority is obtained by using the face information
  • the order of obtaining the second priority by the distance between the sound source and the camera is only an implementation example, and the sequence of the successive operations is performed, or the two steps are performed in parallel.
  • the embodiments are all within the scope of the present embodiment.
  • the embodiment selects the captured image of the displayed camera based on the result of face detection and sound source position, realizes intelligent automatic switching of the camera screen, meets the display requirement of the remote audio and video conference, and reduces manual operation. , more intelligent, automated; improve user experience.
  • FIG. 4 is a schematic structural diagram of an apparatus for automatically selecting a camera screen according to Embodiment 2 of the present invention.
  • the embodiment specifically includes the following structure:
  • the image data acquiring unit 11 is configured to acquire image data of each camera; wherein the acquired image data is a real-time captured screen image of each camera;
  • the face information acquiring unit 12 is configured to perform face detection on image data of each camera to obtain face information in image data of each camera;
  • the first priority acquiring unit 13 is configured to acquire a first priority of each camera according to the acquired face information according to the image data of each camera;
  • the sound source distance obtaining unit 14 is configured to determine, by each microphone, a position of the sound source corresponding to the microphone, so as to obtain a distance between each camera and the sound source;
  • a second priority acquiring unit 15 configured to acquire a second priority of each camera according to a distance between each camera and a sound source position
  • the final priority obtaining unit 16 is configured to obtain a final priority of each camera according to the first priority and the second priority of each camera;
  • the output unit 17 is configured to output image data of the camera with the highest priority for display.
  • the final priority obtaining unit 16 is configured to:
  • the final priority is obtained based on the sum of the first product value and the second product value.
  • the final priority acquiring unit 16 of the present embodiment adopts a method of multiplying the first priority acquired based on the face detection and the second priority acquired based on the sound source location identification by the corresponding weights, and adding the pre-adjustment
  • the first weight and the second weight are set to adjust the proportion of the first priority and the second priority to meet the actual requirements in the remote conference.
  • only the second priority determined by the sound source position may be considered as the final priority according to the demand.
  • the primary consideration is that the first priority obtained based on face detection is a secondary consideration, that is, the display screen can display the screen of the speaker in real time, and when no one speaks, the result is selected based on the result of the face detection. Picture.
  • the display screen can display the screen of the speaker in real time, and when no one speaks, the result is selected based on the result of the face detection. Picture.
  • the above embodiments are also within the scope of the invention.
  • the second priority acquiring unit 15 is configured to sort each camera according to the distance between each camera and the sound source according to the distance between each camera and the sound source position, and obtain the second priority of each camera, and The smaller the distance of the sound source, the higher the second priority obtained by the camera.
  • the second priority acquisition unit 15 makes the second priority level corresponding to the camera closer to the sound source higher, so that the possibility that the screen of the camera closer to the sound source is selected as the display screen is higher.
  • the face information acquiring unit 12 When the face information acquiring unit 12 is configured to perform face detection on the image of the camera, the face is represented by a rectangular frame of the face, and the number of faces and the face area are determined based on the rectangular frame of the face. And the position of the face in the image.
  • the specific face detection method involved herein can be obtained by those skilled in the art from the prior art, and therefore will not be described herein.
  • the face information in the image data of each camera acquired by the face information acquiring unit 12 includes the number of faces, the face area, and the position of the face in the image.
  • the first priority acquiring unit 13 is specifically configured to:
  • the number of faces obtained, the face area, and the position of the face in the image respectively acquire corresponding face number scores, face area scores, and face positions in the image.
  • the position of the face in the image is set as the distance between the center position of the face and the position of the center of the current image, and the closer the distance is, the closer the face is to the image center of the camera.
  • the corresponding setting the more the number of faces, the higher the score of the corresponding face; the larger the face area, the higher the face area score The closer the face is in the image to the center of the camera image, the higher the position score of the face in the image.
  • the image may be specifically based on the number of faces, the face area, and the face.
  • the position in the middle determines the corresponding face number, the face area score, and the position score of the face in the image, and sets the corresponding weight value to adjust the face number and the face area score respectively.
  • the score of the position score of the face in the image which in turn affects the screen of the camera that is finally selected for display.
  • the first priority of each camera is obtained according to the sum of the face number of the image data of each camera, the face area score, and the sum of the position scores of the faces in the image.
  • the image data acquiring unit 11 captures the image data captured by the camera in real time, and then performs face detection on the image data captured by each camera by the face information acquiring unit 12, including detecting each image.
  • the number of faces, the face area, and the position of the face in the image, and then the first priority acquisition unit 13 obtains the score of the face, the face score, and the position score of the face in the image.
  • the sum of the above three scores obtained by the image of each camera is taken as the first priority corresponding to the camera; then, the sound source distance acquiring unit 14 acquires the position of the sound source received by the microphone, thereby obtaining the sound source position and The distance of each camera is obtained by the second priority acquiring unit 15 according to the distance between each camera and the sound source position, wherein the position of the sound source is The smaller the distance, the higher the second priority of the camera; finally, the final priority acquisition unit 16 multiplies the first priority and the second priority by the corresponding weight values for each camera and then sums them to obtain the final priority.
  • the output unit 17 outputs the image data of the camera with the highest priority for display.
  • the first priority is obtained by using the face information, and then the order of obtaining the second priority by the distance between the sound source and the camera is only an implementation example, and the sequence of the successive operations is performed, or the two are performed in parallel.
  • the embodiments of the steps are all within the scope of the protection of the present embodiment.
  • the embodiment selects the captured image of the displayed camera based on the result of face detection and sound source position, realizes intelligent automatic switching of the camera screen, meets the display requirement of the remote audio and video conference, and reduces manual operation. , more intelligent, automated; improve user experience.
  • Embodiment 3 of the present invention further provides an audio and video system.
  • FIG. 5 is a schematic structural diagram of Embodiment 3 of the present invention, where Embodiment 3 includes an automatic selection camera screen provided by Embodiment 2 of the present invention.
  • the third embodiment further includes the following structure:
  • the two cameras 201 and the cameras 201 are respectively mounted on the left and right sides or the upper and lower sides of the smart tablet for capturing the image of the screen in real time; preferably, the two cameras 201 of FIG. 1 are mounted on the left and right sides of the smart tablet.
  • the installation position of the two cameras 201 shown in FIG. 1 in the smart tablet is only one embodiment. Based on the principle of the embodiment of the present invention, only the installation position of the camera 201 on the smart tablet or the camera 201 is added. The number of embodiments is also within the scope of the present invention.
  • the microphone 202 is configured to receive a sound source and determine the position of the received sound source.
  • the image data of the camera 201 is first acquired by the image data acquiring unit 11 , wherein the image of the camera is a screen image captured by the camera 201 in real time and then passed through a face letter.
  • the information acquisition unit 12 performs face detection on the image data captured by each camera 201, including detecting the number of faces in each image, the face area, and the position of the face in the image, and then acquiring the unit through the first priority.
  • the sound source distance acquiring unit 14 obtains the position of the sound source, thereby obtaining the distance between the sound source position and each camera 201, wherein the position of the sound source is the microphone 202 receiving the sound source, and determining the received sound source.
  • the second priority acquisition unit 15 multiplies the first priority and the second priority by the corresponding weight values for each camera 201 and then sums them to obtain the final advantage.
  • the image data of the camera 201 having the highest priority is transmitted from the output unit 17 to the display screen.
  • an audio and video system of the present embodiment can select a captured image of the displayed camera based on the result of face detection and sound source position, and realize intelligent autonomous switching of the camera image to satisfy the remote audio and video conference. Display requirements, reduce manual operations, be more intelligent and automated; and improve user experience.
  • FIG. 6 is a schematic structural diagram of Embodiment 4 of the present invention.
  • the fourth embodiment includes an audio and video system according to Embodiment 3 of the present invention.
  • the content of the audio and video system apparatus according to the third embodiment of the present invention is not described herein.
  • the image data of the camera 201 is first acquired by the image data acquiring unit 11 , wherein the image of the camera is the image captured by the camera 201 in real time and then the image captured by the face information acquiring unit 12 for each camera 201 .
  • the data is subjected to face detection, including detecting the number of faces in each image, the face area, and the position of the face in the image, and then acquiring the number of faces and the face area by the first priority acquiring unit 13.
  • the score and the position score of the face in the image, and the sum of the above three scores obtained by the image of each camera 201 is taken as the first priority corresponding to the camera 201.
  • the sound source distance acquiring unit 14 obtains the position of the sound source, thereby obtaining the distance between the sound source position and each camera 201, wherein the position of the sound source is the microphone 202 receiving the sound source, and determining the received sound source.
  • the image data of the camera 201 having the highest priority is transmitted to the display screen.
  • the smart tablet of the embodiment can select the captured image of the displayed camera based on the result of the face detection and the sound source position, and realize the intelligent autonomous switching of the camera image to meet the display of the remote audio and video conference.

Abstract

Disclosed is a method for automatically selecting a camera image, comprising: obtaining image data of each camera; performing face detection on the image data of each camera to obtain face information in the image data of each camera; obtaining a first priority of each camera according to the face information; obtaining the location of a sound source by means of each microphone to obtain the distance between each camera and the sound source; obtaining a second priority of each camera according to the distance between each camera and the sound source; obtaining a final priority according to the first priority and the second priority of each camera; and outputting the image data of the camera having the highest final priority for display. Correspondingly, the present invention further provides a device for automatically selecting a camera image, an audio and video system and a smart tablet. According to the present invention, during audio and video conferences, the images photographed by the cameras are automatically selected for display, and thus the user experience is improved.

Description

自动选择摄像头画面的方法、装置及音视频系统Method, device and audio and video system for automatically selecting camera screen 技术领域Technical field
本发明涉及通信技术领域,尤其涉及一种自动选择摄像头画面的方法和装置、音视频系统及智能平板。The present invention relates to the field of communications technologies, and in particular, to a method and apparatus for automatically selecting a camera screen, an audio and video system, and a smart tablet.
背景技术Background technique
随着科技的发展,大尺寸智能平板(例如50寸以上)已经越来越多的应用于教育、会议等领域。在会议领域,智能平板一些品牌存在多个摄像头的配置,如图1所示,在智能平板的左侧和右侧各安装有1个摄像头201。在进行远程会议时,通常都是固定(左边或者右边)的摄像头或者手动选择其中一个摄像头来获取图像数据,从而进行远程的音视频会议。With the development of technology, large-sized smart tablets (for example, 50 inches or more) have been increasingly used in education, conferences and other fields. In the conference area, some brands of smart tablets have multiple camera configurations. As shown in FIG. 1, one camera 201 is installed on each of the left and right sides of the smart tablet. When performing a remote conference, it is usually a fixed (left or right) camera or manually select one of the cameras to acquire image data for remote audio and video conferences.
尤其,在多人多方远程音视频会议中,会议中的与会人员可能在会议室的不同的位置,如果单纯选择单个摄像头并不能获取所有与会人员的图像数据。通常的作法,人们根据不同的会议情形来手动切换摄像头以选择对应的会议场景,来满足人们对不同会议显示画面的需求。现有的这种切换摄像头的方式,需要人工干预,用户体验较差。In particular, in a multi-person multi-party remote audio and video conference, the participants in the conference may be in different positions of the conference room, and if only a single camera is selected, the image data of all participants may not be acquired. In general, people manually switch cameras according to different conference situations to select corresponding conference scenes to meet people's needs for different conference display screens. The existing method of switching cameras requires manual intervention and the user experience is poor.
发明内容Summary of the invention
本发明实施例的目的是提供一种自动选择摄像头画面的方法和装置、以及音视频系统,在音视频会议中,能够自动选择摄像头画面,使用户更好地了解会议情况,提升用户体验。An object of the embodiments of the present invention is to provide a method and apparatus for automatically selecting a camera screen, and an audio and video system. In an audio and video conference, a camera screen can be automatically selected, so that the user can better understand the conference situation and improve the user experience.
为实现上述目的,本发明实施例提供一种自动选择摄像头画面的方法,包括:To achieve the above objective, an embodiment of the present invention provides a method for automatically selecting a camera screen, including:
获取每一摄像头的图像数据;其中,获取的所述图像数据为所述每一摄像头 的实时拍摄的画面图像;Obtaining image data of each camera; wherein the acquired image data is each of the cameras Real-time shooting of the screen image;
对所述每一摄像头的图像数据进行人脸检测,以获取所述每一摄像头的图像数据中的人脸信息;Performing face detection on the image data of each camera to obtain face information in the image data of each camera;
根据所述每一摄像头的图像数据对应获取的所述人脸信息,获取每一所述摄像头的第一优先级;Acquiring, according to the image data of each camera, the first priority of each of the cameras;
通过每一麦克风确定所述麦克风所对应接收的声源的位置,从而获取每一所述摄像头与所述声源的距离;Determining, by each microphone, a position of a sound source corresponding to the microphone, thereby obtaining a distance between each of the cameras and the sound source;
根据所述每一摄像头与所述声源位置的距离,获取每一所述摄像头的第二优先级;Obtaining a second priority of each of the cameras according to a distance between each camera and the sound source position;
根据所述每一摄像头的所述第一优先级和所述第二优先级获取所述每一摄像头的最终优先级;Obtaining a final priority of each camera according to the first priority and the second priority of each camera;
将所述最终优先级最高的所述摄像头的图像数据输出以用于显示。The image data of the camera with the highest final priority is output for display.
与现有技术相比,本发明实施例提供的一种自动选择摄像头画面的方法通过首先获取每一摄像头的图像数据;然后对图像数据进行人脸检测以获取人脸信息,根据所述人脸信息来获取每一所述摄像头的第一优先级;接着,通过每一麦克风来确定声源的位置,根据所述每一摄像头与所述声源位置的距离,获取每一所述摄像头的第二优先级;最后根据所述每一摄像头的所述第一优先级和所述第二优先级获取所述每一摄像头的最终优先级,将所述最终优先级最高的所述摄像头的图像数据输出以用于显示的技术方案,实现了在远程音视频会议中,针对视频会议的需求,基于检测人脸和获取语音位置来自动选择摄像头拍摄的图像进行显示,实现智能实时自动切换摄像头画面,使用户更好地了解会议情况,提升用户体验。Compared with the prior art, a method for automatically selecting a camera screen is provided by first acquiring image data of each camera; then performing face detection on the image data to obtain face information, according to the face Information to obtain a first priority of each of the cameras; then, determining a position of the sound source by each microphone, and acquiring a number of each of the cameras according to a distance between each camera and the position of the sound source Second priority; finally obtaining the final priority of each camera according to the first priority and the second priority of each camera, and the image data of the camera with the highest final priority The output is used for display technology, and realizes the requirement of video conference in the remote audio and video conference, and automatically selects the image captured by the camera for displaying based on detecting the face and acquiring the voice position, thereby realizing intelligent automatic switching of the camera screen in real time. Enable users to better understand the situation of the meeting and enhance the user experience.
优选的,所述根据所述每一摄像头的所述第一优先级和所述第二优先级获取所述每一摄像头的最终优先级具体包括: Preferably, the obtaining the final priority of each camera according to the first priority and the second priority of each camera specifically includes:
对于每一所述摄像头,将所述第一优先级乘以预置第一权重得到第一乘积值,将所述第二优先级乘以预置第二权重得到第二乘积值;For each of the cameras, multiplying the first priority by a preset first weight to obtain a first product value, and multiplying the second priority by a preset second weight to obtain a second product value;
根据所述第一乘积值和所述第二乘积值的和值得到最终优先级。A final priority is obtained based on a sum of the first product value and the second product value.
作为本发明实施例的优选方案,采用了对第一优先权和的第二优先权分别乘以预置权重值再进行求和的方式来获取最终优先级,能够实现同时考虑人脸检测的识别结果和声源定位结果,同时,该优选方案能够通过调整第一权重和第二权重,来适应用户对会议显示画面的需求,可适应性强,用户体验高。As a preferred solution of the embodiment of the present invention, the first priority and the second priority are respectively multiplied by the preset weight value and then summed to obtain the final priority, which can simultaneously recognize the face detection. The result and the sound source localization result. At the same time, the preferred solution can adapt the user's demand for the conference display screen by adjusting the first weight and the second weight, and has strong adaptability and high user experience.
进一步的,所述根据所述每一摄像头与所述声源位置距离,获取每一所述摄像头的第二优先级包括:Further, according to the distance between each camera and the sound source, acquiring the second priority of each camera includes:
根据所述每一摄像头与所述声源位置距离的大小,对所述每一摄像头进行排序,与所述声源的距离越小的所述摄像头对应获取的第二优先级越高。Each camera is sorted according to the distance between each camera and the sound source, and the second priority obtained by the camera corresponding to the distance from the sound source is higher.
作为本发明实施例的进一步方案,设置离声源越近的摄像头的第二优先级越高,使得远程音频时,较多对发言人发言的画面进行显示。As a further aspect of the embodiment of the present invention, the higher the second priority of the camera that is disposed closer to the sound source, the more the screen of the speaker's speech is displayed when the remote audio is used.
进一步的,所述人脸信息包括人脸个数、人脸面积以及人脸在图像中的位置。Further, the face information includes a number of faces, a face area, and a position of the face in the image.
进一步的,所述根据所述每一摄像头的图像数据对应获取的所述人脸信息,获取每一所述摄像头图像数据的第一优先级具体包括:Further, the obtaining the first priority of each of the camera image data according to the image information that is obtained by the image data of each camera includes:
根据所述每一摄像头的图像数据对应获取的所述人脸个数、所述人脸面积以及所述人脸在图像中的位置分别获取对应的人脸个数分值、人脸面积分值以及人脸在图像中的位置分值;Obtaining, according to the image data of each camera, the number of the faces, the face area, and the position of the face in the image, respectively, the corresponding face number and the face area score And the position score of the face in the image;
根据所述每一摄像头的图像数据的所述人脸个数分值、所述人脸面积分值和所述人脸在图像中的位置分值的和值的大小,获取所述每一摄像头的所述第一优先级。Acquiring each camera according to the sum of the face number of the image data of the camera, the face area score, and the sum of the position scores of the face in the image. The first priority.
本发明实施例还提供一种自动选择摄像头画面的装置,包括: An embodiment of the present invention further provides an apparatus for automatically selecting a camera screen, including:
图像数据获取单元,用于获取每一摄像头的图像数据;其中,获取的所述图像数据为所述每一摄像头的实时拍摄的画面图像;An image data acquiring unit, configured to acquire image data of each camera; wherein the acquired image data is a real-time captured screen image of each camera;
人脸信息获取单元,用于对所述每一摄像头的图像数据进行人脸检测,以获取所述每一摄像头的图像数据中的人脸信息;a face information acquiring unit, configured to perform face detection on the image data of each camera to obtain face information in the image data of each camera;
第一优先级获取单元,用于根据所述每一摄像头的图像数据对应获取的所述人脸信息,获取每一所述摄像头的第一优先级;a first priority acquiring unit, configured to acquire, according to the image information acquired by the image data of each camera, a first priority of each of the cameras;
声源距离获取单元,用于通过每一麦克风确定所述麦克风所对应接收的声源的位置,从而获取每一所述摄像头与所述声源的距离;a sound source distance obtaining unit, configured to determine, by each microphone, a position of a sound source corresponding to the microphone, so as to obtain a distance between each of the cameras and the sound source;
第二优先级获取单元,用于根据所述每一摄像头与所述声源位置的距离,获取每一所述摄像头的第二优先级;a second priority acquiring unit, configured to acquire a second priority of each of the cameras according to a distance between each camera and the sound source position;
最终优先级获取单元,用于根据所述每一摄像头的所述第一优先级和所述第二优先级获取所述每一摄像头的最终优先级;a final priority acquiring unit, configured to acquire a final priority of each camera according to the first priority and the second priority of each camera;
输出单元,用于将所述最终优先级最高的所述摄像头的图像数据输出以用于显示。And an output unit, configured to output image data of the camera with the highest final priority for display.
与现有技术相比,本发明实施例提供的一种自动选择摄像头画面的装置,首先通过图像数据获取单元获取每一摄像头的图像数据;然后通过人脸信息获取单元对图像数据进行人脸检测以获取人脸信息,并通过第一优先级获取单元根据所述人脸信息来获取每一所述摄像头的第一优先级;接着通过声源距离获取单元每一麦克风来确定声源的位置,并通过第二优先级获取单元根据所述每一摄像头与所述声源位置的距离,获取每一所述摄像头的第二优先级;最后通过最终优先级获取单元根据所述每一摄像头的所述第一优先级和所述第二优先级获取所述每一摄像头的最终优先级,并由输出单元将所述最终优先级最高的所述摄像头的图像数据输出以用于显示的技术方案,实现了在远程音视频会议中,针对视频会议的需求,基于检测人脸和获取语音位置来自动选择摄像头拍摄的图像进行显示, 实现智能实时自动切换摄像头画面,使用户更好地了解会议情况,提升用户体验。Compared with the prior art, an apparatus for automatically selecting a camera screen according to an embodiment of the present invention first acquires image data of each camera by an image data acquiring unit; and then performs face detection on the image data by using a face information acquiring unit. Obtaining the face information, and acquiring, by the first priority acquiring unit, the first priority of each of the cameras according to the face information; and then determining the position of the sound source by using each microphone of the sound source distance acquiring unit, And obtaining, by the second priority acquiring unit, a second priority of each of the cameras according to the distance between each camera and the sound source position; and finally, according to the final priority acquiring unit, according to each camera The first priority and the second priority acquire the final priority of each camera, and the image data of the camera with the highest priority is output by the output unit for the technical solution for display, Realized in the remote audio and video conference, for the needs of video conferencing, based on detecting the face and acquiring the voice position to automatically select An image captured by the camera display, Realize intelligent real-time automatic switching of the camera screen, so that users can better understand the meeting situation and enhance the user experience.
进一步的,所述最终优先级获取单元具体用于:Further, the final priority acquiring unit is specifically configured to:
对于每一所述摄像头,将所述第一优先级乘以预置第一权重得到第一乘积值,将所述第二优先级乘以预置第二权重得到第二乘积值;For each of the cameras, multiplying the first priority by a preset first weight to obtain a first product value, and multiplying the second priority by a preset second weight to obtain a second product value;
根据所述第一乘积值和所述第二乘积值的和值得到最终优先级。A final priority is obtained based on a sum of the first product value and the second product value.
进一步的,所述第二优先级获取单元用于根据所述每一摄像头与所述声源位置距离,获取每一所述摄像头的第二优先级时,根据所述每一摄像头与所述声源位置距离的大小,对所述每一摄像头进行排序,与所述声源的距离越小的所述摄像头对应获取的第二优先级越高。Further, the second priority acquiring unit is configured to acquire, according to the distance between each camera and the sound source, a second priority of each of the cameras, according to each camera and the sound The size of the source location distance is sorted for each camera, and the second distance that the camera obtains correspondingly is smaller as the distance from the sound source is smaller.
相应的,本发明实施例还提供一种音视频系统,包括本发明实施例提供的一种自动选择摄像头画面的装置,还包括:Correspondingly, the embodiment of the present invention further provides an audio and video system, including an apparatus for automatically selecting a camera screen, which is provided by the embodiment of the present invention, and further includes:
至少2个摄像头,所述摄像头分别安装于智能平板的左右两侧或上下两侧,用于实时拍摄画面图像;At least two cameras, the cameras are respectively mounted on the left and right sides or the upper and lower sides of the smart tablet, for real-time shooting of the image of the screen;
麦克风,用于接收声源,并确定接收的所述声源的位置。a microphone for receiving a sound source and determining the location of the received sound source.
与现有技术相比,本发明实施例提供的一种音视频系统,一方面通过所述摄像头实时拍摄画面图像,通过所述麦克风接收声源,并确定声源的位置;另一方面通过本发明实施例公开的自动选择摄像头画面的装置,获取所述摄像头实时拍摄的图像数据,对图像数据进行人脸检测获取人脸信息以获取第一优先级,基于所述麦克风所确定的声源位置与摄像头的距离来获取第二优先级,根据所述每一摄像头的所述第一优先级和所述第二优先级获取所述每一摄像头的最终优先级,将所述最终优先级最高的所述摄像头的图像数据输出以用于显示。上述技术方案,实现了在远程音视频会议中,针对视频会议的需求,基于检测人脸和获取语音位置来自动选择摄像头拍摄的图像进行显示,实现智能实时自动切换摄像头画面,使用户更好地了解会议情况,提升用户体验。 Compared with the prior art, an audio and video system provided by an embodiment of the present invention, on the one hand, captures a picture image in real time through the camera, receives a sound source through the microphone, and determines a position of the sound source; The device for automatically selecting a camera screen disclosed in the embodiment of the invention acquires image data captured by the camera in real time, performs face detection on the image data to obtain face information to obtain a first priority, and determines a sound source position based on the microphone. a distance from the camera to obtain a second priority, and obtaining a final priority of each camera according to the first priority and the second priority of each camera, and the final priority is the highest The image data of the camera is output for display. The above technical solution realizes that, in the remote audio and video conference, for the requirement of the video conference, the image captured by the camera is automatically selected and displayed based on detecting the face and acquiring the voice position, thereby realizing intelligent automatic switching of the camera screen in real time, so that the user is better. Understand the situation of the meeting and enhance the user experience.
相应的,本发明实施例还提供一种智能平板,包括本发明实施例提供的一种音视频系统。Correspondingly, the embodiment of the invention further provides a smart tablet, which comprises an audio and video system provided by an embodiment of the invention.
与现有技术相比,本发明实施例提供的一种智能平板,一方面通过所述摄像头实时拍摄画面图像,通过所述麦克风接收声源,并确定声源的位置;另一方面通过本发明实施例公开的自动选择摄像头画面的装置,获取所述摄像头实时拍摄的图像数据,对图像数据进行人脸检测获取人脸信息以获取第一优先级,基于所述麦克风所确定的声源位置与摄像头的距离来获取第二优先级,根据所述每一摄像头的所述第一优先级和所述第二优先级获取所述每一摄像头的最终优先级,将所述最终优先级最高的所述摄像头的图像数据输出以用于显示。上述技术方案,实现了在远程音视频会议中,针对视频会议的需求,基于检测人脸和获取语音位置来自动选择摄像头拍摄的图像进行显示,实现智能实时自动切换摄像头画面,使用户更好地了解会议情况,提升用户体验。Compared with the prior art, a smart tablet provided by an embodiment of the present invention, on the one hand, captures a picture image in real time through the camera, receives a sound source through the microphone, and determines a position of the sound source; The device for automatically selecting a camera screen disclosed in the embodiment acquires image data captured by the camera in real time, performs face detection on the image data to acquire face information to obtain a first priority, and based on the position of the sound source determined by the microphone The distance of the camera is used to obtain a second priority, and the final priority of each camera is obtained according to the first priority and the second priority of each camera, and the highest priority is obtained. The image data of the camera is output for display. The above technical solution realizes that, in the remote audio and video conference, for the requirement of the video conference, the image captured by the camera is automatically selected and displayed based on detecting the face and acquiring the voice position, thereby realizing intelligent automatic switching of the camera screen in real time, so that the user is better. Understand the situation of the meeting and enhance the user experience.
附图说明DRAWINGS
图1是一种配置有两个摄像头的大尺寸智能平板的结构示意图;1 is a schematic structural view of a large-sized smart tablet configured with two cameras;
图2是本发明实施例一提供的一种自动选择摄像头画面的方法的流程示意图;2 is a schematic flow chart of a method for automatically selecting a camera screen according to Embodiment 1 of the present invention;
图3是本发明实施例一提供的一种自动选择摄像头画面的方法的步骤S3的流程示意图;3 is a schematic flowchart of step S3 of a method for automatically selecting a camera screen according to Embodiment 1 of the present invention;
图4是本发明实施例二提供的一种自动选择摄像头画面的装置的结构示意图;4 is a schematic structural diagram of an apparatus for automatically selecting a camera screen according to Embodiment 2 of the present invention;
图5是本发明实施例三提供的一种音视频系统的结构示意图;FIG. 5 is a schematic structural diagram of an audio and video system according to Embodiment 3 of the present invention; FIG.
图6是本发明实施例四提供的一种智能平板的结构示意图。 FIG. 6 is a schematic structural diagram of a smart tablet according to Embodiment 4 of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
参见图2,图2是本发明实施例一提供的一种自动选择摄像头画面的方法的流程示意图,包括:Referring to FIG. 2, FIG. 2 is a schematic flowchart of a method for automatically selecting a camera screen according to Embodiment 1 of the present invention, including:
S1、获取每一摄像头的图像数据;其中,获取的图像数据为每一摄像头的实时拍摄的画面图像;S1: acquiring image data of each camera; wherein the acquired image data is a real-time captured screen image of each camera;
S2、对每一摄像头的图像数据进行人脸检测,以获取每一摄像头的图像数据中的人脸信息;S2, performing face detection on image data of each camera to obtain face information in image data of each camera;
S3、根据每一摄像头的图像数据对应获取的人脸信息,获取每一摄像头的第一优先级;S3. Acquire the first priority of each camera according to the acquired face information according to the image data of each camera.
S4、通过每一麦克风确定麦克风所对应接收的声源的位置,从而获取每一摄像头与声源的距离;S4. Determine, by using each microphone, a position of a sound source corresponding to the microphone, so as to obtain a distance between each camera and the sound source;
S5、根据每一摄像头与声源位置的距离,获取每一摄像头的第二优先级;S5. Obtain a second priority of each camera according to a distance between each camera and a sound source position.
S6、根据每一摄像头的第一优先级和第二优先级获取每一摄像头的最终优先级;S6. Acquire a final priority of each camera according to the first priority and the second priority of each camera.
S7、将最终优先级最高的摄像头的图像数据输出以用于显示。S7. Output image data of the camera with the highest priority for display.
其中,步骤S1中获取的摄像头实时拍摄的图像数据,会与每一摄像头一一对应。The image data captured by the camera acquired in step S1 in real time corresponds to each camera one by one.
进一步的,步骤S6获取每一摄像头的最终优先级的具体计算过程为:Further, the specific calculation process of obtaining the final priority of each camera in step S6 is:
对于每一摄像头,将第一优先级乘以预置第一权重得到第一乘积值,将第二 优先级乘以预置第二权重得到第二乘积值;For each camera, multiplying the first priority by the preset first weight to obtain the first product value, and the second Multiplying the priority by the preset second weight to obtain a second product value;
根据第一乘积值和第二乘积值的和值得到最终优先级。The final priority is obtained based on the sum of the first product value and the second product value.
本实施例的步骤S6对基于人脸检测获取的第一优先级以及基于声源位置识别获取的第二优先级采取分别乘以对应的权重后相加的方式,能够通过调整预置第一权重和第二权重的来调整第一优先级和第二优先级各占的比重,满足远程会议中的实际需求。另外,对于确定最终优先级的实施方式除上述采用比例值融合相加的优选实施方式,在实际生活中,还可以根据需求,只侧重考虑声源位置所确定的第二优先级作为最终优先级的首要考虑因素,而基于人脸检测所获取的第一优先级作为次要考虑因素,即实现显示画面能够实时显示发言人的画面,当无人发言时,基于人脸检测的结果选择显示的画面。上述实施方式也在本发明的保护范围之内。In step S6 of the embodiment, the first priority obtained based on the face detection and the second priority obtained based on the sound source location identification are respectively multiplied by the corresponding weights and added, and the preset first weight can be adjusted. And the second weight to adjust the proportion of the first priority and the second priority, to meet the actual needs in the teleconference. In addition, for the preferred embodiment for determining the final priority, in addition to the preferred embodiment using the proportional value fusion addition described above, in actual life, only the second priority determined by the sound source position may be considered as the final priority according to the demand. The primary consideration is that the first priority obtained based on face detection is a secondary consideration, that is, the display screen can display the screen of the speaker in real time, and when no one speaks, the result is selected based on the result of the face detection. Picture. The above embodiments are also within the scope of the invention.
步骤S5具体包括:Step S5 specifically includes:
根据每一摄像头与声源位置距离的大小,对每一摄像头进行排序,与声源的距离越小的摄像头对应获取的第二优先级越高。According to the distance between each camera and the sound source, each camera is sorted, and the second priority obtained by the camera corresponding to the smaller distance from the sound source is higher.
通过步骤S5使离声源越近的摄像头对应获取的第二优先级越高,从而使得离声源越近的摄像头的画面被选择作为显示画面显示的可能性越高。The higher the second priority level corresponding to the camera closer to the sound source is, the higher the possibility that the screen of the camera closer to the sound source is selected as the display screen display is higher by the step S5.
其中,在步骤S2中摄像头的图像进行人脸检测时,通过确定人脸外围矩形框来表示人脸,从而基于人脸外围矩形框来确定人脸个数、人脸面积以及人脸在图像中的位置。这里涉及的具体的人脸检测方法,为本领域技术人员能从现有技术中获取,所以此处不再做赘述。Wherein, when the image of the camera is used for face detection in step S2, the face is represented by determining a rectangular frame of the face, thereby determining the number of faces, the face area, and the face in the image based on the rectangular frame of the face. s position. The specific face detection method involved herein can be obtained by those skilled in the art from the prior art, and therefore will not be described herein.
步骤S3中获取每一摄像头的图像数据中的人脸信息包括获取每一摄像头的图像数据中的人脸个数、人脸面积以及人脸在图像中的位置。对应的,参见图3,图3是本发明实施例一提供的一种自动选择摄像头画面的方法的步骤S3的流程示意图,步骤S3具体包括以下步骤: Obtaining the face information in the image data of each camera in step S3 includes acquiring the number of faces, the face area, and the position of the face in the image in the image data of each camera. Correspondingly, referring to FIG. 3, FIG. 3 is a schematic flowchart of step S3 of a method for automatically selecting a camera screen according to Embodiment 1 of the present invention. Step S3 specifically includes the following steps:
S31、根据每一摄像头的图像数据对应获取的人脸个数、人脸面积以及人脸在图像中的位置分别获取对应的人脸个数分值、人脸面积分值以及人脸在图像中的位置分值;S31. Acquire corresponding face number, face area score, and face in the image according to the image number of each camera, the face area, and the position of the face in the image. Location score
优选的,人脸在图像中的位置设定为人脸的中心位置与当前图像的中心的位置的距离,距离越近,则说明人脸越靠近摄像头的图像中心。通常,为了使显示画面能更好展现会议场面,则对应设定:人脸个数越多,对应的人脸个数分值越高;人脸面积越大,则人脸面积分值越高;人脸在图像中的位置越靠近摄像头图像中心,则人脸在图像中的位置分值越高。Preferably, the position of the face in the image is set as the distance between the center position of the face and the position of the center of the current image, and the closer the distance is, the closer the face is to the image center of the camera. Generally, in order to make the display screen better display the meeting scene, the corresponding setting: the more the number of faces, the higher the score of the corresponding face; the larger the face area, the higher the face area score The closer the face is in the image to the center of the camera image, the higher the position score of the face in the image.
进一步优选,为满足实际显示会议画面中对人脸个数、人脸面积以及人脸在图像中的位置的不同考虑,还可以具体通过在根据人脸个数、所述人脸面积以及人脸在图像中的位置确定对应的人脸个数分值、人脸面积分值以及人脸在图像中的位置分值时,设置对应的权重值来分别调整人脸个数分值、人脸面积分值以及人脸在图像中的位置分值的分值大小,进而影响最终选择显示的摄像头的画面。Further preferably, in order to satisfy different considerations of the number of faces, the face area, and the position of the face in the image in the actual display conference screen, the number of faces, the face area, and the face may be specifically When the position in the image determines the corresponding face number, the face area score, and the position score of the face in the image, the corresponding weight value is set to adjust the face number and the face area respectively. The score and the score of the position score of the face in the image, which in turn affects the picture of the camera that is finally selected for display.
S32、根据每一摄像头的图像数据的人脸个数分值、人脸面积分值和人脸在图像中的位置分值的和值的大小,获取每一摄像头的第一优先级。S32. Acquire a first priority of each camera according to the sum of the face number of the image data of each camera, the face area score, and the sum of the position scores of the faces in the image.
具体实施时,本实施例首先获取摄像头实时拍摄的图像数据,然后对每一摄像头拍摄的图像数据进行人脸检测,包括检测每一图像中的人脸个数、人脸面积和人脸在图像中的位置,进而获取人脸个数分值、人脸面积分值和人脸在图像中的位置分值;将每一摄像头的图像获得的上述三个分值的和值作为该摄像头对应的第一优先级;接着,获取麦克风所接收声源的位置,进而获取声源位置与每一摄像头的距离,根据每一摄像头与声源位置的距离来获取对应的第二优先级,其中,与声源位置的距离越小,摄像头的第二优先级越高;最后,对于每一摄像头,将第一优先级和第二优先级分别乘以对应的权重值然后求和得到最终优先级,将最终优先级最高的摄像头的图像数据传输到显示画面。 In a specific implementation, the embodiment first obtains image data captured by the camera in real time, and then performs face detection on the image data captured by each camera, including detecting the number of faces, the face area, and the face in each image. The position in the middle, and then the face score, the face area score, and the position score of the face in the image; the sum of the above three scores obtained by the image of each camera is used as the camera corresponding First priority; then, acquiring the position of the sound source received by the microphone, thereby obtaining the distance between the sound source position and each camera, and obtaining a corresponding second priority according to the distance between each camera and the sound source position, wherein The smaller the distance of the sound source position, the higher the second priority of the camera; finally, for each camera, the first priority and the second priority are respectively multiplied by the corresponding weight values and then summed to obtain the final priority, The image data of the camera with the highest priority is transmitted to the display screen.
本实施例先通过人脸信息来获取第一优先级,然后通过声源位置与摄像头距离来获取第二优先级的顺序只是一种实施示例,调换先后进行的顺序,或并行进行上述两个步骤的实施例都在本实施例的保护范围之内。In this embodiment, the first priority is obtained by using the face information, and then the order of obtaining the second priority by the distance between the sound source and the camera is only an implementation example, and the sequence of the successive operations is performed, or the two steps are performed in parallel. The embodiments are all within the scope of the present embodiment.
与现有技术相比,本实施例基于人脸检测和声源位置的结果来选取所显示的摄像头的拍摄的画面,实现智能自主切换摄像头画面,满足远程音视频会议的显示需求,减少人工操作,更加智能化、自动化;提高用户体验。Compared with the prior art, the embodiment selects the captured image of the displayed camera based on the result of face detection and sound source position, realizes intelligent automatic switching of the camera screen, meets the display requirement of the remote audio and video conference, and reduces manual operation. , more intelligent, automated; improve user experience.
参见图4,图4是本发明实施例二提供的一种自动选择摄像头画面的装置的结构示意图,本实施例具体包括以下结构:Referring to FIG. 4, FIG. 4 is a schematic structural diagram of an apparatus for automatically selecting a camera screen according to Embodiment 2 of the present invention. The embodiment specifically includes the following structure:
图像数据获取单元11,用于获取每一摄像头的图像数据;其中,获取的图像数据为每一摄像头的实时拍摄的画面图像;The image data acquiring unit 11 is configured to acquire image data of each camera; wherein the acquired image data is a real-time captured screen image of each camera;
人脸信息获取单元12,用于对每一摄像头的图像数据进行人脸检测,以获取每一摄像头的图像数据中的人脸信息;The face information acquiring unit 12 is configured to perform face detection on image data of each camera to obtain face information in image data of each camera;
第一优先级获取单元13,用于根据每一摄像头的图像数据对应获取的人脸信息,获取每一摄像头的第一优先级;The first priority acquiring unit 13 is configured to acquire a first priority of each camera according to the acquired face information according to the image data of each camera;
声源距离获取单元14,用于通过每一麦克风确定麦克风所对应接收的声源的位置,从而获取每一摄像头与声源的距离;The sound source distance obtaining unit 14 is configured to determine, by each microphone, a position of the sound source corresponding to the microphone, so as to obtain a distance between each camera and the sound source;
第二优先级获取单元15,用于根据每一摄像头与声源位置的距离,获取每一摄像头的第二优先级;a second priority acquiring unit 15 configured to acquire a second priority of each camera according to a distance between each camera and a sound source position;
最终优先级获取单元16,用于根据每一摄像头的第一优先级和第二优先级获取每一摄像头的最终优先级;The final priority obtaining unit 16 is configured to obtain a final priority of each camera according to the first priority and the second priority of each camera;
输出单元17,用于将最终优先级最高的摄像头的图像数据输出以用于显示。The output unit 17 is configured to output image data of the camera with the highest priority for display.
具体的,最终优先级获取单元16用于: Specifically, the final priority obtaining unit 16 is configured to:
对于每一摄像头,将第一优先级乘以预置第一权重得到第一乘积值,将第二优先级乘以预置第二权重得到第二乘积值;For each camera, multiplying the first priority by the preset first weight to obtain a first product value, and multiplying the second priority by the preset second weight to obtain a second product value;
根据第一乘积值和第二乘积值的和值得到最终优先级。The final priority is obtained based on the sum of the first product value and the second product value.
本实施例的最终优先级获取单元16对基于人脸检测获取的第一优先级以及基于声源位置识别获取的第二优先级采取分别乘以对应的权重后相加的方式,能够通过调整预置第一权重和第二权重的来调整第一优先级和第二优先级各占的比重,满足远程会议中的实际需求。另外,对于确定最终优先级的实施方式除上述采用比例值融合相加的优选实施方式,在实际生活中,还可以根据需求,只侧重考虑声源位置所确定的第二优先级作为最终优先级的首要考虑因素,而基于人脸检测所获取的第一优先级作为次要考虑因素,即实现显示画面能够实时显示发言人的画面,当无人发言时,基于人脸检测的结果选择显示的画面。上述实施方式也在本发明的保护范围之内。The final priority acquiring unit 16 of the present embodiment adopts a method of multiplying the first priority acquired based on the face detection and the second priority acquired based on the sound source location identification by the corresponding weights, and adding the pre-adjustment The first weight and the second weight are set to adjust the proportion of the first priority and the second priority to meet the actual requirements in the remote conference. In addition, for the preferred embodiment for determining the final priority, in addition to the preferred embodiment using the proportional value fusion addition described above, in actual life, only the second priority determined by the sound source position may be considered as the final priority according to the demand. The primary consideration is that the first priority obtained based on face detection is a secondary consideration, that is, the display screen can display the screen of the speaker in real time, and when no one speaks, the result is selected based on the result of the face detection. Picture. The above embodiments are also within the scope of the invention.
第二优先级获取单元15用于根据每一摄像头与声源位置距离,获取每一摄像头的第二优先级时,根据每一摄像头与声源位置距离的大小,对每一摄像头进行排序,与声源的距离越小的摄像头对应获取的第二优先级越高。The second priority acquiring unit 15 is configured to sort each camera according to the distance between each camera and the sound source according to the distance between each camera and the sound source position, and obtain the second priority of each camera, and The smaller the distance of the sound source, the higher the second priority obtained by the camera.
通过第二优先级获取单元15使离声源越近的摄像头对应获取的第二优先级越高,从而使得离声源越近的摄像头的画面被选择作为显示画面显示的可能性越高。The second priority acquisition unit 15 makes the second priority level corresponding to the camera closer to the sound source higher, so that the possibility that the screen of the camera closer to the sound source is selected as the display screen is higher.
其中,在人脸信息获取单元12用于对摄像头的图像进行人脸检测时,通过确定人脸外围矩形框来表示人脸,从而基于人脸外围矩形框来确定人脸个数、人脸面积以及人脸在图像中的位置。这里涉及的具体的人脸检测方法,为本领域技术人员能从现有技术中获取,所以此处不再做赘述。When the face information acquiring unit 12 is configured to perform face detection on the image of the camera, the face is represented by a rectangular frame of the face, and the number of faces and the face area are determined based on the rectangular frame of the face. And the position of the face in the image. The specific face detection method involved herein can be obtained by those skilled in the art from the prior art, and therefore will not be described herein.
人脸信息获取单元12获取的每一摄像头的图像数据中的人脸信息包括人脸个数、人脸面积以及人脸在图像中的位置。 The face information in the image data of each camera acquired by the face information acquiring unit 12 includes the number of faces, the face area, and the position of the face in the image.
第一优先级获取单元13具体用于:The first priority acquiring unit 13 is specifically configured to:
根据每一摄像头的图像数据对应获取的人脸个数、人脸面积以及人脸在图像中的位置分别获取对应的人脸个数分值、人脸面积分值以及人脸在图像中的位置分值;According to the image data of each camera, the number of faces obtained, the face area, and the position of the face in the image respectively acquire corresponding face number scores, face area scores, and face positions in the image. Score
优选的,人脸在图像中的位置设定为人脸的中心位置与当前图像的中心的位置的距离,距离越近,则说明人脸越靠近摄像头的图像中心。通常,为了使显示画面能更好展现会议场面,则对应设定:人脸个数越多,对应的人脸个数分值越高;人脸面积越大,则人脸面积分值越高;人脸在图像中的位置越靠近摄像头图像中心,则人脸在图像中的位置分值越高。Preferably, the position of the face in the image is set as the distance between the center position of the face and the position of the center of the current image, and the closer the distance is, the closer the face is to the image center of the camera. Generally, in order to make the display screen better display the meeting scene, the corresponding setting: the more the number of faces, the higher the score of the corresponding face; the larger the face area, the higher the face area score The closer the face is in the image to the center of the camera image, the higher the position score of the face in the image.
进一步优选,为满足实际显示会议画面中对人脸个数、人脸面积以及人脸在图像中的位置的不同考虑,还可以具体通过在根据人脸个数、人脸面积以及人脸在图像中的位置确定对应的人脸个数分值、人脸面积分值以及人脸在图像中的位置分值时,设置对应的权重值来分别调整人脸个数分值、人脸面积分值以及人脸在图像中的位置分值的分值大小,进而影响最终选择显示的摄像头的画面。Further preferably, in order to satisfy different considerations of the number of faces, the face area, and the position of the face in the image in the actual display conference screen, the image may be specifically based on the number of faces, the face area, and the face. The position in the middle determines the corresponding face number, the face area score, and the position score of the face in the image, and sets the corresponding weight value to adjust the face number and the face area score respectively. And the score of the position score of the face in the image, which in turn affects the screen of the camera that is finally selected for display.
根据每一摄像头的图像数据的人脸个数分值、人脸面积分值和人脸在图像中的位置分值的和值的大小,获取每一摄像头的第一优先级。The first priority of each camera is obtained according to the sum of the face number of the image data of each camera, the face area score, and the sum of the position scores of the faces in the image.
具体实施时,本实施例首先通过图像数据获取单元11获取摄像头实时拍摄的图像数据,然后通过人脸信息获取单元12对每一摄像头拍摄的图像数据进行人脸检测,包括检测每一图像中的人脸个数、人脸面积和人脸在图像中的位置,进而通过第一优先级获取单元13获取人脸个数分值、人脸面积分值和人脸在图像中的位置分值,将每一摄像头的图像获得的上述三个分值的和值作为该摄像头对应的第一优先级;接着,通过声源距离获取单元14获取麦克风所接收声源的位置,进而获取声源位置与每一摄像头的距离,通过第二优先级获取单元15根据每一摄像头与声源位置的距离来获取对应的第二优先级,其中,与声源位置的 距离越小,摄像头的第二优先级越高;最后,通过最终优先级获取单元16对于每一摄像头,将第一优先级和第二优先级分别乘以对应的权重值然后求和得到最终优先级,有输出单元17将最终优先级最高的摄像头的图像数据输出以用于显示。In a specific implementation, the image data acquiring unit 11 captures the image data captured by the camera in real time, and then performs face detection on the image data captured by each camera by the face information acquiring unit 12, including detecting each image. The number of faces, the face area, and the position of the face in the image, and then the first priority acquisition unit 13 obtains the score of the face, the face score, and the position score of the face in the image. The sum of the above three scores obtained by the image of each camera is taken as the first priority corresponding to the camera; then, the sound source distance acquiring unit 14 acquires the position of the sound source received by the microphone, thereby obtaining the sound source position and The distance of each camera is obtained by the second priority acquiring unit 15 according to the distance between each camera and the sound source position, wherein the position of the sound source is The smaller the distance, the higher the second priority of the camera; finally, the final priority acquisition unit 16 multiplies the first priority and the second priority by the corresponding weight values for each camera and then sums them to obtain the final priority. The output unit 17 outputs the image data of the camera with the highest priority for display.
本实施例二先通过人脸信息来获取第一优先级,然后通过声源位置与摄像头距离来获取第二优先级的顺序只是一种实施示例,调换先后进行的顺序,或并行进行上述两个步骤的实施例都在本实施例的保护范围之内。In the second embodiment, the first priority is obtained by using the face information, and then the order of obtaining the second priority by the distance between the sound source and the camera is only an implementation example, and the sequence of the successive operations is performed, or the two are performed in parallel. The embodiments of the steps are all within the scope of the protection of the present embodiment.
与现有技术相比,本实施例基于人脸检测和声源位置的结果来选取所显示的摄像头的拍摄的画面,实现智能自主切换摄像头画面,满足远程音视频会议的显示需求,减少人工操作,更加智能化、自动化;提高用户体验。Compared with the prior art, the embodiment selects the captured image of the displayed camera based on the result of face detection and sound source position, realizes intelligent automatic switching of the camera screen, meets the display requirement of the remote audio and video conference, and reduces manual operation. , more intelligent, automated; improve user experience.
本发明实施例三还提供一种音视频系统,参见图5,图5为本发明实施例三的结构示意图,其中,本实施例三包括本发明实施例二提供的一种自动选择摄像头画面的装置1,具体可以见上述本发明实施例二所述的装置内容,此处不做赘述。另外,本实施例三还包括以下结构:Embodiment 3 of the present invention further provides an audio and video system. Referring to FIG. 5, FIG. 5 is a schematic structural diagram of Embodiment 3 of the present invention, where Embodiment 3 includes an automatic selection camera screen provided by Embodiment 2 of the present invention. For the device 1, the content of the device described in the second embodiment of the present invention can be seen in detail, and details are not described herein. In addition, the third embodiment further includes the following structure:
2个摄像头201,摄像头201分别安装于智能平板的左右两侧或上下两侧,用于实时拍摄画面图像;优选的,采用图1中2个摄像头201安装在智能平板的左右两侧的实施方式;此处,图1所示的2个摄像头201在智能平板的安装位置仅仅为一种实施方式,基于本发明实施例的原理,仅仅只是调整摄像头201在智能平板上的安装位置或增加摄像头201的数量的实施方式,也在本发明的保护范围之内。The two cameras 201 and the cameras 201 are respectively mounted on the left and right sides or the upper and lower sides of the smart tablet for capturing the image of the screen in real time; preferably, the two cameras 201 of FIG. 1 are mounted on the left and right sides of the smart tablet. Here, the installation position of the two cameras 201 shown in FIG. 1 in the smart tablet is only one embodiment. Based on the principle of the embodiment of the present invention, only the installation position of the camera 201 on the smart tablet or the camera 201 is added. The number of embodiments is also within the scope of the present invention.
麦克风202,用于接收声源,并能确定接收的声源的位置。The microphone 202 is configured to receive a sound source and determine the position of the received sound source.
具体实施时,本实施例首先通过图像数据获取单元11获取摄像头201的图像数据,其中,摄像头的图像为摄像头201实时拍摄的画面图像然后通过人脸信 息获取单元12对每一摄像头201拍摄的图像数据进行人脸检测,包括检测每一图像中的人脸个数、人脸面积和人脸在图像中的位置,进而通过第一优先级获取单元13获取人脸个数分值、人脸面积分值和人脸在图像中的位置分值,将每一摄像头201的图像获得的上述三个分值的和值作为摄像头201对应的第一优先级;接着,通过声源距离获取单元14获取声源的位置,进而获取声源位置与每一摄像头201的距离,其中,声源的位置为麦克风202接收声源,并确定所接收的声源位置所获得;然后,通过第二优先级获取单元15根据每一摄像头201与声源位置的距离来获取对应的第二优先级,其中,与声源位置的距离越小,摄像头201的第二优先级越高;最后,通过最终优先级获取单元16对于每一摄像头201,将第一优先级和第二优先级分别乘以对应的权重值然后求和得到最终优先级,由输出单元17将最终优先级最高的摄像头201的图像数据传输到显示画面。In a specific implementation, the image data of the camera 201 is first acquired by the image data acquiring unit 11 , wherein the image of the camera is a screen image captured by the camera 201 in real time and then passed through a face letter. The information acquisition unit 12 performs face detection on the image data captured by each camera 201, including detecting the number of faces in each image, the face area, and the position of the face in the image, and then acquiring the unit through the first priority. 13 acquiring a face number score, a face area score, and a position score of the face in the image, and using the sum of the above three scores obtained by the image of each camera 201 as the first priority corresponding to the camera 201 Then, the sound source distance acquiring unit 14 obtains the position of the sound source, thereby obtaining the distance between the sound source position and each camera 201, wherein the position of the sound source is the microphone 202 receiving the sound source, and determining the received sound source. Obtaining a position; then, obtaining, by the second priority acquiring unit 15, a corresponding second priority according to the distance between each camera 201 and the sound source position, wherein the distance from the sound source position is smaller, the second of the camera 201 The higher the priority; finally, the final priority acquisition unit 16 multiplies the first priority and the second priority by the corresponding weight values for each camera 201 and then sums them to obtain the final advantage. First, the image data of the camera 201 having the highest priority is transmitted from the output unit 17 to the display screen.
与现有技术相比,本实施例一种音视频系统能够基于人脸检测和声源位置的结果来选取所显示的摄像头的拍摄的画面,实现智能自主切换摄像头画面,满足远程音视频会议的显示需求,减少人工操作,更加智能化、自动化;提高用户体验。Compared with the prior art, an audio and video system of the present embodiment can select a captured image of the displayed camera based on the result of face detection and sound source position, and realize intelligent autonomous switching of the camera image to satisfy the remote audio and video conference. Display requirements, reduce manual operations, be more intelligent and automated; and improve user experience.
本发明实施例四提供一种智能平板,参见图6,图6为本发明实施例四的结构示意图,其中,本实施例四包括本发明实施例三提供的一种音视频系统,具体可以见上述本发明实施例三所述的音视频系统装置的内容,此处不做赘述。The fourth embodiment of the present invention provides a smart tablet. Referring to FIG. 6, FIG. 6 is a schematic structural diagram of Embodiment 4 of the present invention. The fourth embodiment includes an audio and video system according to Embodiment 3 of the present invention. The content of the audio and video system apparatus according to the third embodiment of the present invention is not described herein.
具体实施时,本实施例首先通过图像数据获取单元11获取摄像头201的图像数据,其中,摄像头的图像为摄像头201实时拍摄的画面图像然后通过人脸信息获取单元12对每一摄像头201拍摄的图像数据进行人脸检测,包括检测每一图像中的人脸个数、人脸面积和人脸在图像中的位置,进而通过第一优先级获取单元13获取人脸个数分值、人脸面积分值和人脸在图像中的位置分值,将每一摄像头201的图像获得的上述三个分值的和值作为摄像头201对应的第一优先 级;接着,通过声源距离获取单元14获取声源的位置,进而获取声源位置与每一摄像头201的距离,其中,声源的位置为麦克风202接收声源,并确定所接收的声源位置所获得;然后,通过第二优先级获取单元15根据每一摄像头201与声源位置的距离来获取对应的第二优先级,其中,与声源位置的距离越小,摄像头201的第二优先级越高;最后,通过最终优先级获取单元16对于每一摄像头201,将第一优先级和第二优先级分别乘以对应的权重值然后求和得到最终优先级,由输出单元17将最终优先级最高的摄像头201的图像数据传输到显示画面。In the embodiment, the image data of the camera 201 is first acquired by the image data acquiring unit 11 , wherein the image of the camera is the image captured by the camera 201 in real time and then the image captured by the face information acquiring unit 12 for each camera 201 . The data is subjected to face detection, including detecting the number of faces in each image, the face area, and the position of the face in the image, and then acquiring the number of faces and the face area by the first priority acquiring unit 13. The score and the position score of the face in the image, and the sum of the above three scores obtained by the image of each camera 201 is taken as the first priority corresponding to the camera 201. Then, the sound source distance acquiring unit 14 obtains the position of the sound source, thereby obtaining the distance between the sound source position and each camera 201, wherein the position of the sound source is the microphone 202 receiving the sound source, and determining the received sound source. Obtaining a position; then, obtaining, by the second priority acquiring unit 15, a corresponding second priority according to the distance between each camera 201 and the sound source position, wherein the distance from the sound source position is smaller, the second of the camera 201 The higher the priority; finally, the final priority acquisition unit 16 multiplies the first priority and the second priority by the corresponding weight values for each camera 201 and then sums them to obtain the final priority, which is output by the output unit 17 The image data of the camera 201 having the highest priority is transmitted to the display screen.
与现有技术相比,本实施例一种智能平板能够基于人脸检测和声源位置的结果来选取所显示的摄像头的拍摄的画面,实现智能自主切换摄像头画面,满足远程音视频会议的显示需求,减少人工操作,更加智能化、自动化;提高用户体验。Compared with the prior art, the smart tablet of the embodiment can select the captured image of the displayed camera based on the result of the face detection and the sound source position, and realize the intelligent autonomous switching of the camera image to meet the display of the remote audio and video conference. Demand, reduce manual operations, be more intelligent and automated; improve user experience.
以上所述是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也视为本发明的保护范围。 The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It is the scope of protection of the present invention.

Claims (10)

  1. 一种自动选择摄像头画面的方法,其特征在于,包括:A method for automatically selecting a camera screen, comprising:
    获取每一摄像头的图像数据;其中,获取的所述图像数据为所述每一摄像头实时拍摄的画面图像;Obtaining image data of each camera; wherein the acquired image data is a screen image captured by each camera in real time;
    对所述每一摄像头的图像数据进行人脸检测,以获取所述每一摄像头的图像数据中的人脸信息;Performing face detection on the image data of each camera to obtain face information in the image data of each camera;
    根据所述每一摄像头的图像数据对应获取的所述人脸信息,获取每一所述摄像头的第一优先级;Acquiring, according to the image data of each camera, the first priority of each of the cameras;
    通过每一麦克风确定所述麦克风所对应接收的声源的位置,从而获取每一所述摄像头与所述声源的距离;Determining, by each microphone, a position of a sound source corresponding to the microphone, thereby obtaining a distance between each of the cameras and the sound source;
    根据所述每一摄像头与所述声源位置的距离,获取每一所述摄像头的第二优先级;Obtaining a second priority of each of the cameras according to a distance between each camera and the sound source position;
    根据所述每一摄像头的所述第一优先级和所述第二优先级获取所述每一摄像头的最终优先级;Obtaining a final priority of each camera according to the first priority and the second priority of each camera;
    将所述最终优先级最高的所述摄像头的图像数据输出以用于显示。The image data of the camera with the highest final priority is output for display.
  2. 如权利要求1所述的一种自动选择摄像头画面的方法,其特征在于,所述根据所述每一摄像头的所述第一优先级和所述第二优先级获取所述每一摄像头的最终优先级具体包括:The method of automatically selecting a camera screen according to claim 1, wherein the obtaining the final of each camera according to the first priority and the second priority of each camera Priority includes:
    对于每一所述摄像头,将所述第一优先级乘以预置第一权重得到第一乘积值,将所述第二优先级乘以预置第二权重得到第二乘积值; For each of the cameras, multiplying the first priority by a preset first weight to obtain a first product value, and multiplying the second priority by a preset second weight to obtain a second product value;
    根据所述第一乘积值和所述第二乘积值的和值得到最终优先级。A final priority is obtained based on a sum of the first product value and the second product value.
  3. 如权利要求1所述的一种自动选择摄像头画面的方法,其特征在于,所述根据所述每一摄像头与所述声源位置距离,获取每一所述摄像头的第二优先级包括:The method of automatically selecting a camera screen according to claim 1, wherein the obtaining the second priority of each camera according to the distance between the camera and the sound source comprises:
    根据所述每一摄像头与所述声源位置距离的大小,对所述每一摄像头进行排序,与所述声源的距离越小的所述摄像头对应获取的第二优先级越高。Each camera is sorted according to the distance between each camera and the sound source, and the second priority obtained by the camera corresponding to the distance from the sound source is higher.
  4. 如权利要求1所述的一种自动选择摄像头画面的方法,其特征在于,所述人脸信息包括获取人脸个数、人脸面积以及人脸在图像中的位置。The method of automatically selecting a camera screen according to claim 1, wherein the face information comprises acquiring a number of faces, a face area, and a position of the face in the image.
  5. 如权利要求4所述的一种自动选择摄像头画面的方法,其特征在于,所述根据所述每一摄像头的图像数据对应获取的所述人脸信息,获取每一所述摄像头图像数据的第一优先级具体包括:The method for automatically selecting a camera screen according to claim 4, wherein the image data of each of the camera images is acquired according to the face information acquired corresponding to the image data of each camera A priority specifically includes:
    根据所述每一摄像头的图像数据对应获取的所述人脸个数、所述人脸面积以及所述人脸在图像中的位置分别获取对应的人脸个数分值、人脸面积分值以及人脸在图像中的位置分值;Obtaining, according to the image data of each camera, the number of the faces, the face area, and the position of the face in the image, respectively, the corresponding face number and the face area score And the position score of the face in the image;
    根据所述每一摄像头的图像数据的所述人脸个数分值、所述人脸面积分值和所述人脸在图像中的位置分值的和值的大小,获取所述每一摄像头的所述第一优先级。Acquiring each camera according to the sum of the face number of the image data of the camera, the face area score, and the sum of the position scores of the face in the image. The first priority.
  6. 一种自动选择摄像头画面的装置,其特征在于,包括:An apparatus for automatically selecting a camera screen, comprising:
    图像数据获取单元,用于获取每一摄像头的图像数据;其中,获取的所述图 像数据为所述每一摄像头的实时拍摄的画面图像;An image data acquiring unit, configured to acquire image data of each camera; wherein the acquired image The image data is a real-time captured image of each of the cameras;
    人脸信息获取单元,用于对所述每一摄像头的图像数据进行人脸检测,以获取所述每一摄像头的图像数据中的人脸信息;a face information acquiring unit, configured to perform face detection on the image data of each camera to obtain face information in the image data of each camera;
    第一优先级获取单元,用于根据所述每一摄像头的图像数据对应获取的所述人脸信息,获取每一所述摄像头的第一优先级;a first priority acquiring unit, configured to acquire, according to the image information acquired by the image data of each camera, a first priority of each of the cameras;
    声源距离获取单元,用于通过每一麦克风确定所述麦克风所对应接收的声源的位置,从而获取每一所述摄像头与所述声源的距离;a sound source distance obtaining unit, configured to determine, by each microphone, a position of a sound source corresponding to the microphone, so as to obtain a distance between each of the cameras and the sound source;
    第二优先级获取单元,用于根据所述每一摄像头与所述声源位置的距离,获取每一所述摄像头的第二优先级;a second priority acquiring unit, configured to acquire a second priority of each of the cameras according to a distance between each camera and the sound source position;
    最终优先级获取单元,用于根据所述每一摄像头的所述第一优先级和所述第二优先级获取所述每一摄像头的最终优先级;a final priority acquiring unit, configured to acquire a final priority of each camera according to the first priority and the second priority of each camera;
    输出单元,用于将所述最终优先级最高的所述摄像头的图像数据输出以用于显示。And an output unit, configured to output image data of the camera with the highest final priority for display.
  7. 如权利要求6所述的一种自动选择摄像头画面的装置,其特征在于,所述最终优先级获取单元具体用于:The apparatus for automatically selecting a camera screen according to claim 6, wherein the final priority acquisition unit is specifically configured to:
    对于每一所述摄像头,将所述第一优先级乘以预置第一权重得到第一乘积值,将所述第二优先级乘以预置第二权重得到第二乘积值;For each of the cameras, multiplying the first priority by a preset first weight to obtain a first product value, and multiplying the second priority by a preset second weight to obtain a second product value;
    根据所述第一乘积值和所述第二乘积值的和值得到最终优先级。A final priority is obtained based on a sum of the first product value and the second product value.
  8. 如权利要求6所述的一种自动选择摄像头画面的装置,其特征在于,所述第二优先级获取单元用于根据所述每一摄像头与所述声源位置距离,获取每一所述摄像头的第二优先级时,根据所述每一摄像头与所述声源位置距离的大小, 对所述每一摄像头进行排序,与所述声源的距离越小的所述摄像头对应获取的第二优先级越高。The apparatus for automatically selecting a camera screen according to claim 6, wherein the second priority acquiring unit is configured to acquire each of the cameras according to the distance between each camera and the sound source. The second priority, according to the distance between each camera and the sound source position, Each of the cameras is sorted, and the second distance that the camera obtains correspondingly is smaller as the distance from the sound source is smaller.
  9. 一种音视频系统,其特征在于,包括如权利要求6~8任一项所述的一种自动选择摄像头画面的装置,还包括:An audio and video system, comprising: an apparatus for automatically selecting a camera screen according to any one of claims 6 to 8, further comprising:
    至少2个摄像头,所述摄像头分别安装于智能平板的左右两侧或上下两侧,用于实时拍摄画面图像;At least two cameras, the cameras are respectively mounted on the left and right sides or the upper and lower sides of the smart tablet, for real-time shooting of the image of the screen;
    麦克风,用于接收声源,并确定接收的所述声源的位置。a microphone for receiving a sound source and determining the location of the received sound source.
  10. 一种智能平板,其特征在于,包括权利要求9所述的音视频系统。 A smart tablet comprising the audio and video system of claim 9.
PCT/CN2017/104657 2017-05-16 2017-09-29 Method and device for automatically selecting camera image, and audio and video system WO2018209879A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710344454.4 2017-05-16
CN201710344454.4A CN107277427A (en) 2017-05-16 2017-05-16 Automatically select method, device and the audio-visual system of camera picture

Publications (1)

Publication Number Publication Date
WO2018209879A1 true WO2018209879A1 (en) 2018-11-22

Family

ID=60064007

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/104657 WO2018209879A1 (en) 2017-05-16 2017-09-29 Method and device for automatically selecting camera image, and audio and video system

Country Status (2)

Country Link
CN (1) CN107277427A (en)
WO (1) WO2018209879A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113395479A (en) * 2021-06-16 2021-09-14 随锐科技集团股份有限公司 Video conference picture processing method and system

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844774A (en) * 2017-11-10 2018-03-27 广州视源电子科技股份有限公司 System of selection, device, intelligent terminal and the storage medium that image is shown
CN108197554B (en) * 2017-12-28 2023-06-02 努比亚技术有限公司 Camera starting method, mobile terminal and computer readable storage medium
CN110536097A (en) * 2018-05-25 2019-12-03 中兴通讯股份有限公司 A kind of video control method, video conference terminal and multipoint control unit MCU
CN108900787B (en) * 2018-06-20 2021-06-04 广州视源电子科技股份有限公司 Image display method, device, system and equipment, readable storage medium
CN110858887A (en) * 2018-08-22 2020-03-03 视联动力信息技术股份有限公司 Method and device for playing monitoring data
CN110602385A (en) * 2019-08-28 2019-12-20 深圳怡化电脑股份有限公司 Camera and method of using the same
CN110658967A (en) * 2019-09-23 2020-01-07 联想(北京)有限公司 Control method and device and electronic equipment
CN110600036A (en) * 2019-09-24 2019-12-20 随锐科技集团股份有限公司 Conference picture switching device and method based on voice recognition
CA3133530A1 (en) * 2020-08-21 2022-02-21 Shenzhen Carku Technology Co., Limited Shared charging cabinet and ejecting control method thereof
CN112860198B (en) * 2021-01-05 2024-02-09 中科创达软件股份有限公司 Video conference picture switching method and device, computer equipment and storage medium
CN113473011B (en) * 2021-06-29 2023-04-25 广东湾区智能终端工业设计研究院有限公司 Shooting method, shooting system and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060136962A1 (en) * 2004-12-21 2006-06-22 Funai Electric Co., Ltd. Broadcasting signal receiving system
CN1917623A (en) * 2005-08-17 2007-02-21 索尼株式会社 Camera controller and teleconferencing system
CN101685153A (en) * 2008-09-28 2010-03-31 深圳华为通信技术有限公司 Microphone space measuring method and device
US20120050601A1 (en) * 2010-08-26 2012-03-01 Samsung Electronics Co., Ltd. Method of controlling digital photographing apparatus and digital photographing apparatus
CN103237178A (en) * 2013-03-26 2013-08-07 北京小米科技有限责任公司 Video frame switching method, video frame switching device and video frame switching equipment
CN104038725A (en) * 2010-09-09 2014-09-10 华为终端有限公司 Method and device for adjusting conventioneer image display in multi-screen video conference

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104219374A (en) * 2013-06-04 2014-12-17 李旭阳 Human-computer interaction system on basis of next-generation intelligent mobile phone

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060136962A1 (en) * 2004-12-21 2006-06-22 Funai Electric Co., Ltd. Broadcasting signal receiving system
CN1917623A (en) * 2005-08-17 2007-02-21 索尼株式会社 Camera controller and teleconferencing system
CN101685153A (en) * 2008-09-28 2010-03-31 深圳华为通信技术有限公司 Microphone space measuring method and device
US20120050601A1 (en) * 2010-08-26 2012-03-01 Samsung Electronics Co., Ltd. Method of controlling digital photographing apparatus and digital photographing apparatus
CN104038725A (en) * 2010-09-09 2014-09-10 华为终端有限公司 Method and device for adjusting conventioneer image display in multi-screen video conference
CN103237178A (en) * 2013-03-26 2013-08-07 北京小米科技有限责任公司 Video frame switching method, video frame switching device and video frame switching equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113395479A (en) * 2021-06-16 2021-09-14 随锐科技集团股份有限公司 Video conference picture processing method and system
CN113395479B (en) * 2021-06-16 2022-06-24 随锐科技集团股份有限公司 Video conference picture processing method and system

Also Published As

Publication number Publication date
CN107277427A (en) 2017-10-20

Similar Documents

Publication Publication Date Title
WO2018209879A1 (en) Method and device for automatically selecting camera image, and audio and video system
US9641585B2 (en) Automated video editing based on activity in video conference
US10440322B2 (en) Automated configuration of behavior of a telepresence system based on spatial detection of telepresence components
US10917612B2 (en) Multiple simultaneous framing alternatives using speaker tracking
US11128793B2 (en) Speaker tracking in auditoriums
US10182208B2 (en) Panoramic image placement to minimize full image interference
US8208002B2 (en) Distance learning via instructor immersion into remote classroom
US9633270B1 (en) Using speaker clustering to switch between different camera views in a video conference system
US9088694B2 (en) Adjusting video layout
CN106961568B (en) Picture switching method, device and system
US11076127B1 (en) System and method for automatically framing conversations in a meeting or a video conference
WO2012072008A1 (en) Method and device for superposing auxiliary information of video signal
EP2352290B1 (en) Method and apparatus for matching audio and video signals during a videoconference
US20140063176A1 (en) Adjusting video layout
NO327899B1 (en) Procedure and system for automatic camera control
CN111083397A (en) Recorded broadcast picture switching method, system, readable storage medium and equipment
JPH08163522A (en) Video conference system and terminal equipment
US10645339B1 (en) Asymmetric video conferencing system and method
JP2005033570A (en) Method and system for providing mobile body image
CN113676693B (en) Picture presentation method, video conference system, and readable storage medium
EP4106326A1 (en) Multi-camera automatic framing
WO2016110047A1 (en) Teleconference system and teleconferencing method
JPH02202275A (en) Video conference system
JP2010028299A (en) Conference photographed image processing method, conference device, and the like
CN117640874A (en) Image processing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17909692

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26/03/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17909692

Country of ref document: EP

Kind code of ref document: A1