WO2022160748A1 - Video processing method and apparatus - Google Patents

Video processing method and apparatus Download PDF

Info

Publication number
WO2022160748A1
WO2022160748A1 PCT/CN2021/120411 CN2021120411W WO2022160748A1 WO 2022160748 A1 WO2022160748 A1 WO 2022160748A1 CN 2021120411 W CN2021120411 W CN 2021120411W WO 2022160748 A1 WO2022160748 A1 WO 2022160748A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
field
view
view frame
target
Prior art date
Application number
PCT/CN2021/120411
Other languages
French (fr)
Chinese (zh)
Inventor
陈文明
邓高锋
张世明
吕周谨
倪世坤
Original Assignee
深圳壹秘科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹秘科技有限公司 filed Critical 深圳壹秘科技有限公司
Publication of WO2022160748A1 publication Critical patent/WO2022160748A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance

Definitions

  • the invention relates to the technical field of video processing, and in particular, to the technical field of video processing for portrait tracking.
  • the video images of one conference site are acquired through cameras, transmitted from the video image to the other conference site, and displayed on the display device of the other conference site.
  • the camera device of the venue needs to automatically track and focus the participants.
  • the empty space occupies the screen and makes the screen of the participants smaller. In this way, it is not conducive to the exchanges between the participants on both sides.
  • the present application provides a video processing method and device that can automatically track participants in a conference venue.
  • a video processing method comprising: acquiring a sensor frame captured by a video sensor, where the sensor frame is an image frame of an entire frame captured by a video sensor; detecting a target frame in the sensor frame, the The target frame is a human body image frame and/or an image frame containing a human body in the sensor frame; a visual field frame is determined according to the target frame; wherein, the visual field frame is an image frame including all the target frames; The target frame of the boundary of the visual field frame, and determine whether all the target frames that can determine the boundary of the visual field frame are stationary; when it is determined that all the target frames that can determine the boundary of the visual field frame are stationary , output the field of view frame.
  • a video processing device comprising: a video acquisition unit for acquiring a sensor frame captured by a video sensor, where the sensor frame is an image frame of an entire frame captured by the video sensor; a humanoid capture unit for A target frame in the sensor frame is detected, and the target frame is a human body image frame and/or an image frame containing a human body in the sensor frame; a video detection unit is used to determine a field of view frame according to the target frame; determining the target frame of the boundary of the field of view frame, and determining whether all the target frames that can determine the boundary of the field of view frame are still; wherein, the field of view frame is an image frame including all the target frames;
  • the image processing unit outputs the view frame when it is determined that all the target frames that can determine the boundary of the view frame are stationary.
  • the beneficial effect of the present application is that a complete image is acquired through the sensor, and the human body in the sensor frame is detected to determine the image range that needs to be displayed to the user, that is, the field of view frame.
  • the visual field frame is output and displayed. Because real-time monitoring is required for each sensor frame, the position changes of the participants at the venue can be captured in real time.
  • the new field of view will be recalculated and output, thus enabling automatic, real-time tracking of participants in the venue.
  • FIG. 1 is a system architecture diagram of an application of an embodiment of the present application.
  • FIG. 2 is a flowchart of a video processing method according to Embodiment 1 of the present application.
  • FIG. 3 is a flowchart of specific steps of determining a field of view frame according to a target frame in Embodiment 1 of the present application.
  • FIG. 4 is a schematic diagram of extending up and down all target frames in Embodiment 1 of the present application.
  • FIG. 5 is a schematic diagram of a field of view frame in Embodiment 1 of the present application.
  • FIG. 6 is a flowchart of determining a target frame that can determine the boundary of the field of view frame in Embodiment 1 of the present application.
  • FIG. 7 is a schematic diagram of cropping a sensor frame to obtain a field of view frame in Embodiment 1 of the present application.
  • FIG. 8 is a schematic diagram of smoothing a video image in Embodiment 1 of the present application.
  • FIG. 9 is a schematic block diagram of a video processing apparatus according to Embodiment 2 of the present application.
  • FIG. 10 is a schematic structural diagram of a video processing apparatus according to Embodiment 3 of the present application.
  • the embodiments of the present application may be applied to various camera devices or systems, for example, a camera device, a network camera device, and a conference terminal of an audio-video conference, and the specific device or system is not limited by the embodiments of the present application.
  • FIG. 1 shows a system architecture diagram 100 applied by an embodiment of the present application.
  • the system architecture 100 includes: a camera device 110 , a main processing device 120 and a display device 130 .
  • the camera device 110 , the main processing device 120 , and the display device 130 may be communicatively connected through one of electrical connection, network connection, communication connection, and the like.
  • the camera device 110 includes a video sensor for acquiring sensor frames. After the main processing device 120 processes the sensor frames, the field of view frame is sent to the display device 130 for display.
  • the camera device 110, the main processing device 120 and the display device 130 may be three mutually independent hardware entities; alternatively, the camera device 110 and the main processing device 120 may be set in the same hardware entity, for example, in a camera device In addition to including a video sensor, it also includes a device for processing video images; alternatively, the main processing device 120 and the display device 130 may be set in the same hardware entity, for example, in the display device 130, in addition to including The display also includes a device for processing video images.
  • the camera device 110 sends the acquired field of view frame to the display device 130, and the display device 130 processes the field of view frame before displaying it on the display.
  • the camera device 110 may be a camera
  • the display device 130 may be a display, a projector, a computer screen, etc.
  • the main processing device 120 may be a processing device built into the camera device 110 and the display device 130, or an independent processing device
  • the processing device such as a computer or other electronic device, such as a mobile intelligent electronic device, which can communicate with the device 110 and the display device 130, respectively.
  • the camera In the conference scene, the meeting place is fixed. In small and medium-sized conference venues, the camera can use a high-definition wide-angle lens to obtain images of the entire venue. As a result, cameras can capture every participant in real time.
  • the image frame of the entire frame captured by the video sensor is referred to as the sensor frame
  • the human body image frame and/or the image frame containing the human body in the sensor frame is referred to as the target frame, which will include all the above
  • the image frame of the target frame is called the field of view frame.
  • FIG. 2 shows a video processing method provided in Embodiment 1 of the present application.
  • the method can be applied to the camera device 110 with video processing capability, can be applied to the display device 130 with video processing capability, and can also be applied to the independent main processing device 120 .
  • the video processing method includes:
  • S210 Acquire a sensor frame captured by a video sensor, where the sensor frame is an image frame of an entire frame captured by the video sensor; optionally, acquire a sensor frame captured by a high-definition wide-angle camera, for example, the lens part of the camera adopts 4K
  • the lens 5 million pixels or more
  • a wide-angle lens so that when there are more participants in a multi-person conference scene, it can also ensure that all participants are included in the view of the lens. In the range, it can also ensure the clarity of the video; the sensor in the camera mainly converts the optical signal received by the lens into an electrical signal, and then the electrical signal (ie the video signal) is transmitted to the real-time image frame.
  • main processing device 120 main processing device 120;
  • a target frame in the sensor frame detects a target frame in the sensor frame, where the target frame is a human body image frame and/or an image frame including a human body in the sensor frame; optionally, a method for detecting a human body includes but is not limited to face detection , upper body detection, lower body detection, human body pose estimation (SPPE, DensePose) and other methods; it should be noted that the human body referred to in this application may include the entire body of a person, or may refer to a part of the entire body, such as a face or upper body;
  • S240 determine all the target frames that can determine the boundary of the field of view frame, and determine whether all the target frames that can determine the boundary of the field of view frame are static;
  • the field of view frame may be directly displayed on the device running the method, or the field of view frame may be output to other display devices to display the field of view frame by means of wireless or limited transmission.
  • S230 determining a visual field frame according to the target frame, including:
  • Fig. 4 expand the height of a certain proportion to the upper and lower of all target frames, such as e*H, e is the scale coefficient, and H is the height of the corresponding target frame;
  • the range that needs to be displayed to the user is determined.
  • the field of view frame at this time may not conform to the displayed size, or does not conform to the displayed length ratio, etc. Require. You can further adjust the visual field frame. Therefore, in S230, the visual field frame is determined according to the target frame, and may further include the following adjustment mode 1 and/or adjustment mode 2.
  • step S230 further includes:
  • the maximum value of the preset visual field frame is View max and the minimum width and height are W min and H min respectively.
  • View max is generally predefined as the size of the sensor original image
  • W min , H min are set according to the local area of the sensor original image that needs to be enlarged
  • the smaller W min and H min are set, the larger the local area that can be enlarged Small.
  • the coordinates of the visual field frame cannot exceed View max
  • the width/height values cannot be less than W min /H min
  • the coordinates of the minimum frame View O exceeding the boundary or insufficient are corrected.
  • the field of view frame obtained after coordinate correction is marked as View F .
  • the coordinates of the 4 points of the View O box must be within the range of View max coordinates, and the coordinates beyond the maximum boundary are replaced by the maximum boundary coordinates.
  • the width/height values of View O must be greater than or equal to W min /H min . If the width/height of View O is less than W min /H min , the width/height of View O is supplemented to W min /H min .
  • step S234 specifically includes: supplementing one-half of the difference between the minimum height value of the field of view frame and the height value of the field of view frame to the upper and lower boundaries of the field of view frame. If the upper boundary or lower boundary of the visual field frame exceeds the maximum boundary of the visual field frame, the coordinates beyond the maximum boundary are replaced by the maximum boundary coordinates, and the numerical value beyond the maximum boundary is supplemented to the opposite boundary.
  • step S235 specifically includes: supplementing half of the difference between the minimum width value of the field of view frame and the width value of the field of view frame to the left and right boundaries of the field of view frame, if all subsequent additions are added.
  • the left boundary or the right boundary of the visual field frame exceeds the maximum boundary of the visual field frame, then the coordinates beyond the maximum boundary are replaced with the coordinates of the maximum boundary, and the numerical value beyond the maximum boundary is added to the opposite side. boundary.
  • step S230 adjust the aspect ratio of the field of view frame. That is, step S230 further includes:
  • step S236 Adjust the width value and/or the height value of the field of view frame according to the aspect ratio of the current video resolution.
  • the field of view frame obtained after adjustment in step S236 is marked as View.
  • the field of view frame is the field of view frame that is output and displayed to the user.
  • the above-mentioned adjustment method 1 and adjustment method 2 of the field of view frame can be used either, or both adjustment methods can be used, first adjust the size with adjustment method 1, and then adjust with adjustment method 2 Aspect ratio.
  • Rect ti is the set of target frames detected at time ti.
  • determining the target frame that can determine the boundary of the field of view frame specifically includes:
  • a frame rect j is removed from the target frame Rect ti monitored at time ti to obtain a new set by
  • a field of view frame is calculated if It means that the target frame rect j will not affect the calculation result of the field of view frame, otherwise if It means that the target frame rect j will determine the boundary coordinates of the view frame.
  • DecisionRect ti is the set of target boxes that can determine the boundary of the view frame View at time ti.
  • it is determined that all the target frames that can determine the boundary of the field of view frame are static, specifically including:
  • the manner of determining the motion factor Factor 12 of the target frame is as follows:
  • the detection unit After the detection unit receives a sensor frame transmitted by the sensor, it will perform real-time detection on the sensor frame. First, the human body is detected, and the target frame containing the human body is framed, which is called target frame 1 here. Assuming that the upper left corner of the sensor frame is the coordinate origin (0,0), the coordinate of the center point C1 of the target frame 1 is calculated as (x1 ,y1), width W1, height H1, and the result will be saved.
  • the detection unit After the detection unit receives the next sensor frame transmitted by the sensor, it also performs real-time detection on the next sensor frame. Use the same method to frame the target frame 2 containing the human body, and save the coordinates of the center point C2 of the target frame 2 (x2, y2), the length W2, and the height H2.
  • the above is to calculate the motion factor Factor 12 between two sensor frames (which can be the current frame and the previous frame, or the current frame and the next frame).
  • the target frame is determined to be static; when it is determined that the motion factor of T1 within a certain period of time exceeds (such as greater than) the target frame
  • the threshold value of the motion factor can be taken as 0.5, which is an empirical value, which will be different under different conditions.
  • the value of T1 ranges from 0 seconds to 10 seconds. If you need to focus on the person who is currently moving, as long as T1 is small enough.
  • S250 may specifically include:
  • the sensor frame is cropped and scaled by invoking an ISP (Image Signal Processor, image signal processor) chip.
  • ISP Image Signal Processor, image signal processor
  • S250 specifically Also includes:
  • S254 Update the field of view frame frame by frame according to the number of moving steps until the target field of view frame is reached.
  • the field of view frame of each frame of image moves according to a fixed step size to avoid moving too fast.
  • step max the maximum step size of the coordinate value of the view frame
  • View dist (x 0 , y 0 , x 1 , y 1 )
  • MoveNum max ⁇ x 0 , y 0 , x 1 , y 1 ⁇ /step max .
  • View step View dist /MoveNum
  • the cropping and/or scaling processing and the video image smoothing processing in the above S2502 may be used together in practical applications, for example, the cropping and/or scaling processing is performed first, and then the video image smoothing processing is performed.
  • the image of the entire conference site is acquired by the sensor, and the human body in the sensor frame is detected to determine the image range that needs to be displayed to the user. And according to comparing the position change of the same target frame in each sensor frame, it is determined whether the picture frame is in a static state. When it is determined that all the people in the venue who have an influence on the output target frame have been in a still state, the visual field frame of the picture including all the human bodies is output and displayed. Since real-time monitoring is required for each sensor frame, even after the participants are seated, for some reason, the positions of the participants have changed.
  • the described video processing method can capture this change in real time, and after the participants are seated again, a new field of view frame is recalculated, output and displayed for the user to watch. Because the above method does not need to control the rotation of the camera or refocus, it just recalculates the sensor frame captured by the sensor to obtain a new field of view, and outputs and displays it to the user. Perform automatic, real-time tracking. Also, the apparatus using the method can thus also be a plug-and-play device.
  • a video processing apparatus 300 provided in Embodiment 2 of the present application the video processing apparatus includes:
  • the video acquisition unit 310 is configured to acquire the sensor frame captured by the video sensor, where the sensor frame is the image frame of the entire frame captured by the video sensor; optionally, the video acquisition unit 310 acquires the sensor frame captured by the high-definition wide-angle camera ;
  • a humanoid capture unit 320 configured to detect a target frame in the sensor frame, where the target frame is a human body image frame and/or an image frame containing a human body in the sensor frame;
  • the video detection unit 330 is configured to determine a visual field frame according to the target frame; determine all the target frames that can determine the boundary of the visual field frame, and determine whether all the target frames that can determine the boundary of the visual field frame are all Still; wherein, the field of view frame is an image frame including all the target frames;
  • the image processing unit 340 outputs the view frame when it is determined that all the target frames that can determine the boundary of the view frame are stationary.
  • the video detection unit 330 is specifically configured to, when it is determined that the target frames are static, expand the heights of all the target frames up and down by a certain proportion;
  • the smallest frame included in the target frame is the field of view frame.
  • the video detection unit 330 is further configured to replace the four vertex coordinates of the view frame with the maximum boundary coordinates if the coordinates of the four vertices of the view frame exceed the maximum boundary coordinates of the view frame. and/or, if the height value of the field of view frame is less than the minimum height value of the field of view frame, then adjust the height value of the field of view frame to the minimum height value of the field of view frame; and/or, if the field of view If the width value of the frame is smaller than the minimum width value of the view frame, then adjust the width value of the view frame to the minimum width value of the view frame.
  • the video detection unit 330 is specifically used for:
  • the height value of the view frame is smaller than the minimum height value of the view frame, add half of the difference between the minimum width value of the view frame and the width value of the view frame to the view frame If the left or right boundary of the visual field frame after supplementation exceeds the maximum boundary of the visual field frame, the coordinates that exceed the maximum boundary will be replaced by the coordinates of the maximum boundary, and the coordinates that exceed the maximum boundary will be replaced at the same time.
  • the value of the maximum boundary is supplemented to the opposite boundary; and/or,
  • the width value of the view frame is smaller than the minimum width value of the view frame, add half of the difference between the minimum height value of the view frame and the height value of the view frame to the view frame If the upper and lower boundaries of the field of view frame after supplementation exceed the maximum boundary of the field of view frame, the coordinates that exceed the maximum boundary will be replaced by the maximum boundary coordinates, and the maximum boundary will be exceeded at the same time. The value of is added to the opposite boundary.
  • the video detection unit 330 is further configured to adjust the width value and/or the height value of the field of view frame according to the aspect ratio of the current video resolution.
  • the video detection unit 330 configured to determine the target frame that can determine the boundary of the field of view frame, includes:
  • the video detection unit 330 is specifically configured to calculate and obtain a first field of view frame according to all target frames; delete one of the target frames; calculate and obtain a second field of view frame according to the remaining target frames; When it is not equal to the second field of view frame, it is determined that the target frame to be deleted is the target frame that can determine the boundary of the field of view frame.
  • the detection unit 330 determines by calculation whether a certain target frame is a target frame that can determine the boundary of the field of view frame, please refer to the description in Embodiment 1, which will not be repeated here.
  • the video detection unit 330 is configured to determine that all the target frames that can determine the boundary of the field of view frame are static, including:
  • the video detection unit 330 is specifically configured to determine all the target frames that can determine the boundary of the field of view when the motion factors within the preset time interval are all smaller than a preset threshold.
  • the bounding boxes of the target boxes are in a stationary state. Specifically, how the video detection unit 330 determines whether a certain target frame is in a static state through calculation, please refer to the specific description in Embodiment 1, which will not be repeated here.
  • the image processing unit 340 is specifically configured to, when it is determined that all the target frames that can determine the boundary of the field of view frame are stationary, crop and/or crop the sensor frame according to the field of view frame. or zoom, and output the field of view. specific,
  • the image processing unit 340 is specifically configured to, when it is determined that all the target frames that can determine the boundary of the field of view frame are stationary, crop and/or crop the sensor frame according to the field of view frame. or zoom; and calculate the difference coordinates between the target field of view frame and the current field of view frame; according to the preset maximum movement step size of the field of view frame of each frame of image, calculate the movement from the current field of view frame to the target field of view
  • the number of moving steps of the frame; the field of view frame is updated frame by frame according to the number of moving steps until the target field of view frame is reached.
  • how the image processing unit 340 gradually updates the current field of view frame until reaching the target field of view frame please refer to the examples in S252 to S254 in the first embodiment, which will not be repeated here.
  • the video processing device 300 is a camera device with a built-in video processing function, such as the combination of the camera device 110 and the main processing device 120 in FIG. 1 ; it can also be a display device (such as a computer or an intelligent electronic device) with a built-in video processing function. , such as the combination of the main processing device 120 and the display device 130 in FIG. 1 ; it can also be an electronic device independent of hardware. Not limited in this application.
  • FIG. 10 is a schematic structural diagram of a video processing apparatus 400 according to Embodiment 3 of the present application.
  • the video processing apparatus 400 includes: a processor 410 , a memory 420 and a communication interface 430 .
  • the processor 410, the memory 420 and the communication interface 430 are connected to each other through a bus system.
  • the processor 410 may be an independent component, or may be a collective term for multiple processing components. For example, it may be a CPU, an ASIC, or one or more integrated circuits configured to implement the above method, such as at least one microprocessor DSP, or at least one programmable gate FPGA, etc.
  • the memory 420 is a computer-readable storage medium on which programs executable on the processor 410 are stored.
  • the processor 410 invokes the program in the memory 420 to execute a video processing method provided in the first embodiment, and transmits the result obtained by the processor 410 to other devices through the communication interface 430 in a wireless or wired manner.
  • the video processing apparatus 400 may further include a camera 440 .
  • the camera 440 acquires the sensor frame and sends it to the processing 410, and the processor 410 calls the program in the memory 420, executes the video processing method provided in the first embodiment above, and processes the sensor frame , and transmit the result to other devices through the communication interface 430 in a wireless or wired manner.
  • the functions described in the specific embodiments of the present application may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software When implemented in software, it may be implemented by a processor executing software instructions.
  • the software instructions may consist of corresponding software modules.
  • the software modules may be stored in a computer-readable storage medium, which may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes one or more available mediums integrated.
  • the available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, Digital Video Disc (DVD)), or semiconductor media (eg, Solid State Disk (SSD)) )Wait.
  • the computer-readable storage medium includes but is not limited to random access memory (Random Access Memory, RAM), flash memory, read only memory (Read Only Memory, ROM), Erasable Programmable Read Only Memory (Erasable Programmable ROM, EPROM) ), Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), registers, hard disks, removable hard disks, compact disks (CD-ROMs), or any other form of storage medium known in the art.
  • An exemplary computer-readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the computer-readable storage medium.
  • the computer-readable storage medium can also be an integral part of the processor.
  • the processor and computer-readable storage medium may reside in an ASIC. Additionally, the ASIC may reside in access network equipment, target network equipment or core network equipment.
  • the processor and the computer-readable storage medium may also exist as discrete components in the access network device, the target network device, or the core network device. When implemented in software, it can also be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer program instructions may be stored on or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, from a website site, computer, server, or computer-readable storage medium.
  • the data center transmits to another website site, computer, server or data center by wired (eg coaxial cable, optical fiber, Digital Subscriber Line, DSL) or wireless (eg infrared, wireless, microwave, etc.).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Studio Devices (AREA)
  • Image Analysis (AREA)

Abstract

A video processing method and apparatus. The method comprises: acquiring a sensor frame captured by a video sensor, the sensor frame being an image box of an entire frame captured by the video sensor; detecting target boxes in the sensor frame, wherein the target boxes are human body image boxes and/or image boxes comprising a human body in the sensor frame; determining a field of view box according to the target boxes, wherein the field of view box is an image box comprising all the target boxes; determining all target boxes that can determine boundaries of the field of view box, and determining whether all the target boxes that can determine the boundaries of the field of view box are stationary; and when it is determined that all the target boxes that can determine the boundaries of the field of view box are stationary, outputting the field of view box. According to the solution, automatic and real-time tracking of participants in a conference hall can be implemented.

Description

一种视频处理方法及其装置A video processing method and device thereof 技术领域technical field
本发明涉及视频处理技术领域,尤其涉及一种人像追踪的视频处理技术领域。The invention relates to the technical field of video processing, and in particular, to the technical field of video processing for portrait tracking.
背景技术Background technique
在科技迅速发展的今天,人们通过音视频终端在网络上进行远程交流的会议模式,已十分常见。通过都是通过摄像头获取一方会场的视频图像,从传出给另一方会场,并在另一方会场的显示装置上显示出来。With the rapid development of science and technology today, the conference mode in which people communicate remotely on the network through audio and video terminals has become very common. The video images of one conference site are acquired through cameras, transmitted from the video image to the other conference site, and displayed on the display device of the other conference site.
但若参会时,与会人员仅占据会场空间的一部分,则需要该会场的摄像装置对与会人员进行自动追踪并对焦,否则,在另一方会场显示的画面中,该方与会人员不在画面中间,空余空间占据了画面使得与会人员的画面变小。如此,则不利于双方与会人员进行交流。However, if the participants only occupy a part of the venue space when participating in the conference, the camera device of the venue needs to automatically track and focus the participants. The empty space occupies the screen and makes the screen of the participants smaller. In this way, it is not conducive to the exchanges between the participants on both sides.
在现有的音视频通话产品中,有采用控制电机的方式来进行自动对焦的,但这类产品有时会出错,比如:把焦点对在前景或背景而不是拍摄对象上,或者锁定在其他事物上,若遇到光线暗淡,也会对自动对焦产生很大影响。并且,自动对焦需要一定的时间,其时延比较大,实时性相对较弱。还有的产品则采用硬件转轴来控制镜头的转向,如:加入连接传感器、警报器、云台以及镜头控制器等实现搜索和目标锁定,但在会议场景下,若使用镜头控制器等方式来控制镜头的转向,与会人员为了会场中拍摄的视频效果最佳,会一直在关注摄像头的方向,不利于会议进展。In the existing audio and video calling products, some use the motor to control the auto focus, but these products sometimes make mistakes, such as: focus on the foreground or background instead of the subject, or lock on other things If the light is dim, it will also have a great impact on the autofocus. Moreover, auto-focusing takes a certain amount of time, the time delay is relatively large, and the real-time performance is relatively weak. Other products use hardware hinges to control the steering of the lens, such as: adding connected sensors, alarms, pan-tilts, and lens controllers to achieve search and target locking, but in conference scenarios, if the lens controller is used to Control the direction of the camera, so that the participants will always pay attention to the direction of the camera so that the video shot in the venue will have the best effect, which is not conducive to the progress of the conference.
发明内容SUMMARY OF THE INVENTION
本申请提供一种可以在会场中自动追踪与会人员的视频处理方法及其装置。The present application provides a video processing method and device that can automatically track participants in a conference venue.
本申请提供以下技术方案:This application provides the following technical solutions:
一方面,提供一种视频处理方法,其包括:获取视频传感器捕捉的传感器帧,所述传感器帧为视频传感器捕捉到的整个帧的图像框;检测出所述传感器帧中的目标框,所述目标框为传感器帧中的人体图像框和/或包含人体的图像框;根据所述目标框确定视野框;其中,所述视野框为包括所有所述目标框的图像框;确定所有可决定所述视野框的边界的所述目标框,并确定所有可决定所述视野框的边界的所述目标框是否都静止;当确定所有可决定所述视野框的边界的所述目标框都静止时,输出所述视野框。In one aspect, a video processing method is provided, comprising: acquiring a sensor frame captured by a video sensor, where the sensor frame is an image frame of an entire frame captured by a video sensor; detecting a target frame in the sensor frame, the The target frame is a human body image frame and/or an image frame containing a human body in the sensor frame; a visual field frame is determined according to the target frame; wherein, the visual field frame is an image frame including all the target frames; The target frame of the boundary of the visual field frame, and determine whether all the target frames that can determine the boundary of the visual field frame are stationary; when it is determined that all the target frames that can determine the boundary of the visual field frame are stationary , output the field of view frame.
又一方面,提供一种视频处理装置,其包括:视频获取单元,用于获取视频传感器捕捉的传感器帧,所述传感器帧为视频传感器捕捉到的整个帧的图像框;人形捕获单元,用于检测出所述传感器帧中的目标框,所述目标框为传感器帧中的人体图像框和/或包含人体的图像框;视频检测单元,用于根据所述目标框确定视野框;确定所有可决定所述视野框的边界的所述目标框,并确定所有可决定所述视野框的边界的所述目标框 是否都静止;其中,所述视野框为包括所有所述目标框的图像框;图像处理单元,当确定所有可决定所述视野框的边界的所述目标框都静止时,输出所述视野框。In another aspect, a video processing device is provided, comprising: a video acquisition unit for acquiring a sensor frame captured by a video sensor, where the sensor frame is an image frame of an entire frame captured by the video sensor; a humanoid capture unit for A target frame in the sensor frame is detected, and the target frame is a human body image frame and/or an image frame containing a human body in the sensor frame; a video detection unit is used to determine a field of view frame according to the target frame; determining the target frame of the boundary of the field of view frame, and determining whether all the target frames that can determine the boundary of the field of view frame are still; wherein, the field of view frame is an image frame including all the target frames; The image processing unit outputs the view frame when it is determined that all the target frames that can determine the boundary of the view frame are stationary.
本申请的有益效果在于,通过传感器获取完整图像,并对传感器帧中的人体进行检测确定需要显示给用户观看的图像范围,即视野框。当确定会场中对输出的目标框有影响的人都已处于静止状态时,就该视野框输出并显示出来。由于,是对每一帧传感器帧都会需要进行实时监测的,因此,可以实时的捕捉到会场与会人员的位置变化,当目标框的移动影响到视野框的边界时,根据本申请的方案,则会重新计算新的视野框并输出,由此,可对会场中与会人员进行自动的、实时的追踪。The beneficial effect of the present application is that a complete image is acquired through the sensor, and the human body in the sensor frame is detected to determine the image range that needs to be displayed to the user, that is, the field of view frame. When it is determined that all the people in the venue who have an influence on the output target frame are in a stationary state, the visual field frame is output and displayed. Because real-time monitoring is required for each sensor frame, the position changes of the participants at the venue can be captured in real time. When the movement of the target frame affects the boundary of the field of view frame, according to the solution of the present application, then The new field of view will be recalculated and output, thus enabling automatic, real-time tracking of participants in the venue.
附图说明Description of drawings
图1为本申请实施方式应用的系统架构图。FIG. 1 is a system architecture diagram of an application of an embodiment of the present application.
图2为本申请实施方式一提供的一种视频处理方法的流程图。FIG. 2 is a flowchart of a video processing method according to Embodiment 1 of the present application.
图3为本申请实施方式一中根据目标框确定视野框具体步骤的流程图。FIG. 3 is a flowchart of specific steps of determining a field of view frame according to a target frame in Embodiment 1 of the present application.
图4为本申请实施方式一中对所有目标框进行上下扩充的示意图。FIG. 4 is a schematic diagram of extending up and down all target frames in Embodiment 1 of the present application.
图5为本申请实施方式一中视野框的示意图。FIG. 5 is a schematic diagram of a field of view frame in Embodiment 1 of the present application.
图6为本申请实施方式一中确定可决定视野框的边界的目标框的流程图。FIG. 6 is a flowchart of determining a target frame that can determine the boundary of the field of view frame in Embodiment 1 of the present application.
图7为本申请实施方式一中对传感器帧进行裁剪获得视野框的示意图。FIG. 7 is a schematic diagram of cropping a sensor frame to obtain a field of view frame in Embodiment 1 of the present application.
图8为本申请实施方式一中对视频图像进行平滑处理的示意图。FIG. 8 is a schematic diagram of smoothing a video image in Embodiment 1 of the present application.
图9为本申请实施方式二提供的一种视频处理装置的方框示意图。FIG. 9 is a schematic block diagram of a video processing apparatus according to Embodiment 2 of the present application.
图10本申请实施方式三提供的一种视频处理装置的结构示意图。FIG. 10 is a schematic structural diagram of a video processing apparatus according to Embodiment 3 of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施方式,对本申请进行进一步详细说明。应当理解,此处所描述的实施方式仅用以解释本申请,并不用于限定本申请。但是,本申请可以以多种不同的形式来实现,并不限于本文所描述的实施方式。相反地,提供这些实施方式的目的是使对本实用新型的公开内容的理解更加透彻全面。In order to make the objectives, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the embodiments described herein are only used to explain the present application, but not to limit the present application. However, the present application may be implemented in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided so that a thorough and complete understanding of the present disclosure is provided.
除非另有定义,本文所实用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施方式的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein in the specification of the application are for the purpose of describing particular embodiments only, and are not intended to limit the application.
应理解,本文中术语“系统”或“网络”在本文中常被可互换使用。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that the terms "system" or "network" are often used interchangeably herein. The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently B these three cases. In addition, the character "/" in this document generally indicates that the related objects are an "or" relationship.
本申请实施例可以应用于各种摄像装置或系统中,例如:摄像装置、网络摄像装置,音视频会议的会议终端,具体用于何种装置或系统,本申请实施方式对此不做限定。The embodiments of the present application may be applied to various camera devices or systems, for example, a camera device, a network camera device, and a conference terminal of an audio-video conference, and the specific device or system is not limited by the embodiments of the present application.
请参看图1,其示出了本申请实施方式应用的系统架构图100。该系统架构100包括:摄像装置110、主处理装置120以及显示装置130。该摄像装置110、该主处理装置120以及该显示装置130之间可以通过电连接、网络连接、通信连接等方式之一进行通信连接。其中,该摄像装置110包括视频传感器,用于获取传感器帧,所述主处理装置120对所述传感器帧进行处理后,将视野框发送给该显示装置130进行显示。Please refer to FIG. 1 , which shows a system architecture diagram 100 applied by an embodiment of the present application. The system architecture 100 includes: a camera device 110 , a main processing device 120 and a display device 130 . The camera device 110 , the main processing device 120 , and the display device 130 may be communicatively connected through one of electrical connection, network connection, communication connection, and the like. The camera device 110 includes a video sensor for acquiring sensor frames. After the main processing device 120 processes the sensor frames, the field of view frame is sent to the display device 130 for display.
其中,摄像装置110、主处理装置120以及显示装置130可以是三个相互独立的硬件实体;或者,也可以是摄像装置110与主处理装置120是设置于同一个硬件实体中,例如摄像设备中除了包含视频传感器,还包括对视频图像进行处理的装置;又或者,也可以是该主处理装置120与该显示装置130是设置于同一个硬件实体中,例如,在该显示装置130中除了包括显示器,还包括对视频图像进行处理的装置,摄像装置110将获取的视野框发送给该显示装置130,该显示装置130对该视野框进行处理之后,再由显示器显示出来。具体的,摄像装置110可以是摄像头,显示装置130可以是显示器、投影仪、电脑屏幕等,主处理装置120则可以是内置于摄像装置110、显示装置130内部的处理装置,也可以是一个独立的处理装置,如分别可以与这项装置110与显示装置130进行通信的电脑或其他电子设备,如移动智能电子设备。The camera device 110, the main processing device 120 and the display device 130 may be three mutually independent hardware entities; alternatively, the camera device 110 and the main processing device 120 may be set in the same hardware entity, for example, in a camera device In addition to including a video sensor, it also includes a device for processing video images; alternatively, the main processing device 120 and the display device 130 may be set in the same hardware entity, for example, in the display device 130, in addition to including The display also includes a device for processing video images. The camera device 110 sends the acquired field of view frame to the display device 130, and the display device 130 processes the field of view frame before displaying it on the display. Specifically, the camera device 110 may be a camera, the display device 130 may be a display, a projector, a computer screen, etc., and the main processing device 120 may be a processing device built into the camera device 110 and the display device 130, or an independent processing device The processing device, such as a computer or other electronic device, such as a mobile intelligent electronic device, which can communicate with the device 110 and the display device 130, respectively.
在会议场景下,开会的场所是固定的,在中小型会场中,摄像头使用一个高清的广角镜头即可获取整个会场范围内的图像。因此,摄像头可以实时捕捉到每个与会人员。以下,在本申请中将视频传感器捕捉到的整个帧的图像框称之为传感器帧,将传感器帧中的人体图像框和/或包含人体的图像框称之为目标框,将包括所有所述目标框的图像框称之为视野框。以下将通过具体的实施方式对本申请的技术方案进行阐述。In the conference scene, the meeting place is fixed. In small and medium-sized conference venues, the camera can use a high-definition wide-angle lens to obtain images of the entire venue. As a result, cameras can capture every participant in real time. Hereinafter, in this application, the image frame of the entire frame captured by the video sensor is referred to as the sensor frame, and the human body image frame and/or the image frame containing the human body in the sensor frame is referred to as the target frame, which will include all the above The image frame of the target frame is called the field of view frame. The technical solutions of the present application will be described below through specific embodiments.
实施方式一Embodiment 1
请参看图2,为本申请实施方式一提供的一种视频处理方法。该方法可以应用于具有视频处理能力的摄像装置110中,可以应用于具有视频处理能力的显示装置130中,还可以应用于独立的主处理装置120中。该视频处理方法包括:Please refer to FIG. 2 , which shows a video processing method provided in Embodiment 1 of the present application. The method can be applied to the camera device 110 with video processing capability, can be applied to the display device 130 with video processing capability, and can also be applied to the independent main processing device 120 . The video processing method includes:
S210,获取视频传感器捕捉的传感器帧,所述传感器帧为视频传感器捕捉到的整个帧的图像框;可选的,获取的是高清广角摄像头捕捉的传感帧,如摄像头中的镜头部分采用4K的镜头(500万像素或500万像素以上),且为广角镜头,以便于在多人会议场景中容纳有更多的参会者时,也能保证将所有的与会者都纳入到镜头的可视范围中, 同时也能保证视频的清晰度;摄像头中的传感器(Sensor)主要是将所述镜头接收的光信号转成电信号后,再该电信号(即视频信号)以实时图像帧传给主处理装置120;S210: Acquire a sensor frame captured by a video sensor, where the sensor frame is an image frame of an entire frame captured by the video sensor; optionally, acquire a sensor frame captured by a high-definition wide-angle camera, for example, the lens part of the camera adopts 4K The lens (5 million pixels or more) and a wide-angle lens, so that when there are more participants in a multi-person conference scene, it can also ensure that all participants are included in the view of the lens. In the range, it can also ensure the clarity of the video; the sensor in the camera mainly converts the optical signal received by the lens into an electrical signal, and then the electrical signal (ie the video signal) is transmitted to the real-time image frame. main processing device 120;
S220,检测出所述传感器帧中的目标框,所述目标框为传感器帧中的人体图像框和/或包含人体的图像框;可选的,检测出人体的方法包括但不限于人脸检测,上半身检测,下半身检测,人体姿态估计(SPPE,DensePose)等方法;需说明的是,本申请中所称人体可以是包含人的全部形体,也可以是指全部形体中的一部分,如脸部或者上半身;S220, detect a target frame in the sensor frame, where the target frame is a human body image frame and/or an image frame including a human body in the sensor frame; optionally, a method for detecting a human body includes but is not limited to face detection , upper body detection, lower body detection, human body pose estimation (SPPE, DensePose) and other methods; it should be noted that the human body referred to in this application may include the entire body of a person, or may refer to a part of the entire body, such as a face or upper body;
S230,根据所述目标框确定视野框;其中,所述视野框为包括所有所述目标框的图像框;S230, determining a field of view frame according to the target frame; wherein, the field of view frame is an image frame including all the target frames;
S240,确定所有可决定所述视野框的边界的所述目标框,并确定所有可决定所述视野框的边界的所述目标框是否都静止;S240, determine all the target frames that can determine the boundary of the field of view frame, and determine whether all the target frames that can determine the boundary of the field of view frame are static;
S250,当确定所有可决定所述视野框的边界的所述目标框都静止时,输出所述视野框。可选的,输出后,可在运行本方法的装置上直接显示该视野框,也可以是通过无线或有限传输的方式,输出给其他显示设备显示该视野框。S250, when it is determined that all the target frames that can determine the boundary of the view frame are stationary, output the view frame. Optionally, after the output, the field of view frame may be directly displayed on the device running the method, or the field of view frame may be output to other display devices to display the field of view frame by means of wireless or limited transmission.
请参看图3,可选的,S230,根据所述目标框确定视野框,包括:Referring to FIG. 3, optionally, S230, determining a visual field frame according to the target frame, including:
S231,对所有所述目标框上下各扩充一定比例的高度;S231, expand the height of a certain proportion to the upper and lower parts of all the target frames;
请参看图4,对所有目标框上下个扩充一定比例的高度,如e*H,e为比例系数,H为对应目标框的高度;Please refer to Fig. 4, expand the height of a certain proportion to the upper and lower of all target frames, such as e*H, e is the scale coefficient, and H is the height of the corresponding target frame;
S232,确定一个能将所有扩充后的所述目标框都包含进去的最小框,为所述视野框;S232, determine a minimum frame that can contain all the expanded target frames, which is the field of view frame;
请参看图5,画出一个能将所有扩充后的目标框都包括进去的最小框View OReferring to Figure 5, draw a minimum frame View O that can include all the expanded target frames.
可选的,在上述步骤S231与S232的基础上,确定了需要显示给用户观看的范围,但是,此时的视野框可能还不符合显示的尺寸大小,或者,不符合显示的长款比例等要求。还可进一步的对视野框进行调凭证。因此,S230,根据所述目标框确定视野框,还可包括以下调整方式一和/或调整方式二。Optionally, on the basis of the above steps S231 and S232, the range that needs to be displayed to the user is determined. However, the field of view frame at this time may not conform to the displayed size, or does not conform to the displayed length ratio, etc. Require. You can further adjust the visual field frame. Therefore, in S230, the visual field frame is determined according to the target frame, and may further include the following adjustment mode 1 and/or adjustment mode 2.
请继续查看图3,调整方式一:对视野框的尺寸大小进行调整。即,步骤S230,还包括:Please continue to view Figure 3, adjustment method 1: adjust the size of the field of view frame. That is, step S230 further includes:
S233,若所述视野框的四个顶点坐标超出了视野框的最大边界坐标,则以所述最大边界坐标替代所述视野框的四个顶点坐标;和/或,S233, if the coordinates of the four vertices of the view frame exceed the maximum boundary coordinates of the view frame, replace the coordinates of the four vertices of the view frame with the maximum boundary coordinates; and/or,
S234,若所述视野框的高度值小于所述视野框的最小高度值,则调整所述视野框的高度值为所述视野框的最小高度值;和/或,S234, if the height value of the view frame is smaller than the minimum height value of the view frame, adjust the height value of the view frame to the minimum height value of the view frame; and/or,
S235,若所述视野框的宽度值小于所述视野框的最小宽度值,则调整所述视野框的宽度值为所述视野框的最小宽度值。S235 , if the width value of the view frame is smaller than the minimum width value of the view frame, adjust the width value of the view frame to the minimum width value of the view frame.
举例说明,预设视野框的最大值为View max和最小宽、高分别为W min,H min。其中,View max一般预定义为sensor原图的大小,W min,H min根据需要放大的sensor原图局部区域设定,W min,H min设定的越小,则可放大的局部区域就越小。则视野框的坐标不能越出View max,且宽/高数值不能小于W min/H min,对最小框View O越界或者不足的坐标进行修正。经过坐标修正后得到的视野框记为View FFor example, the maximum value of the preset visual field frame is View max and the minimum width and height are W min and H min respectively. Among them, View max is generally predefined as the size of the sensor original image, W min , H min are set according to the local area of the sensor original image that needs to be enlarged, and the smaller W min and H min are set, the larger the local area that can be enlarged Small. Then, the coordinates of the visual field frame cannot exceed View max , and the width/height values cannot be less than W min /H min , and the coordinates of the minimum frame View O exceeding the boundary or insufficient are corrected. The field of view frame obtained after coordinate correction is marked as View F .
具体修正规则如下:The specific amendment rules are as follows:
View O框的4点坐标都必须在View max坐标范围内,对于超出最大边界的坐标以最大边界坐标代替。 The coordinates of the 4 points of the View O box must be within the range of View max coordinates, and the coordinates beyond the maximum boundary are replaced by the maximum boundary coordinates.
View O的宽/高数值都必须大于等于W min/H min,如View O的宽/高不足W min/H min,则将View O的宽/高补足到W min/H minThe width/height values of View O must be greater than or equal to W min /H min . If the width/height of View O is less than W min /H min , the width/height of View O is supplemented to W min /H min .
可选的,步骤S234具体包括:将所述视野框的最小高度值与所述视野框的高度值的差值的二分之一各补充到所述视野框的上下边界,若补充之后的所述视野框的上边界或者下边界超出了所述视野框的最大边界,则将超出所述最大边界的坐标以最大边界坐标代替,同时将超出所述最大边界的数值补充至对面的边界。Optionally, step S234 specifically includes: supplementing one-half of the difference between the minimum height value of the field of view frame and the height value of the field of view frame to the upper and lower boundaries of the field of view frame. If the upper boundary or lower boundary of the visual field frame exceeds the maximum boundary of the visual field frame, the coordinates beyond the maximum boundary are replaced by the maximum boundary coordinates, and the numerical value beyond the maximum boundary is supplemented to the opposite boundary.
可选的,步骤S235具体包括:将所述视野框的最小宽度值与所述视野框的宽度值的差值的二分之一各补充到所述视野框的左右边界,若补充之后的所述视野框的左边界或右边界超出了所述视野框的最大边界,则将超出所述最大边界的坐标以所述最大边界的坐标代替,同时将超出所述最大边界的数值补充至对面的边界。Optionally, step S235 specifically includes: supplementing half of the difference between the minimum width value of the field of view frame and the width value of the field of view frame to the left and right boundaries of the field of view frame, if all subsequent additions are added. The left boundary or the right boundary of the visual field frame exceeds the maximum boundary of the visual field frame, then the coordinates beyond the maximum boundary are replaced with the coordinates of the maximum boundary, and the numerical value beyond the maximum boundary is added to the opposite side. boundary.
请继续查看图3,调整方式二:对视野框的长宽比例进行调整。即,步骤S230还包括:Please continue to view Figure 3, adjustment method 2: adjust the aspect ratio of the field of view frame. That is, step S230 further includes:
S236,根据当前视频分辨率的宽高比例,调整所述视野框的宽度值和/或高度值。经过S236步骤调整后得到的视野框记为View,在较佳实施例中,此视野框即为输出并显示给用户的视野框。S236: Adjust the width value and/or the height value of the field of view frame according to the aspect ratio of the current video resolution. The field of view frame obtained after adjustment in step S236 is marked as View. In a preferred embodiment, the field of view frame is the field of view frame that is output and displayed to the user.
在本申请的具体实施方式中,上述对视野框的调整方式一与调整方式二可选择其一使用,也可以两种调整方式均使用,先用调整方式一调整大小,再用调整方式二调整长宽比例。In the specific embodiment of the present application, the above-mentioned adjustment method 1 and adjustment method 2 of the field of view frame can be used either, or both adjustment methods can be used, first adjust the size with adjustment method 1, and then adjust with adjustment method 2 Aspect ratio.
将上述步骤S231至S236的步骤总结为一个视野框计算函数:The above steps S231 to S236 are summarized into a visual frame calculation function:
Figure PCTCN2021120411-appb-000001
Figure PCTCN2021120411-appb-000001
其中,Rect ti为ti时刻检测到的目标框集合。 Among them, Rect ti is the set of target frames detected at time ti.
请参看图6,可选的,S240中,确定所述可决定所述视野框的边界的所述目标框,具体包括:Referring to FIG. 6, optionally, in S240, determining the target frame that can determine the boundary of the field of view frame specifically includes:
S2411,根据所有的目标框计算得到第一视野框;S2411, calculating and obtaining a first field of view frame according to all target frames;
S2412,删除一个所述目标框;S2412, delete a described target frame;
S2413,根据剩余的所述目标框计算得到第二视野框;S2413, calculating and obtaining a second field of view frame according to the remaining target frame;
S2414,当所述第一视野框与所述第二视野框不相等时,确定删除的所述目标框为所述可决定所述视野框的边界的目标框。所谓的第一视野框与第二视野框相等,是第一视野框的边界坐标与第二视野框的边界坐标相同或相近;所谓的第一视野框与第二视野框不相等,是第一视野框的边界坐标与第二视野框的边界坐标中至少有一个不相同。S2414, when the first view frame and the second view frame are not equal, determine that the deleted target frame is the target frame that can determine the boundary of the view frame. The so-called first view frame is equal to the second view frame, and the boundary coordinates of the first view frame and the second view frame are the same or similar; the so-called first view frame and the second view frame are not equal, and the first At least one of the boundary coordinates of the view frame is different from the boundary coordinates of the second view frame.
具体的,以下将结合所述视野框计算函数阐述,S2411至S2414是如何确定一个目标是否可以决定所述视野框的边界的:Specifically, the following will be described in conjunction with the view frame calculation function, how S2411 to S2414 determine whether a target can determine the boundary of the view frame:
Figure PCTCN2021120411-appb-000002
Figure PCTCN2021120411-appb-000002
Figure PCTCN2021120411-appb-000003
Figure PCTCN2021120411-appb-000003
j∈1,2...n ti j∈1,2...n ti
其中,在ti时刻监测到的目标框Rect ti中去掉一个框rect j,得到一个新的集合
Figure PCTCN2021120411-appb-000004
Figure PCTCN2021120411-appb-000005
作为依据,计算出一个视野框
Figure PCTCN2021120411-appb-000006
如果
Figure PCTCN2021120411-appb-000007
则说明目标框rect j不会影响到视野框的计算结果,反之如果
Figure PCTCN2021120411-appb-000008
则说明目标框rect j会决定视野框的边界坐标的判定。取Rect ti中所有会决定视野框的判定的目标框,得到DecistionRect ti。DecistionRect ti就是在ti时刻,可决定视野框View边界的目标框集合。可选的,S240中,确定所有可决定所述视野框的边界的所述目标框都静止,具体包括:
Among them, a frame rect j is removed from the target frame Rect ti monitored at time ti to obtain a new set
Figure PCTCN2021120411-appb-000004
by
Figure PCTCN2021120411-appb-000005
As a basis, a field of view frame is calculated
Figure PCTCN2021120411-appb-000006
if
Figure PCTCN2021120411-appb-000007
It means that the target frame rect j will not affect the calculation result of the field of view frame, otherwise if
Figure PCTCN2021120411-appb-000008
It means that the target frame rect j will determine the boundary coordinates of the view frame. Take all the target frames in Rect ti that will determine the judgment of the view frame, and get DecisionRect ti . DecisionRect ti is the set of target boxes that can determine the boundary of the view frame View at time ti. Optionally, in S240, it is determined that all the target frames that can determine the boundary of the field of view frame are static, specifically including:
S242,若每一所述可决定所述视野框的边界的目标框在预设时间间隔内的运动因子均小于预设阈值,则确定所有可决定所述视野框的边界的所述目标框都为静止状态。S242, if the motion factor of each target frame that can determine the boundary of the field of view frame within a preset time interval is smaller than a preset threshold, then determine that all the target frames that can determine the boundary of the field of view frame are all to a static state.
在本申请的具体实施方式中,确定目标框的运动因子Factor 12的方式如下: In the specific embodiment of the present application, the manner of determining the motion factor Factor 12 of the target frame is as follows:
检测单元接收传感器传输的一个传感器帧后,会对该传感器帧进行实时检测。首先检测出人体,框出包含该人体的目标框,此处称之为目标框1,假定传感器帧左上角为坐标原点(0,0),计算出目标框1的中心点C1坐标为(x1,y1),宽度W1,高度H1,并将保存该结果。After the detection unit receives a sensor frame transmitted by the sensor, it will perform real-time detection on the sensor frame. First, the human body is detected, and the target frame containing the human body is framed, which is called target frame 1 here. Assuming that the upper left corner of the sensor frame is the coordinate origin (0,0), the coordinate of the center point C1 of the target frame 1 is calculated as (x1 ,y1), width W1, height H1, and the result will be saved.
接下来,检测单元接收到该传感器传输的下一帧传感器帧后,同样,也会对该下一帧传感器帧进行实时检测。用同样的方法框出包含人体的目标框2,保存目标框2的中心点C2坐标(x2,y2),长度W2,高度H2。Next, after the detection unit receives the next sensor frame transmitted by the sensor, it also performs real-time detection on the next sensor frame. Use the same method to frame the target frame 2 containing the human body, and save the coordinates of the center point C2 of the target frame 2 (x2, y2), the length W2, and the height H2.
然后按照下面步骤(1)至(5)计算运动因子:Then follow the steps (1) to (5) below to calculate the motion factor:
(1)计算中心点的欧式距离的平方:L c=(x 2-x 1) 2+(y 2-y 1) 2 (1) Calculate the square of the Euclidean distance of the center point: L c =(x 2 -x 1 ) 2 +(y 2 -y 1 ) 2
(2)计算目标框1面积S 1=W 1*H 1 (2) Calculate the area of the target frame 1 S 1 =W 1 *H 1
(3)计算目标框2面积S 2=W 2*H 2 (3) Calculate the area of target frame 2 S 2 =W 2 *H 2
(4)考虑目标框1和目标框2大小不一样,计算宽差值和高差值的乘积的绝对值(4) Considering that the size of target frame 1 and target frame 2 are different, calculate the absolute value of the product of the width difference value and the height difference value
M=|(W 1-W 2)*(H 1-H 2)| M=|(W 1 -W 2 )*(H 1 -H 2 )|
(5)计算目标框1和目标框2的运动因子Facter 12=(L c+M)/(S 1+S 2)。 (5) Calculate the motion factors of target frame 1 and target frame 2 Facter 12 =(L c +M)/(S 1 +S 2 ).
需说明的是,在本申请的具体实施方式中,仅仅只需检测出来是一个人体即可,并不需要根据图像来精确到是哪个具体的人,但可以根据该人体的目标框在限定时间范围内移动的距离,确定是否为同一个人。It should be noted that, in the specific implementation of the present application, it is only necessary to detect a human body, and it is not necessary to determine which specific person it is according to the image, but it can be determined within a limited time according to the target frame of the human body. The distance moved within the range to determine whether it is the same person.
以上是计算两个传感器帧(可以是当前帧与上一帧,或,当前帧与下一帧)之间的运动因子Factor 12。当确定一定时间内T1的运动因子在一预设的阈值范围内(如小于或等于该阈值)时,则确定该目标框为静止;当确定一定时间内T1的运动因子超出(如大于)该阈值时,则确定该目标框为运动状态。其中,运动因子的阈值可以取0.5,此为一经验值,不同条件下会有所差别。T1取值范围为0秒~10秒,如果需要一直对焦到当前正在运动的人,只要T1足够小即可。 The above is to calculate the motion factor Factor 12 between two sensor frames (which can be the current frame and the previous frame, or the current frame and the next frame). When it is determined that the motion factor of T1 within a certain period of time is within a preset threshold range (such as less than or equal to the threshold), the target frame is determined to be static; when it is determined that the motion factor of T1 within a certain period of time exceeds (such as greater than) the target frame When the threshold value is reached, the target frame is determined to be in motion state. Among them, the threshold value of the motion factor can be taken as 0.5, which is an empirical value, which will be different under different conditions. The value of T1 ranges from 0 seconds to 10 seconds. If you need to focus on the person who is currently moving, as long as T1 is small enough.
可选的,还可根据所述视野框对图像进行裁剪和/或缩放,故,请参看图7,S250具体可包括:Optionally, the image may also be cropped and/or scaled according to the field of view frame. Therefore, referring to FIG. 7 , S250 may specifically include:
S251,当确定所有可决定所述视野框的边界的所述目标框都静止时,根据所述视野框View对所述传感器帧进行裁剪和/或缩放,并输出所述裁剪和/或缩放的所述视野框View out。可选的,是通过调用ISP(Image Signal Processor,图像信号处理器)芯片对传感器帧进行裁剪和缩放。 S251, when it is determined that all the target frames that can determine the boundary of the field of view frame are stationary, crop and/or scale the sensor frame according to the field of view frame View, and output the cropped and/or scaled frame The field of view frame View out . Optionally, the sensor frame is cropped and scaled by invoking an ISP (Image Signal Processor, image signal processor) chip.
如图7所示,在传感器帧上按视野框View的坐标进行裁剪,然后将裁剪出的视野框缩放到当前视频输出分辨率的大小(如1080P、720P),最终输出得到用户看到的图像View out。使用ISP芯片处理裁剪缩放过程,相较于用软件算法处理可以节省50%左右的CPU,大幅度的提高芯片性能。 As shown in Figure 7, crop the sensor frame according to the coordinates of the field of view frame View, and then scale the cropped field of view frame to the size of the current video output resolution (such as 1080P, 720P), and finally output the image that the user sees View out . Using the ISP chip to process the clipping and scaling process can save about 50% of the CPU compared to the software algorithm processing, which greatly improves the chip performance.
可选的,由于当前的视野框,与经过S230和S240步骤计算之后的视频的视野框,在坐标上会存在一定的差异,因此,还可对输出的视频图像进行平滑处理,故,S250具体还可包括:Optionally, since the current visual field frame and the visual field frame of the video calculated in the steps of S230 and S240, there will be a certain difference in the coordinates, therefore, the output video image can also be smoothed. Therefore, S250 specifically Also includes:
S252,当确定所有可决定所述视野框的边界的所述目标框都静止时,计算目标视野框与当前视野框之间的差值坐标;S252, when it is determined that all the target frames that can determine the boundary of the field of view frame are stationary, calculate the difference coordinates between the target field of view frame and the current field of view frame;
S253,根据预设的每帧图像的视野框的最大移动步长,计算所述从所述当前视野框移动到所述目标视野框的移动步数;S253, according to the preset maximum moving step size of the field of view frame of each frame of the image, calculate the number of moving steps from the current field of view frame to the target field of view frame;
S254,根据所述移动步数逐帧更新视野框直至达到所述目标视野框。S254: Update the field of view frame frame by frame according to the number of moving steps until the target field of view frame is reached.
请参见图8,对上述视频图像平滑过程进行举例说明。假设当前视野框为View cur,经过S231至236的计算,得出目标视野框为View dst。其中当前视野框与目标视野框之间的所需要移动的距离为:View dist=View dst-View curReferring to FIG. 8, the above video image smoothing process is illustrated by an example. Assuming that the current field of view frame is View cur , after the calculations in S231 to 236 , it is obtained that the target field of view frame is View dst . The required moving distance between the current view frame and the target view frame is: View dist =View dst -View cur .
为使用户看到平滑的图像,每帧图像的视野框按照一个固定的步长进行移动,以避免移动过快。假设视野框坐标值移动最大步长为step max,当前视野框与目标视野框的坐标差值为View dist=(x 0,y 0,x 1,y 1),则移动步数为: In order to make the user see a smooth image, the field of view frame of each frame of image moves according to a fixed step size to avoid moving too fast. Assuming that the maximum step size of the coordinate value of the view frame is step max , and the coordinate difference between the current view frame and the target view frame is View dist = (x 0 , y 0 , x 1 , y 1 ), then the number of moving steps is:
MoveNum=max{x 0,y 0,x 1,y 1}/step maxMoveNum=max{x 0 , y 0 , x 1 , y 1 }/step max .
按照如下步骤逐帧更新View cur,直到到达目标视野框View distFollow the steps below to update View cur frame by frame until it reaches the target view frame View dist :
While View cur≠View dst: While View cur ≠View dst :
View step=View dist/MoveNum View step =View dist /MoveNum
View cur←View cur+View step View cur ←View cur +View step
即,当View cur与View dst的坐标不重合时,就移动View cur,每次更新的视野框为View step,直至当前视野框View cur达到目标视野框View dstThat is, when the coordinates of View cur and View dst do not coincide, View cur is moved, and the view frame updated each time is View step until the current view frame View cur reaches the target view frame View dst .
上述S2502中的裁剪和/或缩放处理与视频图像的平滑处理,在实际应用中,可以一并采用,例如先进行裁剪和/或缩放处理,再进行视频图像的平滑处理。The cropping and/or scaling processing and the video image smoothing processing in the above S2502 may be used together in practical applications, for example, the cropping and/or scaling processing is performed first, and then the video image smoothing processing is performed.
本申请的具体实施方式一,通过传感器获取的整个会场的画面,并对传感器帧中的人体进行检测确定需要显示给用户观看的图像范围。并根据比对每一帧传感器帧中同一目标框的位置变化,确定该图框框是否处于静止状态。当确定会场中对输出的目标框有影响的人都已处于静止状态时,就将包含所有人体的画面的视野框输出并显示出来。由于,是对每一帧传感器帧都会需要进行实时监测的,因此,即使在与会人员都落座之后,如因某种原因,与会人员的位置发生了改变,如:与会人员原本坐得很紧凑,后来变为做得很松散,或者,所有的与会人员从会场的中间位置移动到会场的一侧位置,即,与会人员在会场中占据的位置空间发生了变化,那么根据本申请的具体实施方一所阐述的视频处理方法,可以实时的捕捉到这一变化,待与会人员重新落座之后,重新计算新的视野框,输出并显示给用户观看。由于,上述方法无需控制摄像头转动,或重新对焦,仅仅只是对传感器捕捉到的传感器帧重新进行计算获得新的视野框,输出并显示给用户观看即可,由此,可以达到对会场中与会人员进行自动的、实时的追踪。并且,使用该方法的装置还可以因此是即插即用的设备。In the specific embodiment 1 of the present application, the image of the entire conference site is acquired by the sensor, and the human body in the sensor frame is detected to determine the image range that needs to be displayed to the user. And according to comparing the position change of the same target frame in each sensor frame, it is determined whether the picture frame is in a static state. When it is determined that all the people in the venue who have an influence on the output target frame have been in a still state, the visual field frame of the picture including all the human bodies is output and displayed. Since real-time monitoring is required for each sensor frame, even after the participants are seated, for some reason, the positions of the participants have changed. Later, it became very loose, or all the participants moved from the middle of the venue to one side of the venue, that is, the position space occupied by the participants in the venue changed, then according to the specific implementation of the present application The described video processing method can capture this change in real time, and after the participants are seated again, a new field of view frame is recalculated, output and displayed for the user to watch. Because the above method does not need to control the rotation of the camera or refocus, it just recalculates the sensor frame captured by the sensor to obtain a new field of view, and outputs and displays it to the user. Perform automatic, real-time tracking. Also, the apparatus using the method can thus also be a plug-and-play device.
实施方式二Embodiment 2
请参看图9,为本申请实施方式二提供的一种视频处理装置300,该视频处理装置包括:Please refer to FIG. 9 , a video processing apparatus 300 provided in Embodiment 2 of the present application, the video processing apparatus includes:
视频获取单元310,用于获取视频传感器捕捉的传感器帧,所述传感器帧为视频传感器捕捉到的整个帧的图像框;可选的,视频获取单元310获取的是高清广角摄像头捕捉的传感帧;The video acquisition unit 310 is configured to acquire the sensor frame captured by the video sensor, where the sensor frame is the image frame of the entire frame captured by the video sensor; optionally, the video acquisition unit 310 acquires the sensor frame captured by the high-definition wide-angle camera ;
人形捕获单元320,用于检测出所述传感器帧中的目标框,所述目标框为传感器帧中的人体图像框和/或包含人体的图像框;A humanoid capture unit 320, configured to detect a target frame in the sensor frame, where the target frame is a human body image frame and/or an image frame containing a human body in the sensor frame;
视频检测单元330,用于根据所述目标框确定视野框;确定所有可决定所述视野框的边界的所述目标框,并确定所有可决定所述视野框的边界的所述目标框是否都静止;其中,所述视野框为包括所有所述目标框的图像框;The video detection unit 330 is configured to determine a visual field frame according to the target frame; determine all the target frames that can determine the boundary of the visual field frame, and determine whether all the target frames that can determine the boundary of the visual field frame are all Still; wherein, the field of view frame is an image frame including all the target frames;
图像处理单元340,当确定所有可决定所述视野框的边界的所述目标框都静止时,输出所述视野框。The image processing unit 340 outputs the view frame when it is determined that all the target frames that can determine the boundary of the view frame are stationary.
可选的,所述视频检测单元330,具体用于当确定所述目标框都静止时,对所有所述目标框上下各扩充一定比例的高度;以及,确定一个能将所有扩充后的所述目标框都包含进去的最小框,为所述视野框。所述目标框上下各扩充一定比例的高度的具体方式请参见实施方式一中S231的内容,在此不做赘述。Optionally, the video detection unit 330 is specifically configured to, when it is determined that the target frames are static, expand the heights of all the target frames up and down by a certain proportion; The smallest frame included in the target frame is the field of view frame. Please refer to the content of S231 in Embodiment 1 for the specific manner of expanding the height of the target frame at the top and bottom of the target frame by a certain proportion, which will not be repeated here.
可选的,所述视频检测单元330,还用于若所述视野框的四个顶点坐标超出了视野框的最大边界坐标,则以所述最大边界坐标替代所述视野框的四个顶点坐标;和/或,若所述视野框的高度值小于所述视野框的最小高度值,则调整所述视野框的高度值为所述视野框的最小高度值;和/或,若所述视野框的宽度值小于所述视野框的最小宽度值,则调整所述视野框的宽度值为所述视野框的最小宽度值。Optionally, the video detection unit 330 is further configured to replace the four vertex coordinates of the view frame with the maximum boundary coordinates if the coordinates of the four vertices of the view frame exceed the maximum boundary coordinates of the view frame. and/or, if the height value of the field of view frame is less than the minimum height value of the field of view frame, then adjust the height value of the field of view frame to the minimum height value of the field of view frame; and/or, if the field of view If the width value of the frame is smaller than the minimum width value of the view frame, then adjust the width value of the view frame to the minimum width value of the view frame.
可选的,所述视频检测单元330,具体用于:Optionally, the video detection unit 330 is specifically used for:
若所述视野框的高度值小于所述视野框的最小高度值,将所述视野框的最小宽度值与所述视野框的宽度值的差值的二分之一各补充到所述视野框的左右边界,若补充之后的所述视野框的左边界或右边界超出了所述视野框的最大边界,则将超出所述最大边界的坐标以所述最大边界的坐标代替,同时将超出所述最大边界的数值补充至对面的边界;和/或,If the height value of the view frame is smaller than the minimum height value of the view frame, add half of the difference between the minimum width value of the view frame and the width value of the view frame to the view frame If the left or right boundary of the visual field frame after supplementation exceeds the maximum boundary of the visual field frame, the coordinates that exceed the maximum boundary will be replaced by the coordinates of the maximum boundary, and the coordinates that exceed the maximum boundary will be replaced at the same time. The value of the maximum boundary is supplemented to the opposite boundary; and/or,
若所述视野框的宽度值小于所述视野框的最小宽度值,将所述视野框的最小高度值与所述视野框的高度值的差值的二分之一各补充到所述视野框的上下边界,若补充之后的所述视野框的上边界或者下边界超出了所述视野框的最大边界,则将超出所述最大边界的坐标以最大边界坐标代替,同时将超出所述最大边界的数值补充至对面的边界。If the width value of the view frame is smaller than the minimum width value of the view frame, add half of the difference between the minimum height value of the view frame and the height value of the view frame to the view frame If the upper and lower boundaries of the field of view frame after supplementation exceed the maximum boundary of the field of view frame, the coordinates that exceed the maximum boundary will be replaced by the maximum boundary coordinates, and the maximum boundary will be exceeded at the same time. The value of is added to the opposite boundary.
可选的,其中,所述视频检测单元330,还用于根据当前视频分辨率的宽高比例,调整所述视野框的宽度值和/或高度值。Optionally, the video detection unit 330 is further configured to adjust the width value and/or the height value of the field of view frame according to the aspect ratio of the current video resolution.
实施方式二中对视野框的进行调整的具体示例请参见实施方式一中S231至236中的详细描述,在此不做重复赘述。For a specific example of adjusting the field of view frame in the second embodiment, please refer to the detailed descriptions in S231 to 236 in the first embodiment, which will not be repeated here.
可选的,所述视频检测单元330,用于确定所述可决定所述视野框的边界的所述目标框,包括:Optionally, the video detection unit 330, configured to determine the target frame that can determine the boundary of the field of view frame, includes:
所述视频检测单元330,具体用于根据所有的目标框计算得到第一视野框;删除一个所述目标框;根据剩余的所述目标框计算得到第二视野框;当所述第一视野框与所述第二视野框不相等时,确定删除的所述目标框为所述可决定所述视野框的边界的目标框。具体的,所述检测单元330是如何通过计算确定某一目标框是否为可以确定视野框边界的目标框,请参看实施方式一中的描述,在此不做重复赘述。The video detection unit 330 is specifically configured to calculate and obtain a first field of view frame according to all target frames; delete one of the target frames; calculate and obtain a second field of view frame according to the remaining target frames; When it is not equal to the second field of view frame, it is determined that the target frame to be deleted is the target frame that can determine the boundary of the field of view frame. Specifically, how the detection unit 330 determines by calculation whether a certain target frame is a target frame that can determine the boundary of the field of view frame, please refer to the description in Embodiment 1, which will not be repeated here.
可选的,其中,所述视频检测单元330,用于确定所有可决定所述视野框的边界的所述目标框都静止,包括:Optionally, wherein the video detection unit 330 is configured to determine that all the target frames that can determine the boundary of the field of view frame are static, including:
所述视频检测单元330,具体用于当每一所述可决定所述视野框的边界的目标框在预设时间间隔内的运动因子均小于预设阈值时,则确定所有可决定所述视野框的边界的所述目标框都为静止状态。具体的,所述视频检测单元330是如何通过计算确定某一目标框是否处于静止状态的,请参看实施方式一中的具体描述,在此不做重复赘述。The video detection unit 330 is specifically configured to determine all the target frames that can determine the boundary of the field of view when the motion factors within the preset time interval are all smaller than a preset threshold. The bounding boxes of the target boxes are in a stationary state. Specifically, how the video detection unit 330 determines whether a certain target frame is in a static state through calculation, please refer to the specific description in Embodiment 1, which will not be repeated here.
可选的,其中,所述图像处理单元340,具体用于当确定所有可决定所述视野框的边界的所述目标框都静止时,根据所述视野框对所述传感器帧进行裁剪和/或缩放,并输出所述视野框。具体的,Optionally, the image processing unit 340 is specifically configured to, when it is determined that all the target frames that can determine the boundary of the field of view frame are stationary, crop and/or crop the sensor frame according to the field of view frame. or zoom, and output the field of view. specific,
可选的,其中,所述图像处理单元340,具体用于当确定所有可决定所述视野框的边界的所述目标框都静止时,根据所述视野框对所述传感器帧进行裁剪和/或缩放;并计算目标视野框与当前视野框之间的差值坐标;根据预设的每帧图像的视野框的最大移动步长,计算所述从所述当前视野框移动到所述目标视野框的移动步数;根据所述移动步数逐帧更新视野框直至达到所述目标视野框。具体的,所述图像处理单元340是如何逐步更新当前视野框直至达到目标视野框的,请参看实施方式一中的S252至S254中的举例,在此不做重复赘述。Optionally, the image processing unit 340 is specifically configured to, when it is determined that all the target frames that can determine the boundary of the field of view frame are stationary, crop and/or crop the sensor frame according to the field of view frame. or zoom; and calculate the difference coordinates between the target field of view frame and the current field of view frame; according to the preset maximum movement step size of the field of view frame of each frame of image, calculate the movement from the current field of view frame to the target field of view The number of moving steps of the frame; the field of view frame is updated frame by frame according to the number of moving steps until the target field of view frame is reached. Specifically, how the image processing unit 340 gradually updates the current field of view frame until reaching the target field of view frame, please refer to the examples in S252 to S254 in the first embodiment, which will not be repeated here.
该视频处理装置300以是内置有视频处理功能的摄像装置,如图1中摄像装置110与主处理装置120的结合;也可以是内置有视频处理功能的显示装置(如电脑或智能电子设备),如图1中主处理装置120与显示装置130的结合;也可以是一个在硬件上独立的电子装置。在本申请中不做限定。The video processing device 300 is a camera device with a built-in video processing function, such as the combination of the camera device 110 and the main processing device 120 in FIG. 1 ; it can also be a display device (such as a computer or an intelligent electronic device) with a built-in video processing function. , such as the combination of the main processing device 120 and the display device 130 in FIG. 1 ; it can also be an electronic device independent of hardware. Not limited in this application.
本实施方式二中有不详尽之处,请参见上述实施方式一中相同或对应的部分,在此不做重复赘述。For the non-detailed parts in the second embodiment, please refer to the same or corresponding parts in the above-mentioned first embodiment, which will not be repeated here.
实施方式三Embodiment 3
请参看图10,本申请实施方式三提供的一种视频处理装置400的结构示意图。该视频处理装置400包括:处理器410、存储器420以及通信接口430。处理器410、存储器420与通信接口430之间通过总线系统实现相互的通信连接。Please refer to FIG. 10 , which is a schematic structural diagram of a video processing apparatus 400 according to Embodiment 3 of the present application. The video processing apparatus 400 includes: a processor 410 , a memory 420 and a communication interface 430 . The processor 410, the memory 420 and the communication interface 430 are connected to each other through a bus system.
该处理器410可以是一个独立的元器件,也可以是多个处理元件的统称。例如,可以是CPU,也可以是ASIC,或者被配置成实施以上方法的一个或多个集成电路,如至少一个微处理器DSP,或至少一个可编程门这列FPGA等。存储器420为一计算机可读存储介质,其上存储可在处理器410上运行的程序。The processor 410 may be an independent component, or may be a collective term for multiple processing components. For example, it may be a CPU, an ASIC, or one or more integrated circuits configured to implement the above method, such as at least one microprocessor DSP, or at least one programmable gate FPGA, etc. The memory 420 is a computer-readable storage medium on which programs executable on the processor 410 are stored.
处理器410调用存储器420中的程序,执行上述实施方式一提供的一种视频处理方法,并通过通信接口430将处理器410获得的结果,通过无线或有线的方式,传输给其他装置。The processor 410 invokes the program in the memory 420 to execute a video processing method provided in the first embodiment, and transmits the result obtained by the processor 410 to other devices through the communication interface 430 in a wireless or wired manner.
可选的,该视频处理装置400还可包括摄像头440。该摄像头440获取传感器帧,并将其发送给所述处理410,所述处理器410调取该存储器420中的程序,执行上述实施方式一提供的一种视频处理方法,对该传感器帧进行处理,并通过通信接口430将结果,通过无线或有线的方式,传输给其他装置。Optionally, the video processing apparatus 400 may further include a camera 440 . The camera 440 acquires the sensor frame and sends it to the processing 410, and the processor 410 calls the program in the memory 420, executes the video processing method provided in the first embodiment above, and processes the sensor frame , and transmit the result to other devices through the communication interface 430 in a wireless or wired manner.
本实施方式三中有不详尽之处,请参见上述实施方式一中相同或对应的部分,在此不做重复赘述。For details that are not detailed in the third embodiment, please refer to the same or corresponding parts in the above-mentioned first embodiment, which will not be repeated here.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请具体实施方式所描述的功能可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成。软件模块可以被存放于计算机可读存储介质中,所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(Digital Video Disc,DVD))、或者半导体介质(例如,固态硬盘(Solid State Disk,SSD))等。所述计算机可读存储介质包括但不限于随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质。一种示例性的计算机可读存储介质耦合至处理器,从而使处理器能够从该计算机可读存储介质读取信息,且可向该计算机可读存储介质写入信息。当然,计算机可读存储介质也可以是处理器的组成部分。处理器和计算机可读存储介质可以位于ASIC中。另外,该ASIC可以位于接入网设备、目标网络设备或核心网设备中。当然,处理器和计算机可读存储介质也可 以作为分立组件存在于接入网设备、目标网络设备或核心网设备中。当使用软件实现时,也可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机或芯片上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请具体实施方式所述的流程或功能,该芯片可包含有处理器。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机程序指令可以存储在上述计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。Those skilled in the art should realize that, in one or more of the above examples, the functions described in the specific embodiments of the present application may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it may be implemented by a processor executing software instructions. The software instructions may consist of corresponding software modules. The software modules may be stored in a computer-readable storage medium, which may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes one or more available mediums integrated. The available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, Digital Video Disc (DVD)), or semiconductor media (eg, Solid State Disk (SSD)) )Wait. The computer-readable storage medium includes but is not limited to random access memory (Random Access Memory, RAM), flash memory, read only memory (Read Only Memory, ROM), Erasable Programmable Read Only Memory (Erasable Programmable ROM, EPROM) ), Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), registers, hard disks, removable hard disks, compact disks (CD-ROMs), or any other form of storage medium known in the art. An exemplary computer-readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the computer-readable storage medium. Of course, the computer-readable storage medium can also be an integral part of the processor. The processor and computer-readable storage medium may reside in an ASIC. Additionally, the ASIC may reside in access network equipment, target network equipment or core network equipment. Of course, the processor and the computer-readable storage medium may also exist as discrete components in the access network device, the target network device, or the core network device. When implemented in software, it can also be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer or a chip, which may include a processor, all or part of the processes or functions described in the specific embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer program instructions may be stored on or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, from a website site, computer, server, or computer-readable storage medium. The data center transmits to another website site, computer, server or data center by wired (eg coaxial cable, optical fiber, Digital Subscriber Line, DSL) or wireless (eg infrared, wireless, microwave, etc.).
上述实施方式说明但并不限制本发明,本领域的技术人员能在权利要求的范围内设计出多个可代替实例。所属领域的技术人员应该意识到,本申请并不局限于上面已经描述并在附图中示出的精确结构,对在没有违反如所附权利要求书所定义的本发明的范围之内,可对具体实现方案做出适当的调整、修改、、等同替换、改进等。因此,凡依据本发明的构思和原则,所做的任意修改和变化,均在所附权利要求书所定义的本发明的范围之内。The above-described embodiments illustrate but do not limit the present invention, and those skilled in the art can design multiple alternative examples within the scope of the claims. It should be appreciated by those skilled in the art that the application is not limited to the precise structures described above and illustrated in the accompanying drawings, but may be applied without departing from the scope of the invention as defined by the appended claims. Appropriate adjustments, modifications, equivalent substitutions, improvements, etc. are made to the specific implementation scheme. Therefore, any modifications and changes made in accordance with the concept and principles of the present invention are within the scope of the present invention as defined by the appended claims.

Claims (18)

  1. 一种视频处理方法,其特征在于,所述方法包括:A video processing method, characterized in that the method comprises:
    获取视频传感器捕捉的传感器帧,所述传感器帧为视频传感器捕捉到的整个帧的图像框;acquiring a sensor frame captured by the video sensor, where the sensor frame is an image frame of the entire frame captured by the video sensor;
    检测出所述传感器帧中的目标框,所述目标框为传感器帧中的人体图像框和/或包含人体的图像框;detecting a target frame in the sensor frame, where the target frame is a human body image frame and/or an image frame containing a human body in the sensor frame;
    根据所述目标框确定视野框;其中,所述视野框为包括所有所述目标框的图像框;A field of view frame is determined according to the target frame; wherein, the field of view frame is an image frame including all the target frames;
    确定所有可决定所述视野框的边界的所述目标框,并确定所有可决定所述视野框的边界的所述目标框是否都静止;determining all the target boxes that can determine the boundaries of the field of view, and determining whether all the target boxes that can determine the boundaries of the field of view are stationary;
    当确定所有可决定所述视野框的边界的所述目标框都静止时,输出所述视野框。When it is determined that all the target frames that can determine the boundary of the viewing frame are stationary, the viewing frame is output.
  2. 如权利要求1所述的方法,其中,所述根据所述目标框确定视野框,包括:The method of claim 1, wherein the determining a field of view frame according to the target frame comprises:
    对所有所述目标框上下各扩充一定比例的高度;Expand the height of a certain proportion to the upper and lower parts of all the target frames;
    确定一个能将所有扩充后的所述目标框都包含进去的最小框,为所述视野框。A minimum frame that can contain all the expanded target frames is determined as the field of view frame.
  3. 如权利要求2所述的方法,其中,所述根据所述目标框确定视野框,还包括:The method of claim 2, wherein the determining a field of view frame according to the target frame further comprises:
    若所述视野框的四个顶点坐标超出了视野框的最大边界坐标,则以所述最大边界坐标替代所述视野框的四个顶点坐标;和/或,If the coordinates of the four vertices of the view frame exceed the maximum boundary coordinates of the view frame, the maximum boundary coordinates are used to replace the coordinates of the four vertices of the view frame; and/or,
    若所述视野框的高度值小于所述视野框的最小高度值,则调整所述视野框的高度值为所述视野框的最小高度值;和/或,If the height value of the view frame is smaller than the minimum height value of the view frame, adjust the height value of the view frame to the minimum height value of the view frame; and/or,
    若所述视野框的宽度值小于所述视野框的最小宽度值,则调整所述视野框的宽度值为所述视野框的最小宽度值。If the width value of the view frame is smaller than the minimum width value of the view frame, the width value of the view frame is adjusted to the minimum width value of the view frame.
  4. 如权利要求3所述的方法,其中:The method of claim 3, wherein:
    所述调整所述视野框的宽度值为所述视野框的最小宽度值,包括:将所述视野框的最小宽度值与所述视野框的宽度值的差值的二分之一各补充到所述视野框的左右边界,若补充之后的所述视野框的左边界或右边界超出了所述视野框的最大边界,则将超出所述最大边界的坐标以所述最大边界的坐标代替,同时将超出所述最大边界的数值补充至对面的边界;和/或,The adjusting the width value of the view frame to the minimum width value of the view frame includes: adding half of the difference between the minimum width value of the view frame and the width value of the view frame to The left and right borders of the field of view frame, if the left border or right border of the field of view frame after supplementation exceeds the maximum border of the field of view frame, then the coordinates that exceed the maximum border are replaced by the coordinates of the maximum border, At the same time, values beyond said maximum boundary are supplemented to the opposite boundary; and/or,
    所述调整所述视野框的高度值为所述视野框的最小高度值,包括:将所述视野框的最小高度值与所述视野框的高度值的差值的二分之一各补充到所述视野框的上下边界,若补充之后的所述视野框的上边界或者下边界超出了所述视野框的最大边界,则将超出所述最大边界的坐标以最大边界坐标代替,同时将超出所述最大边界的数值补充至对面的边界。The adjusting the height value of the field of view frame to the minimum height value of the field of view frame includes: adding half of the difference between the minimum height value of the field of view frame and the height value of the field of view frame to The upper and lower boundaries of the visual field frame, if the upper boundary or the lower boundary of the visual field frame after supplementation exceeds the maximum boundary of the visual field frame, then the coordinates that exceed the maximum boundary are replaced by the maximum boundary coordinates, and the coordinates that exceed the maximum boundary will be replaced. The value of the maximum boundary is supplemented to the opposite boundary.
  5. 如权利要求2至4中任意一项所述的方法,其中,所述根据所述目标框确定视野框,还包括:The method according to any one of claims 2 to 4, wherein the determining a field of view frame according to the target frame further comprises:
    根据当前视频分辨率的宽高比例,调整所述视野框的宽度值和/或高度值。Adjust the width value and/or the height value of the field of view frame according to the aspect ratio of the current video resolution.
  6. 如权利要求1所述的方法,其中,所述确定所述可决定所述视野框的边界的所述目标框,包括:The method of claim 1, wherein the determining the target frame that can determine the boundary of the field of view comprises:
    根据所有的目标框计算得到第一视野框;Calculate the first field of view frame according to all target frames;
    删除一个所述目标框;delete one of said target boxes;
    根据剩余的所述目标框计算得到第二视野框;A second field of view frame is obtained by calculating according to the remaining target frame;
    当所述第一视野框与所述第二视野框不相等时,确定删除的所述目标框为所述可决定所述视野框的边界的目标框。When the first view frame and the second view frame are not equal, it is determined that the deleted target frame is the target frame that can determine the boundary of the view frame.
  7. 如权利要求1至4以及6中任意一项所述的方法,其中,所述确定所有可决定所述视野框的边界的所述目标框都静止,包括:The method according to any one of claims 1 to 4 and 6, wherein the determining that all the target frames that can determine the boundary of the field of view frame are stationary, comprising:
    若每一所述可决定所述视野框的边界的目标框在预设时间间隔内的运动因子均小于预设阈值,则确定所有可决定所述视野框的边界的所述目标框都为静止状态。If the motion factor of each target frame that can determine the boundary of the field of view frame within the preset time interval is less than a preset threshold, then determine that all the target frames that can determine the boundary of the field of view frame are static state.
  8. 如权利要求1至4以及6中任意一项所述的方法,其中,当确定所有可决定所述视野框的边界的所述目标框都静止时,输出所述视野框,包括:The method according to any one of claims 1 to 4 and 6, wherein, when it is determined that all the target frames that can determine the boundary of the view frame are stationary, outputting the view frame includes:
    当确定所有可决定所述视野框的边界的所述目标框都静止时,根据所述视野框对所述传感器帧进行裁剪和/或缩放,并输出所述裁剪和/或缩放的所述视野框。When it is determined that all the target frames that can determine the boundary of the field of view are stationary, crop and/or zoom the sensor frame according to the field of view, and output the cropped and/or scaled field of view frame.
  9. 如权利要求1至4以及6中任意一项所述的方法,其中,当确定所有可决定所述视野框的边界的所述目标框都静止时,输出所述视野框,包括:The method according to any one of claims 1 to 4 and 6, wherein, when it is determined that all the target frames that can determine the boundary of the view frame are stationary, outputting the view frame includes:
    当确定所有可决定所述视野框的边界的所述目标框都静止时,计算目标视野框与当前视野框之间的差值坐标;When it is determined that all the target frames that can determine the boundary of the view frame are stationary, calculate the difference coordinates between the target view frame and the current view frame;
    根据预设的每帧图像的视野框的最大移动步长,计算所述从所述当前视野框移动到所述目标视野框的移动步数;Calculate the number of moving steps from the current field of view frame to the target field of view frame according to the preset maximum movement step size of the field of view frame of each frame of image;
    根据所述移动步数逐帧更新视野框直至达到所述目标视野框。The field of view frame is updated frame by frame according to the number of moving steps until the target field of view frame is reached.
  10. 一种视频处理装置,其中,所述装置包括:A video processing device, wherein the device comprises:
    视频获取单元,用于获取视频传感器捕捉的传感器帧,所述传感器帧为视频传感器捕捉到的整个帧的图像框;a video acquisition unit, configured to acquire a sensor frame captured by the video sensor, where the sensor frame is an image frame of the entire frame captured by the video sensor;
    人形捕获单元,用于检测出所述传感器帧中的目标框,所述目标框为传感器帧中的人体图像框和/或包含人体的图像框;A humanoid capture unit, configured to detect a target frame in the sensor frame, where the target frame is a human body image frame and/or an image frame containing a human body in the sensor frame;
    视频检测单元,用于根据所述目标框确定视野框;确定所有可决定所述视野框的边界的所述目标框,并确定所有可决定所述视野框的边界的所述目标框是否都静止;其中,所述视野框为包括所有所述目标框的图像框;A video detection unit, configured to determine a visual field frame according to the target frame; determine all the target frames that can determine the boundary of the visual field frame, and determine whether all the target frames that can determine the boundary of the visual field frame are still ; wherein, the field of view frame is an image frame including all the target frames;
    图像处理单元,当确定所有可决定所述视野框的边界的所述目标框都静止时,输出所述视野框。The image processing unit outputs the view frame when it is determined that all the target frames that can determine the boundary of the view frame are stationary.
  11. 如权利要求10所述的装置,其中,所述视频检测单元,具体用于当确定所述目标框都静止时,对所有所述目标框上下各扩充一定比例的高度;以及,确定一个能将所有扩充后的所述目标框都包含进去的最小框,为所述视野框。The apparatus according to claim 10, wherein the video detection unit is specifically configured to expand the height of all the target frames up and down by a certain proportion when it is determined that the target frames are still; The smallest frame included in all the expanded target frames is the field of view frame.
  12. 如权利要求11所述的装置,其中,所述视频检测单元,还用于若所述视野框的四个顶点坐标超出了视野框的最大边界坐标,则以所述最大边界坐标替代所述视野框的四个顶点坐标;和/或,若所述视野框的高度值小于所述视野框的最小高度值,则调整所述视野框的高度值为所述视野框的最小高度值;和/或,若所述视野框的宽度值小于所述视野框的最小宽度值,则调整所述视野框的宽度值为所述视野框的最小宽度值。The apparatus according to claim 11, wherein the video detection unit is further configured to replace the field of view with the maximum boundary coordinates if the coordinates of the four vertices of the field of view frame exceed the maximum boundary coordinates of the field of view frame The coordinates of the four vertices of the frame; and/or, if the height value of the view frame is less than the minimum height value of the view frame, adjust the height value of the view frame to the minimum height value of the view frame; and/ Or, if the width value of the view frame is smaller than the minimum width value of the view frame, the width value of the view frame is adjusted to the minimum width value of the view frame.
  13. 如权利要求12所述的装置,其中,所述视频检测单元,具体用于:The apparatus of claim 12, wherein the video detection unit is specifically configured to:
    若所述视野框的高度值小于所述视野框的最小高度值,将所述视野框的最小宽度值与所述视野框的宽度值的差值的二分之一各补充到所述视野框的左右边界,若补充之后的所述视野框的左边界或右边界超出了所述视野框的最大边界,则将超出所述最大边界的坐标以所述最大边界的坐标代替,同时将超出所述最大边界的数值补充至对面的边界;和/或,If the height value of the view frame is smaller than the minimum height value of the view frame, add half of the difference between the minimum width value of the view frame and the width value of the view frame to the view frame If the left or right border of the field of view frame after supplementation exceeds the maximum border of the field of view frame, the coordinates that exceed the maximum border will be replaced by the coordinates of the maximum border, and the coordinates that exceed the maximum border will be replaced at the same time. The value of the maximum boundary is supplemented to the opposite boundary; and/or,
    若所述视野框的宽度值小于所述视野框的最小宽度值,将所述视野框的最小高度值与所述视野框的高度值的差值的二分之一各补充到所述视野框的上下边界,若补充之后的所述视野框的上边界或者下边界超出了所述视野框的最大边界,则将超出所述最大边界的坐标以最大边界坐标代替,同时将超出所述最大边界的数值补充至对面的边界。If the width value of the view frame is smaller than the minimum width value of the view frame, add half of the difference between the minimum height value of the view frame and the height value of the view frame to the view frame If the upper and lower boundaries of the visual field frame after supplementation exceed the maximum boundary of the visual field frame, the coordinates that exceed the maximum boundary will be replaced by the maximum boundary coordinates, and the maximum boundary will be exceeded at the same time. The value of is added to the opposite boundary.
  14. 如权利要求11至13中任意一项所述的装置,其中,所述视频检测单元,还用于根据当前视频分辨率的宽高比例,调整所述视野框的宽度值和/或高度值。The apparatus according to any one of claims 11 to 13, wherein the video detection unit is further configured to adjust the width value and/or the height value of the field of view frame according to the aspect ratio of the current video resolution.
  15. 如权利要求10所述的装置,其中,所述视频检测单元,用于确定所述可决定所述视野框的边界的所述目标框,包括:The apparatus of claim 10, wherein the video detection unit, for determining the target frame that can determine the boundary of the field of view frame, comprises:
    所述视频检测单元,具体用于根据所有的目标框计算得到第一视野框;删除一个所述目标框;根据剩余的所述目标框计算得到第二视野框;当所述第一视野框与所述第二视野框不相等时,确定删除的所述目标框为所述可决定所述视野框的边界的目标框。The video detection unit is specifically configured to calculate and obtain a first field of view frame according to all target frames; delete one of the target frames; calculate and obtain a second field of view frame according to the remaining target frames; When the second field of view frames are not equal, it is determined that the target frame to be deleted is the target frame that can determine the boundary of the field of view frame.
  16. 如权利要求10至13以及15中任意一项所述的装置,其中,所述视频检测单元,用于确定所有可决定所述视野框的边界的所述目标框都静止,包括:The device according to any one of claims 10 to 13 and 15, wherein the video detection unit is configured to determine that all the target frames that can determine the boundary of the field of view frame are still, including:
    所述视频检测单元,具体用于当每一所述可决定所述视野框的边界的目标框在预设时间间隔内的运动因子均小于预设阈值时,确定所有可决定所述视野框的边界的所述目标框都为静止状态。The video detection unit is specifically configured to determine, when the motion factors of each target frame that can determine the boundary of the field of view frame within a preset time interval are less than a preset threshold, determine all the target frames that can determine the field of view frame. The bounding boxes are all stationary.
  17. 如权利要求10至13以及15中任意一项所述的装置,其中,所述图像处理单元,具体用于当确定所有可决定所述视野框的边界的所述目标框都静止时,根据所述视野框对所述传感器帧进行裁剪和/或缩放,并输出所述视野框。The device according to any one of claims 10 to 13 and 15, wherein the image processing unit is specifically configured to, when it is determined that all the target frames that can determine the boundary of the visual field frame are stationary, according to the The field of view frame crops and/or scales the sensor frame, and outputs the field of view frame.
  18. 如权利要求17所述的装置,其中,所述图像处理单元,具体用于当确定所有可决定所述视野框的边界的所述目标框都静止时,根据所述视野框对所述传感器帧进行裁剪和/或缩放;并计算目标视野框与当前视野框之间的差值坐标;根据预设的每帧图像的视野框的最大移动步长,计算所述从所述当前视野框移动到所述目标视野框的移动步数;根据所述移动步数逐帧更新视野框直至达到所述目标视野框。The apparatus according to claim 17, wherein the image processing unit is specifically configured to, when it is determined that all the target frames that can determine the boundary of the field of view frame are stationary, analyze the sensor frame according to the field of view frame Crop and/or zoom; and calculate the difference coordinate between the target field of view frame and the current field of view frame; The number of moving steps of the target field of view frame; the field of view frame is updated frame by frame according to the number of moving steps until the target field of view frame is reached.
PCT/CN2021/120411 2021-01-29 2021-09-24 Video processing method and apparatus WO2022160748A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110129029.XA CN112907617B (en) 2021-01-29 2021-01-29 Video processing method and device
CN202110129029.X 2021-01-29

Publications (1)

Publication Number Publication Date
WO2022160748A1 true WO2022160748A1 (en) 2022-08-04

Family

ID=76121324

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/120411 WO2022160748A1 (en) 2021-01-29 2021-09-24 Video processing method and apparatus

Country Status (2)

Country Link
CN (1) CN112907617B (en)
WO (1) WO2022160748A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112907617B (en) * 2021-01-29 2024-02-20 深圳壹秘科技有限公司 Video processing method and device
CN115633255B (en) * 2021-08-31 2024-03-22 荣耀终端有限公司 Video processing method and electronic equipment
CN114222065B (en) * 2021-12-20 2024-03-08 北京奕斯伟计算技术股份有限公司 Image processing method, image processing apparatus, electronic device, storage medium, and program product

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140002742A1 (en) * 2012-06-29 2014-01-02 Thomson Licensing Method for reframing images of a video sequence, and apparatus for reframing images of a video sequence
CN104125390A (en) * 2013-04-28 2014-10-29 浙江大华技术股份有限公司 Method and device for locating spherical camera
US20180063482A1 (en) * 2016-08-25 2018-03-01 Dolby Laboratories Licensing Corporation Automatic Video Framing of Conference Participants
CN111756996A (en) * 2020-06-18 2020-10-09 影石创新科技股份有限公司 Video processing method, video processing apparatus, electronic device, and computer-readable storage medium
WO2020220289A1 (en) * 2019-04-30 2020-11-05 深圳市大疆创新科技有限公司 Method, apparatus and system for adjusting field of view of observation, and storage medium and mobile apparatus
CN112073613A (en) * 2020-09-10 2020-12-11 广州视源电子科技股份有限公司 Conference portrait shooting method, interactive tablet, computer equipment and storage medium
CN112907617A (en) * 2021-01-29 2021-06-04 深圳壹秘科技有限公司 Video processing method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2564668B (en) * 2017-07-18 2022-04-13 Vision Semantics Ltd Target re-identification
CN109766919B (en) * 2018-12-18 2020-11-10 通号通信信息集团有限公司 Gradual change type classification loss calculation method and system in cascade target detection system
WO2020133170A1 (en) * 2018-12-28 2020-07-02 深圳市大疆创新科技有限公司 Image processing method and apparatus
CN111401383B (en) * 2020-03-06 2023-02-10 中国科学院重庆绿色智能技术研究院 Target frame estimation method, system, device and medium based on image detection

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140002742A1 (en) * 2012-06-29 2014-01-02 Thomson Licensing Method for reframing images of a video sequence, and apparatus for reframing images of a video sequence
CN104125390A (en) * 2013-04-28 2014-10-29 浙江大华技术股份有限公司 Method and device for locating spherical camera
US20180063482A1 (en) * 2016-08-25 2018-03-01 Dolby Laboratories Licensing Corporation Automatic Video Framing of Conference Participants
WO2020220289A1 (en) * 2019-04-30 2020-11-05 深圳市大疆创新科技有限公司 Method, apparatus and system for adjusting field of view of observation, and storage medium and mobile apparatus
CN111756996A (en) * 2020-06-18 2020-10-09 影石创新科技股份有限公司 Video processing method, video processing apparatus, electronic device, and computer-readable storage medium
CN112073613A (en) * 2020-09-10 2020-12-11 广州视源电子科技股份有限公司 Conference portrait shooting method, interactive tablet, computer equipment and storage medium
CN112907617A (en) * 2021-01-29 2021-06-04 深圳壹秘科技有限公司 Video processing method and device

Also Published As

Publication number Publication date
CN112907617A (en) 2021-06-04
CN112907617B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
WO2022160748A1 (en) Video processing method and apparatus
WO2021208371A1 (en) Multi-camera zoom control method and apparatus, and electronic system and storage medium
JP5592006B2 (en) 3D image processing
US11012614B2 (en) Image processing device, image processing method, and program
US8988529B2 (en) Target tracking apparatus, image tracking apparatus, methods of controlling operation of same, and digital camera
WO2020259271A1 (en) Image distortion correction method and apparatus
TWI808987B (en) Apparatus and method of five dimensional (5d) video stabilization with camera and gyroscope fusion
WO2019114617A1 (en) Method, device, and system for fast capturing of still frame
US11825183B2 (en) Photographing method and photographing apparatus for adjusting a field of view of a terminal
WO2020007320A1 (en) Method for fusing multi-visual angle images, apparatus, computer device, and storage medium
US20150103184A1 (en) Method and system for visual tracking of a subject for automatic metering using a mobile device
WO2017045326A1 (en) Photographing processing method for unmanned aerial vehicle
WO2019237745A1 (en) Facial image processing method and apparatus, electronic device and computer readable storage medium
US20200099854A1 (en) Image capturing apparatus and image recording method
JP2013172446A (en) Information processor, terminal, imaging apparatus, information processing method, and information provision method in imaging apparatus
WO2021139764A1 (en) Method and device for image processing, electronic device, and storage medium
WO2021147650A1 (en) Photographing method and apparatus, storage medium, and electronic device
WO2021136035A1 (en) Photographing method and apparatus, storage medium, and electronic device
JP7424076B2 (en) Image processing device, image processing system, imaging device, image processing method and program
WO2023165535A1 (en) Image processing method and apparatus, and device
WO2022042669A1 (en) Image processing method, apparatus, device, and storage medium
WO2021147648A1 (en) Suggestion method and device, storage medium, and electronic apparatus
CN110570441B (en) Ultra-high definition low-delay video control method and system
WO2023072030A1 (en) Automatic focusing method and apparatus for lens, and electronic device and computer-readable storage medium
US20230368343A1 (en) Global motion detection-based image parameter control

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922340

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922340

Country of ref document: EP

Kind code of ref document: A1