WO2021121374A1 - 一种视频处理方法及电子设备 - Google Patents

一种视频处理方法及电子设备 Download PDF

Info

Publication number
WO2021121374A1
WO2021121374A1 PCT/CN2020/137550 CN2020137550W WO2021121374A1 WO 2021121374 A1 WO2021121374 A1 WO 2021121374A1 CN 2020137550 W CN2020137550 W CN 2020137550W WO 2021121374 A1 WO2021121374 A1 WO 2021121374A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
video frame
cropping
target object
preset
Prior art date
Application number
PCT/CN2020/137550
Other languages
English (en)
French (fr)
Inventor
陈艳花
刘宏马
张超
张雅琪
李宏禹
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20901983.5A priority Critical patent/EP4060979A4/en
Publication of WO2021121374A1 publication Critical patent/WO2021121374A1/zh
Priority to US17/843,242 priority patent/US20220321788A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/62Control of parameters via user interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/695Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/631Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
    • H04N23/632Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters for displaying or modifying preview images prior to image capturing, e.g. variety of image resolutions or capturing parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/633Control of cameras or camera modules by using electronic viewfinders for displaying additional information relating to control or operation of the camera
    • H04N23/635Region indicators; Field of view indicators
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/667Camera operation mode switching, e.g. between still and video, sport and normal or high- and low-resolution modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/67Focus control based on electronic image sensor signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2628Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • This application relates to the field of computer vision, and in particular to a video processing method of an electronic device and an electronic device.
  • electronic devices With the popularization of electronic devices, especially the popularization of smart mobile electronic devices such as mobile phones, more and more applications are integrated on mobile phones, which involve all aspects of people's daily lives.
  • electronic devices include a camera, through which image collection and video shooting can be performed.
  • the present application provides a video capture method and electronic equipment to solve the technical problem of high cost when realizing follow-up shooting through electronic equipment in the prior art.
  • An embodiment of the present application provides a video processing method, which is applied to an electronic device, and includes: acquiring a first video frame, determining that the first video frame contains at least one target object; taking the determined target object as the center, determining Cropping frame; the content in the cropping frame is used as the second video frame.
  • the electronic device may display the second video frame without displaying the first video frame, or after displaying the first video frame, display the second video frame in response to the user's activation of the preset function. In this way, there is no need for hardware improvement.
  • the target object can be focused. In a scene where the target object is moving, the target object can be automatically tracked. In the video frame, the target object is always at the center of vision, so The automatic tracking function can be realized at low cost, which reduces the requirements for hardware, thus reducing the difficulty of function realization.
  • the embodiment of the present application also provides an electronic device that executes the method in each method embodiment of the present invention.
  • the electronic device includes one or more processors; one or more memories; multiple application programs; and one or more computer programs, wherein the one or more computer programs are stored in the one or more memories
  • the computer program includes instructions.
  • the electronic device executes the method in the method embodiment, for example: acquiring a first video frame, and determining that the first video frame contains at least one target Object; with the at least one target object as the center, determine a cropping frame; obtain the content of the cropping frame, and display it as a second video frame indication.
  • the electronic equipment provided by the present invention can provide the automatic target tracking function of the video, and achieve the effect of moving the scene with people and changing the scene under the scenes of video call, video shooting, self-broadcasting, etc.; its realization does not require the configuration of a PTZ.
  • Such hardware facilities do not require manual operation by photographers, nor need to install special image processing applications, that is, automatic image processing can be realized, and intelligent identification of target objects and automatic image processing can be realized without affecting the smoothness of the video. , Improve the picture quality of video communication and increase the intelligence of human-computer interaction.
  • FIG. 1A is a structural diagram of an electronic device according to an embodiment of the present invention.
  • Fig. 1B is a software framework diagram of an embodiment of the present invention.
  • Figure 2 is a flowchart of a video capture method in the first aspect of an embodiment of the present invention
  • FIG. 3 is a schematic diagram of an interface of a system call application according to an embodiment of the present invention.
  • Figure 4 is a flowchart of a video control method according to an embodiment of the present invention.
  • FIG. 5A is a schematic diagram of a first video frame according to an embodiment of the present invention.
  • 5B is a schematic diagram of a second video frame according to an embodiment of the present invention.
  • 5C is a schematic diagram of a coordinate frame of a single person according to an embodiment of the present invention.
  • 5D is a schematic diagram of a two-person coordinate frame according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of the first manner of determining a crop frame in an embodiment of the present invention.
  • FIG. 7 is the corresponding relationship between the cutting width and ⁇ W/Width in the embodiment of the present invention.
  • FIGS. 8A-8C are schematic diagrams of a first way of determining a crop frame in an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a second way of determining a cropping frame in an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of human joint points detected in an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of the correspondence between joint points and cutting positions in an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of a third video frame output after adjusting the target object in an embodiment of the present invention.
  • FIG. 13A is a schematic diagram of an original video frame containing two characters in an embodiment of the present invention.
  • FIG. 13B is a schematic diagram of one of the two persons in FIG. 13A not in the original video frame in an embodiment of the present invention
  • FIG. 13C is a schematic diagram of the target object that left in FIG. 13B and returns to the video frame in the embodiment of the present invention.
  • FIG. 13D is a schematic diagram of a video frame output based on the original video frame shown in FIG. 13C in an embodiment of the present invention.
  • FIG. 13E is a schematic diagram of a video frame output in a preset time period based on the original video frame shown in FIG. 13B in an embodiment of the present invention
  • FIG. 13F is a schematic diagram of a video frame output after a preset time period based on the original video frame shown in FIG. 13B in an embodiment of the present invention.
  • 15A is a schematic diagram of another original video frame introduced by an embodiment of the present invention.
  • FIG. 15B is a schematic diagram of a video frame output based on the original video frame of FIG. 15A in an embodiment of the present invention
  • FIG. 15C is a schematic diagram after the screen shown in FIG. 15B is switched to the left in an embodiment of the present invention.
  • FIG. 15D is a schematic diagram after the screen shown in FIG. 15B is switched to the left in an embodiment of the present invention.
  • FIG. 16 is a schematic diagram of zooming in on a video frame in an embodiment of the present invention.
  • FIG. 17 is a flowchart of a video capture method provided by the third aspect of the embodiment of the present invention.
  • 18A is a schematic diagram of multiple people included in a video frame collected in an embodiment of the present invention.
  • FIG. 18B is a schematic diagram of adding a spotlight effect to the character in 18A in an embodiment of the present invention.
  • first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, “plurality” means two or more.
  • FIG. 1A shows a schematic structural diagram of an electronic device 100.
  • the electronic device 100 shown in FIG. 1A is only an example, and the electronic device 100 may have more or fewer components than those shown in FIG. 1A, two or more components may be combined, or Can have different component configurations.
  • the various components shown in the figure may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
  • the electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display 194, And subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than those shown in the figure, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • FIG. 1B is a software structure block diagram of an electronic device 100 according to an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
  • the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
  • the application layer can include a series of application packages. For detailed introduction of software functions, please refer to the previous patent application: CN201910430270.9.
  • an embodiment of the present invention provides a video capture control method applied to an electronic device.
  • the electronic device is an electronic device that does not include a pan/tilt, so that the camera of the electronic device cannot be rotated. Please refer to FIG. 3.
  • the method includes The following steps:
  • S200 Display a first video frame.
  • the first video frame contains a first target object, and the first target object is located in a non-designated area of the first video frame.
  • the first target object is located in the first video frame. A geographic location.
  • S210 In response to detecting the operation of entering the target tracking mode, display the second video frame, the second video frame contains the first target object, the first target object is located in the designated area of the second video frame, and when the second video frame is captured , The first target object is located in the first geographic location;
  • S220 Display a third video frame, the third video frame contains the first target object, the first target object is located in the designated area of the third video frame, and when the third video frame is collected, the first target object is located at the second geographic location, The distance between the second geographic location and the first geographic location is greater than the preset distance.
  • the original video frame before displaying the first video frame based on step S200, the original video frame needs to be collected first, and the output video frame is obtained based on the collected original video frame, for example: directly output the original video frame, After the original video frame undergoes various processing, the output video frame and so on are obtained.
  • the user can collect and obtain the original video frame through the video capture operation for the camera APP (Application: application), for example: the user first opens the camera APP, and then clicks the video capture button (this click on the video capture button The operation is the video capture operation). After the electronic device detects the operation of the video capture button, it controls the camera to perform video capture to obtain the original video frame.
  • the video capture operation for the camera APP (Application: application)
  • the user first opens the camera APP, and then clicks the video capture button (this click on the video capture button The operation is the video capture operation).
  • the electronic device detects the operation of the video capture button, it controls the camera to perform video capture to obtain the original video frame.
  • the electronic device can also acquire the original video frame during the user's video chat. For example, if the user wants to communicate with the peer user through the instant messaging app, he can open the instant messaging software (such as WeChat, QQ, etc.), then enter the contact’s chat interface, and then click the "video communication” button, and the electronic device will be detected. After clicking the "Video Communication” button, turn on the video communication function with the contact, and then turn on the camera to capture the original video frame.
  • instant messaging software such as WeChat, QQ, etc.
  • the user can also have a video chat with the peer user through the default video chat function of the system.
  • the communication function of the electronic device includes a smooth communication function (the smooth communication function refers to video Communication function).
  • the main interface of the system call application includes a phone control 31, a contact control 32, a favorite control 33, and a smooth communication control 30.
  • the control 31 is used to trigger the display of the electronic device to display the recent call records, including all calls and missed calls;
  • the contact control 32 is used to trigger the electronic device to display all contact information of the call application;
  • the personal favorite control 33 is used to trigger the electronic device to display all the contact information of the call application.
  • Some contacts are added as personal favorites. By adding the contact as personal favorites, you can quickly communicate with the contact, such as: sending text messages quickly, making calls quickly, etc.; the smooth connection control 30 is used to trigger the electronic device to turn on Video communication function.
  • the original video frame may be a video frame collected by a front camera and a rear camera of the electronic device; in another optional implementation manner, the original video frame may also be Video frames collected by other video capture devices that communicate with the electronic device. For example, if there is a data transmission channel between the electronic device and the security camera at home, the video frames collected by the security camera can be obtained; the electronic device and the drone have data Through the transmission channel, the video frames collected by the drone can be obtained. Among them, the video stream collected by other video collection devices can be registered as the virtual camera of the system. During the video call, the instant messaging software calls the video frame collected by the virtual camera as the video communication process.
  • step S200 after the original video frame is acquired, the original video frame can be directly output, then the original video frame is the first video frame, or image processing (such as beautification, cropping, etc.) can be performed on the original video frame first. Etc.), and then output the processed video frame (the processed video frame is the first video frame, etc.).
  • image processing such as beautification, cropping, etc.
  • the first physical location can be any location, which can be characterized by parameters such as longitude, latitude, and altitude.
  • the first target object may include one or at least two target objects.
  • the non-designated area refers to other areas in the first video frame except the designated area 43, and the designated area 43 is, for example, a central area, a golden section area, and other arbitrary areas designated by a user of the electronic device.
  • the designated area 43 in FIG. 2 refers to the central area, where the target object is located in the designated area 43, for example: the center point of the target object is located in the designated area 43, and the target object is located in the non-designated area, for example: The center point of the target object is located in a non-designated area.
  • the output video frame in the target tracking mode, can be obtained by cropping the collected original video frame, for example: the second video frame, the third video frame, etc. can be obtained by cropping the original video frame.
  • the geographic location of the first target object in the second video frame has not changed. It is still the first geographic location, but the first target object is located in the central area of the second video frame (that is, the designated area). 43).
  • the original image frame may be cropped with the first target object as a reference, so as to obtain the second video frame. How to crop the original image frame with the first target object as a reference will be introduced later.
  • the display size of the first target object in the second video frame is larger than the display size of the first target object in the first video frame.
  • the display size of the first target object in the first video frame is 0.5 times, 0.6 times, etc., the display size of the first target object in the second video frame.
  • the ratio of the display size of the first target object in the first video frame to the display size of the first target object in the second video frame is also different.
  • step S210 after entering the target tracking mode, the first target object needs to be determined first, so as to track the first target object in subsequent video frames.
  • the first target object can be determined in a variety of ways, among which are listed below Two are introduced. Of course, in the specific implementation process, it is not limited to the following two situations.
  • the target object in the video frame can be a target object automatically determined by an electronic device, or a target object determined based on user operations.
  • the two cases are introduced separately below. Of course, in the specific implementation process In, it is not limited to the following two cases.
  • the target object is automatically determined by the electronic device.
  • the electronic device after entering the subject tracking mode, automatically determines the target object in the video frame based on the preset conditions of the target object.
  • the preset conditions are, for example: everyone in the video frame, animals in the video frame, and video Other moving objects in the frame, etc. Taking the preset condition as a person in a video frame as an example, all persons included in the video frame can be identified based on the human body detection technology, and then all persons are determined as the target object.
  • the preset condition is: a person (or animal, other moving object, etc.) in the video frame that meets the tracking condition
  • the tracking condition is the distance to the original video frame (that is, the video frame that has been collected by the camera and has not been subjected to size processing)
  • the left edge of is greater than the first preset distance
  • the distance from the right edge of the original video frame is greater than the second preset distance
  • the first preset distance and the second preset distance may be the same or different, for example: 150 pixels, 200 pixels and so on, or the first preset distance and the second preset distance are, for example, 0.1 times, 0.2 times, and so on of the image width.
  • the target object being tracked is a person in the center of the camera's field of view. Satisfying the tracking condition is also for example: the area in the original video frame is greater than the preset area, and the preset area is, for example, 10,000 pixels, 20,000 pixels, and so on. In this case, people (animals or other moving objects) that are not fully captured in the original video frame, or people (or animals, or other moving objects) that are not conspicuous at the edge of the original video frame, are not included The target object to be tracked to make the video tracking more pertinent.
  • the target object is determined based on user operations.
  • the target object is determined based on the first preset operation of the user of the electronic device. For example: display a video frame (either the original video frame or the processed video frame) on the display unit of the electronic device, the user clicks on a person in the video frame with a finger, and the electronic device detects the click operation, Then the person is the target object; another example: the user of the electronic device generates the following voice command "follow the person wearing yellow clothes in the picture", the electronic device recognizes the person wearing yellow clothes in the video frame, and determines the person as the target object and many more.
  • the target object is determined based on the second preset operation of the person in the original video frame.
  • the second preset operation is, for example, tracking gestures, voice commands, and so on.
  • the electronic device After the electronic device collects and obtains the original video frame, it uses human detection technology to identify each person in the original video frame, and then uses key point recognition technology to identify each person's joint points (for example: head, neck, shoulders, palms, wrists) , Elbow joint, etc.), based on the positional relationship of each joint point to determine whether there is a tracking gesture, such as: raising the hand, comparing the heart, etc., which can be passed through whether the palm, wrist, and elbow joints are roughly in a straight line, And whether the vertical coordinate of the palm is higher than the vertical coordinate of the wrist and whether the vertical coordinate of the wrist is higher than the vertical coordinate of the elbow joint to determine whether there is a hand-raising gesture.
  • the electronic device detects that a user in the video frame generates the voice command "follow me", and then determines the user
  • the user operation can also be an operation that triggers entering the subject tracking mode.
  • the electronic device After the electronic device enters the video capture mode, it does not enter the subject tracking mode by default, nor is it detected.
  • the electronic device After detecting the user operation for confirming the target object, the electronic device responds to the user operation to enter the subject tracking mode and determine the target object.
  • the electronic device when it detects a user operation to determine the target object, it first determines whether the video capture process has entered the subject tracking mode. If it has entered the subject tracking mode, it determines the target object based on the user operation, and then uses the target The subject is the center for subject tracking; if the video capture process does not enter subject tracking mode, it will not respond to the user's operation.
  • the target object determined in the embodiment of the present invention may be one or at least two.
  • step S220 since the electronic device has not received the operation to exit the target tracking mode, the electronic device still processes the target tracking mode.
  • the video frame displayed by the electronic device moves along with the movement of the first target object. Keep the first target object at the designated position 43 of the video frame.
  • Figure 2 It can be seen from Figure 2 that in the third video frame, the first target object has moved away from the tower and toward the tree. Therefore, the output of the third video frame and the second video frame have a larger picture. Big difference, but the first target object is still in the designated area 43 of the video frame.
  • An embodiment of the present invention provides a video capture method, which includes the following steps:
  • the user can collect and obtain video frames through the video capture operation for the camera APP (Application: application). For example, the user first opens the camera APP and then clicks the video capture button (this is the operation of clicking the video capture button). That is, the video capture operation). After the electronic device detects the operation of the video capture button, it controls the camera to perform video capture to obtain video frames.
  • the video capture button this is the operation of clicking the video capture button. That is, the video capture operation).
  • the electronic device detects the operation of the video capture button, it controls the camera to perform video capture to obtain video frames.
  • the electronic device can also collect and obtain video frames during the user's video chat. For example, if the user wants to communicate with the peer user through the instant messaging app, he can open the instant messaging software (such as WeChat, QQ, etc.), then enter the contact’s chat interface, and then click the "video communication” button, and the electronic device will be detected. After clicking the "video communication” button, turn on the video communication function with the contact, and then turn on the camera to collect video frames.
  • instant messaging software such as WeChat, QQ, etc.
  • the user can also use the default video chat function of the system to have a video chat with the peer user.
  • the communication function of the electronic device includes a smooth communication function (the smooth communication function refers to video Communication function).
  • the main interface of the system call application includes a phone control 31, a contact control 32, a favorite control 33, and a smooth communication control 30.
  • the control 31 is used to trigger the display of the electronic device to display the recent call records, including all calls and missed calls;
  • the contact control 32 is used to trigger the electronic device to display all contact information of the call application;
  • the personal favorite control 33 is used to trigger the electronic device to display all the contact information of the call application.
  • Some contacts are added as personal favorites. By adding the contact as personal favorites, you can quickly communicate with the contact, such as: sending text messages quickly, making calls quickly, etc.; the smooth connection control 30 is used to trigger the electronic device to turn on Video communication function.
  • the video frame may be a video frame collected by a front camera or a rear camera of an electronic device; it may also be a video frame collected by other video capture devices that communicate with the electronic device, such as: If there is a data transmission channel between the electronic device and the security camera at home, the video frames collected by the security camera can be obtained; if the electronic device and the drone have a data transmission channel, the video frames collected by the drone can be obtained.
  • the video stream collected by other video collection devices can be registered as the virtual camera of the system.
  • the instant messaging software calls the video frame collected by the virtual camera as the video communication process.
  • S310 Output the first video frame on the display unit of the electronic device
  • FIG. 5A is a schematic diagram of the first video frame output by the electronic device, where 40 is the outer border of the output video frame.
  • outputting the first video frame is, for example, outputting the first video frame on the video preview interface of the display unit for the user to preview; if the solution is applied to the video communication process, outputting the first video frame, for example, is the first
  • the video frame is transmitted to the peer electronic device, the first video frame is displayed on the video communication interface, and so on.
  • the spotlight mode can be set during the video capture process.
  • the spotlight effect can be set for a specific object, that is, the specific object is highlighted, for example: the spotlight effect is set for a specific object ( Set highlights), control the color display of specific objects, display other content outside the specific object in black and white, display other content outside the specific object in a virtualized display, add special effects to the specific object, and so on.
  • the specific object can be determined in a variety of ways. Two of them are listed below for introduction. Of course, in the specific implementation process, it is not limited to the following two situations.
  • the first type is to determine the specific object by selecting the specific object in the video frame.
  • the selection operation is, for example, a click operation, a sliding operation, and so on.
  • the user can select a specific object or multiple specific objects, for example: the user can select multiple specific objects through multiple selection operations, the user can also select multiple specific objects through one operation, for example: the user selects with two fingers , Each finger corresponds to a target, so that two specific objects can be selected at the same time, and so on.
  • the second type is to locate the sound source through the microphone of the electronic device, and determine that the person in the area where the sound source is located is a specific object.
  • the solution can be applied to a scene where many people discuss and a scene where many people sing. Taking the video frame containing Person A 40, Person B 41, Person C 42, and Person D 43 as an example, these five people are discussing a problem.
  • Person B 41 speaks, and then Person B 41 is determined to be a specific object.
  • Add a spotlight effect for character B 41 at the second moment when character D 43 speaks, it is determined that character D 43 is a specific object, add spotlight effect to character D 43, and cancel the spotlight effect of character B, and so on.
  • the person who is currently speaking can be determined.
  • the third type is to determine all the people included in the video frame through the human body recognition technology, and determine the person in the middle position as the specific object. Taking a video frame containing Person A, Person B, Person C, Person D, and Person E as an example, after the electronic device recognizes the location of these five persons, it determines that Person C is located in the middle position, so Person C is determined to be a specific Object.
  • the fourth type is to receive the user's voice instruction, and determine the specific object through the voice instruction.
  • the user of the electronic device says: “Set the spotlight effect for the person in the middle”
  • the specific object is determined to be the person in the middle (for example: Person C)
  • the user of the electronic device says “Set the spotlight effect for the highest person”
  • the specific object is determined to be the tallest person in the video frame, and so on.
  • the fifth type is to perform gesture recognition on the person collected in the video frame; the person who uses the preset gesture is determined as a specific object, and the preset gesture is, for example, raising a hand, waving a hand, and so on.
  • the operation of entering the spotlight mode can be detected first, and then the specific object can be determined, or the specific object can be determined first, then the operation of entering the spotlight mode can be detected, thereby producing a spotlight effect on the specific object.
  • the embodiments of the invention are not listed in detail, and are not limited.
  • S320 In the video capture mode, enter the subject tracking mode.
  • the image of the video frame output in the subject tracking mode moves with the movement of the target object, so that the target object is located at the center of the video frame, or the target object is located at the golden section of the video frame Location, or location specified by the user, etc.
  • the target object can be people, animals, other moving objects (for example: kites, cars, sweeping robots, etc.) and so on.
  • the electronic device After the electronic device enters the video capture state, it can enter the subject tracking mode in response to a preset operation, such as the operation of clicking a preset button displayed on the display unit, the operation of selecting a specific person in the video frame, and the operation of selecting a specific person in the video frame. Of people generate preset gesture operations and so on.
  • the electronic device after the electronic device enters the video capture state, it may also enter the subject tracking mode by default.
  • S330 In the subject tracking mode, output a second video frame.
  • Both the first video frame and the second video frame contain the target object, and the display ratio of the target object in the second video frame and the display ratio of the target object in the first video frame Different, and the relative position of the target object in the second video frame is different from the relative position of the target object in the first video frame.
  • the display ratio of the target object in the second video frame is greater than the display ratio of the target object in the first video frame, for example: the width of the target object in the second video frame accounts for 50% of the total screen width Above, the width of the target object in the first video frame accounts for 20% of the total screen width; the height of the target object in the second video frame accounts for more than 50% of the total screen height, and the height of the target object in the first video frame accounts for the total screen height 30% and so on.
  • the above ratio of width to height is just an example, not a limitation.
  • the relative position of the target object in the second video frame is different from the relative position of the target object in the first video frame, for example: the second ratio corresponding to the second video frame is different from the first ratio corresponding to the first video frame ,
  • the second ratio is: the distance between the left border 50a of the target object in the second video frame and the left border of the second video frame is greater than the width of the second video frame
  • the first ratio is: the target object in the first video frame
  • the distance between the left border 50a of the first video frame and the left border 50a of the first video frame is greater than the width of the first video frame.
  • the second ratio is: the distance between the right border 50b of the target object in the second video frame and the right border of the second video frame is greater than the width of the second video frame
  • the first ratio is: The distance between the right frame 50b of the target object and the right frame of the first video frame is greater than the width of the first video frame and so on.
  • the spotlight mode can be maintained when entering S330, and the spotlight effect is still generated on the specific object determined in S300; as another optional embodiment
  • smoothing processing can be performed between the first video frame and the second video frame, for example, there are multiple transitional video frames between the first video frame and the second video frame, and the transitional video frames are, for example, 10 frames, 20 frames, and so on.
  • the target object may be people, animals, other moving objects (such as drones, toy cars, balloons, etc.) in the video frame, and so on.
  • the target objects included in the first video frame are, for example, a person 41 and a person 42
  • the output second video frame is, for example, as shown in FIG. 5B.
  • the display area of the target object (person 41, person 42) in the video frame is enlarged, and the relative position of the target object in the video frame changes.
  • the target object In the video frame, the target object is located in the left part of the picture, and in the second video frame, the target object is located in the middle part of the picture.
  • the target object in the video frame may be a target object automatically determined by the electronic device, or a target object determined based on user operations. Since it has been introduced above, it will not be repeated here.
  • the original video frame can be cropped to obtain the output second video frame.
  • the human body's coordinate frame 50 can be determined through the human body detection model, and then the coordinate frame 50 can be determined by the human body detection model.
  • the coordinate frame 50 can be represented by the coordinates of each point in the coordinate frame 50, or can be represented by the coordinates of the upper left corner plus the coordinates of the lower right corner, or can be represented by the lower left corner plus the upper right corner.
  • the coordinate representation of the angle and so on As shown in FIG. 5C, it is a schematic diagram of the determined coordinate frame 50 when there is one target object; as shown in FIG.
  • the determined coordinate frame 50 is In the schematic diagram, the coordinate frame 50 of each person in the original video frame can be determined based on the human body detection technology, and then the coordinate frame 50 of each person can be merged to determine the coordinate frame 50 of the target object.
  • Figure 5D through the upper left corner The coordinates (Xmin, Ymin) and the coordinates (Xmax, Ymax) at the lower right corner represent the coordinate frame 50.
  • Xmin represents the minimum value of the X axis
  • Ymin represents the minimum value of the Y axis
  • Xmax represents the maximum value of the X axis
  • Ymax represents the maximum value of the Y axis
  • the upper left corner of the video frame is the origin.
  • the cropping frame 81 can be determined in a variety of ways, and several of them are listed below for introduction. Of course, in the specific implementation process, it is not limited to the following situations.
  • the cropping frame 81 for cropping the original video frame can be determined in the following manner:
  • the width ⁇ W of the target object can be obtained by subtracting Xmin from Xmax.
  • S620 Determine a cropping width for cropping the original image frame based on the width ⁇ W of the target object and the width Width of the original image frame;
  • the cropping width can be determined by the ratio of the width ⁇ W of the target object and the width Width of the image.
  • the cropping width is the first A preset ratio is multiplied by the width of the original image frame.
  • the cropping width is the width ⁇ W of the target object; when ⁇ W/Width is greater than the second preset ratio
  • proportional the cropping width is Width.
  • the first preset ratio is, for example, 0.3, 0.5, etc.
  • the second preset ratio is, for example, 0.6, 0.8, etc.
  • the first preset ratio may also be other values, which are not listed in detail in the embodiment of the present invention, and are not limited.
  • S630 Determine the left cropping edge 81a and the right cropping edge 81b according to the cropping width.
  • the cropping width is 0.5 times the original image frame
  • cropping The width is ⁇ W.
  • the cropping width is the width of the original image frame.
  • FIG. 8A For example, suppose the original image frame is shown in Figure 8A.
  • 80 represents the outer frame of the original video frame
  • 50 represents the coordinate frame of the target object
  • the coordinate frame 50 includes a left frame 50a, a right frame 50b, and an upper frame 50c.
  • the lower border 50d, ⁇ W/Width is less than 0.5, so it is determined that the cropping width is 0.5 times the width Width of the original video frame.
  • the center point 82 of the target object can be determined through the coordinate frame 50, and the coordinates of the center point 82 can be calculated by the following formula:
  • X center point refers to the coordinates of the center point in the X axis direction
  • Y center point refers to the coordinates of the center point in the Y axis direction. You can only determine the X center point, or you can determine both the X center point and the X center point. Y center point.
  • the center point is extended to the left by a first preset width W1
  • a straight line perpendicular to the X-axis direction is used to determine the left trimming edge 81a
  • the center point is extended to the right by a second preset width W2
  • a straight line perpendicular to the X-axis direction is used to determine the right cutting edge 81b.
  • the sum of the first predetermined width W1 and the second predetermined width W2 is the cutting width, for example: the sum of the first predetermined width W1 and the second predetermined width W2 It is 0.5 times Width.
  • the first preset width W1 and the second preset width W2 may be equal, and the first preset width W1 and the second preset width W2 are both 1/2 of the cropping width, for example: 1/4*Width;
  • the width W1 and the second preset width W2 may also be different, which is not limited in the embodiment of the present invention.
  • the cropping width at this time is equal to ⁇ W.
  • the left border 50a of the coordinate frame 50 can be used as the left cropping edge 81a
  • the coordinate frame The right border 50b of 50 serves as the right trim 81b.
  • S640 Determine the upper clipping edge 81c and the lower clipping edge 81d based on the longitudinal coordinates of the target object in the coordinate frame 50.
  • the upper border 50c can be moved upward by the first preset height H1 as the upper trimming edge 81c (in the specific implementation process, the upper border 50c can also be directly used as the upper trimming edge 81c); the above trimming edge 81c extends downward
  • the second preset height H2 is used as the lower cropping edge 81d
  • the first preset height H1 is, for example, the height of the original image frame of 0.05 times, 0.01 times (of course, other values can also be used)
  • the second preset height H2 is, for example, 0.5
  • the height of the original image frame is 0.6 times or 0.6 times (of course, other values are also possible). (See Figures 8A-8C).
  • the lower border 50d can also be directly used as the lower trimming edge 81d. It is also possible to extend the lower border 50d downward for a certain distance as the lower cutting edge 81d.
  • the upper trimming edge 81c can also be determined by extending the central point 82 upwards by a preset height
  • the lower trimming edge 81d can be determined by extending the central point 82 downwards by the preset height (the determination method is the same as that of the left trimming edge 81a, The method for determining the right cropping edge 81b is similar, so I won’t repeat it here.)
  • the cropping height can also be determined based on the ratio of the cropping width to the original video frame, and the upper cropping edge 81c and the lower cropping edge 81d are determined based on the cropping height.
  • the determination method is the same as that for determining the left cropping edge 81a based on the cropping width. It is similar to the way of cutting the edge 81b on the right, and will not be repeated here. Based on this solution, it can ensure that the original video frame is cropped in equal proportions, so that when the original video frame conforms to the aspect ratio of the display area of the video frame, there is no need to adjust the cropping frame 81 to make it conform to the width of the real area. High ratio.
  • S650 Determine a cropping frame 81 based on the upper cropping edge 81c, the lower cropping edge 81d, the left cropping edge 81a, and the right cropping edge 81b. It can be seen from Figures 8A-8C that based on the different width ⁇ W of the target object, the size of the finally determined cropping frame 81 is also different, so that the size of the output video frame in the original video frame may also be different.
  • the ratio of the first video frame to the original video frame is usually fixed, for example: the first video frame is 100%, 90%, etc. of the original image frame, so based on the different width ⁇ W of the target object, the first video frame is different. The proportions of the pictures of the second video frame to the pictures of the first video frame are also different.
  • the distance between the target object and the camera may be different, which may lead to different ⁇ W, resulting in a different proportion of the second video frame in the first video frame; if the target If the object is multiple people, based on the distance between the target object and the camera and the distance between the two people, the ⁇ W may be different, resulting in a different proportion of the second video frame in the first video frame. .
  • the center point 82 of the target object can be determined by selecting the former preset frame and the latter preset frame of the current frame. For example: Determine the center point of each frame in the current frame, the previous preset frame (for example: 10 frames, 15 frames, etc.), and the next preset frame (for example: 15 frames, 20 frames, etc.), and then calculate the center point of each frame The center point is averaged to obtain the center point 82 of the current frame.
  • the second type please refer to Figure 9, after determining the coordinate frame 50 of the target object, move the left border 50a to the left by a third preset width W3 to obtain the left cropping edge 81a (in the specific implementation process, you can also directly
  • the left border 50a is used as the left cropping edge 81a
  • the right border 50b is moved to the right by the fourth preset width W4 to obtain the right cropping edge 81b (in the specific implementation process, the right border 50b can also be directly used as the right cropping edge 81b)
  • the upper border 50c is moved upward by a third preset height H3 as the upper cutting edge 81c (in the specific implementation process, the upper border 50c can also be used as the upper cutting edge 81c)
  • the lower border 50d is used as the lower cutting edge 81d (in the specific implementation process)
  • the aforementioned third preset width W3, fourth preset width W4, third preset height H3, and fourth preset height H4 may all be the same, partly the same, or all different, for example: 100 pixels, 200 pixels, 300 pixels And so on; it is, for example, 0.1 times, 0.2 times, etc., the width of the original video frame; it is, for example, 0.05 times, 0.15 times, etc., the height of the original video frame.
  • the lower trimming edge 81d can be determined in the following manner: determining a preset joint point closest to the lower border 50d, and determining the trimming position corresponding to the preset joint point as the lower trimming edge 81d.
  • the preset joint points are, for example, ankle joints, knee joints, hip joints, etc., among which the joint points of a character can be determined by a key point detection algorithm.
  • the key point recognition technology is, for example: Pictorial Structure algorithm, top-down Key point detection algorithm, bottom-up human key point detection algorithm, etc.
  • the determined joint points are shown in Figure 10.
  • the cropping position is usually the preset joint point moving up the preset distance.
  • the preset distance is for example: 1 a fixed value, such as 30 pixels, 40 pixels, etc.; 2 a specific proportion of the total height of the human body, such as : 1/20, 1/30, etc. of the total height of the human body; 3Specific ratio of the total height of the video frame, for example: 1/40, 1/50, etc.
  • the last specific joint point is for example : Knee joint; if the current joint point is a knee joint, the last specific joint point is, for example, a hip joint; if the current joint point is a hip joint, the last specific joint point is, for example, an elbow joint, etc.
  • the lower cropping edge 81d can be determined in the following manner: the cropping frame 81 is determined based on the user's historical operation record on the video (and/or image). For example, according to the user's historical collection record or historical cropping record for the video (and/or image), the cropping frame that the user likes most is determined, for example: the distance between the target object in the cropping frame and each frame of the video frame.
  • the cropping frame 81 can be adjusted.
  • a variety of adjustment methods can be used. The following lists several of them for introduction. Of course, in the specific implementation process , Not limited to the following situations.
  • the first type is to determine a preset joint point closest to the lower cutting edge 81d, and determine the cutting position corresponding to the preset joint point; move the lower cutting edge 81d to the cutting position.
  • the preset joint points are, for example, ankle joints, knee joints, hip joints, etc., among which the joint points of a character can be determined by a key point detection algorithm.
  • the key point recognition technology is, for example: Pictorial Structure algorithm, top-down Key point detection algorithm, bottom-up human key point detection algorithm, etc.
  • the determined joint points are shown in Figure 9. (Mainly for the first and second methods above, the cropping frame 81 is determined)
  • the cropping position is usually a preset joint point moving up a preset distance.
  • the preset distance is for example: 1 a fixed value, such as 30 pixels, 40 pixels, etc.; 2 a specific proportion of the total height of the human body, For example: 1/20, 1/30 of the total height of the human body, etc.; 3The specific ratio of the total height of the video frame, such as: 1/40, 1/50, etc.
  • the last specific joint point is for example : Knee joint; if the current joint point is a knee joint, the last specific joint point is, for example, a hip joint; if the current joint point is a hip joint, the last specific joint point is, for example, an elbow joint, etc.
  • the second type is to determine the image cut-off position based on the user's historical operation data for the video (and/or image), and adjust the lower cropping edge 81d based on the image cut-off position.
  • the historical operation data may include historical video (image) collection data, historical video (image) operation data, and so on.
  • the image can be divided into multiple human body cut-off regions based on the joint points of the human body in advance, the duration of each image cut-off region in the video frames collected by the user's history is determined, and the user's favorite is determined based on the duration of each image cut-off region.
  • the image cut-off area (the image cut-off area with the longest appearance time), the lower cropping edge 81d is adjusted based on the image cut-off area.
  • the captured video can also be split into multiple frames of images, and then combined with other images captured in the electronic device, the image cut-off area of each image is determined, and the image cut-off area with the most occurrences is determined based on The image cut-off area adjusts the lower cropping edge 81d.
  • the video cropped by the user can also be split into multiple frames of images, and then combined with other images cropped by the user in the electronic device, the image cut-off area of each image is determined, and the image with the most occurrences is determined The cut-off area, the lower cropping edge 81d is adjusted based on the cut-off area of the image.
  • the corresponding relationship between the image cut-off area and the lower cropping edge 81d can be preset, for example, as shown in Table 1:
  • the corresponding lower cropping edge 81d is determined based on the corresponding relationship, and the lower cropping edge 81d of the determined cropping frame 81 is adjusted based on the determined lower cropping edge 81d.
  • the lower cut-off frame After the image cut-off area is determined, it can be judged whether the lower cropping edge 81d is located in the image cut-off area. If it is located, the lower cut-off frame does not need to be adjusted. If it is not located, the lower cut-off frame can be adjusted to The image cut-off area.
  • the number of target objects in the video frame can be determined first, and when the number of target objects is not greater than a preset threshold (for example, when 1, 2), Then adjust the lower cropping edge 81d through the image cut-off area; if the number of target objects is greater than the preset threshold, there is no need to adjust the lower cropping edge 81d through the image cut-off area. Based on this solution, it is possible to prevent the number of targets from being too large, and using this solution to determine that part of the target object caused by the lower clipping edge 81d has been clipped with too much content.
  • a preset threshold for example, when 1, 2
  • the lower cropping edge 81d before adjusting the lower cropping edge 81d based on the image cut-off area, it is also possible to determine the amount of movement of the current video frame relative to the previous frame; when the amount of movement is less than the preset amount of movement, perform the lower cropping edge 81d through the image cut-off area. Adjustment; if the amount of movement is not less than the preset amount of movement, the lower cropping edge 81d is not cropped through the point of the image cut-off area.
  • the preset motion amount is, for example, a preset ratio (for example: 0.02, 0.025, etc.) where the abscissa motion amount is less than the width of the video frame, and a preset ratio (for example: 0.025, 0.03, etc.) where the ordinate motion amount is less than the height of the video frame.
  • the cropping frame 81 can also check whether the aspect ratio of the cropping frame 81 meets the preset ratio (for example: 16:9, 4:3, etc.). In the case of not meeting the preset ratio, The cropping frame 81 can also be adjusted to meet the preset ratio. For example, if the aspect ratio is less than the preset ratio, you can increase the width to make the aspect ratio meet the preset ratio. If the aspect ratio is greater than the preset ratio, If the ratio is set, the aspect ratio can be made to meet the preset ratio by increasing the height. Of course, the cropping frame 81 can also be adjusted in other ways to make the aspect ratio meet the preset threshold, which is not listed in detail in this embodiment of the present invention, and No restrictions. Wherein, if the solution is applied to a video communication process, the electronic device can obtain the aspect ratio of the display screen (or video display area) of the peer electronic device, and determine the preset ratio based on the aspect ratio.
  • the preset ratio for example: 16:9, 4:3, etc.
  • the method further includes: determining each target object Then determine whether the lower clipping edge 81d is located below the center lines of all target objects. If not, move the lower clipping edge 81d down until the lower clipping edge 81d is located below the center lines of all target objects ,
  • the center line is, for example, a line parallel to the X axis based on the midpoint of the vertical axis of each target object, a line parallel to the X axis based on the hip joint of each target object, etc. . Please continue to refer to Figure 5D.
  • This figure contains two target objects, namely the target object 60 and the target object 61.
  • the center line of the target object 60 is 60a, and the center line of the target object is 61a. Then the lower clipping edge 81d is determined. It should be located below the centerline 61a of the target object 61.
  • the method when adjusting the lower cropping edge based on the center line of each target object, the method further includes: determining the first preset key point of the first target object and the first predetermined key point of the second target object. Preset the relative distance between key points; determine whether the relative distance is greater than a preset threshold; if the relative distance is greater than the preset threshold, determine the first center line and the second center line.
  • the first preset key point is, for example, the head, neck, etc. of the target object; then the first preset key point of the first target object is the head of the first target object, and the head of the second target object The first preset key point is the neck of the second target object.
  • the preset threshold is, for example: 1 fixed value, such as 30 pixels, 40 pixels, 50 pixels, etc.; 2 preset ratio of the total height of the first target object or the second target object, for example: 1/4, 1/5, etc.; 3The preset ratio of the pre-cut frame, such as; 1/4, 1/6, etc.
  • the upper cropping edge of a pre-cropping frame is moved upward by a second preset distance and the lower cropping edge of the first pre-cropping frame is moved downward by a third preset distance to obtain the first cropping frame.
  • the final output video frame is the content in the first cropping frame.
  • the second preset distance and the third preset distance may be the same or different, for example: 1 a fixed value, such as 30 pixels, 40 pixels, etc.; 2 a specific proportion of the total height of the human body, such as: 1/20, 1/30, etc. of the total height of the human body; 3Specific ratio of the total height of the video frame, for example: 1/40, 1/50, etc. 4 The specific ratio of the first pre-cut frame, for example: 1/3, 1/4, etc.
  • the target object to be tracked can also be re-selected or switched. Several switching methods are listed below for introduction.
  • the touch display unit of the electronic device receives the click operation on the first object, and then controls the first object as the target object, and cancels other target objects.
  • the previous target objects are the person 41 and the person 42.
  • the electronic device outputs the third video frame.
  • the position of the first object in the third video frame is equal to the position of the second object in the second video frame. The location is different.
  • the target object 41 has not been displaced, but its relative position in the picture of the video frame has changed.
  • the electronic device When the electronic device detects the operation of clicking on the area where the person 41 is located on the touch display unit, it will crop the original video frame with the person 41 as the target object. In this case, even if the original video frame contains the person 42, the 42. In this case, the third video frame is output. The position of the first object in the third video frame is different from the position of the second object in the second video frame. The content of the original video frame of the video frame is the same.
  • the first object in the video frame generates a tracking operation; it is determined whether the first object is a target object determined based on user operations; if the first object is a target object determined based on user operations, the default subject tracking mode is restored ; If the first object is not the target object determined based on the user operation, the first object is taken as the target object, and other target objects are cancelled.
  • Restoring the default subject tracking mode is, for example, restoring the subject tracking mode in which the target subject automatically determined by the electronic device is tracked.
  • the user operation is, for example, a click operation, a voice command, etc. of the user of the electronic device, or a preset gesture, voice command, etc. of the user in a video frame.
  • the first object can be determined as the target object, and other target objects can be cancelled. (As shown in case (1) above)
  • the character 41 is currently determined as the target object based on the user's click operation, and the character 42 is cancelled as the target object.
  • the tracking operation generated by the character 41 is detected again (for example: Hand, Comparing Heart) and so on, the default subject tracking mode is restored, and both the character 41 and the character 42 are determined as the target object.
  • the tracking operation is determined to be a valid operation.
  • the electronic device can respond to the tracking operation, otherwise, the tracking operation is deemed to be an invalid operation, and the electronic device does not respond to the tracking operation.
  • the electronic device detects the tracking operation of the person 41 in the video frame (for example: raising hands, comparing hearts, etc.), then the person 41 is determined as the target object.
  • the subsequent original video frames are cropped with reference to the person 41 (for example: center, golden section point, etc.), so that the output video frame is the video frame output with reference to the person 41; the tracking of the person 41 is detected again After the operation, it is found that this operation is only 0.3 seconds away from the time of the last operation.
  • This operation is regarded as an invalid operation, and the original video frame is still cropped with the character 41 as the center; then the tracking operation of the character 41 is detected again , It is found that there is a time interval of 7 seconds between this operation and the last operation, the operation is deemed to be a valid operation, the characters 41 and 42 are jointly targeted, and the default subject tracking mode is restored.
  • the first object is the target object determined by the user operation, if a click operation on the first object is detected on the touch display unit of the electronic device, the first object is kept as the target object (other target objects at this time) Still cancelled).
  • the first object is the target object determined by the user operation, if it is detected that the first object in the original video frame generates a tracking operation, the first object is cancelled as the target object.
  • the operation that triggers the first object as the target object is a tracking operation
  • the tracking operation when the tracking operation is detected again, first determine whether the tracking operation is a valid operation, and if it is a valid operation, restore the default subject tracking mode; otherwise, , Keep the first object as the target object.
  • the electronic device detects the operation of clicking the character 42 on the touch display unit, and then controls the character 42 as the target object; subsequently, the electronic device detects that the character 42 in the original video frame has a tracking operation, in this case Next, restore the default subject tracking mode, and use the characters 41 and 42 as the target objects.
  • the electronic device detects the tracking operation of the person 42 in the original video frame, it controls the person 42 as the target object and cancels the person 41 as the target object; subsequently, the electronic device detects the tracking of the person 42 in the original video frame again Operation, first determine whether the tracking operation is a valid operation, and if the tracking operation is a valid operation, restore the default subject tracking mode; if the tracking operation is not a valid operation, keep the person 42 as the target object.
  • the target object is switched from the first object to the second object.
  • the operation of controlling the second object as the target object is, for example, the operation of clicking the area where the second object is located on the touch display unit of the electronic device, the tracking operation generated by the second object in the video frame, and so on.
  • the electronic device detects the operation of clicking the area where the person 41 (the second object) is located on the touch display unit, and then the person 41 is used as the target, and character 2 is cancelled as the target.
  • the character 41 is used as the reference (for example, the center, the golden section point, etc.) for cropping, without considering the position of the character 42, so the output video frame is based on the second object (the character 41) Video frame at the center or golden section point.
  • the electronic device detects that there is a tracking operation (for example, raising a hand) of the person 41 in the original video frame, and then cancels the person 42 as the target object, and uses the person 41 as the target object. target.
  • a tracking operation for example, raising a hand
  • the blank area refers to an area without a target object; or, the blank area refers to an area without a target object and no other moving objects.
  • the scene is to re-select or switch a single target object. Multiple target objects can also be selected or switched.
  • the area selection operation is, for example, drawing a closed area (such as a circle, a frame, etc.), and an object located in the closed area is used as a target object; the area selection operation is another example, a line operation, the operation of the line operation is The object passed by the scribed path is used as the target object and so on.
  • the original video frame contains five people, namely Person A, Person B, Person C, Person D, Person E, and then the raising of the hands of Person A in the original video frame is detected first, and Person A is taken as the target object; 1 Seconds later, if the hand-raising operation of the person C in the original video frame is detected, then the person C and the person A will be used as the target object; and 1 second later, the hand-raising operation of the person D in the original video frame will be detected.
  • Character C and Character D are the target objects together.
  • the original video frame contains five people. From left to right, they are Person A, Person B, Person C, Person D, and Person E.
  • the user of the electronic device generates the following voice command "For the first person from left to right , The third and the fourth person to track", the electronic device responds to the voice command and first recognizes the five persons contained in the original video frame, and then determines that the first person from left to right is Person A and the third person is Person C.
  • the fourth person is Person D, so Person A, Person C, and Person D are set as target objects.
  • the method further includes the following steps:
  • S340 Continue to acquire the original video frame, and determine whether the first target object exists in the previous original video frame and does not exist in the current original video frame.
  • the person (target object) contained in it can be identified through the human body recognition technology, and then the person contained in the current original video frame and the person contained in the previous original video frame can be compared to determine the previous The person who is collected in the original video frame but not collected in the current original video frame is taken as the first target object.
  • FIG. 13A suppose that the previous original image frame is shown in FIG. 13A, and FIG. 13A contains two target objects, namely the target object 41 and the target object 42, where the target object 42 is located at the edge of the original image frame.
  • the current original image frame is shown in FIG. 13B, and the person 42 is no longer in the original image frame, so it can be determined that the person 42 is the first target object.
  • the first preset duration may be represented by time, such as 2 seconds, 3 seconds, etc., and the first preset duration may also be represented by the number of video frames, such as 20 frames, 30 frames, 40 frames, and so on.
  • the electronic device After detecting the presence of the aforementioned first target object, the electronic device can start timing.
  • the original position of the first target object is, for example, the position 42a of the first target object 42 when it appeared in the original video frame for the last time.
  • the original video frame is cropped using the original position 42a of the person 41 and the first target object 42 as a reference, and the output video frame is, for example, as shown in FIG. 13E.
  • the original position of the first target object is no longer considered when cropping the original video frame, but the remaining target objects in the original video frame are considered, and the output video frame is based on the original video frame
  • the video frame determined by the remaining target object in the video frame, the output video frame is, for example, as shown in FIG. 13F.
  • the target object appears in the field of view of the camera after briefly exiting the field of view of the camera, which can ensure the smoothness of the video frame picture.
  • an embodiment of the present invention provides a video capture method. Please refer to FIG. 14, which includes the following steps:
  • S1400 Capture to obtain a video frame; the capture is similar to step S300, and details are not described herein again.
  • step S1410 Output the first video frame on the display unit of the electronic device; this step is similar to step S310, and will not be repeated here.
  • S1420 In the video capture mode, enter the subject tracking mode. This step is similar to S320 and will not be repeated here.
  • S1430 In the subject tracking mode, output the second video frame, the first video frame and the second video frame both contain the target object, the display ratio of the target object in the second video frame and the display ratio of the target object in the first video frame Different, and the relative position of the target object in the second video frame is different from the relative position of the target object in the first video frame. This step is similar to S330, and will not be repeated here.
  • S1440 The screen switching operation is detected, and the third video frame is output, where the screen displayed in the third video frame moves relative to the screen displayed in the second video frame when the target object is not displaced.
  • the screen switching operation can be a variety of different operations, so that the way the screen moves is also different. Four of them are listed below for introduction. Of course, in the specific implementation process, it is not limited to the following four situations.
  • the first type please refer to step S1440A, in response to the first screen switching operation, the screen of the video frame is switched to the left, wherein the screen switching can be realized by adjusting the cropping frame 81.
  • Two of these adjustment methods are listed below. Of course, in the specific implementation process, they are not limited to the following two situations.
  • the center point of the target object of the previous original video frame extends upward by 1/4 of the height of the original video frame as a parallel line as the upper cropping edge 81c, and the center point of the previous original video frame extends downward by 1/4
  • the height of each video frame is drawn as a parallel line as the lower cropping edge 81d.
  • the video frame finally output after the screen is switched to the left based on this scheme is as shown in, for example, FIG. 15B. Therefore, the final output video frame is the left part of the original video frame.
  • the cropping frame 81 2Move the cropping frame 81 to the left by a second preset distance as a whole, for example: 20%, 30%, etc., of the width of the original image frame.
  • the cropping frame 81 can be floated up and down by a first preset distance.
  • the output video frame is a picture determined by moving the second video frame to the left by a second preset distance.
  • the first preset operation is, for example, the operation of pointing the arm of the person in the video frame (any person or just the target object) to the left, dragging the video frame to the right on the touch display unit, and generating voice Instructions and so on.
  • the key point recognition technology can identify the joint points of the person in the video frame, and then determine the coordinates of the elbow joint and wrist joint of the hand to determine whether there is an operation of the arm pointing to the left, for example: if the person faces the camera, the elbow joint and the wrist If the ordinate values of the joints have little difference, and the abscissa of the elbow joint is greater than the abscissa of the wrist joint, it can be determined that there is an operation of pointing the arm to the left.
  • step S1440B in response to the second screen switching operation, the screen of the video frame is switched to the left.
  • the screen switching can be realized by adjusting the cropping frame 81.
  • Two of these adjustment methods are listed below. Of course, in the specific implementation process, they are not limited to the following two situations.
  • the cropping frame 81 2Move the cropping frame 81 to the right by a third preset distance as a whole, for example: 20%, 30%, etc. of the width of the original image frame.
  • the cropping frame 81 can be floated up and down by a first preset distance. Therefore, the finally output video frame is a picture in which the picture of the second video frame moves to the right by a third preset distance.
  • step S1440C in response to the third screen switching operation, the screen of the video frame is switched upward.
  • the third screen switching operation is, for example, an operation of dragging the display unit from top to bottom, an operation of swinging an arm from top to bottom, a voice command, and so on.
  • the implementation of the screen upward switching is similar to the implementation of the screen switching to the left and right, and will not be repeated here.
  • step S1440D in response to the fourth screen switching operation, the screen of the video frame is switched downward.
  • the fourth screen switching operation is, for example, an operation of dragging the display unit from bottom to top, an operation of swinging an arm from bottom to top, a voice instruction, and so on.
  • the implementation of the screen downward switching is similar to the implementation of the screen switching to the left and right, and will not be repeated here.
  • the preset time is, for example, 2 seconds, 3 seconds, etc., which is not limited in the embodiment of the present invention. How to crop the video frame based on the target object has already been introduced, so I will not repeat it here.
  • the above video capture method may also include the following steps in response to the focus concentration operation to focus the focus on the character itself, for example: enlarging the proportion of the character in the video frame, blurring the background area, and Characters increase special effects and so on.
  • the fifth preset operation is, for example, the operation of double-clicking the area where the person is located in the video frame, the operation of generating a specific gesture, and so on.
  • the method further includes: responding to a user operation, exiting the subject tracking mode, and outputting a fourth video frame.
  • the display ratio of the target object in the fourth video frame is compared with the ratio of the target object in the second video frame.
  • the display scale is different, and the relative position of the target object in the fourth video frame is different from the relative position of the target object in the second video frame.
  • the display ratio of the target object in the fourth video frame is smaller than the display ratio of the target object in the second video frame.
  • the fourth video frame is similar to the second video frame, and will not be repeated here.
  • the method further includes: in response to the zooming operation, outputting a fifth video frame, where the display size of the target object in the fifth video frame is larger than the display size of the target object in the second video frame.
  • the zoom-in operation is, for example, a preset gesture (for example, pushing the palm outward, spreading five fingers, etc.), voice commands, and so on.
  • Figure 16 it is a schematic diagram of the comparison between the second video frame and the fifth video frame, where the display size of the target object can be gradually enlarged based on the zoom-in operation, thereby outputting a multi-frame video frame with an increasingly larger display size of the target object
  • the fifth video frame (1) is output
  • the fifth video frame (2) is output, so as to achieve a smooth transition.
  • the third embodiment of the present invention provides a video capture method, please refer to FIG. 17, including:
  • S1700 Capture and obtain video frames, this step is similar to S300, and will not be repeated here;
  • the specific object may be a single specific object, or at least two specific objects, and the specific object may be a person, an animal, other moving objects, and so on.
  • the target object can be determined in a variety of ways, some of which are listed below for introduction. Of course, in the specific implementation process, it is not limited to the following situations.
  • the first type is to determine the specific object by selecting the specific object in the video frame.
  • FIG. 18A is a schematic diagram of a video frame in the process of video communication.
  • the video frame contains 5 characters, namely Person A 18a, Person B 18b, Person C 18c, Person D 18d, Person E 18e.
  • a selection operation for example: click, slide, etc.
  • the electronic device responds to the selection operation, thereby determining the character B 18b For a specific object.
  • the user can select one specific object or multiple specific objects.
  • the user can select multiple specific objects through multiple selection operations, and the user can also select multiple specific objects through one operation.
  • the user can select multiple specific objects.
  • Finger selection each finger corresponds to a target, so that two specific objects can be selected at the same time, and so on.
  • the second type is to locate the sound source through the microphone of the electronic device, and determine that the person in the area where the sound source is located is a specific object.
  • the solution can be applied to a scene where many people discuss and a scene where many people sing.
  • Let’s take the example of a video frame that contains character A 18a, character B 18b, character C 18c, character D 18d, and character E 18e. These five people are discussing a problem.
  • character B 18b speaks, and character B is determined.
  • 18b is a specific object, and when the character 18d speaks at the second moment, the character 18d is determined to be the specific object, and so on. By locating a specific object, the person who is currently speaking can be determined.
  • the third type is to determine all the characters contained in the video frame through the person recognition technology, and determine the person in the middle position as the specific object. Still taking the example of the video frame containing the character A 18a, the character B 18b, the character C 18c, the character D 18d, and the character E 18e shown in FIG. 18A, the electronic device identifies the location of the five characters and determines the character C 18c is located in the middle position, so character C is identified as a specific object.
  • the fourth type is to receive the user's voice instruction, and determine the specific object through the voice instruction.
  • the user of the electronic device says: “Set the spotlight effect for the person in the middle”
  • the specific object is determined to be the person in the middle (for example: person 18c)
  • the user of the electronic device says “Set the spotlight effect for the highest person”
  • the specific object is determined to be the tallest person in the video frame, and so on.
  • the fifth type is to perform gesture recognition on the person collected in the video frame; the person who uses the preset gesture is determined as a specific object, and the preset gesture is, for example, raising a hand, waving a hand, and so on.
  • the specific object after the specific object is determined, the specific object can be directly controlled to enter the spotlight mode; after the specific object is determined, the preset operation can be received, and the electronic device responds to the preset operation to control the specific object to enter the spotlight mode. Mode, highlight specific objects in this spotlight mode.
  • the user of the electronic device generates a preset operation, which is used to control the video communication to enter the spotlight mode.
  • the preset operation is, for example, the operation of clicking a preset button representing the spotlight mode, and generating a voice for entering the spotlight mode.
  • the electronic device determines a specific object based on user operations (step S1710).
  • the specific determination method has been described above and will not be repeated here.
  • the user of the electronic device After the specific object is determined, the user of the electronic device generates a preset operation, and the electronic device responds to the preset operation to control the specific object to enter the spotlight mode (step S1720).
  • the spotlight mode refers to the mode for highlighting specific objects.
  • the specific object is displayed in color, the content other than the specific object is displayed in black and white, the content other than the specific object is displayed in a virtualized manner, and so on.
  • the above-mentioned electronic devices and the like include hardware structures and/or software modules corresponding to the respective functions.
  • the embodiments of the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the embodiments of the present invention.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiment of the present invention is illustrative, and is only a logical function division, and there may be other division methods in actual implementation. The following is an example of dividing each function module corresponding to each function:
  • the methods provided in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software When implemented by software, it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, a network device, an electronic device, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server, or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, SSD).
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及图像处理领域,公开了一种视频处理方法、电子设备、计算机可读存储介质、计算机程序产品及芯片,以解决现有技术中视频通话对用户跟踪需要设置转动的摄像头的技术问题。该方法包括:获取第一视频帧,确定所述第一视频帧中包含有至少一个目标对象;以所述至少一个目标对象为中心,确定裁剪框;获取所述裁剪框中的内容,作为第二视频帧进行显示。该方法可用于人工智能设备,该方法和深度学习等技术相关。

Description

一种视频处理方法及电子设备
本申请要求在2019年12月19日提交中国国家知识产权局、申请号为201911315344.0的中国专利申请的优先权,发明名称为“一种视频处理方法及电子设备”的中国专利申请的优先权,在2020年7月30日提交中国国家知识产权局、申请号为202010753515.4的中国专利申请的优先权,发明名称为“一种视频处理方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉领域,尤其涉及一种电子设备的视频处理方法及电子设备。
背景技术
随着电子设备的普及,尤其是智能移动电子设备如手机的普及,手机上集成了越来越多的应用,涉及到人们日常生活中的方方面面。通常电子设备都包含摄像头,能够通过该摄像头进行图像采集、视频拍摄。
现有技术中,在进行视频采集时,如果需要对目标物体进行跟拍,则需要摄像头具备云台,通过云台旋转摄像头的拍摄角度,从而实现跟拍,该方案存在着成本较高的技术问题。
发明内容
本申请提供的一种视频采集方法及电子设备,以解决现有技术中通过电子设备实现跟拍时成本较高的技术问题。
本申请实施例提供一种视频处理方法,其应用于电子设备,包括:获取第一视频帧,确定所述第一视频帧中包含有至少一个目标对象;以该确定的目标对象为中心,确定裁剪框;将该裁剪框中的内容作为第二视频帧。该电子设备可以显示该第二视频帧而不显示第一视频帧,或者显示第一视频帧后,响应于用户对预设功能的启动,显示第二视频帧。这样无需硬件改进,通过采用本视频处理方法,即可实现对目标对象的聚焦,在目标对象移动的场景下,能呈现自动跟踪目标对象,在视频帧中该目标对象始终在视觉中心位置,因此能低成本的实现自动追踪功能,降低对硬件的要求,因此降低功能的实现难度。
本申请实施例还提供一种电子设备,该电子设备执行本发明各方法实施例中的方法。具体的,该电子设备包括一个或多个处理器;一个或多个存储器;多个应用程序;以及一个或多个计算机程序,其中该一个或多个计算机程序被存储在该一个或多个存储器中,计算机程序包括指令,当指令被一个或多个处理器执行时,使得电子设备执行方法实施例中的方法,例如:获取第一视频帧,确定该第一视频帧中包含有至少一个目标对象;以该至少一个目标对象为中心,确定裁剪框;获取该裁剪框中的内容,作为第二视频帧指示进行显示。本发明提供的电子设备能提供视频的自动目标跟踪功能,在视频通话,视频拍摄,自播等场景下, 达到景随人动,移步换景的效果;其实现既不需要配置云台之类的硬件设施,也不需要摄影师手动操作,也无需安装专门图像处理应用,即能实现图像的自动处理,在不影响视频流畅度的前提下,实现目标对象的智能识别和图像的自动处理,提高了视频通信的画面质量,增加了人机交互的智能性。
附图说明
图1A为本发明实施例的电子设备的结构图;
图1B为本发明实施例的软件框架图;
图2为本发明实施例第一方面的视频采集方法的流程图;
图3为本发明实施例的系统通话应用的界面示意图;
图4为本发明实施例的视频控制方法的流程图;
图5A为本发明实施例的第一视频帧的示意图;
图5B为本发明实施例的第二视频帧的示意图;
图5C为本发明实施例的单人的坐标框的示意图;
图5D为本发明实施例的双人的坐标框的示意图;
图6为本发明实施例中确定裁剪框的第一种方式的流程图;
图7为本发明实施例中裁剪宽度与δW/Width的对应关系;
图8A-8C为本发明实施例中确定裁剪框的第一种方式的示意图;
图9为本发明实施例中确定裁剪框的第二种方式的示意图;
图10为本发明实施例中检测出的人体关节点的示意图;
图11为本发明实施例中关节点与裁剪位置的对应关系的示意图;
图12为本发明实施例中调整目标对象之后所输出的第三视频帧的示意图;
图13A为本发明实施例中包含两个人物的原始视频帧的示意图;
图13B为本发明实施例中图13A中的两个人有一个不在原始视频帧的示意图;
图13C为本发明实施例中图13B中离开的目标对象又回到视频帧的示意图;
图13D为本发明实施例中基于图13C所示的原始视频帧所输出的视频帧的示意图;
图13E为本发明实施例中基于图13B所示的原始视频帧在预设时间段所输出的视频帧的示意图;
图13F为本发明实施例中基于图13B所示的原始视频帧在预设时间段之后所输出的视频帧的示意图;
图14为本发明实施例第二方面所介绍的视频采集方法的流程图;
图15A为本发明实施例所介绍的又一原始视频帧的示意图;
图15B为本发明实施例中基于图15A的原始视频帧输出的视频帧的示意图;
图15C为本发明实施例中将图15B所示的画面向左切换之后的示意图;
图15D为本发明实施例中将图15B所示的画面向左切换之后的示意图;
图16为本发明实施例中对视频帧进行放大操作的示意图;
图17为本发明实施例第三方面提供的视频采集方法的流程图;
图18A为本发明实施例中采集的视频帧中包含多个人的示意图;
图18B为本发明实施例中对18A中的人物添加聚光灯效果的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
首先,介绍本申请以下实施例中提供的示例性的电子设备100。
图1A示出了电子设备100的结构示意图。
下面以电子设备100为例对实施例进行具体说明。应该理解的是,图1A所示电子设备100仅是一个范例,并且电子设备100可以具有比图1A中所示的更多的或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
电子设备100可以包括:处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。电子设备100的详细结构介绍,请参考在先专利申请:CN201910430270.9。
图1B是本申请实施例的电子设备100的软件结构框图。分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在 一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。应用程序层可以包括一系列应用程序包。软件功能的详细介绍,请参考在先专利申请:CN201910430270.9。
第一方面,本发明实施例提供一种视频采集控制方法,应用于电子设备,该电子设备为不包含云台的电子设备,从而该电子设备的摄像头无法旋转,请参考图3,该方法包括以下步骤:
S200:显示第一视频帧,第一视频帧中包含第一目标对象,且第一目标对象位于第一视频帧的非指定区域,在采集第一视频帧时,所述第一目标对象位于第一地理位置。
S210:响应于检测到进入目标跟踪模式的操作,显示第二视频帧,第二视频帧中包含第一目标对象,第一目标对象位于第二视频帧的指定区域,在采集第二视频帧时,第一目标对象位于第一地理位置;
S220:显示第三视频帧,第三视频帧中包含第一目标对象,第一目标对象位于第三视频帧的指定区域,在采集第三视频帧时,第一目标对象位于第二地理位置,第二地理位置与第一地理位置之间的距离大于预设距离。
在具体实施过程中,在基于步骤S200显示第一视频帧之前,需要先采集获得原始视频帧,基于该采集获得的原始视频帧获得输出的视频帧,例如:直接输出该原始视频帧、对该原始视频帧进行各种处理之后获得输出的视频帧等等。
在具体实施过程中,用户可以通过针对相机APP(Application:应用程序)的视频采集操作,从而采集获得原始视频帧,例如:用户首先打开相机APP,然后点击视频采集按钮(该点击视频采集按钮的操作即为视频采集操作),电子设备检测到该视频采集按钮的操作之后,控制摄像头进行视频采集获得原始视频帧。
电子设备也可以在用户视频聊天过程中,采集获得原始视频帧。例如:用户希望通过即时通信APP与对端用户进行视频通信,可以开启即时通讯软件(例如:微信、QQ等等),然后进入联系人的聊天界面,然后点击“视频通信”按钮,电子设备检测到点击“视频通信”按钮的操作之后,开启与该联系人的视频通信功能,然后开启摄像头,采集获得原始视频帧。
又例如,用户还可以通过系统默认的视频聊天功能与对端用户进行视频聊天,例如,请参考图3,电子设备的通信功能中包含有畅连通话功能(该畅连通话功能指的是视频通信功能),电子设备的用户打开系统通话应用之后,展示系统通话应用的主界面,该系统通话应用的主界面包含电话控件31、联系人控件32、收藏控件33和畅连通话控件30,电话控件31用于触发显示电子设备显示最近的通话记录,包括所有通话和未接来电;联系人控件32用于触发电子设备显示通话应用的所有联系人信息;个人收藏控件33用于触发电子设备将一些联系人添加为个人收藏,通过将联系人添加为个人收藏,可以针对该联系人进行快捷通信,比如:快捷发送短信、快捷拨打电话等等;畅连通话控件30则用于触发电子设备开启视频通信功能。
在一种可选的实施方式中,该原始视频帧可以为通过电子设备的前置摄像头、后置摄像头采集的视频帧;在另一种可选的实施方式中,该原始视频帧也可以为与该电子设备存在通信的其他视频采集设备采集的视频帧,例如:该电子设备与家里的安防摄像头存在数据传输通道,则可以获得安防摄像头采集的视频帧;该电子设备与无人机存在数据传输通道,则可以获得无人机采集的视频帧。其中,可以将其他视频采集设备采集的视频流注册为系统的虚拟相机,在视频通话过程中,即时通信软件调用该虚拟相机采集的视频帧作为视频通信过程。
步骤S200中,在采集获得原始视频帧之后,可以直接输出该原始视频帧,则该原始视频帧即为第一视频帧,也可以先对该原始视频帧进行图像处理(例如:美化、裁剪等等),然后输出处理后的视频帧(处理后的视频帧即为第一视频帧等等)。
步骤S200中,第一物理位置可以为任意位置,其可以为经度、维度、海拔高度等参数表征。第一目标对象可以包含一个或至少两个目标对象。非指定区域指的是第一视频帧中除指定区域43之外的其他区域,指定区域43例如为:中心区域、黄金分割点区域、电子设备的用户指定其他任意区域。如图2所示,图2中的指定区域43指的是中心区域,其中,目标对象位于指定区域43,例如为:目标对象的中心点位于指定区域43,目标对象位于非指定区域,例如:目标对象的中心点位于非指定区域。
步骤S210中,在目标跟踪模式下可以通过对采集的原始视频帧进行裁剪的方式获得输出的视频帧,例如:通过对原始视频帧进行裁剪获得第二视频帧、第三视频帧等等。
如图2所示可知,第二视频帧中第一目标对象的地理位置并未发生变化,依然是第一地理位置,但是第一目标对象位于第二视频帧的中心区域(也即:指定区域43)。其中,可以以第一目标对象为参考对原始图像帧进行裁剪,从而获得第二视频帧。具体如何以第一目标对象为参考对原始图像帧进行裁剪,将在后续介绍。
在具体实施过程中,所述第二视频帧中所述第一目标对象的显示尺寸大于所述第一视频帧中所述第一目标对象的显示尺寸。例如:第一视频帧中第一目标对象的显示尺寸是第二视频帧中第一目标对象的显示尺寸的0.5倍、0.6倍等等。其中,基于第一目标对象在第一视频帧中的宽度不同,第一视频帧中第一目标对象的显示尺寸与第二视频帧中第一目标对象的显示尺寸的比例也不同。
步骤S210中,在进入目标跟踪模式之后,需要先确定出第一目标对象,以在后续视频帧中实现对第一目标对象的跟踪,第一目标对象可以采用多种方式确定,下面列举其中的两种进行介绍,当然,在具体实施过程中,不限于以下两种情况。
在具体实施过程中,视频帧中的目标对象可以由电子设备自动确定的目标对象,也可以由基于用户操作而确定的目标对象,下面针对这两种情况分别予以介绍,当然,在具体实施过程中,不限于以下两种情况。
第一种,该目标对象由电子设备自动确定。
示例来说,在进入主体跟踪模式后,电子设备基于目标对象的预设条件自动确定出视频帧中的目标对象,该预设条件例如为:视频帧中所有人、视频帧中的动物、视频帧中的其他活动物体等等。以该预设条件为视频帧中的人为例,可以基于人体检测技术识别出视频帧中包含的所有人,然后将所有人确定为目标对象。
又或者,该预设条件为:视频帧中满足跟踪条件的人(或动物、其他活动物体等),满足跟踪条件为距离原始视频帧(即通过摄像头采集到、尚未进行尺寸处理的视频帧)的左边缘大于第一预设距离、且距离原始视频帧的右边缘大于第二预设距离的人,第一预设距离与第二预设距离可以相同或不同,其例如为:150像素、200像素等等,或者第一预设距离、第二预设距离例如为图像宽度的0.1倍、0.2倍等等。基于该方案,能够保证被跟踪的目标对象为摄像头的视野中较为居中的人。满足跟踪条件还例如为:在原始视频帧中的面积大于预设面积,预设面积例如为:1万像素、2万像素等等。在这种情况下,对于原始视频帧中未采集全的人(动物、或其他活动物体)、或者位于原始视频帧的边缘不太显眼的人(或动物、或其他活动物体),则不属于被跟踪的目标对象,以使视频跟踪更具有针对性。
第二种,该目标对象基于用户操作确定。
(1)基于电子设备的用户的第一预设操作确定目标对象。例如:在电子设备的显示单元上显示视频帧(可以为原始视频帧、也可以为经处理过的视频帧),用户通过手指点击视频帧中的某个人,电子设备检测到该点击操作之后,则该人为目标对象;又例如:电子设备的用户产生如下语音指令“跟随画面中穿黄衣服的人”,则电子设备识别出视频帧中穿黄衣服的人,并将该人确定为目标对象等等。
(2)基于原始视频帧中的人物的第二预设操作确定出目标对象,该第二预设操作例如为:跟踪手势、语音指令等等。例如:电子设备采集获得原始视频帧之后,通过人体检测技术识别出原始视频帧中的每个人,然后通过关键点识别技术识别出每个人的关节点(例如:头、颈、肩、手掌、手腕、肘关节等等),基于各个关节点的位置关系判断是否存在跟踪手势,该跟踪手势例如为:举手、比心等等,其中,可以通过手掌、手腕、肘关节是否大致位于一条直线、以及手掌的纵坐标是否高于手腕的纵坐标、手腕的纵坐标是否高于肘关节的纵坐标来确定是否存在举手手势。又例如,电子设备检测到视频帧中的某用户产生语音指令“跟我”,则将该用户确定为目标对象等等。
其中,在目标对象为基于用户操作确定的情况下,该用户操作可以同时为触发进入主体跟踪模式的操作,例如:在电子设备进入视频采集模式之后,并未默认进入主体跟踪模式,也未检测到进入主体跟踪模式的其他操作,则电子设备在检测到用于确认目标对象的用户操作之后,通过响应该用户操作既进入主体跟踪模式、又确定目标对象。
又或者,电子设备在检测到确定目标对象的用户操作时,先判断视频采集过程是否已进入主体跟踪模式,如果已进入主体跟踪模式的话,则基于该用户操作确定出目标对象,然后以该目标对象为中心进行主体跟踪;如果视频采集过程没有进入主体跟踪模式,则不响应该该用户操作。
本发明实施例所确定出的目标对象,可以为一个或至少两个。
步骤S220中,由于电子设备并未接收到退出目标跟踪模式的操作,因此电子设备依然处理目标跟踪模式,在目标跟踪模式下,电子设备显示的视频帧随着第一目标对象的移动而移动,保持第一目标对象处于视频帧的指定位置43。请继续参考图2,从图2可以看出在第三视频帧中,第一目标对象已经远离塔,而朝向树走去,因此所输出的第三视频帧与第二视频帧的画面存在较大差异,但是第一目标对象依然处于视频帧的指定区域43。
第二方面,请参考图3,本发明实施例提供一种视频采集方法,包括以下步骤:
S300:采集获得视频帧;
在具体实施过程中,用户可以通过针对相机APP(Application:应用程序)的视频采集操作,从而采集获得视频帧,例如:用户首先打开相机APP,然后点击视频采集按钮(该点击视频采集按钮的操作即为视频采集操作),电子设备检测到该视频采集按钮的操作之后,控制摄像头进行视频采集获得视频帧。
电子设备也可以在用户视频聊天过程中,采集获得视频帧。例如:用户希望通过即时通信APP与对端用户进行视频通信,可以开启即时通讯软件(例如:微信、QQ等等),然后进入联系人的聊天界面,然后点击“视频通信”按钮,电子设备检测到点击“视频通信”按钮的操作之后,开启与该联系人的视频通信功能,然后开启摄像头,采集获得视频帧。
又例如,用户还可以通过系统默认的视频聊天功能与对端用户进行视频聊天,例如,请参考图2,电子设备的通信功能中包含有畅连通话功能(该畅连通话功能指的是视频通信功能),电子设备的用户打开系统通话应用之后,展示系统通话应用的主界面,该系统通话应用的主界面包含电话控件31、联系人控件32、收藏控件33和畅连通话控件30,电话控件31用于触发显示电子设备显示最近的通话记录,包括所有通话和未接来电;联系人控件32用于触发电子设备显示通话应用的所有联系人信息;个人收藏控件33用于触发电子设备将一些联系人添加为个人收藏,通过将联系人添加为个人收藏,可以针对该联系人进行快捷通信,比如:快捷发送短信、快捷拨打电话等等;畅连通话控件30则用于触发电子设备开启视频通信功能。
在本发明实施例中,该视频帧可以为通过电子设备的前置摄像头、后置摄像头采集的视频帧;也可以为与该电子设备存在通信的其他视频采集设备采集的视频帧,例如:该电子设备与家里的安防摄像头存在数据传输通道,则可以获得安防摄像头采集的视频帧;该电子设备与无人机存在数据传输通道,则可以获得无 人机采集的视频帧。其中,可以将其他视频采集设备采集的视频流注册为系统的虚拟相机,在视频通话过程中,即时通信软件调用该虚拟相机采集的视频帧作为视频通信过程。
S310:在电子设备的显示单元上,输出第一视频帧;
请参考图5A,为电子设备输出的第一视频帧的示意图,其中40为输出的视频帧的外边框。如果该方案用于视频采集过程中,输出第一视频帧例如为将输出在显示单元的视频预览界面上给用户预览;如果该方案应用于视频通信过程,输出第一视频帧例如为将第一视频帧传输至对端电子设备、将第一视频帧显示于视频通信界面等等。
在一种可选的实施例中,在视频采集过程中可以设置聚光灯模式,在聚光灯模式下可以针对特定对象设置聚光灯效果,也即对特定对象进行突出显示,例如:对特定对象设置聚光灯效果(设置高光)、控制特定对象彩色显示、将特定对象之外的其他内容黑白显示、将特定对象之外的其他内容虚化显示,为特定对象增加特效等等。
在具体实施过程中,该特定对象可以采用多种方式确定,下面列举其中的两种进行介绍,当然,在具体实施过程中,不限于以下两种情况。
第一种,通过针对视频帧中特定对象的选择操作,确定出该特定对象。示例来说,该选择操作例如为:点击操作、滑动操作等等。用户可以选择一个特定对象,也可以选择多个特定对象,例如:用户可以通过多次选择操作选择出多个特定对象,用户也可以通过一个操作选择多个特定对象,例如:用户通过双指选择,每个手指对应一个目标,从而同时选择两个特定对象等等。
第二种,通过电子设备的麦克风对声源进行定位,确定出声源所在区域的人物为特定对象。示例来说,该方案可以应用于多人讨论的场景、多人演唱的场景。以视频帧中包含人物甲40、人物乙41、人物丙42、人物丁43为例,这五人在讨论问题,在第一时刻,由人物乙41发言,则确定人物乙41为特定对象,为人物乙41添加聚光灯效果;在第二时刻由人物丁43发言,则确定人物丁43为特定对象,为人物丁43添加聚光灯效果,且取消人物乙的聚光灯效果,依次类推。通过定位出特定对象,能够确定出当前发言的人。
第三种,通过人体识别技术确定出视频帧中包含的所有人物,将位于中间位置的人物确定为该特定对象。以视频帧中包含人物甲、人物乙、人物丙、人物丁、人物戊为例,则电子设备识别出这五个人物所在位置之后,确定出人物丙位于中间位置,故而将人物丙确定为特定对象。
第四种,接收用户的语音指令,通过语音指令确定出特定对象。例如;电子设备的用户说:“为中间的人设置聚光灯效果”,则确定特定对象为中间的人(例如:人物丙),又例如,电子设备的用户说“为最高的人设置聚光灯效果”,则确定特定对象为视频帧中最高的人等等。
第五种,对视频帧中采集的人物进行手势识别;将使用预设手势的人确定为特定对象,该预设手势例如为:举手、摆手等等。
在具体实施过程中,可以先检测到进入聚光灯模式的操作,再确定出特定对象,也可以先确定出特定对象,则检测到进入聚光灯模式的操作,从而对特定对象产生聚光灯效果,对此本发明实施例不再详细列举,并且不做限制。
S320:在视频采集模式下,进入主体跟踪模式。
在具体实施过程中,在该主体跟踪模式下输出的视频帧的画面随着目标对象的移动而移动,从而使目标对象位于视频帧的画面的中心位置、或者目标对象位于视频帧的黄金分割点位置、或者用户指定的位置等等。该目标对象可以为人、动物、其他移动物体(例如:风筝、汽车、扫地机器人等等)等等。
电子设备在进入视频采集状态之后,可以响应预设操作进入主体跟踪模式,该预设操作例如为:点击显示单元显示的预设按钮的操作、选择视频帧中的特定人的操作、视频帧中的人产生预设手势的操作等等。
在具体实施过程中,电子设备在进入视频采集状态之后,也可以默认进入主体跟踪模式。
S330:在主体跟踪模式下,输出第二视频帧,第一视频帧和第二视频帧中都包含目标对象,第二视频帧中目标对象的显示比例与第一视频帧中目标对象的显示比例不同,且目标对象在第二视频帧中的相对位置与目标对象在第一视频帧中的相对位置不同。
在一种可选的实施例中,第二视频帧中目标对象的显示比例大于第一视频帧中目标对象的显示比例,例如:第二视频帧中目标对象的宽度占画面总宽度的50%以上、第一视频帧中目标对象的宽度占画面总宽度的20%;第二视频帧中目标对象的高度占画面总高度的50%以上、第一视频帧中目标对象的高度占画面总高度30%等等,当然,以上宽度、高度的比例仅仅作为举例,并不作为限制。
目标对象在第二视频帧中的相对位置与目标对象在第一视频帧中的相对位置不同,例如为:第二视频帧所对应的第二比值与第一视频帧所对应的第一比值不同,第二比值为:第二视频帧中目标对象的左边框50a与第二视频帧的左边框之间的距离比上第二视频帧的宽度,第一比值为:第一视频帧中目标对象的左边框50a与第一视频帧的左边框50a之间的距离比上第一视频帧的宽度。又例如,第二比值为:第二视频帧中目标对象的右边框50b与第二视频帧的右边框之间的距离比上第二视频帧的宽度,第一比值为:第一视频帧中目标对象的右边框50b与第一视频帧的右边框之间的距离比上第一视频帧的宽度等等。作为一种可选的实施方式,如果步骤S300中进入了聚光灯模式,则在进入S330中可以保持聚光灯模式,依然对S300中确定出的特定对象产生聚光灯效果;作为另一种可选的实施例,在进入主体跟踪模式之后,还可以保持聚光灯模式,但是调整具备聚光灯效果的特定对象,例如:将特定对象调整为主体跟踪模式下的目标对象;作为另一种可选的实施例,聚光灯模式与主体跟踪模式为多个并列的模式,在检测到进入主体跟踪模式之后,则退出聚光灯模式。
作为一种可选的实施例,如果在S330之前并未进入聚光灯模式,则在S330之后依然可以进入聚光灯模式而不会退出主体跟踪模式,对此,本发明实施例不 再详细列举,并且不做限制。
在一种可选的实施例中,为了保证由第一视频帧与第二视频帧之间切换时,不会出现画面突变,在第一视频帧与第二视频帧之间可以进行平滑处理,例如:在第一视频帧与第二视频帧之间还存在多帧过渡的视频帧,该过渡的视频帧例如为10帧、20帧等等。在具体实施过程中,该目标对象可以为视频帧中的人、动物、其他移动物体(例如:无人机、玩具汽车、气球等等)等等。以图5A所示为例,第一视频帧中包含的目标对象例如为:人物41、人物42,所输出的第二视频帧例如如图5B所示。从图5A和图5B可知,在进入主体跟踪模式之后,放大了目标对象(人物41、人物42)在视频帧中的显示区域,且目标对象在视频帧中的相对位置发生变化,在第一视频帧中,目标对象位于画面的左边部分,而在第二视频帧中,目标对象位于画面的中间部分。
在具体实施过程中,视频帧中的目标对象可以由电子设备自动确定的目标对象,也可以由基于用户操作而确定的目标对象,由于前面已做介绍,故而在此不再赘述。
在具体实施过程中,可以通过对原始视频帧进行裁剪,获得输出的第二视频帧,以目标对象为人为例,可以先通过人体检测模型确定出人体的坐标框50,然后通过该坐标框50确定出对视频帧进行裁剪的裁剪框81,该坐标框50可以由坐标框50中每个点的坐标表示、也可以由左上角的坐标加右下角的坐标表示、也可以由左下角加右上角的坐标表示等等。如图5C所示,为目标对象为一个时,所确定出的坐标框50的示意图;如图5D所示,为目标对象为两个人(多个人类似)时,所确定出的坐标框50的示意图,可以先基于人体检测技术确定出原始视频帧中的每个人的坐标框50,然后对每个人的坐标框50进行合并,确定出目标对象的坐标框50,图5D中,通过左上角的坐标(Xmin,Ymin)与右下角的坐标(Xmax,Ymax)表征该坐标框50。Xmin表示X轴的最小值,Ymin表示Y轴的最小值,Xmax表示X轴的最大值,Ymax表示Y轴的最大值,视频帧的左上角为原点。
在确定出目标对象的坐标框50时,可以仅仅考虑目标对象本身,而不考虑其附着的其他东西,例如:撑的雨伞、戴的帽子、骑的自行车等等;而为了考虑画面的完整性,则在确定目标对象时,还可以考虑目标对象附着的其他东西。
在具体实施过程中,可以通过多种方式确定出裁剪框81,下面列举其中的几种进行介绍,当然,在具体实施过程中,不限于以下几种情况。
第一种,请参考图6,可以通过以下方式确定出对原始视频帧进行裁剪的裁剪框81:
S600:通过坐标框50确定出目标对象的宽度δW;
示例来说,请参考图5C和图5D,可以通过Xmax减去Xmin来获得目标对象的宽度δW。
S610:获得原始图像帧的宽度Width;
S620:基于目标对象的宽度δW和原始图像帧的宽度Width确定出对原始 图像帧进行裁剪的裁剪宽度;
在具体实施过程中,可以通过目标对象的宽度δW和图像的宽度Width的比值来确定出裁剪宽度,例如:请参考图7,在δW/Width小于等于第一预设比例时,裁剪宽度为第一预设比例乘以原始图像帧的宽度,在δW/Width大于第一预设比例且小于等于第二预设比例时,裁剪宽度为目标对象的宽度δW;在δW/Width大于第二预设比例时时,裁剪宽度为Width。第一预设比例例如为0.3、0.5等等、第二预设比例例如为0.6、0.8等等。当然,第一预设比例还可以为其他值,本发明实施例不再详细列举,并且不做限制,
S630:通过裁剪宽度确定出左裁剪边81a和右裁剪边81b。
示例来说,假设第一预设比例为0.5,第二预设比例为0.8,则当δW小于等于0.5时,裁剪宽度为原始图像帧的0.5倍,在δW大于0.5且小于等于0.8时,裁剪宽度为δW,在δW大于0.8时,裁剪宽度为原始图像帧的宽度。
示例来说,假设原始图像帧如图8A所示,图8A中,80表示原始视频帧的外框,50表示目标对象的坐标框,坐标框50包括左边框50a、右边框50b、上边框50c、下边框50d,δW/Width小于0.5,故而确定出裁剪宽度为0.5倍原始视频帧的宽度Width。可以通过坐标框50确定出目标对象的中心点82,该中心点82的坐标可以通过以下公式计算获得:
X中心点=(Xmax+Xmin)/2
Y中心点=(Ymax+Ymin)/2
其中,X中心点指的中心点在X轴方向的坐标,Y中心点指的是中心点在Y轴方向的坐标,可以仅仅确定出X中心点,也可以既确定出X中心点也确定出Y中心点。
在确定出中心点之后,将该中心点向左延伸第一预设宽度W1,作垂直于X轴方向的直线确定出左裁剪边81a,将该中心点向右延伸第二预设宽度W2,作垂直于X轴方向的直线确定右裁剪边81b,第一预设宽度W1与第二预设宽度W2之和为裁剪宽度,例如:第一预设宽度W1与第二预设宽度W2之和为0.5乘以Width。第一预设宽度W1与第二预设宽度W2可以相等,则第一预设宽度W1与第二预设宽度W2均为裁剪宽度的1/2,例如:1/4*Width;第一预设宽度W1与第二预设宽度W2也可以不等,本发明实施例不做限制。
其中,请参考图8B,假设图8B中δW/Width=0.6,则此时的裁剪宽度等于δW,在这种情况下,可以将坐标框50的左边框50a作为左裁剪边81a,将坐标框50的右边框50b作为右裁剪边81b。
请参考图8C,假设图8C中δW/Width=0.85,则此时的裁剪宽度Width,则这种情况下,原始视频帧的左边框为左裁剪边81a,原始视频帧的右边框作为右裁剪边81b。
S640:基于坐标框50中目标对象的纵向坐标确定出上裁剪边81c和下裁剪边81d。
示例来说,可以将上边框50c向上移动第一预设高度H1作为上裁剪边81c (在具体实施过程中,也可以将上边框50c直接作为上裁剪边81c);以上裁剪边81c向下延伸第二预设高度H2作为下裁剪边81d,第一预设高度H1例如为:0.05倍、0.01倍(当然也可以为其他值)的原始图像帧的高度,第二预设高度H2例如为0.5倍、0.6倍(当然也可以为其他值)的原始图像帧的高度。(见图8A-8C所示)。
在具体实施过程中,也可以以下边框50d直接作为下裁剪边81d。还可以将下边框50d向下延伸一定距离作为下裁剪边81d。
在具体实施过程中,也可以通过中心点82向上延伸预设高度确定出上裁剪边81c、通过中心点82向下延伸预设高度确定出下裁剪边81d(其确定方式与左裁剪边81a、右裁剪边81b的确定方式类似,在此不再赘述。)
在具体实施过程中,也可以基于裁剪宽度占原始视频帧的比例确定出裁剪高度,基于裁剪高度确定出上裁剪边81c和下裁剪边81d,其确定方式与基于裁剪宽度确定出左裁剪边81a和右裁剪边81b的方式类似,在此不再赘述。基于该方案能够保证对原始视频帧进行等比例裁剪,从而在原始视频帧符合视频帧的显示区域的宽高比的情况下,不需要再对裁剪框81进行调整,使其符合现实区域的宽高比。
S650:基于上裁剪边81c、下裁剪边81d、左裁剪边81a、右裁剪边81b确定出裁剪框81。从图8A-图8C可知,基于目标对象的宽度δW不同,最终所确定出的裁剪框81的大小也不同,从而输出的视频帧的画面占原始视频帧的画面的大小也可能不同,而在进入主体跟踪模式之前,第一视频帧占原始视频帧的比例通常是固定的,例如:第一视频帧为原始图像帧的100%、90%等等,故而基于目标对象的宽度δW不同,第二视频帧的画面占第一视频帧的画面的比例也不同。
其中,如果目标对象为一个人,则基于目标对象距离摄像头(电子设备)的距离不同,可能导致δW不同,从而导致第二视频帧的画面在第一视频帧所占的比例也不同;如果目标对象为多个人,则基于目标对象距离摄像头的距离不同、两个人之间的距离的不同,可能会导致δW不同,从而导致第二视频帧的画面在第一视频帧中所占的比例也不同。
可选的,为了保证各图像帧之间平滑过渡,可以选择当前帧的前预设帧、后预设帧共同确定出目标对象的中心点82。例如:确定当前帧、前预设帧(例如:10帧、15帧等等)、后预设帧(例如:15帧、20帧等等)中每一帧的中心点,然后对所有帧的中心点取平均值,获得当前帧的中心点82。
第二种,请参考图9,在确定出目标对象的坐标框50之后,将左边框50a向左移动第三预设宽度W3,获得左裁剪边81a(在具体实施过程中,也可以直接将左边框50a作为左裁剪边81a);将右边框50b向右移动第四预设宽度W4获得右裁剪边81b(在具体实施过程中,也可以直接以右边框50b作为右裁剪边81b);将上边框50c向上移动第三预设高度H3作为上裁剪边81c(在具体实施过程中,也可以将上边框50c作为上裁剪边81c);以下边框50d作为下裁剪边 81d(在具体实施过程中,也可以将下边框50d下移动第四预设高度H4,获得下裁剪边81d),通过上裁剪边81c、下裁剪边81d、左裁剪边81a、右裁剪边81b确定出裁剪原始视频帧的裁剪框81。
前述第三预设宽度W3、第四预设宽度W4、第三预设高度H3、第四预设高度H4可以全部相同、部分相同、全部不同,其例如为:100像素、200像素、300像素等等;其例如为原始视频帧的宽度的0.1倍、0.2倍等等;其例如为原始视频帧的高度的0.05倍、0.15倍等等。
通过上述方案,能够保证在视频采集过程中,以目标对象为中心对目标对象进行跟踪。
第三种,可以通过以下方式确定出下裁剪边81d:确定距离下边框50d最近的一个预设关节点,确定预设关节点对应的裁剪位置作为下裁剪边81d。该预设关节点例如为:踝关节、膝关节、髋关节等等,其中可以通过关键点检测算法确定出人物的关节点,该关键点识别技术例如为:Pictorial Structure算法、自上而下的关键点检测算法、自下而上的人体关键点检测算法等等。确定的关节点如图10所示。
在具体实施过程中,该裁剪位置通常为预设关节点向上移动预设距离,预设距离例如为:①固定值,例如:30像素、40像素等等;②人体总高度的特定比例,例如:人体总高度的1/20、1/30等等;③视频帧的总高度的特定比例,例如:1/40、1/50等等。④当前关节点与上一特定关节点的距离的特定比例,例如:1/3、1/4等等,如图10所示,如果当前关节点为踝关节,则上一特定关节点例如为:膝关节;如果当前关节点为膝关节,则上一特定关节点例如为:髋关节;如果当前关节点为髋关节,则上一特定关节点例如为肘关节等等。
其他裁剪边的确定方式可以参考第一种、第二种方式,在此不再赘述。
第四种,可以通过以下方式确定出下裁剪边81d:基于用户针对视频(和/或图像)的历史操作记录确定出裁剪框81。例如:根据用户针对视频(和/或图像)的历史采集记录或历史裁剪记录,确定出用户最喜欢采用的裁剪框,例如:该裁剪框中目标对象距离视频帧的各个边框的距离。
可选的,在确定出裁剪框81之后,还可以对裁剪框81进行调整,在具体实施过程中,可以采用多种调整方式,下面列举其中的几种进行介绍,当然,在具体实施过程中,不限于以下几种情况。
第一种,确定距离下裁剪边81d最近的一个预设关节点,确定该预设关节点对应的裁剪位置;将下裁剪边81d移动至该裁剪位置。该预设关节点例如为:踝关节、膝关节、髋关节等等,其中可以通过关键点检测算法确定出人物的关节点,该关键点识别技术例如为:Pictorial Structure算法、自上而下的关键点检测算法、自下而上的人体关键点检测算法等等。确定的关节点如图9所示。(主要针对上述第一种、第二种方式确定出裁剪框81)
在一种实现方式中,该裁剪位置通常为预设关节点向上移动预设距离,预 设距离例如为:①固定值,例如:30像素、40像素等等;②人体总高度的特定比例,例如:人体总高度的1/20、1/30等等;③视频帧的总高度的特定比例,例如:1/40、1/50等等。④当前关节点与上一特定关节点的距离的特定比例,例如:1/3、1/4等等,如图11所示,如果当前关节点为踝关节,则上一特定关节点例如为:膝关节;如果当前关节点为膝关节,则上一特定关节点例如为:髋关节;如果当前关节点为髋关节,则上一特定关节点例如为肘关节等等。
第二种,基于用户针对视频(和/或图像)的历史操作数据确定出图像截止位置,基于图像截止位置对下裁剪边81d进行调整。该历史操作数据可以包含历史视频(图像)采集数据、历史视频(图像)操作数据等等。
示例来说,可以预先基于人体关节点将图像划分为多个人体截止区域,确定用户历史采集的视频帧中各个图像截止区域出现的时长,基于各个图像截止区域出现的时长确定出用户最喜欢的图像截止区域(出现时长最长的图像截止区域),基于该图像截止区域对下裁剪边81d进行调整。
在一种实施例中,还可以将采集的视频拆分为多帧图像,然后结合电子设备中采集的其他图像,确定每张图像的图像截止区域,确定出出现次数最多的图像截止区域,基于该图像截止区域对下裁剪边81d进行调整。
在一种实施例中,还可以将用户裁剪过的视频拆分为多帧图像,然后结合电子设备中用户裁剪的其他图像,确定出每张图像的图像截止区域,确定出出现次数最多的图像截止区域,基于该图像截止区域对下裁剪边81d进行调整。
在基于图像截止区域对下裁剪边81d进行调整时,可以采用多种方式,下面列举其中的两种进行介绍,当然,在具体实施过程中,不限于以下两种情况。
(1)可以预先设定图像截止区域与下裁剪边81d的对应关系,例如如表1所示:
Figure PCTCN2020137550-appb-000001
表1
在确定出图像截止区域之后,通过该对应关系确定出对应的下裁剪边81d,基于确定出的下裁剪边81d对已确定出的裁剪框81的下裁剪边81d进行调整。
(2)在确定出图像截止区域之后,可以判断下裁剪边81d是否位于该图像 截止区域内,如果位于,则不需要对下截止框进行调整,如果不位于,则可以将下截止框调整到该图像截止区域内。
在具体实施过程中,在基于图像截止区域对下裁剪边81d进行调整之前,可以先确定视频帧中的目标对象的数量,在目标对象的数量不大于预设阈值(例如1、2时),再通过图像截止区域对下裁剪边81d进行调整;如果目标对象的数量大于预设阈值,则不需要通过图像截止区域对下裁剪边81d进行调整。基于该方案,能够防止目标数量过多,采用该方案确定下裁剪边81d导致的部分目标对象被裁剪掉过多内容。
可选的,在基于图像截止区域对下裁剪边81d进行调整之前,还可以先判断当前视频帧相对于上一帧的运动量;在运动量小于预设运动量,通过图像截止区域对下裁剪边81d进行调整;如果运动量不小于预设运动量,则不通过图像截止区域点对下裁剪边81d进行裁剪。该预设运动量例如为:横坐标运动量小于视频帧宽度的预设比例(例如:0.02、0.025等等)、纵坐标运动量小于视频帧高度的预设比例(例如:0.025、0.03等等)。基于该方案,能够防止视频帧中的目标对象出现大幅度运动,基于图像截止区域对下裁剪边81d进行调整可能导致的视频过渡不平滑。
第三种,在确定出裁剪框81之后,还可以裁剪框81的宽高比是否符合预设比值(例如:16:9、4:3等等),在不符合预设比值的情况下,还可以调整该裁剪框81,从而使其符合预设比值,例如:如果宽高比小于该预设比值,则可以通过增加宽度使宽高比符合该预设比值,如果宽高比大于该预设比值,则可以通过增加高度使宽高比符合该预设比值,当然,还可以通过其他方式调整该裁剪框81,使宽高比符合预设阈值,本发明实施例不再详细列举,并且不做限制。其中,如果该方案应用于视频通信过程,则电子设备可以获得对端电子设备的显示屏(或视频显示区域)的宽高比,基于该宽高比确定出预设比值。
第四种,在视频帧中包含多个目标对象时,如果下裁剪边81d为从上裁剪边81c往下延伸第二预设高度H2而确定,则该方法还包括:确定出每个目标对象的中心线,然后判断下裁剪边81d是否位于所有目标对象的中心线的下方,如果不位于,则将下裁剪边81d往下移动,直到下裁剪边81d位于所有的目标对象的中心线的下方,该中心线例如为:以每个目标对象的纵轴坐标的中点为基准所作的平行于X轴的线、以每个目标对象的髋关节为基准所作的平行于X轴的线等等。请继续参考图5D,该图中包含两个目标对象,分别为目标对象60、目标对象61,目标对象60的中心线为60a,目标对象的中心线为61a,则确定出的下裁剪边81d应当位于目标对象61的中心线61a的下方。
在具体实施过程中,在基于各目标对象的中心线对下裁剪边调整时,该方法还包括:确定第一目标对象的第一预设关键点与所述第二目标对象的所述第一预设关键点之间的相对距离;判断所述相对距离是否大于预设阈值;如果所述相对距离大于所述预设阈值,确定所述第一中心线和所述第二中心线。
示例来说,第一预设关键点例如为:目标对象的头部、颈部等等;则第一目 标对象的第一预设关键点为第一目标对象的头部,第二目标对象的第一预设关键点为第二目标对象的颈部。其中,该预设阈值例如为:①固定值,例如:30像素、40像素、50像素等等;②第一目标对象或第二目标对象的总高度的预设比例,例如:1/4、1/5等等;③预裁剪框的预设比例,比如;1/4、1/6等等。
第五种,判断第一预裁剪框的中心线是否位于第一目标对象中心线的下方,如果所述第一预裁剪框的中心线不位于第一目标对象中心线下方,则将所述第一预裁剪框的上裁剪边向上移动第二预设距离且将所述第一预裁剪框的下裁剪边向下移动第三预设距离,获得所述第一裁剪框。最终输出的视频帧为第一裁剪框中的内容。
在具体实施过程中,第二预设距离与第三预设距离可以相同或者不同,其例如为:①固定值,例如:30像素、40像素等等;②人体总高度的特定比例,例如:人体总高度的1/20、1/30等等;③视频帧的总高度的特定比例,例如:1/40、1/50等等。④第一预裁剪框的特定比例,例如:1/3、1/4等等。在基于上述主体跟踪模式,对目标对象进行跟踪过程中,还可以重新选择或切换被跟踪的目标对象,下面列举其中的几种切换方式进行介绍。
(1)电子设备的触控显示单元接收到针对第一对象的点击操作,则控制第一对象作为目标对象,取消其他目标对象。以原始视频帧为图8A所示为例,之前的目标对象为人物41、人物42,在检测到针对人物41的点击操作之后,将人物41依然作为目标对象,将人物42取消作为目标对象。在这种情况下,电子设备输出第三视频帧,在第二视频帧中的目标对象未发生位移的情况下,第三视频帧中第一对象的位置与第二视频帧中第二对象的位置不同。如图12所示,第二视频帧与第三视频帧中,目标对象41并未发生位移,但是其在视频帧的画面中的相对位置发生变化。
电子设备检测到在触控显示单元上点击人物41所在区域的操作时,则以人物41作为目标对象对原始视频帧进行裁剪,在这种情况下,即使原始视频帧中包含人物42,在裁剪时也不考虑42,在这种情况下,则输出第三视频帧,第三视频帧中第一对象的位置与第二视频帧中第二对象的位置不同,虽然第三视频帧与第二视频帧的原始视频帧的内容相同。
(2)在视频帧中第一对象产生跟踪操作;则判断第一对象是否为基于用户操作所确定的目标对象;如果第一对象为基于用户操作所确定的目标对象,则恢复默认主体跟踪模式;如果第一对象并非基于用户操作所确定的目标对象,则将第一对象作为目标对象,且取消其他目标对象。恢复默认主体跟踪模式例如为:恢复以电子设备自动确定的目标主体进行跟踪的主体跟踪模式。
示例来说,用户操作例如为:电子设备的用户的点击操作、语音指令等等,又或者,视频帧中的用户的预设手势、语音指令等等。基于该用户操作,能够将第一对象确定为目标对象,且取消其他目标对象。(如以上第(1)种情况所示)
接着前述第(1)种情况,目前人物41基于用户的点击操作确定为目标对象,而人物42被取消目标对象,在这种情况下,如果再次检测到人物41产生的跟踪 操作(例如:举手、比心)等等,则恢复默认主体跟踪模式,将人物41、人物42都确定为目标对象。
另外,在检测到第一对象产生跟踪操作之后,如果再次检测到第一对象产生跟踪操作的间隔时长大于预设时长(例如1秒、2秒等等),则认定该跟踪操作为有效操作,电子设备可以响应该跟踪操作,否则,认定该跟踪操作为无效操作,电子设备不响应给跟踪操作。
例如:在初始阶段,人物41并非由用户操作确定的目标对象,电子设备检测到视频帧中人物41的跟踪操作(例如:举手、比心等等),则将人物41确定为目标对象,此时及随后的原始视频帧以人物41为参照(例如:中心、黄金分割点等)进行裁剪,从而输出的视频帧为以人物41为参照输出的视频帧;在再次检测到人物41的跟踪操作之后,发现本次操作距离上次操作的时间只间隔了0.3秒,则认定本次操作为无效操作,依然以人物41为中心对原始视频帧进行裁剪;随后再次检测到人物41的跟踪操作,发现本次操作距离上次操作的时间间隔了7秒,则认定本次操作为有效操作,将人物41、人物42共同作为目标对象,恢复默认的主体跟踪模式。
(3)在第一对象作为用户操作确定的目标对象时,如果在电子设备的触控显示单元上检测到针对第一对象的点击操作,则保持第一对象作为目标对象(此时其他目标对象依然被取消)。
(4)在第一对象作为用户操作确定的目标对象时,如果检测到原始视频帧中的第一对象产生跟踪操作,则取消第一对象作为目标对象。
可选的,如果触发第一对象作为目标对象的操作为跟踪操作,在再次检测到跟踪操作时,先判断该跟踪操作是否为有效操作,如果为有效操作,则恢复默认的主体跟踪模式;否则,保持第一对象作为目标对象。
以人物42为例,电子设备检测到在触控显示单元上点击人物42的操作,则控制人物42作为目标对象;随后,电子设备检测到原始视频帧中人物42产生跟踪操作,在这种情况下,恢复默认主体跟踪模式,将人物41和人物42作为目标对象。而如果初始阶段,电子设备检测到原始视频帧中人物42的跟踪操作,则控制人物42作为目标对象,且取消人物41作为目标对象;随后,电子设备再次检测到原始视频帧中人物42的跟踪操作,则先判断该跟踪操作是否为有效操作,在该跟踪操作为有效操作的情况下,恢复默认的主体跟踪模式;如果该跟踪操作并非有效操作,则保持人物42作为目标对象。
(5)在第一对象作为基于用户操作确定的目标对象时,如果检测到控制第二对象作为目标对象的操作,则将目标对象由第一对象切换至第二对象。控制第二对象作为目标对象的操作,例如为:在电子设备的触控显示单元上点击第二对象所在区域的操作、视频帧中第二对象产生的跟踪操作等等。
示例来说,在人物42(第一对象)作为目标对象(人物41并非目标对象)时,电子设备检测到在触控显示单元上点击人物41(第二对象)所在区域的操作,则将人物41作为目标对象,且取消人物2作为目标对象。后续裁剪原始视 频帧时,以人物41为基准(例如:中心、黄金分割点等等)进行裁剪,而不需要考虑人物42所在位置,从而输出的视频帧是以第二对象(人物41)为中心或黄金分割点的视频帧。又例如,在人物42作为用户操作所确定的目标对象时,电子设备检测到原始视频帧中人物41存在跟踪操作(例如:举手动作),则取消人物42作为目标对象,而将人物41作为目标对象。
(6)检测到针对触控显示单元的第一区域的操作之后,如果确定出该第一区域为空白区域,则恢复默认主体跟踪模式。其中,空白区域指的是没有目标对象的区域;或者,空白区域指的是没有目标对象也没有其他运动物体的区域。
上述(1)-(6)种切换过程中,针对的是重新选择或切换单个目标对象的场景,针对多个目标对象也可以进行选择或切换,下面列举其中的几种进行介绍,当然,在具体实施过程中,不限于以下几种情况。
(1)接收在触控显示单元上的区域选择操作;响应该区域选择操作,确定出被选择的区域;将位于被选择区域内的对象作为目标对象。该区域选择操作例如为:绘制封闭区域(例如:圈、框等等),将位于该封闭区域内的对象作为目标对象;该区域选择操作又例如为,划线操作,将该划线操作的划线路径经过的对象作为目标对象等等。
(2)检测原始视频帧中各个人产生的跟踪操作,如果多个人中任意相邻两个人产生跟踪操作的时间间隔小于预设时间间隔(例如:2秒、3秒)等等,则将这多个都确定为目标对象。
例如:原始视频帧中包含五个人,分别为人物甲、人物乙、人物丙、人物丁、人物戊,则先检测到原始视频帧人物甲的举手操作,则将人物甲作为目标对象;1秒之后,检测到原始视频帧中人物丙的举手操作,则将人物丙与人物甲共同作为目标对象;又1秒之后,检测到原始视频帧中人物丁的举手操作,则将人物甲、人物丙与人物丁共同作为目标对象。
(3)响应用户的语音指令,基于语音指令将多个人作为目标对象。
示例来说,原始视频帧中包含五个人,从左到右分别为人物甲、人物乙、人物丙、人物丁、人物戊,电子设备的用户产生如下语音指令“对从左到右第一个、第三个、第四人进行跟踪”,电子设备响应该语音指令,先识别出原始视频帧中包含的五个人,然后确定出从左到右第一个人为人物甲、第三个人为人物丙、第四个人为人物丁,从而将人物甲、人物丙、人物丁设置为目标对象。
请继续参考图2,该方法还进一步的包括以下步骤:
S340:继续采集获得原始视频帧,判断是否存在第一目标对象存在于上一原始视频帧且不存在于当前原始视频帧。
示例来说,针对每一帧都可以通过人体识别技术识别出其中所包含的人(目标对象),然后通过比对当前原始视频帧和上一原始视频帧中包含的人,来确定在上一原始视频帧采集到、但是当前原始视频帧未采集到的人作为第一目标对象。
例如:假设上一原始图像帧如图13A所示,图13A中包含两个目标对象,分别是目标对象41和目标对象42,其中目标对象42位于原始图像帧的边缘。当 前的原始图像帧如图13B所示,人物42已不在原始图像帧,因此可以确定出人物42为第一目标对象。
S350:如果存在该第一目标对象,在第一预设时长内,持续检测第一目标对象是否再次出现在原始视频帧中;
该第一预设时长可以通过时间表示,例如:2秒、3秒等等,该第一预设时长也可以通过视频帧的数量表示,例如:20帧、30帧、40帧等等。在检测出存在前述第一目标对象之后,电子设备就可以启动计时。
S360:如果第一目标对象在原始视频帧中再次出现,以第一目标对象和其他目标对象为中心对原始视频帧进行裁剪,输出以第一目标对象和其他目标对象为基准的视频帧。
请参考图13C,假设在1S之后,又在视频帧中检测到第一目标对象42,则继续以第一目标对象42和其他目标对象(例如:目标对象41)为中心进行裁剪,所输出的视频帧如图13D所示。
S370:如果第一目标对象在原始视频帧中未再次出现,在第一预设时长内,以第一目标对象的原始位置和其他目标对象为基准裁剪原始视频帧,从而输出基于第一目标对象的原始位置和其他目标对象的位置所确定的视频帧;
示例来说,请参考图13B,第一目标对象的原始位置例如为第一目标对象42最后一次在原始视频帧中出现时的位置42a。在这种情况下则以人物41和第一目标对象42的原始位置42a为参考对原始视频帧进行裁剪,所输出的视频帧例如如图13E所示。
S380:在第一预设时长之后,在对原始视频帧进行裁剪时不再考虑第一目标对象的原始位置,而考虑原始视频帧中剩余的目标对象,所输出的视频帧为基于原始视频帧中剩余的目标对象所确定的视频帧,所输出的视频帧例如如图13F所示。
基于上述方案,能够在视频采集过程中,基于目标对象获得输出的视频帧时,在目标对象短暂退出摄像头的视野后又出现在摄像头的视野的情况,能够保证视频帧画面的流畅性。
第二方面,本发明实施例提供一种视频采集方法,请参考图14,包括以下步骤:
S1400:采集获得视频帧;该采集与步骤S300类似,在此不再赘述。
S1410:在电子设备的显示单元上,输出第一视频帧;该步骤与步骤S310类似,在此不再赘述。
S1420:在视频采集模式下,进入主体跟踪模式。该步骤与S320类似,在此不再赘述。
S1430:在主体跟踪模式下,输出第二视频帧,第一视频帧和第二视频帧中都包含目标对象,第二视频帧中目标对象的显示比例与第一视频帧中目标对象的显示比例不同,且目标对象在第二视频帧中的相对位置与目标对象在第一视频帧 中的相对位置不同。该步骤与S330类似,在此不再赘述。
S1440:检测到画面切换操作,输出第三视频帧,其中,在目标对象未发生位移的情况下,第三视频帧中显示的画面相对于第二视频帧中显示的画面发生移动。
该画面切换操作可以为多种不同的操作,从而画面移动的方式也不同,下面列举其中四种进行介绍,当然在具体实施过程中,不限于以下四种情况。
第一种,请参考步骤S1440A,响应于第一画面切换操作,将视频帧的画面向左切换,其中可以通过调整裁剪框81的方式实现画面切换。下面列举其中的两种调整方式,当然,在具体实施过程中,不限于以下两种情况。
①将视频帧的左边框50a作为左裁剪边81a,将Xmax的预设比例(例如:0.1、0.3、0.5等等)作右裁剪点,以该右裁剪点做垂直于X轴的线获得右裁剪边81b,将第二视频帧的上裁剪边81c、下裁剪作为当前视频帧的上裁剪边81c、下裁剪边81d(或者上下波动第一预设距离,例如:20像素、50像素等等);又或者,以上一原始视频帧的目标对象的中心点向上延伸1/4个原始视频帧的高度作平行线作为上裁剪边81c,以上一原始视频帧的中心点向下延伸1/4个视频帧的高度作平行线作为下裁剪边81d。假设原始视频帧如图15A所示,基于该方案使画面向左切换之后最终输出的视频帧例如如图15B所示。从而最终输出的视频帧为原始视频帧中左侧的部分。
②将裁剪框81整体左移第二预设距离,例如:原始图像帧的宽度的20%、30%等等,可选的,还可以将裁剪框81上下浮动第一预设距离。从而输出的视频帧为第二视频帧向左移动第二预设距离确定的画面。
第一预设操作例如为:视频帧中的人物(可以为任何人,也可以仅仅为目标对象)的手臂向左指的操作、在触控显示单元上将视频帧向右拖动、产生语音指令等等。可以通过关键点识别技术识别出视频帧中的人物的关节点,然后确定出手的肘关节和腕关节的坐标判断是否存在手臂向左指的操作,例如:如果人面对摄像头,肘关节和腕关节的纵坐标值差别不大、肘关节的横坐标大于腕关节的横坐标,则可以认定存在手臂向左指的操作的操作。
第二种,请参考步骤S1440B,响应于第二画面切换操作,将视频帧的画面向左切换。
其中可以通过调整裁剪框81的方式实现画面切换。下面列举其中的两种调整方式,当然,在具体实施过程中,不限于以下两种情况。
①将视频帧的右边框50b作为右裁剪边81b,将Xmax减去Xmax的预设比例(例如:0.1、0.3、0.5等等)作为左裁剪点,以该左裁剪点做垂直于X轴的线获得左裁剪边81a,将第二视频帧的上裁剪边81c、下裁剪作为当前视频帧的上裁剪边81c、下裁剪边81d(或者上下波动第一预设距离,例如:20像素、50像素等等);又或者,以上一原始视频帧的目标对象的中心点向上延伸1/4个原始视频帧的高度作平行线作为上裁剪边81c,以上一原始视频帧的中心点向下延伸1/4个视频帧的高度作平行线作为下裁剪边81d。从而最终输出的视频帧为原 始图像帧的右边的部分。
假设原始视频帧如图15A所示,基于该方案使画面向右切换之后最终输出的视频帧例如如图15C所示。
②将裁剪框81整体右移第三预设距离,例如:原始图像帧的宽度的20%、30%等等,可选的,还可以将裁剪框81上下浮动第一预设距离。从而最终输出的视频帧为第二视频帧的画面向右移动第三预设距离的画面。
第三种,请参考步骤S1440C,响应于第三画面切换操作,将视频帧的画面向上切换。第三画面切换操作例如为在显示单元从上往下拖动的操作、手臂从上往下摆动的操作、语音指令等等。画面向上切换的实现方式与画面向左、向右切换的实现方式类似,在此不再赘述。
第四种,请参考步骤S1440D,响应于第四画面切换操作,将视频帧的画面向下切换。第四画面切换操作例如为在显示单元从下往上拖动的操作、手臂从下上摆动的操作、语音指令等等。画面向下切换的实现方式与画面向左、向右切换的实现方式类似,在此不再赘述。
S1450:在预设时间后,恢复主体跟踪模式,输出第四视频帧,在目标对象的位移小于第一预设位移的情况下,第四视频帧中目标对象的相对位置与第二视频帧中目标对象的相对位置的偏移量小于第二预设位移。也即是说,如果在输出第二视频帧和第四视频帧时,目标对象并未移动或者移动量较小,则第二视频帧和第四视频帧的差别也较小。
预设时间例如为:2秒、3秒等等,本发明实施例不做限制。如何基于目标对象对视频帧进行裁剪,前面已做介绍,故而在此不再赘述。
在具体实施过程中,以上视频采集方法还可以包括以下步骤响应焦点集中操作,将焦点集中到该人物自身,例如:放大该人物在视频帧中的比例、对背景区域进行虚化处理、为该人物增加特效等等。第五预设操作例如为:双击视频帧中该人物所在区域的操作、产生特定手势的操作等等。
在基于S1430输出第二视频帧的之后,该方法还包括:响应用户操作,退出主体跟踪模式,输出第四视频帧,第四视频帧中目标对象的显示比例与第二视频帧中目标对象的显示比例不同,且目标对象在第四视频帧中的相对位置与目标对象在第二视频帧中的相对位置不同。例如:第四视频帧中目标对象的显示比例小于第二视频帧中目标对象的显示比例。在目标对象未发生位移的情况下,第四视频帧与第二视频帧类似,在此不再赘述。
在基于S1430输出第二视频帧之后,该方法还包括:响应放大操作,输出第五视频帧,第五视频帧中目标对象的显示尺寸大于第二视频帧中目标对象的显示尺寸。
示例来说,该放大操作例如为:预设手势(例如:手掌向外推、五指张开等等)、语音指令等等。如图16所示,为第二视频帧与第五视频帧的对比示意图,其中基于该放大操作可以逐渐放大目标对象的显示尺寸,从而输出多帧目标对象的显示尺寸越来越大的视频帧,如图16所示,在输出第二视频帧之后,输出第 五视频帧(一),然后输出第五视频帧(二),从而实现平滑过渡。
本发明第三实施例提供一种视频采集方法,请参考图17,包括:
S1700:采集获得视频帧,该步骤与S300类似,在此不再赘述;
S1710:确定该视频帧中的特定对象;
在具体实施过程中,该特定对象可以为单个特定对象、也可以为至少两个特定对象,该特定对象可以为人、动物、其他移动物体等等。该目标物体可以通过多种方式确定,下面列举其中的几种进行介绍,当然,在具体实施过程中,不限于以下几种情况。
第一种,通过针对视频帧中的该特定对象的选择操作,从而确定该特定对象。
示例来说,请参考图18A,为一视频通信过程中的视频帧的示意图,该视频帧中包含5个人物,分别为人物甲18a、人物乙18b、人物丙18c、人物丁18d、人物戊18e,电子设备的用户希望将人物18b确定为特定对象,则针对视频帧中的人物乙18b产生选择操作(例如:点击、滑动等等),电子设备响应该选择操作,从而将人物乙18b确定为特定对象。
其中,用户可以选择一个特定对象,也可以选择多个特定对象,例如:用户可以通过多次选择操作选择出多个特定对象,用户也可以通过一个操作选择多个特定对象,例如:用户通过双指选择,每个手指对应一个目标,从而同时选择两个特定对象等等。
第二种,通过电子设备的麦克风对声源进行定位,确定出声源所在区域的人物为特定对象。
示例来说,该方案可以应用于多人讨论的场景、多人演唱的场景。还是以视频帧中包含人物甲18a、人物乙18b、人物丙18c、人物丁18d、人物戊18e为例,这五人在讨论问题,在第一时刻,由人物乙18b发言,则确定人物乙18b为特定对象,在第二时刻由人物丁18d发言,则确定人物丁18d为特定对象,依次类推。通过定位出特定对象,能够确定出当前发言的人。
第三种,通过人物识别技术确定出视频帧中包含的所有人物,将位于中间位置的人物确定为该特定对象。还是以视频帧中包含图18A所示的人物甲18a、人物乙18b、人物丙18c、人物丁18d、人物戊18e为例,则电子设备识别出这五个人物所在位置之后,确定出人物丙18c位于中间位置,故而将人物丙确定为特定对象。
第四种,接收用户的语音指令,通过语音指令确定出特定对象。例如;电子设备的用户说:“为中间的人设置聚光灯效果”,则确定特定对象为中间的人(例如:人物18c),又例如,电子设备的用户说“为最高的人设置聚光灯效果”,则确定特定对象为视频帧中最高的人等等。
第五种,对视频帧中采集的人物进行手势识别;将使用预设手势的人确定为特定对象,该预设手势例如为:举手、摆手等等。
S1720:控制该特定对象进入聚光灯模式。
在具体实施过程中,可以在确定出特定对象之后,直接控制该特定对象进入 聚光灯模式;也可以在确定出特定对象之后,接收预设操作,电子设备响应该预设操作,控制特定对象进入聚光灯模式,在该聚光灯模式下对特定对象进行突出显示。
上述步骤S1710、S1720可以采用多种实现方式,下面列举其中的两种进行介绍,当然,在具体实施过程中,不限于以下两种情况。
第一种,电子设备的用户产生预设操作,该预设操作用于控制视频通信进入聚光灯模式,该预设操作例如为:点击表征聚光灯模式的预设按钮的操作、产生进入聚光灯模式的语音指令的操作、产生预设手势的操作等等;电子设备检测到该预设操作之后执行步骤S1610,提示电子设备的用户选择特定对象或者自动确定特定对象;在确定出特定对象之后,电子设备自动执行步骤S1720,也即:电子设备控制特定对象进行聚光灯模式。
第二种,电子设备基于用户操作确定出特定对象(步骤S1710),其具体确定方式前面已做介绍,在此不再赘述。在确定出特定对象之后,电子设备的用户产生预设操作,电子设备响应该预设操作,控制特定对象进入聚光灯模式(步骤S1720)。
在具体实施过程中,聚光灯模式指的是对特定对象突出显示的模式,可以采用多种方式对特定对象进行突出显示,例如:请参考图18B,对特定对象设置聚光灯效果(设置高光)、控制特定对象彩色显示、将特定对象之外的其他内容黑白显示、将特定对象之外的其他内容虚化显示等等。
其他内容参考上文相关内容的描述,不再赘述。
可以理解的是,上述电子设备等为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明实施例的范围。
本申请实施例可以根据上述方法示例对上述电子设备等进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。下面以采用对应各个功能划分各个功能模块为例进行说明:
本申请实施例提供的方法中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例描述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、电 子设备或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机可以存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DVD))、或者半导体介质(例如,SSD)等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上,仅为本申请的具体实施方式,但本申请实施例的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请实施例揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以权利要求的保护范围为准。

Claims (16)

  1. 一种视频处理方法,应用于电子设备,包括:
    获取第一视频帧,
    确定所述第一视频帧中包含有第一目标对象;
    以所述第一目标对象为中心,确定第一裁剪框,所述第一裁剪框的下裁剪边与所述第一目标对象的预设关节点之间存在第一预设距离;
    获取所述第一裁剪框中的内容,作为第二视频帧进行输出。
  2. 如权利要求1所述的方法,其特征在于,所述确定第一裁剪框还包括:
    按照预设规则确定第一预裁剪框;
    判断所述第一预裁剪框的下裁剪边与所述预设关节点之间是否存在所述第一预设距离;
    如果所述第一预裁剪框的下裁剪边与所述预设关节点之间不存在所述所述第一预设距离,将所述第一预裁剪框的下裁剪边向上移动至所述预设关节点对应的裁剪位置,获得所述第一裁剪框。
  3. 如权利要求1所述的方法,其特征在于,以所述第一目标对象为中心,确定第一裁剪框,还包括:
    按照预设规则确定第一预裁剪框;
    判断所述第一预裁剪框的中心线是否位于第一目标对象中心线下方;
    如果所述第一预裁剪框的中心线不位于所述第一目标对象中心线下方,将所述第一预裁剪框的上裁剪边向上移动第二预设距离且将所述第一预裁剪框的下裁剪边向下移动第三预设距离,获得所述第一裁剪框。
  4. 如权利要求1所述的方法,其特征在于,在所述获取所述第一裁剪框中的内容,作为第二视频帧进行输出之后,所述方法还包括:
    采集获得第三视频帧;
    检测到所述第三视频帧中存在符合预设手势的内容,则确定所述预设手势对应的方向;
    基于所述预设手势的方向确定第二裁剪框,所述第二裁剪框相对于所述第一裁剪框存在朝向所述预设手势对应的方向的移动;
    获取所述第二裁剪框中的内容,作为第四视频帧输出。
  5. 如权利要求4所述的方法,其特征在于,所述基于所述预设手势的方向确定第二裁剪框,包括:
    如果所述预设手势的方向为所述第三视频帧中的人物的手臂向左指的方向,所述第二裁剪框为相对于第一裁剪框向左移动的裁剪框;或,
    如果所述预设手势的方向为所述第三视频帧中的人物的手臂向右指的方向,所述第二裁剪框为相对于所述第一裁剪框向右移动的裁剪框;或,
    如果所述预设手势的方向为所述第三视频帧中的人物的手臂从上往下摆的方向,所述第二裁剪框为相对于所述第一裁剪框向上移动的裁剪框;或,
    如果所述预设手势的方向为所述第三视频帧中的人物的手臂从下往上摆的方向,所述第二裁剪框为相对于所述第一裁剪框向下移动的裁剪框。
  6. 如权利要求1所述的方法,其特征在于,在所述获取所述第一裁剪框中的内容,作为第二视频帧进行显示之后,所述方法还包括:
    采集获得第五视频帧,所述视频帧中包含所述第一目标对象和第二目标对象在内的至少两个目标对象;
    以所述至少两个目标对象为中心,确定出第三裁剪框,所述第三裁剪框的下裁剪边位于所述第一目标对象的水平方向的中心线的下方,且位于所述第二目标对象的水平方向的中心线的下方;
    获取所述第三裁剪框中的内容,作为第六视频帧输出。
  7. 如权利要求6所述的方法,其特征在于,所述确定第三裁剪框,包括:
    按照预设规则确定出第三预裁剪框;
    确定出所述第一目标对象的水平方向的第一中心线和所述第二目标对象的水平方向的第二中心线;
    判断所述第三预裁剪框的下裁剪边是否位于所述第一中心线的下方且位于所述第二中心线的下方;
    如果所述第三预裁剪框的下裁剪边不位于所述第一中心线的下方,或,所述第三预裁剪框的下裁剪边不位于所述第二中心线的下方,则将所述第三预裁剪框的下裁剪边往下移动,直到所述第三预裁剪框的下裁剪边位于所述第一中心线和所述第二中心线的下方,获得所述第三裁剪框。
  8. 如权利要求7所述的方法,其特征在于,所述确定出所述第一目标对象的水平方向的第一中心线和所述第二目标对象的水平方向的第二中心线,包括:
    确定第一目标对象的第一预设关键点与所述第二目标对象的所述第一预设关键点之间的相对距离;
    判断所述相对距离是否大于预设阈值;
    如果所述相对距离大于所述预设阈值,确定所述第一中心线和所述第二中心线。
  9. 如权利要求1所述的方法,其特征在于,在所述获取所述第一裁剪框中的内容,作为第二视频帧进行显示之后,所述方法还包括:
    采集获得第七视频帧,所述第七视频帧中包含所述第一目标对象和第四目标对象在内的至少两个目标对象;
    判断是否存在聚焦于所述第四目标对象的聚焦操作;
    如果存在所述聚焦操作,以所述第四目标对象为中心确定第四裁剪框;
    获取所述第四裁剪框中的内容,作为第八视频帧输出。
  10. 如权利要求1-9任一所述的方法,其特征在于,在所述以所述第一目标对象为中心,确定第一裁剪框之前,所述方法还包括:
    采集获得第九视频帧;输出未经裁剪的所述第九视频帧;
    检测获得第一操作,响应于所述第一操作进入目标跟踪模式;
    所述以所述第一目标对象为中心,确定第一裁剪框包括:在进入所述目标跟踪模式之后,以所述第一目标对象为中心,确定所述第一裁剪框。
  11. 如权利要求10所述的方法,其特征在于,在所述检测获得第一操作,响应于所述第一操作进入目标跟踪模式之后,所述方法还包括:
    在所述目标跟踪模式下,检测到第二操作;
    响应于所述第二操作,对所述电子设备输出的视频帧中的目标对象采用聚光灯效果。
  12. 如权利要求1-9任一的方法,其特征在于,所述方法应用于视频通话过程中,所述方法获取所述第一裁剪框中的内容,作为第二视频帧进行输出,包括:
    将所述第二视频帧传输至对端电子设备;和/或,将所述第二视频帧显示于视频通信界面。
  13. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    一个或多个存储器;
    多个应用程序;
    以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述一个或多个存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备的一个或多个处理器执行时,使得所述电子设备如权利要求1-12任一所述的方法。
  14. 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在电子设备上运行时,使得所述电子设备执行如权利要求1-12中任一项所述的方法。
  15. 一种计算机程序产品,其特征在于,所述计算机程序产品包括软件代码,所述软件代码用于执行如权利要求1-12中任一项所述的方法。
  16. 一种包含指令的芯片,其特征在于,当所述芯片在电子设备上运行时,使得所述电子设备执行如权利要求1-12中任一项所述的方法。
PCT/CN2020/137550 2019-12-19 2020-12-18 一种视频处理方法及电子设备 WO2021121374A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20901983.5A EP4060979A4 (en) 2019-12-19 2020-12-18 VIDEO PROCESSING METHOD AND ELECTRONIC DEVICE
US17/843,242 US20220321788A1 (en) 2019-12-19 2022-06-17 Video Processing Method and Electronic Device

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201911315344 2019-12-19
CN201911315344.0 2019-12-19
CN202010753515.4 2020-07-30
CN202010753515.4A CN113014793A (zh) 2019-12-19 2020-07-30 一种视频处理方法及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/843,242 Continuation US20220321788A1 (en) 2019-12-19 2022-06-17 Video Processing Method and Electronic Device

Publications (1)

Publication Number Publication Date
WO2021121374A1 true WO2021121374A1 (zh) 2021-06-24

Family

ID=76383442

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/137550 WO2021121374A1 (zh) 2019-12-19 2020-12-18 一种视频处理方法及电子设备

Country Status (4)

Country Link
US (1) US20220321788A1 (zh)
EP (1) EP4060979A4 (zh)
CN (1) CN113014793A (zh)
WO (1) WO2021121374A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10244175B2 (en) * 2015-03-09 2019-03-26 Apple Inc. Automatic cropping of video content
CN115633255B (zh) * 2021-08-31 2024-03-22 荣耀终端有限公司 视频处理方法和电子设备
CN114339031A (zh) * 2021-12-06 2022-04-12 深圳市金九天视实业有限公司 画面调节方法、装置、设备以及存储介质
WO2023225910A1 (zh) * 2022-05-25 2023-11-30 北京小米移动软件有限公司 视频显示方法及装置、终端设备及计算机存储介质
CN116033260A (zh) * 2022-08-25 2023-04-28 维沃移动通信有限公司 拍摄方法、装置、电子设备及存储介质
CN117714833A (zh) * 2023-05-19 2024-03-15 荣耀终端有限公司 图像处理方法、装置、芯片、电子设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110091108A1 (en) * 2009-10-19 2011-04-21 Canon Kabushiki Kaisha Object recognition apparatus and object recognition method
CN103576848A (zh) * 2012-08-09 2014-02-12 腾讯科技(深圳)有限公司 手势操作方法和手势操作装置
CN105979383A (zh) * 2016-06-03 2016-09-28 北京小米移动软件有限公司 图像获取方法及装置
CN106034216A (zh) * 2015-03-10 2016-10-19 北京同步科技有限公司 基于图像识别的摄像机图像定位系统及其方法
CN108366303A (zh) * 2018-01-25 2018-08-03 努比亚技术有限公司 一种视频播放方法、移动终端及计算机可读存储介质
CN110347877A (zh) * 2019-06-27 2019-10-18 北京奇艺世纪科技有限公司 一种视频处理方法、装置、电子设备及存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8085302B2 (en) * 2005-11-21 2011-12-27 Microsoft Corporation Combined digital and mechanical tracking of a person or object using a single video camera
JP4947018B2 (ja) * 2008-09-12 2012-06-06 大日本印刷株式会社 顔画像の自動トリミング装置
CN108334099B (zh) * 2018-01-26 2021-11-19 上海深视信息科技有限公司 一种高效的无人机人体跟踪方法
CN109905593B (zh) * 2018-11-06 2021-10-15 华为技术有限公司 一种图像处理方法和装置
CN110189378B (zh) * 2019-05-23 2022-03-04 北京奇艺世纪科技有限公司 一种视频处理方法、装置及电子设备
CN110427833A (zh) * 2019-07-10 2019-11-08 广州市讯码通讯科技有限公司 一种手势跟踪方法、系统和存储介质
CN110533700B (zh) * 2019-08-30 2023-08-29 腾讯科技(深圳)有限公司 对象跟踪方法和装置、存储介质及电子装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110091108A1 (en) * 2009-10-19 2011-04-21 Canon Kabushiki Kaisha Object recognition apparatus and object recognition method
CN103576848A (zh) * 2012-08-09 2014-02-12 腾讯科技(深圳)有限公司 手势操作方法和手势操作装置
CN106034216A (zh) * 2015-03-10 2016-10-19 北京同步科技有限公司 基于图像识别的摄像机图像定位系统及其方法
CN105979383A (zh) * 2016-06-03 2016-09-28 北京小米移动软件有限公司 图像获取方法及装置
CN108366303A (zh) * 2018-01-25 2018-08-03 努比亚技术有限公司 一种视频播放方法、移动终端及计算机可读存储介质
CN110347877A (zh) * 2019-06-27 2019-10-18 北京奇艺世纪科技有限公司 一种视频处理方法、装置、电子设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4060979A4

Also Published As

Publication number Publication date
EP4060979A4 (en) 2022-12-28
CN113014793A (zh) 2021-06-22
EP4060979A1 (en) 2022-09-21
US20220321788A1 (en) 2022-10-06

Similar Documents

Publication Publication Date Title
WO2021121374A1 (zh) 一种视频处理方法及电子设备
US20220301180A1 (en) Image Processing Method and Electronic Device
CN110865744B (zh) 一种分屏显示方法与电子设备
CN111885265B (zh) 屏幕界面调整方法及相关装置
CN105488527B (zh) 图像分类方法及装置
CN109683716B (zh) 基于眼睛跟踪的可见度提高方法和电子装置
CN111182205B (zh) 拍摄方法、电子设备及介质
CN108881544B (zh) 一种拍照的方法及移动终端
CN113014846B (zh) 一种视频采集控制方法、电子设备、计算机可读存储介质
CN108848313B (zh) 一种多人拍照方法、终端和存储介质
WO2020259655A1 (zh) 一种图像拍摄方法与电子设备
CN113747085A (zh) 拍摄视频的方法和装置
WO2021179830A1 (zh) 构图指导方法、装置及电子设备
CN110851062A (zh) 一种绘图方法及电子设备
WO2021185374A1 (zh) 一种拍摄图像的方法及电子设备
CN112541400B (zh) 基于视线估计的行为识别方法及装置、电子设备、存储介质
CN113536866A (zh) 一种人物追踪显示方法和电子设备
CN113741681A (zh) 一种图像校正方法与电子设备
WO2022266907A1 (zh) 处理方法、终端设备及存储介质
CN114697530B (zh) 一种智能取景推荐的拍照方法及装置
CN113347356A (zh) 拍摄方法、装置、电子设备及存储介质
US11601588B2 (en) Take-off capture method and electronic device, and storage medium
WO2022261856A1 (zh) 图像处理方法、装置及存储介质
CN114390191A (zh) 录像方法、电子设备及存储介质
CN112788239A (zh) 拍摄方法、装置和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20901983

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020901983

Country of ref document: EP

Effective date: 20220614

NENP Non-entry into the national phase

Ref country code: DE