WO2021036318A1 - 一种视频图像处理方法及装置 - Google Patents

一种视频图像处理方法及装置 Download PDF

Info

Publication number
WO2021036318A1
WO2021036318A1 PCT/CN2020/087634 CN2020087634W WO2021036318A1 WO 2021036318 A1 WO2021036318 A1 WO 2021036318A1 CN 2020087634 W CN2020087634 W CN 2020087634W WO 2021036318 A1 WO2021036318 A1 WO 2021036318A1
Authority
WO
WIPO (PCT)
Prior art keywords
video image
frame
cropping
electronic device
video
Prior art date
Application number
PCT/CN2020/087634
Other languages
English (en)
French (fr)
Inventor
武勇
赵厚强
宋巍
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021036318A1 publication Critical patent/WO2021036318A1/zh
Priority to US17/680,889 priority Critical patent/US20220270343A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/24Indexing scheme for image data processing or generation, in general involving graphical user interfaces [GUIs]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • This application relates to the field of image processing, and in particular to a method and device for processing video images.
  • video screen display during a video call video screen display in a surveillance scene.
  • the conventional video capture and display process is to capture the video image by the capture device, perform corresponding cropping and zooming on the captured video image according to the display specifications, and then encode and send it to the display device for display.
  • the acquisition and display are usually realized based on a fixed hardware platform, and the video image of a fixed field of view is collected by the acquisition camera.
  • the screen on the display side always maintains a fixed field of view display, which does not achieve the effect of "drawing with the person", and the user experience is poor.
  • the industry applies human perception technology to the image acquisition and display process.
  • the specific solution is: the camera performs large-resolution acquisition according to a fixed field of view, and uses human perception technology to detect and track the collected video images to locate the position of the person in real time.
  • the large-resolution video image can be cropped and zoomed according to the real-time positioning character position (the character position after the movement) to obtain a small resolution that fits the display specifications and the character is located in a specific area of the image Image, so as to realize the real-time adjustment of the display screen according to the position of the character, achieving the effect of "drawing with the movement of the person".
  • the above method may cause misdetection and missed detection, resulting in inaccurate positions of characters located in certain frames, and small resolution after cropping and zooming.
  • the character cannot be displayed or displayed completely in the rate image, so that the main character's picture is not continuous.
  • the present application provides a video image processing method and device, which can realize continuous pictures of display pictures following the movement of people in a video call.
  • a video image processing method may include: acquiring the identity information and position information of each person in the i-th frame of video image; i is greater than 1; according to the N video images before the i-th frame of video image The identity information of the person in the frame, determine M main persons from the video image of the i-th frame; M and N are greater than or equal to 1, according to the position information of the main person, crop the i-th video image, and the cropped i-th video The image includes M main characters; the cropped i-th frame of video image is reduced or enlarged so that the display screen can display the cropped i-th frame of video image according to a preset display specification.
  • the video image processing method when determining the main character of the video image, the person identity information of the current frame image and the person identity information of the N video image frames before the current frame are combined to make the person perception process accurate It is greatly improved, and the accuracy of determining the position of the main character is correspondingly improved, so as to ensure that the main character can be fully displayed in the small-resolution image after cropping and scaling the main character, so as to ensure that the displayed main character's picture is continuous to realize the image
  • the continuous painting of the picture follows the movement of people through software.
  • the identity information of the person is used to uniquely indicate the same person in different frames, and the identity information may be the sign information of the person obtained by the detection and tracking algorithm, that is, each person has its own different characteristic information.
  • the i-th video image is any video image in the video stream, and i is less than or equal to the total number of frames in the video stream.
  • the video image processing method provided in this application is executed for each frame of image in the video stream to ensure that each frame of image can be completely displayed after cropping, and the others will not be one by one. Go into details.
  • the N video image frames before the i-th video image may be the first N video image frames that are continuous with the i-th video image in the video stream, or may also be the i-th video image in the video stream.
  • the identity information of the characters in the N video image frames includes the identity information of the M main characters, that is, the M main characters have appeared in the first N video image frames. Specifically, whether a person appears in the video image is identified by the person's identity information.
  • determining the M main characters from the i-th video image may include: According to the identity information of each person in the i-th frame of video image and the identity information of the persons in the N video image frames before the i-th frame of video image, M main characters are determined from the i-th frame of video image.
  • a person who has appeared in the first N video image frames and appears in the i-th video image frame and meets a preset condition may be determined as the main person.
  • the preset conditions can be configured according to actual conditions, which are not limited in this application.
  • Determining M main characters in N video image frames can be specifically implemented as: determining the number of frames that appear in the N video image frames to be greater than or equal to the first preset threshold and appearing in the i-th frame of the video image as the M main characters.
  • the main character is determined by the number of accumulated frames, which avoids the interference of people who are not participating in the video call entering and exiting the screen on the character recognition, and improves the accuracy of the character recognition.
  • the process of determining whether a person in the i-th video image is the main person may include: counting the cumulative appearance of the person in N video image frames, and if the cumulative appearance of the person is greater than or equal to the first preset threshold , The character determines the main character.
  • Whether the person appears in a video image frame can be specifically implemented as: whether the video image frame contains a person with the same identity information as the person.
  • the cumulative number of appearance frames of a person is the number of consecutive video image frames in which the person appears in the N video image frames before the i-th video image; the consecutive video image frames may include S frames of video in which the person does not appear Image frame; S is greater than or equal to 0 and less than or equal to the preset number of frames.
  • the video image processing method provided by this application may further include: dividing the i-th frame of video image into Y regions; configuring each The preset threshold corresponding to each area; the preset threshold corresponding to the k-th area is the k-th preset threshold; the k-th area is any one of the Y areas; Y is greater than or equal to 2; k is greater than or equal to 1, and less than or equal to Y.
  • the preset thresholds corresponding to different regions may be different.
  • the M main characters are determined from the i-th video image, which is specifically implemented as follows:
  • the number of frames is greater than or equal to the preset threshold corresponding to the area and the persons appearing in the i-th frame of the video image are determined to be M main persons.
  • the above method further includes: obtaining character information of each character in the i-th frame of video image, the character information may include one or more of the following information: whether to speak information , Priority information.
  • the M main characters are determined from the i-th video image, which can be specifically implemented as follows: speaking in N video image frames The number of frames is greater than or equal to the second preset threshold and the persons appearing in the i-th frame of the video image are determined to be M main persons.
  • the persons whose priority information in the N video image frames are greater than the third preset threshold and appear in the i-th frame of the video image are determined as the M main persons.
  • the number of speaking frames in the N video image frames is greater than or equal to the second preset threshold and the characters appearing in the i-th frame of video image are selected according to the priority information and determined as the M main characters.
  • the speaking information is used to indicate whether the person in the video image is speaking or not speaking.
  • the audio processing technology can be combined with the mouth shape of the character in the video image to obtain the information whether the character is speaking or not, or the information whether the character speaks or not can be obtained directly through the mouth shape of the character in the video image.
  • the priority information is used to indicate the importance of the person in the video image, and the priority information of different people using the device can be pre-configured to correspond to the person's identity information. Then, when processing each frame of video image, when the person's identity information is obtained, the pre-configured priority information is searched to obtain the person's priority information. Or, the priority information input by the user for different characters in the video image can be received.
  • the video image processing method provided in this application may further include: receiving priority information input by a user. In order to realize the user real-time configuration of the character priority level, improve the accuracy of character recognition.
  • crop the i-th frame of the video image which can be specifically implemented as follows: determining a cropping frame, the cropping frame The smallest bounding rectangular frame containing M main characters; the determined cropping frame crops the i-th video image.
  • the cropping frame can be the smallest circumscribed rectangular frame of M main characters plus a cropping margin, and the cropping margin can be greater than or equal to zero.
  • the minimum circumscribed rectangular frame of the cropping frame containing M main characters can be understood as: the determined cropping frame contains the minimum circumscribed rectangular frame of M main characters as complete as possible.
  • determining the crop frame can be specifically implemented as follows: obtain the center point of the crop frame to be selected and the crop frame of the previous frame of video image The distance from the center point, the crop frame to be selected includes the smallest bounding rectangle of M main characters; if the distance is greater than or equal to the distance threshold, the crop frame to be selected is expanded to be less than the distance from the center point of the crop frame of the previous frame of video image The threshold is preset, and the expanded crop frame to be selected is used as the determined crop frame.
  • the cropping frame to be selected can be the smallest circumscribed rectangular frame of the M main characters plus a cropping margin, and the cropping margin can be greater than or equal to zero.
  • determining the cropping frame may be specifically implemented as: obtaining the center point of the first to-be-selected cropping frame and the crop of the previous frame of video image
  • the distance between the center point of the frame, the first to-be-selected cropping frame includes the smallest circumscribed rectangular frame of M main characters; if the distance is greater than or equal to the distance threshold, the second cropping frame is determined, and the center point of the second cropping frame is the previous frame Add an offset to the center point of the cropping frame of the video image.
  • the size of the second cropping frame is the same as the size of the cropping frame of the previous frame of video image; if the second cropping frame contains the smallest bounding rectangular frame of M main characters, the first Three cropping frames are used as cropping frames; among them, the third cropping frame is the second cropping frame, or, the third cropping frame is the cropping frame of the second cropping frame reduced to the smallest circumscribed rectangular frame; if the second cropping frame does not completely contain the smallest Circumscribe the rectangular frame, expand the second cropping frame to include the smallest circumscribed rectangular frame, and use the expanded second cropping frame as the cropping frame.
  • the offset may be a preset value, or it may be the distance between the center point of the first to-be-selected cropping frame and the center point of the cropping frame of the previous frame of video image multiplied by a weighted value, or others.
  • the to-be-selected cropping frame or the first to-be-selected cropping frame may be based on M subjects
  • the character with the highest priority among the characters is the center and the circumscribed rectangular frame containing M main characters plus the cutting margin.
  • the to-be-selected cropping frame or the first to-be-selected cropping frame may be based on M subjects The circumscribed rectangular box with the speaking person as the center and containing M main characters plus cutting margin.
  • the video image processing method provided in this application may further include: displaying the cropped i-th frame video image according to a preset display specification .
  • the preset display specification may be a specification adapted to the display screen, or may also be a preset display screen ratio.
  • the video image processing method provided by this application may further include: saving the following information of each person in the i-th frame of video image At least one item: identity information, location information, person information.
  • the video image processing method provided by this application may further include: obtaining a j-th frame of video image, j is less than or equal to X, X Greater than 1; obtain and save the identity information and position information of each person in the j-th frame of video image; directly reduce the j-th frame of video image to an image with a preset display specification.
  • the video image processing method provided in this application is applied to the sending end device in a video call, and the video image processing method provided in this application is It may also include: sending a reduced or enlarged i-th frame of video image to the receiving end device.
  • the present application provides a video image processing device, which may be an electronic device, a device or a chip system in an electronic device, or a device that can be matched and used with an electronic device.
  • the video image processing device can realize the functions performed in the above-mentioned aspects or various possible designs, and the functions can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • the video image processing device may include: an acquisition unit, a determination unit, a cropping unit, and a scaling unit.
  • the obtaining unit is used to obtain the identity information and position information of each person in the i-th frame of video image; i is greater than 1; the determining unit is used to obtain the identity of the person in the N video image frames before the i-th frame of video image Information, determine M subject characters from the i-th video image; M and N are greater than or equal to 1; among them, the identity information of the characters in the N video image frames includes the identity information of the M subjects; the cropping unit is based on the subject The position information of the character, the i-th frame of video image is cropped, the i-th frame of video image after cropping includes M main characters; the zoom unit reduces or enlarges the i-th frame of video image after cropping, so that the display screen can display according to the preset display specifications Display the i-th frame video image after cropping.
  • the video image processing device provided in the second aspect is used to execute the video image processing method provided in the first aspect, and the specific implementation may refer to the specific implementation of the first aspect.
  • an embodiment of the present application provides an electronic device.
  • the electronic device may include a processor and a memory; the processor and the memory are coupled.
  • the memory may be used to store computer program code.
  • the computer program code includes computer instructions. When the electronic device is executed, the electronic device is caused to execute the video image processing method described in the first aspect or any one of the possible implementation manners.
  • an embodiment of the present application provides a computer-readable storage medium.
  • the computer-readable storage medium may include: computer software instructions; when the computer software instructions run in an electronic device, the electronic device executes the same as in the first aspect. Or the video image processing method described in any one of the possible implementation manners of the first aspect.
  • the embodiments of the present application provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute the first aspect of the claim or any one of the possible implementation manners The video image processing method.
  • the embodiments of the present application provide a chip system, which is applied to an electronic device; the chip system includes an interface circuit and a processor; the interface circuit and the processor are interconnected by wires; the interface circuit is used to receive data from the memory of the electronic device Signal and send a signal to the processor, the signal includes a computer instruction stored in the memory; when the processor executes the computer instruction, the chip system executes the video as described in the first aspect or any one of the possible implementation manners Image processing method.
  • GUI graphical user interface
  • the graphical user interface is stored in an electronic device, and the electronic device includes a display, a memory, and one or more processors; one or more A processor is used to execute one or more computer programs stored in the memory, the graphical user interface includes: a GUI displayed on the display, the GUI includes a video screen, the video screen includes the first aspect or any
  • One possible implementation is the processed i-th frame of video image.
  • the video image is transmitted to the electronic device by another electronic device (for example, called a second electronic device), and the second electronic device includes a display screen and a camera.
  • FIG. 1 is a schematic diagram of a video scene provided by an embodiment of the application
  • FIG. 2 is a schematic diagram of a system architecture of a video call scenario provided by an embodiment of the application
  • FIG. 3 is a schematic diagram of a video image provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of video image processing provided by an embodiment of this application.
  • FIG. 5 is a schematic diagram of a video image processing result provided by an embodiment of this application.
  • FIG. 6 is a schematic diagram of another video image processing result provided by an embodiment of this application.
  • FIG. 7 is a schematic diagram of a system architecture of a video surveillance scenario provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • FIG. 9 is a schematic flowchart of a video image processing method provided by an embodiment of this application.
  • FIG. 10 is a schematic diagram of a video call interface provided by an embodiment of the application.
  • FIG. 11 is a schematic diagram of another video call interface provided by an embodiment of the application.
  • FIG. 12 is a schematic diagram of another video call interface provided by an embodiment of this application.
  • FIG. 13 is another schematic diagram of video image processing provided by an embodiment of the application.
  • FIG. 14 is a schematic diagram of still another video image processing provided by an embodiment of this application.
  • FIG. 15 is another schematic diagram of video image processing provided by an embodiment of this application.
  • 16 is a schematic flowchart of another video image processing method provided by an embodiment of the application.
  • FIG. 17A is a schematic diagram of another video call interface provided by an embodiment of this application.
  • FIG. 17B is a schematic diagram of another video call interface provided by an embodiment of this application.
  • 18A is a schematic diagram of another video call interface provided by an embodiment of this application.
  • 18B is a schematic diagram of another video call interface provided by an embodiment of this application.
  • 18C is a schematic diagram of another video call interface provided by an embodiment of this application.
  • FIG. 19 is a schematic diagram of another video image processing provided by an embodiment of this application.
  • FIG. 19A is another schematic diagram of video image processing provided by an embodiment of this application.
  • FIG. 19B is a display diagram of another video call interface provided by an embodiment of this application.
  • FIG. 20 is another schematic diagram of video image processing provided by an embodiment of this application.
  • 20A is a display diagram of another video call interface provided by an embodiment of the application.
  • FIG. 21 is a schematic diagram of still another video image processing provided by an embodiment of this application.
  • 21A is another schematic diagram of video image processing provided by an embodiment of this application.
  • FIG. 21B is a display diagram of another video call interface provided by an embodiment of this application.
  • FIG. 22 is another schematic diagram of video image processing provided by an embodiment of this application.
  • FIG. 22A is a display diagram of another video call interface provided by an embodiment of this application.
  • FIG. 23 is a schematic diagram of video image processing of a surveillance scene provided by an embodiment of the application.
  • FIG. 24 is a schematic diagram of still another surveillance scene video image processing provided by an embodiment of this application.
  • FIG. 25 is a schematic diagram of still another surveillance scene video image processing provided by an embodiment of the application.
  • FIG. 26 is a schematic diagram of still another surveillance scene video image processing provided by an embodiment of the application.
  • FIG. 27 is a schematic structural diagram of a video image processing device provided by an embodiment of the application.
  • FIG. 28 is a schematic structural diagram of another video image processing apparatus provided by an embodiment of the application.
  • this application proposes a new service transmission method, which is used to adjust the power of the carrier when the multiple carriers configured by the UE support different numerology.
  • the basic principle is: when the UE uses multiple carriers to transmit services, one carrier
  • the power adjustment position of the base station uses the transmit power configured or instructed to transmit the signal, and the power adjustment position of the remaining carriers uses the transmit power less than or equal to the base station configuration or instruction to transmit the signal.
  • the power adjustment position time resources of different carriers overlap. This application only adjusts the power adjustment position.
  • the power adjustment position can be flexibly configured according to actual needs, and the power of scheduled services can also be reduced to ensure that the transmission power of temporary burst services can also be guaranteed.
  • words such as “exemplary” or “for example” are used as examples, illustrations, or illustrations. Any embodiment or design solution described as “exemplary” or “for example” in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as “exemplary” or “for example” are used to present related concepts in a specific manner.
  • Video stream can refer to data transmitted in a video service, that is, a dynamic continuous image sequence in a video call, video conference, or surveillance scene.
  • Video images can refer to static images, and each frame of image in the video stream is called a video image.
  • it is not only applicable to moving or standing people in the video image, but also applicable to other subject objects in the video image, such as moving or static animals or other things.
  • the following will take the person in the video image as an example for description, which should not limit the application scenario.
  • Identity information can refer to the feature identifier of each person identified by the human body detection and tracking algorithm in the video image, and is used to uniquely identify the same person in different frames to distinguish different individuals.
  • the identity information may include, but is not limited to, appearance information, labeling information, or other recognized characteristic information.
  • the expression form of identity information may include text, serial number, character number, or other information related to individual characteristics.
  • the position information can be used to indicate the relative position or area of the person in the video image in the video image.
  • the form of the position information may be the pixel position of one or more points of the person in the video image, or the pixel position of the outline of the person, or the pixel position of the area where the person is located, or the like.
  • the pixel location can be indicated by pixel coordinates or other indications.
  • the position information is used to indicate the relative position of the person in the video image, and is not limited to a specific location.
  • the person information may refer to the additional information of each person in the video image obtained through a recognition algorithm or a marking algorithm, so as to better identify the person and determine the main person.
  • the character information may include, but is not limited to, one or more of the following information: whether the character is speaking or not, character priority information, and so on.
  • additional character positioning equipment such as locating the speaker's position through voice
  • the other is a software algorithm implementation scheme.
  • the camera performs large-resolution acquisition according to a fixed field of view.
  • the person detection and tracking algorithm locates the position of the person in real time, and then the large-resolution image is correspondingly cropped, reduced or enlarged according to the position of the positioned person ( Zoom) to obtain a small resolution image of the established specification.
  • the software solution may have defects such as misdetection and missed detection. If it is directly cut after positioning, the accuracy of the character perception is not high, and the continuity of the final display screen will be difficult to guarantee.
  • the embodiment of the present application provides a video image processing method, which realizes the continuous drawing of the displayed main character's picture following the person's movement through software.
  • the method can be applied to electronic devices.
  • the main person is determined by combining the person identity information of the current frame and the historical frame, and the current frame of video image collected is cropped and zoomed according to the main person.
  • the accuracy of the character perception process is greatly improved, and the accuracy of determining the position of the main character is correspondingly improved. This ensures that the main character can be fully displayed in the small-resolution image cropped and zoomed according to the main character, so as to ensure that the main character is displayed.
  • the picture is continuous, in order to realize the continuous picture of the picture following the movement of the person through the software in the process of image acquisition and display.
  • the video image processing method provided by the embodiments of the present application can be applied to the video image collection and display process of an electronic device.
  • the image collection and display process can be in a video call (video conference) scene or a video surveillance scene or other.
  • video call video conference
  • FIG. 2 is a schematic diagram of a system architecture in which the above-mentioned video image processing method provided by an embodiment of the application is applied in a video call scenario.
  • the system architecture may include a sending end device 201 and a receiving end device 202.
  • the sending end device 201 may be used as one end of a video call to communicate with the receiving end device 202.
  • one or more users 1 can talk with one or more users 2 of the receiving device 202 through the sending device 201.
  • the call in this embodiment may refer to a video call or a video conference. Therefore, the sending end device 201 includes at least a camera and a display screen, and the receiving end device 202 also includes at least a camera and a display screen.
  • the transmitting end device 201 and the receiving end device 202 may also include a receiver (or speaker), a microphone, and the like.
  • the camera can be used to capture video images during a call.
  • the display screen can be used to display images during a call.
  • the earpiece (or speaker) is used to play the voice during a call.
  • the microphone is used to collect the voice during the call.
  • the sending end device 201 includes a video collector 2011, a video preprocessor 2012, a video encoder 2013, and a transmitter 2014.
  • the receiving end device 202 includes a video display 2021, a video post-processor 2022, a video decoder 2023, and a receiver 2024.
  • the workflow of the system architecture shown in FIG. 2 is: the video collector 2011 in the sending end device 201 collects the video images in the video call frame by frame, and transmits the collected video images to the video preprocessor 2012 Perform corresponding pre-processing (including but not limited to: person recognition, cropping, zooming, etc.), and then encoded by the video encoder 2013 and then sent to the sender 2014, and the sender 2014 sends the encoded video image through a wired or wireless medium
  • the receiver 2024 of the receiving end device 202 the receiver 2024 transmits the received video image to the video decoder 2023 for decoding, and the decoded video image is processed by the video post processor 2022 and then transmitted to the video display 2021 for display.
  • the electronic devices described in the embodiments of the present application may be televisions, mobile phones, tablet computers, desktops, laptops, handheld computers, notebook computers (such as Huawei laptops), desktop computers, and ultra-mobile personal computers.
  • UMPC ultra-mobile personal computer
  • netbooks and cellular phones
  • PDAs personal digital assistants
  • AR augmented reality
  • VR virtual reality
  • the sending end device 201 and the receiving end device 202 may be the same type of electronic devices, for example, the sending end device 201 and the receiving end device 202 are both TV sets. In some other embodiments, the sending end device 201 and the receiving end device 202 may be different types of electronic devices, for example, the sending end device 201 is a television, and the receiving end device 202 is a laptop computer.
  • the video image transmission process in a video call or video conference is illustrated.
  • the electronic device 1 is the transmitting end device and the electronic device 2 is the receiving end device.
  • the video image of the fixed field of view collected by the camera at a certain moment can be as shown in Figure 3.
  • the electronic device 1 uses a person detection and tracking algorithm on the video image shown in FIG. 3 to identify the person's identity information and location information.
  • the position information may be the coordinates shown in FIG. 4.
  • the coordinates here are examples of the specific coordinates of each key point in the character.
  • the key points may include, but are not limited to: head, shoulders, arms, hands, legs, feet, eyes, nose, mouth, clothes, etc.
  • FIG. 4 The minimum circumscribed rectangular frame of the recognized person determined by the electronic device 1 is shown in FIG. 4.
  • the display specification of the electronic device 2 has a resolution image of width w and height h
  • the electronic device 1 is centered on the smallest circumscribed rectangular frame
  • the video image shown in Figure 3 is cropped according to the aspect ratio of the display specification of the electronic device 2 to obtain Figure 5 shows the cropping results.
  • the electronic device 1 scales the cropping result shown in FIG. 5 to a resolution image with width w and height h as shown in FIG. 6.
  • the specific zooming process is: if the resolution of the cropped result is less than the width w height h, then zoom in; if the resolution of the cropped result is greater than the width w height h, then zoom out.
  • FIG. 7 is a schematic diagram of a system architecture in which the foregoing video image processing method provided by an embodiment of the application is applied to a video surveillance scene.
  • the system architecture may include a collection device 701, a processing device 702, a storage device 703, and a display device 704.
  • the equipment included in the system architecture illustrated in FIG. 7 may be deployed in a centralized manner or in a distributed manner.
  • the device included in the system architecture illustrated in FIG. 7 may be deployed in at least one electronic device.
  • the workflow of the system architecture shown in FIG. 7 is: the collection device 701 collects video images frame by frame, and transmits the collected video images to the processing device 702 for corresponding preprocessing (including but not limited to: person recognition, cropping) , Zooming, etc.) and then stored in the storage device 703.
  • the display device 704 obtains the video image from the storage device 703 and displays it.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • the structures of the above-mentioned transmitting-end device 201, receiving-end device 202, and electronic devices included in the system architecture illustrated in FIG. 7 may be as shown in FIG. 8.
  • the electronic device may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, and a battery 142, Antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, A display screen 194, and a subscriber identification module (SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and the environment Light sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in this embodiment does not constitute a specific limitation on the electronic device.
  • the electronic device may include more or fewer components than shown, or combine certain components, or divide certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, neural network processor (neural-network processing unit, NPU), etc. one or more.
  • AP application processor
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller can be the nerve center and command center of the electronic device.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that the processor 110 has just used or used cyclically. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter/receiver (universal asynchronous) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART receiver/transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive the charging input of the wired charger through the USB interface 130.
  • the charging management module 140 may receive the wireless charging input through the wireless charging coil of the electronic device. While the charging management module 140 charges the battery 142, it can also supply power to the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110.
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in an electronic device can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 150 can provide applications on electronic devices including the second generation mobile communication technology (2G)/the 3rd generation mobile communication technology (3G)/fourth generation Mobile communication technology (the 4th generation mobile communication technology, 4G)/(the 5th generation mobile communication technology, 5G) and other wireless communication solutions.
  • the mobile communication module 150 may include at least one filter and switch , A power amplifier, a low noise amplifier (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves for radiation through the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 can be set in the processor 110
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After the low-frequency baseband signal is processed by the baseband processor, it is passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194.
  • the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on electronic devices including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellite systems. (global navigation satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 160 may also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic waves to radiate through the antenna 2.
  • the antenna 1 of the electronic device is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device can communicate with the network and other devices through wireless communication technology.
  • the electronic device may conduct a video call or video conference with other electronic devices through the antenna 1 and the mobile communication module 150.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, IR technology, etc. one or more.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS), satellite-based augmentation system (satellite-based augmentation systems, SBAS), etc., one or more.
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation system
  • the electronic device realizes the display function through the GPU, the display screen 194, and the application processor.
  • the GPU is an image processing microprocessor, which is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 110 may include one or more GPUs, which execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, and the like.
  • the display screen 194 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • emitting diode AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the electronic device may include one or N display screens 194, and N is a positive integer greater than one.
  • the display screen 194 may display a video answering interface, or a video reminder interface, or a video call interface, or Video monitoring interface (such as including video images sent by the peer device and video images collected by this device).
  • Electronic equipment can achieve shooting functions through ISP, camera 193, video codec, GPU, display 194, and application processor.
  • the ISP is used to process the data fed back by the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the camera 193 may be used to collect video images during a video call or a video conference.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device may include 1 or N cameras 193, and N is a positive integer greater than 1.
  • the camera 193 may be installed in the electronic device in a hidden manner, or may not be installed in a hidden manner, and this embodiment does not specifically limit it here.
  • the digital signal processor is used to process digital signals. For example, a human body monitoring and tracking algorithm is used for digital video images, the main person in the video image is determined, and the video image is cropped and zoomed accordingly to obtain an image that meets the display specifications of the receiving end device.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device can support one or more video codecs.
  • the electronic device can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • NPU can realize the intelligent cognition of electronic equipment and other applications, such as: image recognition, face recognition, voice recognition, text understanding, etc.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device by running instructions stored in the internal memory 121. For example, in the embodiment of the present application, the processor 110 may process the video image to locate the person by executing the instructions stored in the internal memory 121, and then combine the current frame of person information and the historical frame of person information to determine the main person. The current frame of the video image collected by the character cropping and zooming ensures that the display screen of the receiving end device is continuous, so as to realize the continuous picture of the display screen in the video call with the movement of the person.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
  • the internal memory 121 can also be used to store the original large-resolution video image collected by the camera 193, the small-resolution video image that has been subjected to character recognition, character screening, cropping and scaling by the processor 110, and each frame of video image Character information and so on.
  • the electronic device can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, call, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
  • the speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the electronic device answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C.
  • the electronic device may be provided with at least one microphone 170C.
  • the electronic device may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals.
  • the electronic device may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the earphone interface 170D is used to connect wired earphones.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA, CTIA
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 180A may be provided on the display screen 194.
  • the capacitive pressure sensor may include at least two parallel plates with conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device determines the strength of the pressure based on the change in capacitance. When a touch operation acts on the display screen 194, the electronic device detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device may also calculate the touched position based on the detection signal of the pressure sensor 180A.
  • touch operations that act on the same touch position but have different touch operation strengths may correspond to different operation instructions. For example, when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B can be used to determine the movement posture of the electronic device.
  • the angular velocity of the electronic device around three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyroscope sensor 180B detects the angle of the shake of the electronic device, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device through a reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device can use the magnetic sensor 180D to detect the opening and closing of the flip holster.
  • the electronic device when the electronic device is a flip machine, the electronic device can detect the opening and closing of the flip according to the magnetic sensor 180D.
  • features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device in various directions (generally three axes). When the electronic device is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and apply to applications such as horizontal and vertical screen switching, pedometers and so on.
  • Distance sensor 180F used to measure distance.
  • Electronic equipment can measure distance through infrared or laser.
  • the electronic device may use the distance sensor 180F to measure the distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device emits infrared light to the outside through the light-emitting diode.
  • Electronic devices use photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device. When insufficient reflected light is detected, the electronic device can determine that there is no object near the electronic device.
  • the electronic device can use the proximity light sensor 180G to detect that the user holds the electronic device close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
  • the ambient light sensor 180L is used to sense the brightness of the ambient light.
  • the electronic device can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device is in the pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints. Electronic devices can use the collected fingerprint characteristics to unlock fingerprints, access application locks, take photos with fingerprints, and answer calls with fingerprints.
  • the temperature sensor 180J is used to detect temperature.
  • the electronic device uses the temperature detected by the temperature sensor 180J to execute the temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device executes to reduce the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
  • the electronic device when the temperature is lower than another threshold, the electronic device heats the battery 142 to avoid abnormal shutdown of the electronic device due to low temperature.
  • the electronic device boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be disposed on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”.
  • the touch sensor 180K is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K may also be disposed on the surface of the electronic device, which is different from the position of the display screen 194.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can obtain the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the human pulse and receive the blood pressure pulse signal.
  • the bone conduction sensor 180M may also be provided in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 180M, and realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 180M, and realize the heart rate detection function.
  • the button 190 includes a power-on button, a volume button, and so on.
  • the button 190 may be a mechanical button. It can also be a touch button.
  • the electronic device can receive key input, and generate key signal input related to user settings and function control of the electronic device.
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • touch operations that act on different applications can correspond to different vibration feedback effects.
  • Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 195 is used to connect to the SIM card.
  • the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device.
  • the electronic device can support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 may also be compatible with external memory cards.
  • the electronic device interacts with the network through the SIM card to realize functions such as call and data communication.
  • the electronic device adopts an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device and cannot be separated from the electronic device.
  • FIG. 9 is a schematic flowchart of a video image processing method provided by an embodiment of the application.
  • the electronic device processes the video stream in video calls or video surveillance frame by frame, and each frame of video image acquired is processed according to the image processing method provided in this application, and the electronic device processes each frame of image in the same way
  • the following embodiments only describe the detailed process of the electronic device processing the i-th frame of video image, and the rest will not be repeated one by one.
  • the i-th frame of video image is any frame of video image in the video stream.
  • the method may include:
  • the electronic device obtains the identity information and location information of each person in the i-th frame of video image.
  • i is greater than 1, i is less than or equal to the total number of frames of the video stream.
  • i may be greater than or equal to X
  • X is the threshold of the number of frames in the pre-configured video stream for starting to execute the video image processing method provided in this embodiment of the application.
  • the electronic device may use a human body detection and tracking algorithm to identify the person in the i-th frame of the video image, and the identified person is one or more, and the identity information and identity information of each person can be obtained while identifying the person. location information.
  • the human body detection and tracking algorithm is an image processing technology used to identify people in images, and the embodiment of the present application does not limit the specific implementation of the human body detection and tracking algorithm.
  • the human body detection and tracking algorithm may be the YOLO algorithm or the SSD algorithm or others.
  • the identity information of a person can be used to uniquely indicate the same person in different frames, and the identity information can be the identification information of the person obtained by a detection and tracking algorithm, that is, each person has its own different characteristic information.
  • the identity information may also be a character number corresponding to the characteristic information.
  • the position information of the character may be the unique coordinate value of one or more key points of the character in the video image.
  • the video processing method provided in the embodiment of the present application may further include S901a.
  • the electronic device obtains character information of each character in the i-th frame of video image.
  • the character information may include one or more of the following information: whether to speak information, priority information.
  • the content included in the character information can be configured according to actual needs without being limited by the content of this article.
  • the speaking information is used to indicate whether the person in the video image is speaking or not speaking.
  • the audio processing technology can be combined with the mouth shape of the character in the video image to obtain the information whether the character is speaking or not, or the information whether the character speaks or not can be obtained directly through the mouth shape of the character in the video image.
  • the priority information is used to indicate the importance of the person in the video image, and the priority information of different people using the device can be pre-configured to correspond to the person's identity information. Then, when processing each frame of video image, when the person's identity information is obtained, the pre-configured priority information is searched to obtain the person's priority information. Or, the priority information input by the user for different characters in the video image can be received. Alternatively, the priority information can be obtained by converting whether to speak or not. For example, the priority of a person who speaks is higher than that of a person who does not speak, and the priority of a person who speaks for a long time is higher than that of a person who speaks for a short time.
  • the electronic device stores the photo information of different people and the corresponding priority information.
  • the electronic device stores the photo information of different people and the corresponding priority information.
  • the similarity between the person identified in the video image and a certain stored photo is greater than the similarity threshold, the The priority information corresponding to the stored photo is used as the priority information of the recognized person.
  • the photo information of different characters and the corresponding priority information stored in the electronic device can be entered by the user into the function configuration interface of the electronic device, and the photo and priority information of different characters can be manually inputted and stored by the electronic device; or, it can be stored by the electronic device.
  • the electronic device records the photo information of different people and the corresponding priority information obtained during the process of video collection and display; or the user can manually input the photos and priority information of different people, and the electronic device will collect and display the video each time.
  • Dynamically update the photos of different people and the corresponding priority information Dynamically update the photos of different people and the corresponding priority information.
  • the video image processing method provided in this application may further include: receiving priority information input by the user.
  • the electronic device displays the configuration menu illustrated in FIG. 11 to the user, and the user can select "Configure Person Priority Information" in the configuration menu illustrated in FIG. 11 for priority configuration.
  • the electronic device displays the interactive interface shown in Figure 12. The user enters the character’s priority information on this interface, and the electronic device captures the character’s photo at the same time. The importance level entered by the user in the interface of Figure 12 is recorded and stored.
  • the electronic device determines M subject persons from the i-th frame of video image according to the identity information of the persons in the N video image frames before the i-th frame of video image.
  • the identity information of the characters in the N video image frames includes the identity information of the M main characters. It should be understood that the identity information of all the characters in the N video image frames includes the identity information of the M main characters, that is, the M main characters have appeared in the first N video image frames.
  • the identity information of the person in the N video image frames before the i-th video image is stored by the electronic device after the corresponding video image is acquired by performing S901 processing.
  • the specific process is the same as that of S901, and will not be repeated.
  • N is greater than or equal to 1.
  • N can be less than or equal to i-1.
  • the specific value of N can be configured according to actual needs.
  • the N video image frames before the i-th video image may be the first N video image frames adjacent to the i-th video image in the video stream, or may also be the first N video image frames in the video stream and the i-th frame
  • the first N video image frames that are not adjacent to the video image may alternatively be video image frames within a preset time period in the video stream.
  • the embodiment of the present application does not limit the specific positions of the N video image frames before the i-th video image in the video stream.
  • the value of N in the process of processing a video stream, can also be a dynamic value.
  • N When i is less than the configured threshold, N is equal to i-1, and when i is greater than the configured threshold, N is less than The fixed value of i-1.
  • N When i is equal to the configuration threshold, N can be equal to i-1, or a fixed value smaller than i-1, which is not specifically limited in this application.
  • N takes a fixed value smaller than i-1
  • the specific value of the fixed value can be configured based on experience, and this application does not specifically limit it.
  • M can be one or more.
  • the embodiment of the present application does not specifically limit the value of M.
  • M may be the total number of subjects determined in each video image frame.
  • M can be a pre-configured fixed value.
  • S902 can be implemented as follows: the electronic device, according to the identity information of each person in the i-th video image and the identity information of the persons in the N video image frames before the i-th video image, from the first Determine M main characters in the i-frame video image. For example, the electronic device can compare the identity information of each person in the i-th frame of video image with the identity information of the person in the N video image frames before the i-th frame of video image, and determine the identity information of each person in the i-th frame of video image. The person corresponding to the matching part of the identity information of the person in the N video image frames and the identity information of each person in the i-th frame of video image is used as the candidate, and then the main person is determined from the candidate.
  • the electronic device may determine a person who has appeared in the first N video image frames (identified according to the identity information) and appears in the i-th video image frame and meets a preset condition as the main person.
  • the preset condition can be configured according to actual conditions, which is not limited in this application.
  • the preset condition may be that the number of frames in which the person has appeared in the first N video image frames is greater than or equal to the threshold.
  • S902 can be implemented but not limited to the following possible implementations.
  • the electronic device determines that the number of frames appearing in the N video image frames is greater than or equal to the first preset threshold and the persons appearing in the i-th frame of the video image are M subject persons.
  • the process of determining whether a person in the i-th video image is the main person may include: counting the cumulative appearance of the person in N video image frames, and if the cumulative appearance of the person is greater than or equal to the first preset threshold , The character determines the main character.
  • Whether the person appears in a video image frame can be specifically implemented as: whether the video image frame contains a person with the same identity information as the person.
  • the cumulative number of appearance frames of a person is the number of consecutive video image frames in which the person appears in the N video image frames before the i-th video image; the consecutive video image frames may include S frames of video in which the person does not appear Image frame; S is greater than or equal to 0 and less than or equal to the preset number of frames.
  • the electronic device divides the i-th frame of video image into Y regions; configures the preset threshold corresponding to each region; the preset threshold corresponding to the k-th region is the k-th preset threshold; the k-th region is the Y region Any region; Y is greater than or equal to 2; k is greater than or equal to 1, and less than or equal to Y.
  • the number of frames appearing in the N video image frames is greater than or equal to the preset threshold corresponding to the area and the persons appearing in the i-th frame of the video image are determined as M main persons.
  • the preset thresholds corresponding to different regions may be different.
  • the video image is divided into 3 preset areas on the left, middle and right shown in Figure 13, which are recorded as area 1, area 2 and area 3, and the preset threshold configured for each area is recorded as threshold 1 respectively.
  • Threshold 2, Threshold 3, Threshold 1, Threshold 2, and Threshold 3 are different. Then, if it is recognized that the person A is located in the area 2 in the i-th frame of video image, and the cumulative number of appearance frames of the person A is greater than the threshold 2, then the person A is determined as the main person. If it is recognized that person B is located in area 3 in the i-th frame of video image, and the cumulative number of appearance frames of person B is less than the threshold 3, then person B is not the main person.
  • Y may also be 1.
  • the specific implementation of implementation 2 is the same as that of implementation 1, and will not be repeated.
  • the number of frames speaking in the N video image frames is greater than or equal to the second preset threshold and the persons appearing in the i-th frame of the video image are determined as M main persons. Or, determine the person whose priority information is greater than the third preset threshold in the N video image frames and appear in the i-th video image as the M main person; or, the frame that will speak in the N video image frames If the number is greater than or equal to the second preset threshold and appears in the i-th frame of the video image, the most important M persons are selected and determined as the M main persons according to the priority information.
  • each of the foregoing preset thresholds can be configured according to actual needs, which is not specifically limited in the embodiment of the present application.
  • the cumulative number of frames can also be converted into cumulative duration, and the content of the corresponding preset threshold can be a time threshold.
  • the electronic device crops the i-th frame of video image according to the position information of the main character.
  • the i-th frame of video image after cropping includes M main characters. It should be understood that the i-th frame of video image after cropping can completely display the M main characters.
  • the electronic device crops the i-th frame of the video image according to the position information of the main character, which can be specifically implemented as follows: determine a cropping frame, the cropping frame contains the smallest outer rectangular frame of the M main characters; crop the i-th frame of the video image with the cropping frame .
  • the aspect ratio of the cropping frame should be adapted to the preset display specifications.
  • the minimum circumscribed rectangular frame of the cropping frame containing M main characters can be understood as: the determined cropping frame contains the minimum circumscribed rectangular frame of M main characters as complete as possible.
  • the specific implementation of determining the crop frame may include, but is not limited to, the following implementation solutions.
  • Implementation scheme 1 The electronic device determines the crop frame to be selected as the crop frame.
  • the cropping frame to be selected can be the smallest circumscribed rectangular frame of the M main characters plus a cropping margin, and the cropping margin can be greater than or equal to zero.
  • the to-be-selected cropping frame may be a circumscribed rectangular frame centered on the person with the highest priority among the M main characters and containing the M main characters plus a cutting margin. .
  • FIG. 14 illustrates that the determined cropping frame is a circumscribed rectangular frame centered on the person with the highest priority among the M main characters and containing the M main characters, and the i-th video image is cropped to completely display the scene of the main character.
  • the to-be-selected cropping frame may be a circumscribed rectangular frame centered on the talking character among the M main characters and containing the M main characters plus a cutting margin.
  • FIG. 15 illustrates that the determined cropping frame is a circumscribed rectangular frame centered on the speaking person among the M main characters and containing the M main characters, and the i-th video image is cropped to completely display the scene of the main character.
  • Implementation scheme 2 The electronic device determines the cropping frame of the i-th frame of video image according to the first to-be-selected cropping frame and the cropping frame of the previous frame of video image.
  • the first to-be-selected cropping frame in Implementation Solution 2 is the same as the to-be-selected cropping frame in Implementation Solution 1.
  • the electronic device first obtains the distance between the center point of the first to-be-selected cropping frame and the center point of the cropping frame of the previous frame of video image, and the first to-be-selected cropping frame includes the minimum of M main characters. Circumscribe a rectangular frame; if the distance is greater than or equal to the distance threshold, the second cropping frame is determined.
  • the center point of the second cropping frame is the center point of the previous frame of the video image plus the offset, and the size of the second cropping frame is the same as The size of the cropping frame of the previous video image is the same; if the second cropping frame contains the smallest circumscribed rectangular frame of M main characters, the third cropping frame is used as the cropping frame; among them, the third cropping frame is the second cropping frame, or , The third cropping frame is the cropping frame that the second cropping frame is reduced to include the smallest enclosing rectangular frame; if the second cropping frame does not completely include the smallest enclosing rectangular frame, expand the second cropping frame to include the smallest enclosing rectangular frame, it will be expanded The second cropping frame is used as the cropping frame.
  • the offset can be a preset value, or it can be the distance between the center point of the first to-be-selected cropping frame and the center point of the cropping frame of the previous frame of video image multiplied by a weighted value, or it can be obtained according to a preset algorithm
  • the embodiments of this application do not specifically limit this.
  • expanding or contracting the crop box to be selected may be implemented as: expanding one or more sides of the crop box to be selected outward or shrinking inward.
  • the electronic device may directly use the crop frame to be selected as the determined crop frame.
  • the distance between the center point of the crop frame to be selected and the center point of the crop frame of the previous frame of video image may be a linear distance or others, which is not specifically limited in the embodiment of the present application.
  • the electronic device reduces or enlarges the i-th frame of the video image that has been cropped.
  • the electronic device executes S904, so that the display screen displays the cropped i-th frame of video image according to the preset display specification.
  • the electronic device reduces or enlarges the i-th frame of video image cropped in S903 according to the preset display specification.
  • the preset display specification may be a specification adapted to the display screen, or a fixed screen-to-body ratio.
  • the electronic device in S904 enlarges the i-th frame of video image after cropping into an image of the preset display specification; if it is cropped in S903 If the resolution of the i-th frame of video image is greater than the preset display specification, the electronic device in S904 will reduce the cropped i-th frame of video image to an image of the preset display specification; if the i-th frame of video image cropped in S903 is If the resolution is equal to the preset display specification, the electronic device in S904 uses the cropped i-th frame of video image as an image of the preset display specification.
  • the electronic device can continue to perform the processes of S901 to S904 for subsequent frames of video images, that is, i+1 traverses each frame of video image in the video stream, processes it frame by frame, and obtains one frame for processing one frame. Until the end of the video stream.
  • the video image processing method when determining the main character of the video image, the person identity information of the current frame image and the person identity information of the N video image frames before the current frame are combined to make the person perception process accurate It is greatly improved, and the accuracy of determining the position of the main character is correspondingly improved, so as to ensure that the main character can be fully displayed in the small-resolution image after cropping and scaling the main character, so as to ensure that the displayed main character's picture is continuous to realize the image
  • the continuous painting of the picture follows the movement of people through software.
  • the video image processing method provided by the present application may further include: the electronic device acquires the j-th frame of video image, where j is less than or equal to X; and X is greater than 1. Acquire and save the identity information and/or position information of each person in the j-th frame of video image; directly reduce the j-th frame of video image to an image with a preset display specification. Wherein, the identity information and/or position information of the j-th frame of video image can be used as reference information of the subsequent frame of video image.
  • the electronic device may also obtain and save the character information of each character in the j-th frame of video image.
  • the image processing method provided by the embodiment of the present application may further include S905.
  • the electronic device displays the cropped i-th frame of video image according to the preset display specification.
  • the electronic device that executes the video image processing method shown in FIG. 9 or FIG. 16 may be the sender device in a video call.
  • the video image processing method provided in this application may also include: the electronic device reduces or The enlarged image of the preset display specification is encoded and sent to the receiving end device, and the receiving end device displays the cropped i-th frame video image according to the preset display specification. Refer to the workflow of the system architecture shown in Figure 2 for the specific process.
  • the electronic device that executes the video image processing method shown in FIG. 9 or FIG. 16 may be the sender device in a video call.
  • the video image processing method provided in this application may also include: the electronic device according to a preset The display specification displays the i-th frame of video image after cropping, and at the same time, the video image of the opposite end after cropping is displayed according to the preset display specification.
  • the electronic device that executes the video image processing method shown in FIG. 9 or FIG. 16 may be the receiving end device in a video call, and the video image processing method provided in this application may also include: the electronic device zooms out or The enlarged image of the preset specification is displayed on the display device. Refer to the workflow of the system architecture shown in Figure 2 for the specific process.
  • the following takes a specific video call scenario as an example to describe in detail the video image processing method provided in the embodiment of the present application.
  • a video call application is installed in the electronic device 1701 and the electronic device 1702.
  • the video call application is a client that can provide users with a video call service.
  • the video call application installed in the electronic device 1701 and the electronic device 1702 can access a video call server through the Internet for data interaction, complete a video call, and provide video call services for users using the electronic device 1701 and the electronic device 1702.
  • the main interface (ie, desktop) of the electronic device 1701 includes an application icon 17011 of a video call application.
  • the desktop of the electronic device 1702 includes application icons 17021 of the video call application.
  • the electronic device 1701 invokes the video call application to make a video call with the electronic device 1702, and during the video call, performs the video image processing described in the embodiments of the present application on the video image.
  • the electronic device 1701 may receive a user's click operation (such as a touch click operation or an operation through a remote control device) on the application icon 17011 shown in FIG. 17A, and display the video call application interface 1801 shown in FIG. 18A.
  • the video call application interface 1801 includes a "new friend" option 1802 and at least one contact option.
  • the at least one contact option includes a contact option 1803 of Bob and a contact option 1804 of the user 311.
  • the "new friend" option 1802 is used to add a new contact.
  • the electronic device 1701 responds to a user's click operation (such as a single click operation or an operation through a remote control device) on the contact option 1804 of the user 311, and sends a video call request to the electronic device 1702 logged in to the user 311 account, and communicates with the electronic device 1702 Make a video call.
  • a user's click operation such as a single click operation or an operation through a remote control device
  • the electronic device 1701 may activate its own camera to collect images with a fixed field of view as scene images, and the display screen of the electronic device 1701 displays a video call including the scene images collected by the camera Interface 1805 is shown in Figure 18B.
  • the video call interface 1805 includes a prompt message "Waiting for the other party's response! 1806 and a "Cancel” button 1807.
  • the "cancel" button 1807 is used to trigger the electronic device 1701 to cancel the video call with the electronic device 1702.
  • the electronic device 1702 receives the video call request sent by the electronic device 1701 from the video call server, and the display screen of the electronic device 1702 displays a video call interface 1808 as shown in FIG. 18C.
  • the video call interface 1808 includes a "receive” button 1809 and a “reject” button 1810. Among them, the "receive” button 1809 is used for the electronic device 1702 to establish a video call connection with the electronic device 1701. The “reject” button 1810 is used to trigger the electronic device 1702 to reject the video call request of the electronic device 1701.
  • the electronic device 1702 can receive a user's click operation on the "receive" button 1809 (such as a touch click operation or an operation through a remote control device), and establish a video call connection with the electronic device 1701. After the connection is established, the electronic device 1701 and the electronic device 1702 serve as the two parties of the video call.
  • the electronic device 1701 and the electronic device 1702 can use their respective cameras to collect images with a fixed field of view as scene images, which are cropped, zoomed, and encoded frame by frame.
  • the opposite end sends a scene image, which is displayed by the opposite end.
  • the electronic device 1701 and the electronic device 1702 can display the cropped video image of the local end while displaying the video image cropped by the opposite end.
  • the electronic device 1701 sends a video image to the electronic device 1702
  • the electronic device 1701 is the sending end device
  • the electronic device 1702 is the receiving end device
  • the electronic device 1702 sends the video image to the electronic device 1701.
  • the device 1702 is the sending end device
  • the electronic device 1701 is the receiving end device.
  • the specific process of video image transmission between electronic devices can refer to the workflow of the system architecture shown in FIG. 2.
  • the electronic device 1701 and the electronic device 1702 may directly reduce the original image to an image with the display specification of the opposite end by encoding the first X (for example, X equals 120) frames of video images and sending it to the opposite end.
  • the electronic device 1701 and the electronic device 1702 may process the video image of the i-th frame (i is greater than 120) according to the video image processing method provided in the embodiment of the present application.
  • the video image of the fixed field of view collected by the camera of the electronic device 1701 is shown in Figure 19 (a), and the electronic device 1701 follows this
  • the video image processing method provided by the application embodiment determines that the main character is cropped and scaled to the display specification of the electronic device 1702 as shown in (b) of FIG. 19.
  • the electronic device 1701 encodes the image shown in (b) of FIG. 19 and transmits it to the electronic device 1702.
  • the video image of the fixed field of view collected by the camera of the electronic device 1702 is shown in (a) of FIG. 19A.
  • the electronic device 1702 determines the main character for cropping according to the video image processing method provided by the embodiment of the application.
  • the image scaled to the display specification of the electronic device 1701 is shown in (b) in FIG. 19A, and the electronic device 1702 encodes the image shown in (b) in FIG. 19A and transmits it to the electronic device 1701.
  • the display interface of the electronic device 1701 and the electronic device 1702 is shown in FIG. 19B.
  • the large images of the main interface of the electronic device 1701 and the electronic device 1702 respectively show that the opposite end collects, cropped and zoomed images.
  • the small images are processed according to the video image processing method provided in this embodiment of the application to determine the main character to be cropped, Zoom into the image of its own display specification. It should be noted that when the electronic device displays the image collected by the local terminal, it can display the original image collected by the local terminal or it can be processed according to the video image processing method provided in the embodiment of this application to determine that the main character is cropped and scaled to its own display specifications. image.
  • the electronic device 1701 processes and determines the main character to be cropped and zoomed to the display specification of the electronic device 1702 according to the video image processing method provided in this embodiment of the application, as shown in FIG. 20(b).
  • the electronic device 1701 encodes the image shown in (b) of FIG. 20 and transmits it to the electronic device 1702. Meanwhile, at this moment, it is assumed that the position of the character in the collection scene of the electronic device 1702 is the same as that illustrated in FIG.
  • FIG. 20A the display interface of the electronic device 1701 and the electronic device 1702 is shown in FIG. 20A.
  • the large images of the main interface of the electronic device 1701 and the electronic device 1702 respectively show that the opposite end collects, cropped and zoomed images, and the small images are processed according to the video image processing method provided by this embodiment of the application to determine the main character to be cropped.
  • the characters increase.
  • the video image of the fixed field of view collected by the camera of the electronic device 1701 is shown in Figure 21 ( As shown in a), the electronic device 1701 processes and determines that the main character is cropped and scaled to the display specification of the electronic device 1702 according to the video image processing method provided in the embodiment of the present application, as shown in (b) of FIG. 21.
  • the electronic device 1701 encodes the image shown in (b) of FIG. 21 and transmits it to the electronic device 1702.
  • the collection scene of the electronic device 1702 is relative to FIG. 19A, and the position of the character changes.
  • the video image of the fixed field of view collected by the camera of the electronic device 1702 is shown in (a) of FIG. 21A, and the electronic device 1702 follows
  • the video image processing method provided by the embodiments of the present application processes to determine that the main character is cropped and scaled to the display specifications of the electronic device 1701 as shown in Figure 21A (b), and the electronic device 1702 is shown in Figure 21A (b)
  • the displayed image is encoded and sent to the electronic device 1701.
  • the display interface of the electronic device 1701 and the electronic device 1702 is shown in FIG. 21B.
  • the large images of the main interface of the electronic device 1701 and the electronic device 1702 respectively show that the opposite end collects, cropped and zoomed images.
  • the small images are processed according to the video image processing method provided in this embodiment of the application to determine the main character to be cropped, Zoom into the image of its own display specification.
  • the electronic device 1701 processes, according to the video image processing method provided by the embodiment of the present application, the image processed to determine the main character to be cropped and scaled to the display specification of the electronic device 1702 is shown in (b) of FIG. 22.
  • the electronic device 1701 encodes the image shown in (b) of FIG. 22 and transmits it to the electronic device 1702.
  • the display interface of the electronic device 1701 and the electronic device 1702 is shown in FIG. 22A.
  • the large images of the main interface of the electronic device 1701 and the electronic device 1702 respectively show that the opposite end collects, cropped and zoomed images, and the small images are processed according to the video image processing method provided in this embodiment of the application to determine the main character to be cropped.
  • the following takes a specific monitoring field as an example to describe in detail the video image processing method provided in the embodiment of the present application.
  • the monitoring system includes a camera 1, a server 2, and a display device 3.
  • the camera 1 is used to collect video images with a fixed field of view
  • the server 2 is used to process the video images collected by the camera 1 through the video image processing method provided in the embodiments of this application.
  • the processed video images can be displayed in real time on the display device 3.
  • the video image of may also be stored in a storage device in the server 2, and the server 2 reads the processed video image from the storage device when receiving a reading instruction and displays the processed video image on the display device 3.
  • the video image of a fixed field of view collected by the camera 1 is shown in (a) of FIG. 23, and the camera 1 sends the collected image to the server 2.
  • the server 2 processes, according to the video image processing method provided by the embodiment of the present application, an image determined to be cropped and scaled to the display specification of the display device 3 by the main character, as shown in (b) of FIG. 23.
  • the server 2 displays the image shown in (b) in FIG. 23 through the display device 3 in real time.
  • the server 2 stores the image shown in (b) in FIG. 23 in the storage device in the server 2.
  • the server 2 receives the instruction to read the video image, it reads the video image from the storage device and displays it on the display device 3.
  • the position of the person in the collection scene changes.
  • the fixed field of view video image collected by the camera 1 is shown in Figure 24 (a), and the camera 1 sends the collected image To server 2.
  • the server 2 processes, according to the video image processing method provided by the embodiment of the present application, an image that determines that the main character is cropped and scaled to the display specification of the display device 3 as shown in (b) of FIG. 24.
  • the server 2 displays the image shown in (b) in FIG. 24 through the display device 3 in real time.
  • the server 2 stores the image shown in (b) in FIG. 24 in the storage device in the server 2.
  • the server 2 receives the instruction to read the video image, it reads the video image from the storage device and displays it on the display device 3.
  • the server 2 processes and determines the image of the main character to be cropped and scaled to the display specification of the display device 3 according to the video image processing method provided by the embodiment of the present application, as shown in (b) of FIG. 25.
  • the server 2 displays the image shown in (b) in FIG. 25 through the display device 3 in real time.
  • the server 2 stores the image shown in (b) in FIG. 25 in the storage device in the server 2.
  • the server 2 receives the instruction to read the video image, it reads the video image from the storage device and displays it on the display device 3.
  • the characters in the collection scene increase and their positions change.
  • the fixed field of view video image captured by the camera 1 is shown in Figure 26 (a), and the camera 1 sends the captured image To server 2.
  • the server 2 processes, according to the video image processing method provided by the embodiment of the present application, an image determined to be cropped and scaled to the display specification of the display device 3 by the main character, as shown in (b) of FIG. 26.
  • the server 2 displays the image shown in (b) in FIG. 26 through the display device 3 in real time.
  • the server 2 stores the image shown in (b) in FIG. 26 in the storage device in the server 2.
  • the server 2 receives the instruction to read the video image, it reads the video image from the storage device and displays it on the display device 3.
  • an electronic device includes hardware structures and/or software modules corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiment of the present application may divide the electronic device into functional modules according to the foregoing method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • a video image processing apparatus 270 provided in this embodiment of the present application is used to implement the function of the electronic device in the foregoing method.
  • the video image processing device 270 may be an electronic device, a device in an electronic device, or a device that can be matched and used with an electronic device.
  • the video image processing device 270 may be a chip system.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the video image processing device 270 may include: an obtaining unit 2701, a determining unit 2702, a cropping unit 2703, and a scaling unit 2704.
  • the obtaining unit 2701 is used to perform S901 and S901a in FIG. 9 or FIG. 16, the determining unit 2702 is used to perform S902 in FIG. 9 or FIG. 16, the cropping unit 2703 is used to perform S903 in FIG. 9 or FIG. 16, and the scaling unit 2704 Used to execute S904 in Figure 9 or Figure 16.
  • the determining unit 2702 is used to perform S902 in FIG. 9 or FIG. 16
  • the cropping unit 2703 is used to perform S903 in FIG. 9 or FIG. 16
  • the scaling unit 2704 Used to execute S904 in Figure 9 or Figure 16.
  • the video image processing device 270 may further include a display unit 2705 for performing S905 in FIG. 16.
  • the video image processing device 280 is used to implement the function of the electronic device in the above method.
  • the video image processing device 280 may be an electronic device, a device in an electronic device, or a device that can be matched and used with an electronic device. Wherein, the video image processing device 280 may be a chip system.
  • the video image processing device 280 includes at least one processing module 2801, which is configured to implement the function of the electronic device in the method provided in the embodiment of the present application. Exemplarily, the processing module 2801 may be used to execute the processes S901, S901a, S902, S903, and S904 in FIG. 9 or FIG. 16. For details, please refer to the detailed description in the method example, which will not be repeated here.
  • the video image processing device 280 may also include at least one storage module 2802 for storing program instructions and/or data.
  • the storage module 2802 and the processing module 2801 are coupled.
  • the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • the processing module 2801 may cooperate with the storage module 2802.
  • the processing module 2801 may execute program instructions stored in the storage module 2802. At least one of the at least one storage module may be included in the processing module.
  • the video image processing device 280 may further include a communication module 2803 for communicating with other devices through a transmission medium, so as to determine that the device in the video image processing device 280 can communicate with other devices.
  • the video image processing device 280 may further include a display module 2804, which may be used to perform the process S905 in FIG. 16.
  • the processing module 2801 is a processor
  • the storage module 2802 is a memory
  • the display module 2804 is a display screen
  • the video image processing apparatus 280 involved in FIG. 28 in the embodiment of the present application may be the electronic device shown in FIG. 8.
  • the video image processing device 270 or the video image processing device 280 provided in the embodiments of the present application can be used to implement the functions of the electronic equipment in the methods implemented by the various embodiments of the present application.
  • the video image processing device 270 or the video image processing device 280 provided in the embodiments of the present application can be used to implement the functions of the electronic equipment in the methods implemented by the various embodiments of the present application.
  • the computer-readable storage medium may include computer software instructions.
  • the computer software instructions run on an electronic device, the electronic device executes 9 or 16 above. The various steps performed by the electronic device in the illustrated embodiment.
  • FIG. 9 Another embodiments of the present application also provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute each step performed by the electronic device in the embodiment shown in FIG. 9 or FIG. 16.
  • the electronic device includes a display screen and a camera.
  • the chip system includes an interface circuit and a processor; the interface circuit and the processor are interconnected by wires; the interface circuit is used to receive signals from the memory of the electronic device and send signals to the processor.
  • the signals include computer instructions stored in the memory; when the processor executes When the computer is instructed, the chip system executes each step performed by the electronic device in the embodiment shown in FIG. 9 or FIG. 16.
  • the disclosed device and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components may be divided. It can be combined or integrated into another device, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate parts may or may not be physically separate.
  • the parts displayed as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application are essentially or the part that contributes to the prior art, or all or part of the technical solutions can be embodied in the form of a software product, and the software product is stored in a storage medium. It includes several instructions to make a device (may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Studio Devices (AREA)

Abstract

本申请实施例公开了一种视频图像处理方法及装置,涉及图像处理领域,在视频采集显示过程中实现显示画面连续的画随人动。具体方案为:获取第i帧视频图像中每个人物的身份信息及位置信息;根据第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从第i帧视频图像中确定M个主体人物;其中,N个视频图像帧中的人物的身份信息包括M个主体人物的身份信息;根据主体人物的位置信息,裁剪第i帧视频图像,裁剪后的第i帧视频图像包括M个主体人物;将裁剪后的第i帧视频图像缩小或放大,以便显示屏按照预设显示规格显示裁剪后的第i帧视频图像。

Description

一种视频图像处理方法及装置
本申请要求于2019年08月31日提交国家知识产权局、申请号为201910819774.X、发明名称为“一种视频图像处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理领域,尤其涉及一种视频图像的处理方法及装置。
背景技术
随着图像技术的飞速发展,用户对视频画面的显示有了更高的需求。例如,视频通话过程中的视频画面显示、监控场景中的视频画面显示。常规的视频采集显示过程是由采集设备采集视频图像,按照显示规格对采集的视频图像进行相应的裁剪、缩放,然后编码发送至显示设备以进行显示。
通常采集显示基于固定硬件平台实现,由采集摄像头采集固定视野的视频图像。当采集端的人物位置变化时,由于采集摄像头对人物不感知,显示端的画面始终保持固定视野显示,达不到“画随人动”的效果,用户体验差。
基于此,业界将人物感知技术应用于图像采集显示过程,具体方案为:摄像头按固定视野进行大分辨率采集,对采集到的视频图像利用人体感知技术进行人体检测跟踪,实时定位出人物位置,在人物位置发生移动时,能根据实时定位的人物位置(移动后的人物位置)对大分辨率视频图像进行相应的裁剪、缩放,得到适应显示规格,且人物位于图像中特定区域的小分辨率图像,从而实现根据人物位置实时调整显示画面,达到“画随人动”的效果。
但是,当采集端设备环境复杂(例如,背景画面复杂或者有其他人物频繁进出画面)时,上述方法可能出现误检漏检导致某些帧定位出的人物位置不准确,裁剪缩放后的小分辨率图像中不能显示或者不能完整显示人物,使得呈现的主体人物的画面不连续。
发明内容
本申请提供一种视频图像处理方法及装置,在视频通话中实现显示画面连续的画随人动。
为了达到上述目的,本申请采用如下技术方案:
第一方面,提供一种视频图像处理方法,该方法可以包括:获取第i帧视频图像中每个人物的身份信息及位置信息;i大于1;根据第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从第i帧视频图像中确定M个主体人物;M、N大于或等于1;根据主体人物的位置信息,裁剪第i帧视频图像,裁剪后的第i帧视频图像包括M个主体人物;将裁剪后的第i帧视频图像缩小或放大,以便显示屏按照预设显示规格显示裁剪后的第i帧视频图像。
通过本申请提供的视频图像处理方法,在确定视频图像的主体人物时,结合了本帧图像的人物身份信息以及本帧之前的N个视频图像帧的人物身份信息,使得人物感 知过程的准确度大大提高,确定的主体人物位置的准确性相应提高,这样就能保证按照主体人物裁剪缩放后的小分辨率图像中能完整显示主体人物,以保证呈现的主体人物的画面连续,以实现在图像采集显示过程中通过软件的方式达到画面连续的画随人动。
其中,人物的身份信息用于在不同帧中唯一指示同一个人物,身份信息可以是通过检测跟踪算法得到的该人物的标志信息,即每个人物具有各自不同的特征信息。
第i帧视频图像为视频流中任一帧视频图像,i小于或等于视频流的总帧数。在执行本申请提供的视频图像处理方法时,对于视频流中每一帧图像均执行本申请提供的视频图像处理方法,保证每一帧图像裁剪后都能完整显示主体人物,其他不再一一赘述。
可选的,第i帧视频图像之前的N个视频图像帧,可以为视频流中与第i帧视频图像连续的前N个视频图像帧,或者,也可以为视频流中与第i帧视频图像不连续的前N个视频图像帧,或者,也可以为视频流中预设时间段内的视频图像帧。
其中,N个视频图像帧中的人物的身份信息包括该M个主体人物的身份信息,即该M个主体人物在前N个视频图像帧中出现过。具体的,一个人物是否在视频图像中出现,由人物的身份信息识别。
结合第一方面,在一种可能的实现方式中,根据第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从第i帧视频图像中确定M个主体人物,可以包括:根据第i帧视频图像中每个人物的身份信息和第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从第i帧视频图像中确定M个主体人物。
一种可能的实现方式中,可以将在前N个视频图像帧中出现过且出现在第i帧视频图像帧中,满足预设条件的人物确定为主体人物。其中,预设条件可以根据实际情况进行配置,本申请对此不予限定。
结合第一方面及上述任一种可能的实现方式,在另一种可能的实现方式中,根据第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从第i帧视频图像中确定M个主体人物,具体可以实现为:将在N个视频图像帧中出现的帧数大于等于第一预设阈值并且出现在第i帧视频图像中的人物确定为M个主体人物。通过累计帧数确定主体人物,避免了未参与视频通话的人员进出画面对人物识别的干扰,提高了人物识别的准确度。
具体的,确定第i帧视频图像中一个人物是否为主体人物的过程可以包括:统计该人物在N个视频图像帧中的累计出现帧数,若累计出现帧数大于或等于第一预设阈值,则该人物确定主体人物。一个视频图像帧中是否出现该人物,具体可以实现为:该视频图像帧中是否含有与该人物具有相同身份信息的人物。
其中,一个人物的累计出现帧数,为第i帧视频图像之前的N个视频图像帧中出现该人物的连续视频图像帧的数量;连续视频图像帧中可以包括S帧未出现该人物的视频图像帧;S大于或等于0,小于或等于预设帧数。
结合第一方面及上述任一种可能的实现方式,在另一种可能的实现方式中,本申请提供的视频图像处理方法还可以包括:将第i帧视频图像划分为Y个区域;配置每个区域对应的预设阈值;第k区域对应的预设阈值为第k预设阈值;第k区域为Y个 区域中任一个区域;Y大于或等于2;k大于或等于1,小于或等于Y。其中,不同区域对应的预设阈值可以不同。相应的,根据第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从第i帧视频图像中确定M个主体人物,具体实现为:将在N个视频图像帧中出现的帧数大于等于所在区域对应的预设阈值并且出现在第i帧视频图像中的人物确定为M个主体人物。通过为不同区域配置不同预设阈值,提高确定主体人物的准确度,进而提高了人物识别的准确度。
结合第一方面,在一种可能的实现方式中,上述方法还包括:获取第i帧视频图像中每个人物的人物信息,人物信息可以包括下述信息中一项或多项:是否讲话信息、优先级信息。相应的,根据第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从第i帧视频图像中确定M个主体人物,具体可以实现为:将在N个视频图像帧中讲话的帧数大于等于第二预设阈值并且出现在第i帧视频图像中的人物确定为M个主体人物。或者,将在N个视频图像帧中优先级信息大于第三预设阈值并且出现在第i帧视频图像中的人物确定为M个主体人物。或者,将在N个视频图像帧中讲话的帧数大于等于第二预设阈值并且出现在第i帧视频图像中的人物,按照优先级信息选择最重要的M个确定为M个主体人物。
其中,是否讲话信息用于指示视频图像中人物正在讲话或者未讲话。可以采用音频处理技术结合视频图像中人物的口型以获取人物是否讲话信息,或者,可以直接通过视频图像中人物的口型获取人物是否讲话信息。
优先级信息用于指示视频图像中人物的重要程度,可以预先配置使用设备的不同人的优先级信息与人物的身份信息对应。然后,在处理每一帧视频图像时,在获取到人物的身份信息时,查找预先配置的优先级信息,得到该人物的优先级信息。或者,可以接收用户对视频图像中不同人物输入的优先级信息。
结合第一方面及上述任一种可能的实现方式,在另一种可能的实现方式中,本申请提供的视频图像处理方法还可以包括:接收用户输入的优先级信息。以实现用户实时配置人物优先级等级,提高人物识别准确度。
结合第一方面及上述任一种可能的实现方式,在另一种可能的实现方式中,根据主体人物的位置信息,裁剪第i帧视频图像,具体可以实现为:确定裁剪框,该裁剪框包含M个主体人物的最小外接矩形框;以确定的裁剪框裁剪第i帧视频图像。
其中,裁剪框可以为M个主体人物的最小外接矩形框加裁剪余量,裁剪余量可以大于或等于0。
需要说明的是,裁剪框包含M个主体人物的最小外接矩形框可以理解为:确定的裁剪框尽量完整的包含M个主体人物的最小外接矩形框。
结合第一方面及上述任一种可能的实现方式,在另一种可能的实现方式中,确定裁剪框具体可以实现为:获取待选裁剪框的中心点与前一帧视频图像的裁剪框的中心点的距离,待选裁剪框包括M个主体人物的最小外接矩形框;若距离大于或等于距离阈值,将待选裁剪框扩大至与前一帧视频图像的裁剪框的中心点的距离小于预设阈值,将扩大后的待选裁剪框作为确定的裁剪框。
其中,待选裁剪框可以为M个主体人物的最小外接矩形框加裁剪余量,裁剪余量可以大于或等于0。
结合第一方面及上述任一种可能的实现方式,在另一种可能的实现方式中,确定裁剪框具体可以实现为:获取第一待选裁剪框的中心点与前一帧视频图像的裁剪框的中心点的距离,第一待选裁剪框包括M个主体人物的最小外接矩形框;若该距离大于或等于距离阈值,确定第二裁剪框,第二裁剪框的中心点为前一帧视频图像的裁剪框的中心点加偏移量,第二裁剪框的大小与前一帧视频图像的裁剪框的大小相同;若第二裁剪框包含M个主体人物的最小外接矩形框,将第三裁剪框作为裁剪框;其中,第三裁剪框为第二裁剪框,或者,第三裁剪框为第二裁剪框缩小至包含最小外接矩形框的裁剪框;若第二裁剪框未完整包含最小外接矩形框,将第二裁剪框扩大至包含最小外接矩形框,将扩大后的第二裁剪框作为裁剪框。
其中,偏移量可以为预设值,或者,也可以为第一待选裁剪框的中心点与前一帧视频图像的裁剪框的中心点的距离乘以加权值,或者其他。
结合第一方面及上述任一种可能的实现方式,在另一种可能的实现方式中,当人物信息包括优先级信息时,待选裁剪框或者第一待选裁剪框可以为以M个主体人物中优先级最高的人物为中心且包含M个主体人物的外接矩形框加裁剪余量。
结合第一方面及上述任一种可能的实现方式,在另一种可能的实现方式中,当人物信息包括是否讲话信息时,待选裁剪框或者第一待选裁剪框可以为以M个主体人物中讲话人物为中心且包含M个主体人物的外接矩形框加裁剪余量。
结合第一方面及上述任一种可能的实现方式,在另一种可能的实现方式中,本申请提供的视频图像处理方法还可以包括:按照预设显示规格显示裁剪后的第i帧视频图像。其中,预设显示规格可以为适应显示屏的规格,或者,也可以为预设的显示屏占比。
结合第一方面及上述任一种可能的实现方式,在另一种可能的实现方式中,本申请提供的视频图像处理方法还可以包括:保存第i帧视频图像中每个人物下述信息中至少一项:身份信息、位置信息、人物信息。
结合第一方面及上述任一种可能的实现方式,在另一种可能的实现方式中,本申请提供的视频图像处理方法还可以包括:获取第j帧视频图像,j小于或等于X,X大于1;获取并保存第j帧视频图像中每个人物的身份信息及位置信息;将第j帧视频图像直接缩小为预设显示规格的图像。
结合第一方面或上述任一种可能的实现方式,在另一种可能的实现方式中,本申请提供的视频图像处理方法应用于视频通话中的发送端设备,本申请提供的视频图像处理方法还可以包括:向接收端设备发送缩小或放大后的第i帧视频图像。
第二方面,本申请提供一种视频图像处理装置,该装置可以是电子设备,也可以是电子设备中的装置或者芯片系统,或者是能够和电子设备匹配使用的装置。该视频图像处理装置可以实现上述各方面或者各可能的设计中所执行的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个上述功能相应的模块。如:该视频图像处理装置可以包括:获取单元、确定单元、裁剪单元及缩放单元。
其中,获取单元,用于获取第i帧视频图像中每个人物的身份信息及位置信息;i大于1;确定单元,用于根据第i帧视频图像之前的N个视频图像帧中人物的身份信 息,从第i帧视频图像中确定M个主体人物;M、N大于或等于1;其中,N个视频图像帧中的人物的身份信息包括M个主体人物的身份信息;裁剪单元,根据主体人物的位置信息,裁剪第i帧视频图像,裁剪后的第i帧视频图像包括M个主体人物;缩放单元,将裁剪后的第i帧视频图像缩小或放大,以便显示屏按照预设显示规格显示裁剪后的第i帧视频图像。
需要说明的是,第二方面提供的视频图像处理装置,用于执行上述第一方面提供的视频图像处理方法,具体实现可以参考上述第一方面的具体实现。
第三方面,本申请实施例提供一种电子设备,该电子设备可以包括:处理器,存储器;处理器,存储器耦合,存储器可用于存储计算机程序代码,计算机程序代码包括计算机指令,当计算机指令被电子设备执行时,使得电子设备执行如第一方面或任一种可能的实现方式面所述的视频图像处理方法。
第四方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质可以包括:计算机软件指令;当计算机软件指令在电子设备中运行时,使得该电子设备执行如第一方面或第一方面的可能实现方式中任一项所述的视频图像处理方法。
第五方面,本申请实施例提供一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行如权利要求第一方面或任一种可能的实现方式中任一项所述的视频图像处理方法。
第六方面,本申请实施例提供一种芯片系统,该芯片系统应用于电子设备;芯片系统包括接口电路和处理器;接口电路和处理器通过线路互联;接口电路用于从电子设备的存储器接收信号,并向处理器发送信号,信号包括存储器中存储的计算机指令;当处理器执行该计算机指令时,芯片系统执行如第一方面或任一种可能的实现方式中任一项所述的视频图像处理方法。
第七方面,本申请实施例提供一种图形用户界面(graphical user interface,GUI),该图形用户界面存储在电子设备中,该电子设备包括显示器、存储器、一个或多个处理器;一个或多个处理器用于执行存储在存储器中的一个或多个计算机程序,该图形用户界面包括:显示在所述显示器上的GUI,该GUI包括视频画面,该视频画面中包括经上述第一方面或任一种可能的实现方式处理后的第i帧视频图像,该视频画面是其他电子设备(如称为第二电子设备)传输给该电子设备的,第二电子设备包括显示屏和摄像头。
应当理解的是,本申请中对技术特征、技术方案、有益效果或类似语言的描述并不是暗示在任意的单个实施例中可以实现所有的特点和优点。相反,可以理解的是对于特征或有益效果的描述意味着在至少一个实施例中包括特定的技术特征、技术方案或有益效果。因此,本说明书中对于技术特征、技术方案或有益效果的描述并不一定是指相同的实施例。进而,还可以任何适当的方式组合本实施例中所描述的技术特征、技术方案和有益效果。本领域技术人员将会理解,无需特定实施例的一个或多个特定的技术特征、技术方案或有益效果即可实现实施例。在其他实施例中,还可在没有体现所有实施例的特定实施例中识别出额外的技术特征和有益效果。
附图说明
图1为本申请实施例提供的一种视频场景示意图;
图2为本申请实施例提供一种视频通话场景的系统架构示意图;
图3为本申请实施例提供的一种视频图像示意图;
图4为本申请实施例提供的一种视频图像处理示意图;
图5为本申请实施例提供的一种视频图像处理结果的示意图;
图6为本申请实施例提供的又一种视频图像处理结果的示意图;
图7为本申请实施例提供的一种视频监控场景的系统架构示意图;
图8为本申请实施例提供的一种电子设备的结构示意图;
图9为本申请实施例提供的一种视频图像处理方法的流程示意图;
图10为本申请实施例提供的一种视频通话界面示意图;
图11为本申请实施例提供的另一种视频通话界面示意图;
图12为本申请实施例提供的又一种视频通话界面示意图;
图13为本申请实施例提供的又一种视频图像处理示意图;
图14为本申请实施例提供的又一种视频图像处理示意图;
图15为本申请实施例提供的又一种视频图像处理示意图;
图16为本申请实施例提供的另一种视频图像处理方法的流程示意图;
图17A为本申请实施例提供的又一种视频通话界面示意图;
图17B为本申请实施例提供的又一种视频通话界面示意图;
图18A为本申请实施例提供的又一种视频通话界面示意图;
图18B为本申请实施例提供的又一种视频通话界面示意图;
图18C为本申请实施例提供的又一种视频通话界面示意图;
图19为本申请实施例提供的又一种视频图像处理示意图;
图19A为本申请实施例提供的又一种视频图像处理示意图;
图19B为本申请实施例提供的又一种视频通话界面显示图;
图20为本申请实施例提供的又一种视频图像处理示意图;
图20A为本申请实施例提供的又一种视频通话界面显示图;
图21为本申请实施例提供的又一种视频图像处理示意图;
图21A为本申请实施例提供的又一种视频图像处理示意图;
图21B为本申请实施例提供的又一种视频通话界面显示图;
图22为本申请实施例提供的又一种视频图像处理示意图;
图22A为本申请实施例提供的又一种视频通话界面显示图;
图23为本申请实施例提供的一种监控场景视频图像处理示意图;
图24为本申请实施例提供的又一种监控场景视频图像处理示意图;
图25为本申请实施例提供的又一种监控场景视频图像处理示意图;
图26为本申请实施例提供的又一种监控场景视频图像处理示意图;
图27为本申请实施例提供的一种视频图像处理装置的结构示意图;
图28为本申请实施例提供的又一种视频图像处理装置的结构示意图。
具体实施方式
基于此,本申请提出一种新的业务传输方法,用于在UE配置的多载波支持不同的numerology时,调整载波的功率,其基本原理是:UE使用多个载波传输业务时, 一个载波中的功率调整位置使用基站配置或指示的发射功率发送信号,其余载波的功率调整位置使用小于或等于基站配置或指示的发射功率发射信号。不同载波的功率调整位置时间资源上有重叠。本申请仅调整功率调整位置,功率调整位置可以根据实际需求灵活配置,已经调度的业务也可以降功率,以保证临时突发业务也能保证其发射功率。
本申请说明书和权利要求书及上述附图中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于限定特定顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
为了便于理解,先对本申请涉及的名词进行解释。
视频流,可以指视频业务中传输的数据,即视频通话、视频会议或监控场景中动态的连续图像序列。
视频图像,可以指静态画面,视频流中的每一帧图像称之为视频图像。
人物,可以指视频图像中活动或静止的人。当然,本申请的应用场景中,不仅可以适用于视频图像中活动或静置的人,还可以适用于视频图像中其他主体对象,例如活动或静动物或者其他事物。下面将以视频图像中的人物作为举例进行说明,不应造成应用场景的限定。
身份信息,可以指视频图像中通过人体检测跟踪算法识别出的每个人物的特征标识,用于在不同帧中唯一标识同一个人物,以区分不同的人物个体。身份信息可以包括但不限于样貌信息、标注信息、或者其他识别出来的特征信息。身份信息的表达形式可以包括文字、序号、人物编号或者其他与个体特征有关信息。
位置信息,可以用于指示视频图像中人物在该视频图像中的相对位置或者区域。位置信息的形式可以为视频图像中人物的一个或多个点的像素位置,或者人物轮廓的像素位置,或者人物所在区域的像素位置等。像素位置可以通过像素坐标或者其他指示。位置信息是用于指示人物在视频图像中的相对位置,并不局限于具体地点。
人物信息,可以是指通过识别算法或者标记算法获取的视频图像中每个人物的附加信息,以更好的进行人物识别、确定主体人物。该人物信息可以包括但不限于下述信息中一项或多项:人物是否讲话信息、人物优先级信息等。
目前,在视频采集显示过程中为了实现画随人动,业界有两种方案。
一种是硬件实现方案,采用带有云台的摄像头,并辅助额外的人物定位设备(如通过语音定位出说话者位置)定位人物位置,然后控制云台将摄像头指向说话者的方向进行采集。云台摄像头的硬件方案,体积大、成本高,不利于大规模普及。
另一种是软件算法实现方案,摄像头按固定视野进行大分辨率采集,人物检测跟踪算法实时定位出人物位置,然后根据定位出的人物位置对大分辨率图像进行相应的裁剪、缩小或放大(缩放),得到既定规格的小分辨率图像。但软件方案可能存在误检、漏检等缺陷,若定位后直接裁剪,人物感知的准确性不高,最终显示画面的连续性将难以保证。
基于此,本申请实施例提供一种视频图像处理方法,以通过软件方式实现呈现的主体人物的画面连续的画随人动,该方法可以应用于电子设备。在本实施例提供的方法中,对视频图像进行处理定位出人物后,结合当前帧以及历史帧的人物身份信息确定主体人物,按照主体人物裁剪缩放采集的当前帧视频图像。使得人物感知过程的准确度大大提高,确定的主体人物位置的准确性相应提高,这样就能保证按照主体人物裁剪缩放后的小分辨率图像中能完整显示主体人物,以保证呈现的主体人物的画面连续,以实现在图像采集显示过程中通过软件的方式达到画面连续的画随人动。
下面将结合附图对本申请实施例的实施方式进行详细描述。
本申请实施例提供的视频图像处理方法可以应用于电子设备的视频图像采集显示过程。该图像采集显示过程可以是视频通话(视频会议)场景或视频监控场景中或者其他。示例性的,视频图像采集显示过程为视频通话场景时,如图1所示,用户A使用电子设备1,用户B使用电子设备2,用户A与用户B进行视频通话。
图2为本申请实施例提供的一种上述视频图像处理方法应用于视频通话场景中的系统架构示意图。如图2所示,该系统架构可以包括发送端设备201及接收端设备202。
具体的,发送端设备201可以作为视频通话的一端,与接收端设备202进行通话。例如,一个或多个用户1可通过发送端设备201与接收端设备202的一个或多个用户2进行通话。
其中,本实施例中的通话可以是指视频通话,或视频会议。因此,发送端设备201至少包括摄像头和显示屏,接收端设备202也至少包括摄像头和显示屏。另外,发送端设备201、接收端设备202还可以包括听筒(或喇叭),话筒等。摄像头可用于采集通话过程中的视频图像。显示屏可用于显示通话过程中的图像。听筒(或喇叭)用于播放通话过程中的语音。话筒用于采集通话过程中的语音。
具体的,如图2所示,发送端设备201包括视频采集器2011、视频前处理器2012、视频编码器2013、发送器2014。接收端设备202包括视频显示器2021、视频后处理器2022、视频解码器2023、接收器2024。
其中,图2示意的系统架构的工作流程为:发送端设备201中的视频采集器2011对视频通话中的视频图像逐帧进行视频图像采集,将采集到的视频图像传给视频前处理器2012进行相应地预处理(包括但不限于:人物识别、裁剪、缩放等),然后经视频编码器2013进行编码后传给发送器2014,发送器2014将编码后的视频图像通过有线或无线介质发送给接收端设备202的接收器2024,接收器2024将接收到的视频图像传给视频解码器2023进行解码,解码后的视频图像经视频后处理器2022的处理后传给视频显示器2021进行显示。
示例性的,本申请实施例中所述的电子设备可以是电视机、手机、平板电脑、桌面型、膝上型、手持计算机、笔记本电脑(如华为笔记本电脑)、台式电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)\虚拟现实(virtual reality,VR)设备等包括或连接有显示屏和摄像头的设备,本申请实施例对该设备的具体形态不作特殊限制。
另外,在一些实施例中,上述发送端设备201、接收端设备202可以为相同类型 的电子设备,如发送端设备201、接收端设备202均为电视机。在其他一些实施例中,上述发送端设备201、接收端设备202可以为不同类型的电子设备,如发送端设备201为电视机,接收端设备202为笔记本电脑。此处结合具体示例,对视频通话或视频会议中的视频图像传输过程进行示例说明。
例如,在图1所示的场景中,假设电子设备1为发送端设备、电子设备2为接收端设备。在某一时刻其摄像头采集的固定视野的视频图像可以如图3所示。电子设备1对图3所示的视频图像采用人物检测跟踪算法识别出人物的身份信息和位置信息。例如,该位置信息可以如图4所示的坐标。其中,此处的坐标,示例为人物中每个关键点的具体坐标,该关键点可以包括但不限于:头部、肩膀、胳膊、手、腿、脚、眼睛、鼻子、嘴巴及衣服等。图4中将坐标示意为不同的点,每个坐标点具有在视频图像中的确定坐标值。电子设备1确定识别出的人物的最小外接矩形框如图4所示。假设电子设备2的显示规格宽w高h的分辨率图像,电子设备1以最小外接矩形框裁为中心,按照电子设备2的显示规格的宽高比例,裁剪图3所示的视频图像,得到图5所示的裁剪结果。电子设备1将图5所示的裁剪结果,缩放为宽w高h的分辨率图像如图6所示。具体的缩放过程为:若裁剪结果的分辨率小于宽w高h,则进行放大;若裁剪结果的分辨率大于宽w高h,则进行缩小。
图7为本申请实施例提供的一种上述视频图像处理方法应用于视频监控场景的系统架构示意图。如图7所示,该系统架构可以包括采集设备701、处理设备702、存储设备703、显示设备704。
需要说明的是,图7中示意的系统架构中包括的设备可以集中部署,也可以分布式部署。图7中示意的系统架构中包括的设备可以部署在至少一个电子设备中。
其中,图7示意的系统架构的工作流程为:采集设备701对逐帧进行视频图像采集,将采集到的视频图像传给处理设备702进行相应地预处理(包括但不限于:人物识别、裁剪、缩放等)后存储至存储设备703。显示设备704从存储设备703中获取视频图像并显示。
图8为本申请实施例提供的一种电子设备的结构示意图。上述发送端设备201、接收端设备202、图7中示意的系统架构中包括的设备所在的电子设备的结构可以如图8所示。
如图8所示,电子设备可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中,传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本实施例示意的结构并不构成对电子设备的具体限定。在另一些实施例中,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者拆 分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,神经网络处理器(neural-network processing unit,NPU)等中的一个或多个。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以是电子设备的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,SIM接口,USB接口等中的一个或多个。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142,充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备上的包括第二代手机通信技术(the second generation mobile communication technology,2G)/第三代手机通信技术(the 3rd  generation mobile communication technology,3G)/第四代手机通信技术(the 4th generation mobile communication technology,4G)/第五代手机通信技术((the 5th generation mobile communication technology,5G)等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备可以通过无线通信技术与网络以及其他设备通信。例如,电子设备可以通过天线1和移动通信模块150与其他电子设备进行视频通话或视频会议。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,IR技术等中的一个或多个。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS),星基增强系统(satellite based augmentation systems,SBAS)等中的一个或多个。
电子设备通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备可以包括1个或N个显示屏194,N为大于1的正整数。例如,在本申请实施例中,在用户利用电子设备与其他电子设备的用户进行视频通话或视频会议的过程中,显示屏194可以显示视频接听界面,或视频提醒界面,或视频通话界面,或视频监控界面(如包括对端设备发送的视频图像,本设备采集到的视频图像)。
电子设备可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。例如,在本申请实施例中,摄像头193可用于采集视频通话或视频会议过程中的视频图像。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备可以包括1个或N个摄像头193,N为大于1的正整数。在本实施例中,该摄像头193可以采用隐藏式方式设置在电子设备中,也可以不采用隐藏式方式设置,本实施例在此不做具体限制。
数字信号处理器用于处理数字信号。例如,对数字视频图像采用人体监测跟踪算法,确定出视频图像中的主体人物后对视频图像进行相应的裁剪、缩放,得到适应接收端设备显示规格的图像等。
视频编解码器用于对数字视频压缩或解压缩。电子设备可以支持一种或多种视频编解码器。这样,电子设备可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备的智能认知等应用,例如:图像识别,人脸识别,语音识 别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备的各种功能应用以及数据处理。例如,在本申请实施例中,处理器110可以通过执行存储在内部存储器121中的指令,对视频图像进行处理定位出人物后,结合当前帧人物信息以及历史帧人物信息确定主体人物,按照主体人物裁剪缩放采集的当前帧视频图像,保证接收端设备显示画面连续,以实现在视频通话中显示画面连续的画随人动。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。在本实施例中,内部存储器121还可用于存储摄像头193采集到的原大分辨率视频图像、经过处理器110人物识别、人物筛选、裁剪缩放的小分辨率视频图像,以及每一帧视频图像的人物信息等。
电子设备可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如通话,音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息或需要通过语音助手触发电子设备执行某些功能时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备可以设置至少一个麦克风170C。在另一些实施例中,电子设备可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备根据压力传感器180A检测所述触摸操作强度。电子设备也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备是翻盖机时,电子设备可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备在各个方向上(一般为三轴)加速度的大小。当电子设备静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备通过发光二极管向外发射红外光。电子设备使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备附近有物体。当检测到不充分的反射光时,电子设备可以确定电子设备附近没有物体。电子设备可以利用接近光传感器180G检测用户手持电子设备贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备对电池142加热,以避免低温导致电子设备异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备可以接收按键输入,产生与电子设备的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备的接触和分离。电子设备可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备中,不能和电子设备分离。
以下实施例中的方法均可以在具有上述硬件结构的电子设备中实现。
图9为本申请实施例提供的一种视频图像处理方法的流程示意图。在本申请中,电子设备对视频通话或视频监控中的视频流逐帧处理,每获取一帧视频图像则按照本申请提供的图像处理方法进行处理,电子设备对于每一帧图像的处理方式相同,下述实施例仅描述电子设备处理第i帧视频图像的详细过程,其他不再一一赘述。第i帧视频图像为视频流中任一帧视频图像。如图9所示,该方法可以包括:
S901、电子设备获取第i帧视频图像中每个人物的身份信息及位置信息。
其中,i大于1,i小于或等于视频流的总帧数。
例如的,i可以大于或等于X,X为预先配置的视频流中开始执行本申请实施例提供的视频图像处理方法的帧数门限值。
具体的,在S901中,电子设备可以采用人体检测跟踪算法来识别第i帧视频图像中人物,识别出的人物为一个或多个,在识别人物的同时可以获取到每个人物的身份信息及位置信息。
需要说明的是,人体检测跟踪算法是一种图像处理技术,用于识别图像中的人物,本申请实施例对于人体检测跟踪算法的具体实现不进行限定。例如,人体检测跟踪算法可以为YOLO算法或者SSD算法或者其他。
具体的,一个人物的身份信息可以用于在不同帧中唯一指示同一个人物,身份信息可以是通过检测跟踪算法得到的该人物的标志信息,即每个人物具有各自不同的特征信息。或者,身份信息也可以是特征信息对应的人物编号。
人物的位置信息可以为人物的一个或多个关键点在视频图像中的唯一坐标值。
进一步的,如图16所示,本申请实施例提供的视频处理方法还可以包括S901a。
S901a、电子设备获取第i帧视频图像中每个人物的人物信息。
其中,人物信息可以包括下述信息中一项或多项:是否讲话信息、优先级信息。在实际应用中,对于人物信息包括的内容可以不受本文内容局限,根据实际需求配置。
其中,是否讲话信息用于指示视频图像中人物正在讲话或者未讲话。可以采用音频处理技术结合视频图像中人物的口型以获取人物是否讲话信息,或者,可以直接通过视频图像中人物的口型获取人物是否讲话信息。
优先级信息用于指示视频图像中人物的重要程度,可以预先配置使用设备的不同人的优先级信息与人物的身份信息对应。然后,在处理每一帧视频图像时,在获取到人物的身份信息时,查找预先配置的优先级信息,得到该人物的优先级信息。或者,可以接收用户对视频图像中不同人物输入的优先级信息。或者,优先级信息可以通过是否讲话信息转换得到,例如,讲话的人物的优先级高于未讲话的人物的优先级,讲话时间长的人物的优先级高于讲话时间短的人物。
示例性的,电子设备中存储了不同人物的照片信息以及对应的优先级信息,在进行视频图像处理时,若视频图像中识别出的人物与存储的某一照片相似度大于相似门限,则将存储的照片对应的优先级信息作为识别出的人物的优先级信息。
其中,电子设备中存储的不同人物的照片信息以及对应的优先级信息,可以由用户进入电子设备的功能配置界面,人工输入不同人物的照片及优先级信息,由电子设备存储;或者,可以由电子设备记录历史视频采集显示过程中得到的不同人物的照片信息以及对应的优先级信息;或者,可以由用户人工输入不同人物的照片及优先级信 息,同时电子设备在每次进行视频采集显示时,动态更新不同人物的照片及对应的优先级信息。
可选的,当优先级信息由电子设备的用户输入时,本申请提供的视频图像处理方法还可以包括:接收用户输入的优先级信息。
此处通过示例对用户输入优先级信息的过程进行说明。
例如,用户给视频图像中识别出的某一人物配置优先级信息时,可以在电子设备的屏幕长按以显示配置菜单,进行配置。如图10所示,假设电子设备采集的视频图像为图10中的画面,用户长按该画面中某一人物的位置(图10中手指位置用于指示用户长按的位置,仅为示例不构成限定),电子设备向用户显示图11中示意的配置菜单,用户在图11示意的配置菜单中可以选择选择“配置人物优先级信息”进行优先级配置。当用户选择图11中“配置人物优先级信息”时,电子设备显示图12所示的交互界面,用户在该界面输入该人物的优先级信息,电子设备同时抓取该人物的照片,将照片与用户在图12界面输入的重要程度记录存储。
S902、电子设备根据第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从第i帧视频图像中确定M个主体人物。
其中,N个视频图像帧中的人物的身份信息包括M个主体人物的身份信息。应理解,该N个视频图像帧中的所有的人物的身份信息包括该M个主体人物的身份信息,即,该M个主体人物在前N个视频图像帧中出现过。
其中,第i帧视频图像之前的N个视频图像帧中的人物的身份信息,是由电子设备执行S901处理相应的视频图像获取之后保存的,具体过程与S901相同,不再赘述。
具体的,N大于或等于1。可选的,N可以小于或等于i-1。在实际应用中,N的具体取值可以根据实际需求配置。
可选的,第i帧视频图像之前的N个视频图像帧,可以为视频流中与第i帧视频图像相邻的前N个视频图像帧,或者,也可以为视频流中与第i帧视频图像不相邻的前N个视频图像帧,或者,也可以为视频流中预设时间段内的视频图像帧。本申请实施例对于第i帧视频图像之前的N个视频图像帧在视频流中的具体位置不进行限定。
一种可能的实现中,在处理一个视频流的过程中,N的取值也可以为动态值,当i小于配置门限时,N取等于i-1,当i大于配置门限时,N取小于i-1的固定值,当i等于配置门限时,N可以取等于i-1,也可以取小于i-1的固定值,本申请不进行具体限定。
其中,N取小于i-1的固定值时,固定值的具体取值可以根据经验配置,本申请不进行具体限定。
其中,M可以为一个或多个。本申请实施例对于M的取值不进行具体限定。
一种可能的实现中,M可以为每个视频图像帧中确定的主体人物的总数量。
另一种可能的实现中,M可以为预先配置的固定值。
一种可能的实现方式中,S902可以实现为:电子设备根据第i帧视频图像中每个人物的身份信息和第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从第i帧视频图像中确定M个主体人物。例如,电子设备可以将第i帧视频图像中每个人物的身份信息,与第i帧视频图像之前的N个视频图像帧中的人物的身份信息进行比 对,确定第i帧视频图像之前的N个视频图像帧中的人物的身份信息与第i帧视频图像中每个人物的身份信息的匹配部分对应的人物,作为待选人物,再从待选人物中确定主体人物。
具体的,在S902中电子设备可以将在前N个视频图像帧中出现过(根据身份信息识别)且出现在第i帧视频图像帧中,且满足预设条件的人物确定为主体人物。其中,该预设条件可以根据实际情况进行配置,本申请对此不予限定。例如,预设条件可以为人物在前N个视频图像帧中出现过的帧数大于或等于阈值。
具体的,S902可以通过但不限于下述几种可能的实现。
实现1、电子设备将在N个视频图像帧中出现的帧数大于等于第一预设阈值并且出现在第i帧视频图像中的人物确定为M个主体人物。
具体的,确定第i帧视频图像中一个人物是否为主体人物的过程可以包括:统计该人物在N个视频图像帧中的累计出现帧数,若累计出现帧数大于或等于第一预设阈值,则该人物确定主体人物。一个视频图像帧中是否出现该人物,具体可以实现为:该视频图像帧中是否含有与该人物具有相同身份信息的人物。
其中,一个人物的累计出现帧数,为第i帧视频图像之前的N个视频图像帧中出现该人物的连续视频图像帧的数量;连续视频图像帧中可以包括S帧未出现该人物的视频图像帧;S大于或等于0,小于或等于预设帧数。
实现2、电子设备将第i帧视频图像划分为Y个区域;配置每个区域对应的预设阈值;第k区域对应的预设阈值为第k预设阈值;第k区域为Y个区域中任一个区域;Y大于或等于2;k大于或等于1,小于或等于Y。将在N个视频图像帧中出现的帧数大于等于所在区域对应的预设阈值并且出现在第i帧视频图像中的人物确定为M个主体人物。
其中,在实现2中,不同区域对应的预设阈值可以不同。
例如,Y等于3,将视频图像划分为图13所示左中右3个预设区域,分别记录为区域1、区域2及区域3,为每个区域配置的预设阈值分别记录为阈值1、阈值2、阈值3,阈值1、阈值2、阈值3不同。那么,若第i帧视频图像中识别出人物A位于区域2,人物A的累计出现帧数大于阈值2,则将人物A确定为主体人物。若第i帧视频图像中识别出人物B位于区域3,人物B的累计出现帧数小于阈值3,则人物B不是主体人物。
需要说明的是,Y也可以为1,此时实现2的具体实现与上述实现1相同,不再赘述。
实现3、对应于S901a获取了第i帧视频图像中每个人物的人物信息,S902具体实现为:
将在N个视频图像帧中讲话的帧数大于等于第二预设阈值并且出现在第i帧视频图像中的人物确定为M个主体人物。或者,将在N个视频图像帧中优先级信息大于第三预设阈值并且出现在第i帧视频图像中的人物确定为M个主体人物;或者,将在N个视频图像帧中讲话的帧数大于等于第二预设阈值并且出现在第i帧视频图像中的人物,按照优先级信息选择最重要的M个确定为M个主体人物。
需要说明的是,上述各个预设阈值的取值,可以根据实际需求配置,本申请实施 例对此不进行具体限定。累计出现帧数也可以转换为累计出现时长,相应的预设阈值的内容则可以为时间阈值。
S903、电子设备根据主体人物位置信息,裁剪第i帧视频图像。
其中,裁剪后的第i帧视频图像包括M个主体人物,应理解,裁剪后的第i帧视频图像可以完整显示M个主体人物。
具体的,电子设备根据主体人物位置信息,裁剪第i帧视频图像,具体可以实现为:确定裁剪框,该裁剪框包含M个主体人物的最小外接矩形框;以裁剪框裁剪第i帧视频图像。
其中,裁剪框的长宽比应适应预设显示规格。
需要说明的是,裁剪框包含M个主体人物的最小外接矩形框可以理解为:确定的裁剪框尽量完整的包含M个主体人物的最小外接矩形框。
可选的,确定裁剪框的具体实现可以包括但不限于下述几种实现方案。
实现方案1、电子设备将待选裁剪框确定为裁剪框。
一种可能的实现中,待选裁剪框可以为M个主体人物的最小外接矩形框加裁剪余量,裁剪余量可以大于或等于0。
例如,电子设备将最小外接矩形框作为确定的裁剪框裁剪视频图像的具体过程,可以参照图4及图5示意。
另一种可能的实现中,当人物信息包括优先级信息时,待选裁剪框可以为以M个主体人物中优先级最高的人物为中心且包含M个主体人物的外接矩形框加裁剪余量。
例如,图14示意了确定的裁剪框为以M个主体人物中优先级最高的人物为中心且包含M个主体人物的外接矩形框,裁剪第i帧视频图像以完整显示主体人物的场景。
另一种可能的实现中,当人物信息包括是否讲话信息时,待选裁剪框可以为以M个主体人物中讲话人物为中心且包含M个主体人物的外接矩形框加裁剪余量。
例如,图15示意了确定的裁剪框为以M个主体人物中讲话人物为中心且包含M个主体人物的外接矩形框,裁剪第i帧视频图像以完整显示主体人物的场景。
当然,待选裁剪框的范围可以根据实际需求配置,本申请实施例对此不进行具体限定。
实现方案2、电子设备根据第一待选裁剪框及前一帧视频图像的裁剪框确定第i帧视频图像的裁剪框。
其中,实现方案2中的第一待选裁剪框与实现方案1中的待选裁剪框相同。
具体的,在实现方案2中,电子设备先获取第一待选裁剪框的中心点与前一帧视频图像的裁剪框的中心点的距离,第一待选裁剪框包括M个主体人物的最小外接矩形框;若该距离大于或等于距离阈值,确定第二裁剪框,第二裁剪框的中心点为前一帧视频图像的裁剪框的中心点加偏移量,第二裁剪框的大小与前一帧视频图像的裁剪框的大小相同;若第二裁剪框包含M个主体人物的最小外接矩形框,将第三裁剪框作为裁剪框;其中,第三裁剪框为第二裁剪框,或者,第三裁剪框为第二裁剪框缩小至包含最小外接矩形框的裁剪框;若第二裁剪框未完整包含最小外接矩形框,将第二裁剪框扩大至包含最小外接矩形框,将扩大后的第二裁剪框作为裁剪框。
其中,偏移量可以为预设值,或者,也可以为第一待选裁剪框的中心点与前一帧 视频图像的裁剪框的中心点的距离乘以加权值,或者根据预设算法得到,本申请实施例对此不进行具体限定。
示例性的,将待选裁剪框扩大或缩小可以实现为:将待选裁剪框的一个或多个边向外扩大或向内缩小。
进一步的,若该距离小于距离阈值,电子设备可以直接将待选裁剪框作为确定的裁剪框。
其中,待选裁剪框的中心点与前一帧视频图像的裁剪框的中心点的距离,可以为直线距离或者其他,本申请实施例对此不进行具体限定。
S904、电子设备将裁剪后的第i帧视频图像缩小或放大。
具体的,电子设备执行S904,以便显示屏按照预设显示规格显示裁剪后的第i帧视频图像。在S904中,电子设备按照预设显示规格对S903中裁剪后的第i帧视频图像进行缩小或放大。
其中,预设显示规格可以为适应显示屏幕的规格,或者固定的屏占比。
例如,若S903中裁剪后的第i帧视频图像的分辨率小于预设显示规格,则S904中电子设备将裁剪后的第i帧视频图像放大为预设显示规格的图像;若S903中裁剪后的第i帧视频图像的分辨率大于预设显示规格,则S904中电子设备将裁剪后的第i帧视频图像缩小为预设显示规格的图像;若S903中裁剪后的第i帧视频图像的分辨率等于预设显示规格,则S904中电子设备将裁剪后的第i帧视频图像作为预设显示规格的图像。
进一步的,在S904之后,电子设备对后续帧视频图像,可以继续执行S901至S904的过程,即将i+1遍历视频流中每一帧视频图像,逐帧进行处理,获取一帧处理一帧,直至视频流结束。
通过本申请提供的视频图像处理方法,在确定视频图像的主体人物时,结合了本帧图像的人物身份信息以及本帧之前的N个视频图像帧的人物身份信息,使得人物感知过程的准确度大大提高,确定的主体人物位置的准确性相应提高,这样就能保证按照主体人物裁剪缩放后的小分辨率图像中能完整显示主体人物,以保证呈现的主体人物的画面连续,以实现在图像采集显示过程中通过软件的方式达到画面连续的画随人动。
进一步的,本申请提供的视频图像处理方法还可以包括:电子设备获取第j帧视频图像,j小于或等于X;X大于1。获取并保存第j帧视频图像中每个人物的身份信息和/或位置信息;将第j帧视频图像直接缩小为预设显示规格的图像。其中,第j帧视频图像的身份信息和/或位置信息可以作为后续帧视频图像的参考信息。
当然,电子设备还可以获取第j帧视频图像中每个人物的人物信息并保存。
进一步的,如图16所示,本申请实施例提供的图像处理方法还可以包括S905。
S905、电子设备按照预设显示规格显示裁剪后的第i帧视频图像。
一种可能的实现中,执行图9或图16所示的视频图像处理方法的电子设备可以为视频通话中的发送端设备,本申请提供的视频图像处理方法还可以包括:电子设备将缩小或放大得到的预设显示规格的图像进行编码,向接收端设备发送,由接收端设备按照预设显示规格显示裁剪后的第i帧视频图像。具体过程参见图2所示系统架构的 工作流程。
一种可能的实现中,执行图9或图16所示的视频图像处理方法的电子设备可以为视频通话中的发送端设备,本申请提供的视频图像处理方法还可以包括:电子设备按照预设显示规格显示裁剪后的第i帧视频图像,同时按照预设显示规格显示裁剪后的对端的视频图像。
一种可能的实现中,执行图9或图16所示的视频图像处理方法的电子设备可以为视频通话中的接收端设备,本申请提供的视频图像处理方法还可以包括:电子设备将缩小或放大得到的预设规格的图像通过显示装置显示。具体过程参见图2所示系统架构的工作流程。
下面以具体视频通话场景为例,对本申请实施例提供的视频图像处理方法进行详细说明。
电子设备1701和电子设备1702中安装有视频通话应用。该视频通话应用是可以为用户提供视频通话服务的客户端。电子设备1701、电子设备1702中安装的视频通话应用,可以通过互联网访问视频通话服务器进行数据交互,完成视频通话,为使用电子设备1701和电子设备1702的用户提供视频通话服务。
例如,如图17A所示,电子设备1701的主界面(即桌面)上包括视频通话应用的应用图标17011。如图17B所示,电子设备1702的桌面上包括视频通话应用的应用图标17021。电子设备1701调用视频通话应用与电子设备1702进行视频通话,视频通话过程中对视频图像进行本申请实施例所述的视频图像处理。
例如,电子设备1701可以接收用户对图17A所示的应用图标17011的点击操作(如触摸单击操作或通过遥控装置的操作),显示图18A所示的视频通话应用界面1801。视频通话应用界面1801中包括“新朋友”选项1802和至少一个联系人选项。例如,至少一个联系人选项包括鲍勃(Bob)的联系人选项1803和用户311的联系人选项1804。其中,“新朋友”选项1802用于添加新的联系人。电子设备1701响应于用户对用户311的联系人选项1804的点击操作(如单击操作或通过遥控装置的操作),向用户311这一账户登录的电子设备1702发送视频通话请求,与电子设备1702进行视频通话。
示例性的,响应于用户对联系人选项1804的点击操作,电子设备1701可以启动自身的摄像头,采集固定视野的图像作为场景图像,电子设备1701的显示屏显示包括摄像头采集的场景图像的视频通话界面1805如图18B所示。视频通话界面1805中包括提示信息“正在等待对方响应!”1806和“取消”按钮1807。“取消”按钮1807用于触发电子设备1701取消与电子设备1702进行视频通话。
相应的,电子设备1702从视频通话服务器接收到电子设备1701发送的视频通话请求,电子设备1702的显示屏显示视频通话界面1808如图18C所示。视频通话界面1808中包括“接收”按钮1809和“拒绝”按钮1810。其中,“接收”按钮1809用于电子设备1702与电子设备1701建立视频通话连接。“拒绝”按钮1810用于触发电子设备1702拒绝电子设备1701的视频通话请求。
电子设备1702可以接收用户对“接收”按钮1809的点击操作(如触摸单击操作或通过遥控装置的操作),与电子设备1701建立视频通话连接。在建立连接之后,电 子设备1701及电子设备1702作为视频通话的双方,电子设备1701、电子设备1702可以分别采用各自的摄像头采集固定视野的图像作为场景图像,逐帧经裁剪、缩放、编码后向对端发送场景图像,由对端显示,电子设备1701、电子设备1702可以在显示对端裁剪后的视频图像的同时显示本端裁剪后的视频图像。其中,在视频通话过程中,电子设备1701向电子设备1702发送视频图像的过程,电子设备1701是发送端设备电子设备1702是接收端设备,电子设备1702向电子设备1701发送视频图像的过程,电子设备1702是发送端设备电子设备1701是接收端设备。电子设备间视频图像传输具体过程可以参照图2所示的系统架构的工作流程。
其中,电子设备1701、电子设备1702可以对前X(例如X等于120)帧视频图像,直接将原始图像缩小为对端显示规格的图像进行编码发送到对端。电子设备1701、电子设备1702可以对第i帧(i大于120)帧视频图像,按照本申请实施例提供的视频图像处理方法处理。
示例性的,电子设备1701与电子设备1702进行视频通话的过程中的某一时刻,电子设备1701的摄像头采集的固定视野的视频图像如图19中的(a)所示,电子设备1701按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为电子设备1702的显示规格的图像如图19中的(b)所示。电子设备1701将图19中的(b)所示的图像编码后向电子设备1702发送。同时,在该时刻,电子设备1702的摄像头采集的固定视野的视频图像如图19A中的(a)所示,电子设备1702按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为电子设备1701的显示规格的图像如图19A中的(b)所示,电子设备1702将图19A中的(b)所示的图像编码后向电子设备1701发送。此时,电子设备1701、电子设备1702的显示界面如图19B。如图19B所示,电子设备1701、电子设备1702的主界面大图分别是对端是采集裁剪缩放后的图像,小图按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为自身显示规格的图像。需要说明的是,电子设备显示本端采集的图像时,可以显示为本端采集的原始图像或者为按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为自身显示规格的图像。
在电子设备1701与电子设备1702进行视频通话的过程中的另一时刻,电子设备1701的采集场景中,人物位置发生变化,此时电子设备1701的摄像头采集的固定视野的视频图像如图20中的(a)所示,电子设备1701按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为电子设备1702的显示规格的图像如图20中的(b)所示。电子设备1701将图20中的(b)所示的图像编码后向电子设备1702发送。同时,在该时刻,假设电子设备1702的采集场景中人物位置与图19A中示意的相同未发生变化。此时,电子设备1701、电子设备1702的显示界面如图20A。如图20A所示,电子设备1701、电子设备1702的主界面大图分别是对端是采集裁剪缩放后的图像,小图为按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为自身显示规格的图像。
在电子设备1701与电子设备1702进行视频通话的过程中的另一时刻,电子设备1701的采集场景中,人物增加,此时电子设备1701的摄像头采集的固定视野的视频图像如图21中的(a)所示,电子设备1701按照本申请实施例提供的视频图像处理方法 处理确定主体人物进行裁剪、缩放为电子设备1702的显示规格的图像如图21中的(b)所示。电子设备1701将图21中的(b)所示的图像编码后向电子设备1702发送。同时,在该时刻,电子设备1702的采集场景相对于图19A,人物位置发生变化,此时电子设备1702摄像头采集的固定视野的视频图像如图21A中的(a)所示,电子设备1702按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为电子设备1701的显示规格的图像如图21A中的(b)所示,电子设备1702将图21A中的(b)所示的图像编码后向电子设备1701发送。此时,电子设备1701、电子设备1702的显示界面如图21B。如图21B所示,电子设备1701、电子设备1702的主界面大图分别是对端是采集裁剪缩放后的图像,小图按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为自身显示规格的图像。
在电子设备1701与电子设备1702进行视频通话的过程中另一时刻,电子设备1701的采集场景中,人物增加且位置变化,此时电子设备1701的摄像头采集的固定视野的视频图像如图22中的(a)所示,电子设备1701按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为电子设备1702的显示规格的图像如图22中的(b)所示。电子设备1701将图22中的(b)所示的图像编码后向电子设备1702发送。同时,在该时刻,假设电子设备1702的采集场景中人物位置与图21A中示意的相同未发生变化。此时,电子设备1701、电子设备1702的显示界面如图22A。如图22A所示,电子设备1701、电子设备1702的主界面大图分别是对端是采集裁剪缩放后的图像,小图为按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为自身显示规格的图像。
下面以具体监控场为例,对本申请实施例提供的视频图像处理方法进行详细说明。
假设监控系统包括摄像头1、服务器2、显示设备3。摄像头1用于采集固定视野的视频图像,服务器2用于对摄像头1采集的视频图像通过本申请实施例提供的视频图像处理方法处理,处理之后的视频图像可以通过显示设备3实时显示,处理之后的视频图像还可以存储于服务器2中的存储装置,服务器2在接收到读取指令时从存储装置读取处理后的视频图像通过显示设备3显示。
示例性的,该监控系统运行过程中的某一时刻,摄像头1采集的固定视野的视频图像如图23中的(a)所示,摄像头1将采集的图像发送至服务器2。服务器2按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为显示设备3的显示规格的图像如图23中的(b)所示。服务器2将图23中的(b)所示的图像通过显示设备3实时显示。同时,服务器2将图23中的(b)所示的图像存储于服务器2中的存储装置。当服务器2接收到读取该视频图像的指令时,从存储装置中读取视频图像通过显示设备3显示。
在该监控系统运行过程中的另一时刻,采集场景中的人物位置发生变化,此时摄像头1采集的固定视野的视频图像如图24中的(a)所示,摄像头1将采集的图像发送至服务器2。服务器2按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为显示设备3的显示规格的图像如图24中的(b)所示。服务器2将图24中的(b)所示的图像通过显示设备3实时显示。同时,服务器2将图24中的(b)所示的图像存储于服务器2中的存储装置。当服务器2接收到读取该视频图像的指令时, 从存储装置中读取视频图像通过显示设备3显示。
在该监控系统运行过程中的另一时刻,采集场景中的人物增加,此时摄像头1采集的固定视野的视频图像如图25中的(a)所示,摄像头1将采集的图像发送至服务器2。服务器2按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为显示设备3的显示规格的图像如图25中的(b)所示。服务器2将图25中的(b)所示的图像通过显示设备3实时显示。同时,服务器2将图25中的(b)所示的图像存储于服务器2中的存储装置。当服务器2接收到读取该视频图像的指令时,从存储装置中读取视频图像通过显示设备3显示。
在该监控系统运行过程中的另一时刻,采集场景中人物增加且位置变化,此时摄像头1采集的固定视野的视频图像如图26中的(a)所示,摄像头1将采集的图像发送至服务器2。服务器2按照本申请实施例提供的视频图像处理方法处理确定主体人物进行裁剪、缩放为显示设备3的显示规格的图像如图26中的(b)所示。服务器2将图26中的(b)所示的图像通过显示设备3实时显示。同时,服务器2将图26中的(b)所示的图像存储于服务器2中的存储装置。当服务器2接收到读取该视频图像的指令时,从存储装置中读取视频图像通过显示设备3显示。
上述主要从电子设备的角度对本申请实施例提供的方案进行了介绍。可以理解的是,电子设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对电子设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,如图27所示为本申请实施例提供的一种视频图像处理装置270,用于实现上述方法中电子设备的功能。该视频图像处理装置270可以是电子设备,也可以是电子设备中的装置,也可以是能够和电子设备匹配使用的装置。其中,该视频图像处理装置270可以为芯片系统。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。如图27所示,视频图像处理装置270可以包括:获取单元2701、确定单元2702、裁剪单元2703、缩放单元2704。获取单元2701用于执行图9或图16中的S901、S901a,确定单元2702用于执行图9或图16中的S902,裁剪单元2703用于执行图9或图16中的S903,缩放单元2704用于执行图9或图16中的S904。其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
进一步的,如图27所示,视频图像处理装置270还可以包括显示单元2705,用于执行图16中的S905。
如图28所示,为本申请实施例提供的视频图像处理装置280,用于实现上述方法中电子设备的功能。该视频图像处理装置280可以是电子设备,也可以是电子设备中的装置,也可以是能够和电子设备匹配使用的装置。其中,该视频图像处理装置280可以为芯片系统。视频图像处理装置280包括至少一个处理模块2801,用于实现本申请实施例提供的方法中电子设备的功能。示例性地,处理模块2801可以用于执行图9或图16中的过程S901、S901a、S902、S903、S904。具体参见方法示例中的详细描述,此处不做赘述。
视频图像处理装置280还可以包括至少一个存储模块2802,用于存储程序指令和/或数据。存储模块2802和处理模块2801耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理模块2801可能和存储模块2802协同操作。处理模块2801可能执行存储模块2802中存储的程序指令。所述至少一个存储模块中的至少一个可以包括于处理模块中。
视频图像处理装置280还可以包括通信模块2803,用于通过传输介质和其它设备进行通信,从而用于确定视频图像处理装置280中的装置可以和其它设备进行通信。
视频图像处理装置280还可以包括显示模块2804,可以用于执行图16中的过程S905。
当处理模块2801为处理器,存储模块2802为存储器,显示模块2804为显示屏,本申请实施例图28所涉及的视频图像处理装置280可以为图8所示的电子设备。
如前述,本申请实施例提供的视频图像处理装置270或视频图像处理装置280可以用于实施上述本申请各实施例实现的方法中电子设备的功能,为了便于说明,仅示出了与本申请实施例相关的部分,具体技术细节未揭示的,请参照本申请各实施例。
本申请另一些实施例还提供一种计算机可读存储介质,该计算机可读存储介质可包括计算机软件指令,当该计算机软件指令在电子设备上运行时,使得该电子设备执行上述9或图16所示实施例中电子设备执行的各个步骤。
本申请另一些实施例还提供一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行上述图9或图16所示实施例中电子设备执行的各个步骤。
本申请另一些实施例还提供一种芯片系统,该芯片系统可以应用于电子设备。该电子设备包括显示屏和摄像头。芯片系统包括接口电路和处理器;接口电路和处理器通过线路互联;接口电路用于从电子设备的存储器接收信号,并向处理器发送信号,信号包括存储器中存储的计算机指令;当处理器执行该计算机指令时,芯片系统执行如上述图9或图16所示实施例中电子设备执行的各个步骤。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如 多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (17)

  1. 一种视频图像处理方法,其特征在于,所述方法包括:
    获取第i帧视频图像中每个人物的身份信息及位置信息;所述i大于1;
    根据所述第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从所述第i帧视频图像中确定M个主体人物;所述M、N大于或等于1;其中,所述N个视频图像帧中的人物的身份信息包括所述M个主体人物的身份信息;
    根据所述主体人物的位置信息,裁剪所述第i帧视频图像,裁剪后的所述第i帧视频图像包括所述M个主体人物;
    将裁剪后的所述第i帧视频图像缩小或放大,以便显示屏按照预设显示规格显示裁剪后的所述第i帧视频图像。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从所述第i帧视频图像中确定M个主体人物,包括:
    将在所述N个视频图像帧中出现的帧数大于等于第一预设阈值并且出现在所述第i帧视频图像中的人物确定为M个主体人物。
  3. 根据权利要求1所述的方法,其特征在于,
    所述方法还包括:将所述第i帧视频图像划分为Y个区域;配置每个所述区域对应的预设阈值;第k区域对应的预设阈值为第k预设阈值;所述第k区域为所述Y个区域中任一个区域;所述Y大于或等于2;所述k大于或等于1,小于或等于所述Y;
    所述根据所述第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从所述第i帧视频图像中确定M个主体人物,包括:
    将在所述N个视频图像帧中出现的帧数大于等于所在区域对应的预设阈值并且出现在所述第i帧视频图像中的人物确定为M个主体人物。
  4. 根据权利要求1所述的方法,其特征在于,
    所述方法还包括:获取所述第i帧视频图像中每个人物的人物信息,所述人物信息包括下述信息中一项或多项:是否讲话信息、优先级信息;
    所述根据所述第i帧视频图像之前的N个视频图像帧中的人物的身份信息,从所述第i帧视频图像中确定M个主体人物,包括:
    将在所述N个视频图像帧中讲话的帧数大于等于第二预设阈值并且出现在所述第i帧视频图像中的人物确定为M个主体人物;
    或者,
    将在所述N个视频图像帧中优先级信息大于第三预设阈值并且出现在所述第i帧视频图像中的人物确定为M个主体人物;
    或者,
    将在所述N个视频图像帧中讲话的帧数大于等于第二预设阈值并且出现在所述第i帧视频图像中的人物,按照优先级信息选择最重要的M个确定为M个主体人物。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述根据所述主体人物的位置信息,裁剪所述第i帧视频图像,包括:
    确定裁剪框,所述裁剪框包含所述M个主体人物的最小外接矩形框;
    以裁剪框裁剪所述第i帧视频图像。
  6. 根据权利要求5所述的方法,其特征在于,所述确定裁剪框,包括:
    获取第一待选裁剪框的中心点与前一帧视频图像的裁剪框的中心点的距离,所述第一待选裁剪框包括所述M个主体人物的最小外接矩形框;
    若所述距离大于或等于距离阈值,确定第二裁剪框,所述第二裁剪框的中心点为所述前一帧视频图像的裁剪框的中心点加偏移量,所述第二裁剪框的大小与所述前一帧视频图像的裁剪框的大小相同;
    若所述第二裁剪框包含所述最小外接矩形框,将第三裁剪框作为所述裁剪框;其中,所述第三裁剪框为所述第二裁剪框,或者,所述第三裁剪框为所述第二裁剪框缩小至包含所述最小外接矩形框的裁剪框;
    若所述第二裁剪框未完整包含所述最小外接矩形框,将所述第二裁剪框扩大至包含所述最小外接矩形框,将扩大后的所述第二裁剪框作为所述裁剪框。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述方法还包括:
    按照预设显示规格显示裁剪后的所述第i帧视频图像。
  8. 一种视频图像处理装置,其特征在于,所述装置包括:
    获取单元,用于获取第i帧视频图像中每个人物的身份信息及位置信息;所述i大于1;
    确定单元,用于根据所述第i帧视频图像之前的N个视频图像帧中的人物身份信息,从所述第i帧视频图像中确定M个的主体人物;所述M、N大于或等于1;
    裁剪单元,用于根据所述确定单元确定的所述主体人物的位置信息,裁剪所述第i帧视频图像,裁剪后的所述第i帧视频图像包括所述M个主体人物;其中,所述N个视频图像帧中的人物的身份信息包括所述M个主体人物的身份信息;
    缩放单元,将裁剪后的所述第i帧视频图像缩小或放大,以便显示屏按照预设显示规格显示裁剪后的所述第i帧视频图像。
  9. 根据权利要求8所述的装置,其特征在于,所述确定单元具体用于:
    将在所述N个视频图像帧中出现的帧数大于等于第一预设阈值并且出现在所述第i帧视频图像中的人物确定为M个主体人物。
  10. 根据权利要求8所述的装置,其特征在于,所述确定单元具体用于:
    将所述第i帧视频图像划分为Y个区域;配置每个所述区域对应的预设阈值;第k区域对应的预设阈值为第k预设阈值;所述第k区域为所述Y个区域中任一个区域;所述Y大于或等于2;所述k大于或等于1,小于或等于所述Y;
    将在所述N个视频图像帧中出现的帧数大于等于所在区域对应的预设阈值并且出现在所述第i帧视频图像中的人物确定为M个主体人物。
  11. 根据权利要求8所述的装置,其特征在于,
    所述获取单元还用于:获取所述第i帧视频图像中每个人物的人物信息,所述人物信息包括下述信息中一项或多项:是否讲话信息、优先级信息;
    所述确定单元具体用于:
    将在所述N个视频图像帧中讲话的帧数大于等于第二预设阈值并且出现在所述第i帧视频图像中的人物确定为M个主体人物;
    或者,
    将在所述N个视频图像帧中优先级信息大于第三预设阈值并且出现在所述第i帧视频图像中的人物确定为M个主体人物;
    或者,
    将在所述N个视频图像帧中讲话的帧数大于等于第二预设阈值并且出现在所述第i帧视频图像中的人物,按照优先级信息选择最重要的M个确定为M个主体人物。
  12. 根据权利要求8-11任一项所述的装置,其特征在于,所述裁剪单元具体用于:
    确定裁剪框,所述裁剪框包含所述M个主体人物的最小外接矩形框;
    以裁剪框裁剪所述第i帧视频图像。
  13. 根据权利要求12所述的装置,其特征在于,所述裁剪单元具体用于:
    获取第一待选裁剪框的中心点与前一帧视频图像的裁剪框的中心点的距离,所述第一待选裁剪框包括所述M个主体人物的最小外接矩形框;
    若所述距离大于或等于距离阈值,确定第二裁剪框,所述第二裁剪框的中心点为所述前一帧视频图像的裁剪框的中心点加偏移量,所述第二裁剪框的大小与所述前一帧视频图像的裁剪框的大小相同;
    若所述第二裁剪框包含所述最小外接矩形框,将第三裁剪框作为所述裁剪框;其中,所述第三裁剪框为所述第二裁剪框,或者,所述第三裁剪框为所述第二裁剪框缩小至包含所述最小外接矩形框的裁剪框;
    若所述第二裁剪框未完整包含所述最小外接矩形框,将所述第二裁剪框扩大至包含所述最小外接矩形框,将扩大后的所述第二裁剪框作为所述裁剪框。
  14. 根据权利要求8-13任一项所述的装置,其特征在于,所述装置还包括:
    显示单元,用于按照预设显示规格显示裁剪后的所述第i帧视频图像。
  15. 一种电子设备,其特征在于,所述电子设备包括:处理器,存储器;所述处理器和所述存储器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述计算机指令被所述电子设备执行时,使得所述电子设备执行如权利要求1-7中任一项所述的视频图像处理方法。
  16. 一种计算机可读存储介质,其特征在于,包括:计算机软件指令;
    当所述计算机软件指令在电子设备中运行时,使得所述电子设备执行如权利要求1-7中任一项所述的视频图像处理方法。
  17. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-7中任一项所述的视频图像处理方法。
PCT/CN2020/087634 2019-08-31 2020-04-28 一种视频图像处理方法及装置 WO2021036318A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/680,889 US20220270343A1 (en) 2019-08-31 2022-02-25 Video image processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910819774.X 2019-08-31
CN201910819774.XA CN112446255A (zh) 2019-08-31 2019-08-31 一种视频图像处理方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/680,889 Continuation US20220270343A1 (en) 2019-08-31 2022-02-25 Video image processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2021036318A1 true WO2021036318A1 (zh) 2021-03-04

Family

ID=74685536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087634 WO2021036318A1 (zh) 2019-08-31 2020-04-28 一种视频图像处理方法及装置

Country Status (3)

Country Link
US (1) US20220270343A1 (zh)
CN (1) CN112446255A (zh)
WO (1) WO2021036318A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256655A (zh) * 2021-05-27 2021-08-13 瑞芯微电子股份有限公司 一种基于画面特征的视频分割方法及存储介质
CN114040145A (zh) * 2021-11-20 2022-02-11 深圳市音络科技有限公司 一种视频会议人像显示方法、系统、终端及存储介质
CN114598819A (zh) * 2022-03-16 2022-06-07 维沃移动通信有限公司 视频录制方法、装置和电子设备
WO2024145878A1 (zh) * 2023-01-05 2024-07-11 广州视源电子科技股份有限公司 视频处理方法、装置、设备及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763242A (zh) * 2021-05-17 2021-12-07 腾讯科技(深圳)有限公司 一种图像处理方法、装置及计算机可读存储介质
CN115633255B (zh) * 2021-08-31 2024-03-22 荣耀终端有限公司 视频处理方法和电子设备
CN113840159B (zh) * 2021-09-26 2024-07-16 北京沃东天骏信息技术有限公司 视频处理方法、装置、计算机系统及可读存储介质
CN116342639A (zh) * 2021-12-22 2023-06-27 华为技术有限公司 图像显示方法及其电子设备和介质
CN115766901B (zh) * 2023-01-09 2023-05-26 武汉精测电子集团股份有限公司 一种图像传感器的数据传输设备及方法
CN117714833A (zh) * 2023-05-19 2024-03-15 荣耀终端有限公司 图像处理方法、装置、芯片、电子设备及介质
CN117714903B (zh) * 2024-02-06 2024-05-03 成都唐米科技有限公司 一种基于跟拍的视频合成方法、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102611872A (zh) * 2011-01-19 2012-07-25 株式会社理光 基于感兴趣区域动态检测的场景影像转换系统和方法
CN109360183A (zh) * 2018-08-20 2019-02-19 中国电子进出口有限公司 一种基于卷积神经网络的人脸图像质量评估方法和系统
CN110072055A (zh) * 2019-05-07 2019-07-30 中国联合网络通信集团有限公司 基于人工智能的视频制作方法及系统
CN110148157A (zh) * 2019-05-10 2019-08-20 腾讯科技(深圳)有限公司 画面目标跟踪方法、装置、存储介质及电子设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3287947A1 (en) * 2016-08-25 2018-02-28 Dolby Laboratories Licensing Corp. Automatic video framing of conference participants

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102611872A (zh) * 2011-01-19 2012-07-25 株式会社理光 基于感兴趣区域动态检测的场景影像转换系统和方法
CN109360183A (zh) * 2018-08-20 2019-02-19 中国电子进出口有限公司 一种基于卷积神经网络的人脸图像质量评估方法和系统
CN110072055A (zh) * 2019-05-07 2019-07-30 中国联合网络通信集团有限公司 基于人工智能的视频制作方法及系统
CN110148157A (zh) * 2019-05-10 2019-08-20 腾讯科技(深圳)有限公司 画面目标跟踪方法、装置、存储介质及电子设备

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256655A (zh) * 2021-05-27 2021-08-13 瑞芯微电子股份有限公司 一种基于画面特征的视频分割方法及存储介质
CN114040145A (zh) * 2021-11-20 2022-02-11 深圳市音络科技有限公司 一种视频会议人像显示方法、系统、终端及存储介质
CN114598819A (zh) * 2022-03-16 2022-06-07 维沃移动通信有限公司 视频录制方法、装置和电子设备
WO2024145878A1 (zh) * 2023-01-05 2024-07-11 广州视源电子科技股份有限公司 视频处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN112446255A (zh) 2021-03-05
US20220270343A1 (en) 2022-08-25

Similar Documents

Publication Publication Date Title
WO2021036318A1 (zh) 一种视频图像处理方法及装置
WO2020259038A1 (zh) 一种拍摄方法及设备
CN110035141B (zh) 一种拍摄方法及设备
WO2021052214A1 (zh) 一种手势交互方法、装置及终端设备
WO2022143128A1 (zh) 基于虚拟形象的视频通话方法、装置和终端
CN113810601B (zh) 终端的图像处理方法、装置和终端设备
CN112492193B (zh) 一种回调流的处理方法及设备
WO2022193989A1 (zh) 电子设备的操作方法、装置和电子设备
US11272116B2 (en) Photographing method and electronic device
US20230421900A1 (en) Target User Focus Tracking Photographing Method, Electronic Device, and Storage Medium
WO2022022319A1 (zh) 一种图像处理方法、电子设备、图像处理系统及芯片系统
CN110248037A (zh) 一种身份证件扫描方法及装置
CN113593567A (zh) 视频声音转文本的方法及相关设备
WO2020078267A1 (zh) 在线翻译过程中的语音数据处理方法及装置
WO2022214004A1 (zh) 一种目标用户确定方法、电子设备和计算机可读存储介质
WO2022033344A1 (zh) 视频防抖方法、终端设备和计算机可读存储介质
CN114302063B (zh) 一种拍摄方法及设备
CN114339140A (zh) 一种可交互监控装置、视频传输方法及装置
CN115297269B (zh) 曝光参数的确定方法及电子设备
CN113472996B (zh) 图片传输方法及装置
WO2022105670A1 (zh) 一种显示方法及终端
WO2023005882A1 (zh) 拍摄方法、拍摄参数训练方法、电子设备及存储介质
WO2023030067A1 (zh) 遥控方法、遥控设备和被控制设备
WO2023020420A1 (zh) 音量显示方法、电子设备及存储介质
CN116582743A (zh) 一种拍摄方法、电子设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20859495

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20859495

Country of ref document: EP

Kind code of ref document: A1