WO2023029547A1 - Video processing method, and electronic device - Google Patents

Video processing method, and electronic device Download PDF

Info

Publication number
WO2023029547A1
WO2023029547A1 PCT/CN2022/091447 CN2022091447W WO2023029547A1 WO 2023029547 A1 WO2023029547 A1 WO 2023029547A1 CN 2022091447 W CN2022091447 W CN 2022091447W WO 2023029547 A1 WO2023029547 A1 WO 2023029547A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
image frame
video
electronic device
cropping
Prior art date
Application number
PCT/CN2022/091447
Other languages
French (fr)
Chinese (zh)
Inventor
付庆涛
陈斌
Original Assignee
荣耀终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 荣耀终端有限公司 filed Critical 荣耀终端有限公司
Publication of WO2023029547A1 publication Critical patent/WO2023029547A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/62Control of parameters via user interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/69Control of means for changing angle of the field of view, e.g. optical zoom objectives or electronic zooming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/95Computational photography systems, e.g. light-field imaging systems
    • H04N23/951Computational photography systems, e.g. light-field imaging systems by using two or more images to influence resolution, frame rate or aspect ratio

Definitions

  • the present application relates to the field of image processing, in particular to a video processing method and electronic equipment.
  • the user tracking in the video display screen is usually realized by detecting the user's human body, so as to realize the function of "following the person's movement". Higher, moreover, the human body detection of the user is also likely to cause misidentification.
  • the present application provides a video processing method and electronic equipment.
  • the video can be accurately processed while reducing power consumption, and the accuracy of the "follow people's movement" function can be improved.
  • a video processing method is provided, and the video processing method is applied to electronic equipment, including:
  • the first image frame is an image frame of the target object at a first position; when the target object moves to a second position, acquiring a second image frame, the second position is the same as the second position
  • the first position is a different position
  • the second image frame refers to an image frame collected by the electronic device when the target object moves to the second position
  • face detection is performed according to the second image frame to obtain Coordinate information of the first detection frame
  • the first detection frame is used to indicate the position information of the face of the target object in the second image frame
  • the second area in the third image frame has an intersection, the first area refers to the area where the target object is located in the first image frame, and the second area refers to the area of the target object in the third image frame.
  • the second image frame may refer to an image frame captured by the camera in real time after the target object moves; the target object may refer to a part or all of the subject; Refers to the owner user; in the case of not receiving an instruction to enable the identification of the owner, the target object may refer to all shooting objects.
  • intersection between the first area of the first image and the second area of the third image may mean that the first area and the second area completely overlap; or, it may also mean that there is a partial intersection between the first area and the second area .
  • the electronic device displays the first image frame of the target object; after the target object moves, the camera of the electronic device can collect the second image frame in real time, and the second image frame Perform face detection to obtain the coordinate information of the face frame corresponding to the target object; obtain the coordinate information of the cropping frame according to the coordinate information of the face frame; perform cropping processing on the second image frame according to the cropping frame to obtain the display content including the target object ; Displaying the third image frame according to the display content; the third image frame refers to the image frame of the target object displayed by the electronic device after the target object moves.
  • the video processing method of the present application can reduce the calculation amount of the electronic device and reduce the power consumption of the electronic device; in addition, because the video processing method of the present application is Determining the coordinate information of the cropping frame according to the face frame can prevent the target object from performing video tracking and display on the target object when the target object is facing away from the electronic device in the second image frame; therefore, the solution of the present application can reduce power consumption.
  • the first area coincides with the second area.
  • the first area and the second area overlap, and the first area and the second area are located in a middle area of the display interface.
  • the positions of the electronic device are the same.
  • the electronic device can keep its position unchanged, and after the target object is moved, the target object can always be displayed in the middle position or the middle area of the video display screen; to realize the tracking of the target object Display, that is, to realize the function of "following people".
  • the first aspect in combination with the first aspect, in some implementation manners of the first aspect, it also includes:
  • An action is detected indicating that the camera application is running; or,
  • the video processing method may be applied in the process of shooting video by the camera application program; or, the video processing method may also be applied in the video call application program.
  • the obtaining the coordinate information of the cropping frame according to the first detection frame includes:
  • the first expansion process refers to expanding the boundary of the first detection frame with the first detection frame as the center
  • the second detection frame is used to indicate that the body of the target object is within the first detection frame.
  • the second extension process refers to extending the boundary of the second detection frame with the second detection frame as the center.
  • the cropping frame in order to avoid local shaking of the first detection frame in multiple image frames in the video image, it is ensured that the cropping frame can remain unchanged when the target object is moving in a small range; therefore, through the second detection The frame is subjected to the second expansion process to obtain the cropping frame, which can ensure the stability of the image frame after the cropping process to a certain extent.
  • performing the first extension process on the first detection frame to obtain the second detection frame includes:
  • the first expansion process is performed on the first detection frame according to a first threshold to obtain the second detection frame, and the first threshold is used to indicate body proportion data.
  • performing cropping processing on the second image frame according to the cropping frame to obtain display content including the target object includes:
  • the preset condition means that the second detection frame and the cropping frame satisfy a preset proportional relationship
  • the second image frame is cropped according to the cropping frame to obtain the display content.
  • the preset condition may refer to that the second detection frame and the cropping frame satisfy a certain proportional relationship, and the second detection frame is located inside the cropping frame.
  • the coordinate information of the first detection frame refers to the coordinate information corresponding to the first detection frame when the second image frame is at the second resolution ,Also includes:
  • the second resolution is determined according to the first resolution, and the second resolution is greater than the first resolution.
  • the resolution can be extended from the first resolution to the second resolution, which can solve the problem of lowering the definition of the second image frame caused by the subsequent cropping process; by performing the resolution expansion process, it is possible to To a certain extent, the clarity of the third image frame displayed after the trimming process is improved.
  • a request instruction requesting the first resolution is received; the first resolution is expanded to determine the second resolution; the coordinate information of the first detection frame is detected in the second image frame; The coordinate information of the first detection frame is converted to the corresponding coordinate information when the second image frame is of the second resolution.
  • the displaying the third image frame according to the display content includes:
  • the target object is the owner user, and further includes:
  • the owner identification instruction is used to instruct to identify the owner user
  • Face recognition is performed according to the first detection frame to determine the owner user, and the owner user is a pre-configured user.
  • the owner of the device may refer to the management user of the electronic device; or, the owner may also be any user with a higher priority configured in advance; the identification of the owner refers to identifying the target through face detection during tracking and display. The owner user in the object, and track and display the owner user.
  • the first detection frame refers to a face frame of the owner user of the device.
  • the target object includes at least one user.
  • the target object includes a first user and a second user
  • the first detection frame refers to a face frame of the first user and the first user The union frame of the face frames of the two users.
  • an electronic device in a second aspect, includes: one or more processors, a memory, and a display screen; the memory is coupled to the one or more processors, and the memory is used to store computer program code, the computer program code comprising computer instructions that are invoked by the one or more processors to cause the electronic device to perform:
  • the first image frame is an image frame of the target object at a first position; when the target object moves to a second position, acquiring a second image frame, the second position is the same as the second position
  • the first position is a different position
  • the second image frame refers to an image frame collected by the electronic device when the target object moves to the second position; face detection is performed according to the second image frame to obtain Coordinate information of the first detection frame, the first detection frame is used to indicate the position information of the face of the target object in the second image frame; obtain the coordinate information of the cropping frame according to the first detection frame;
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • An action is detected indicating that the camera application is running; or,
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • the first expansion process refers to expanding the boundary of the first detection frame with the first detection frame as the center
  • the second detection frame is used to indicate that the body of the target object is within the first detection frame.
  • the second extension process refers to extending the boundary of the second detection frame with the second detection frame as the center.
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • the second detection frame is obtained by performing the first expansion process on the first detection frame according to a first threshold, where the first threshold is used to indicate body proportion data.
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • the preset condition means that the second detection frame and the cropping frame satisfy a preset proportional relationship
  • the second image frame is cropped according to the cropping frame to obtain the display content.
  • the coordinate information of the first detection frame refers to the coordinate information corresponding to the first detection frame when the second image frame is at the second resolution
  • the one or more processors invoke the computer instructions so that the electronic device also performs:
  • the second resolution is determined according to the first resolution, and the second resolution is greater than the first resolution.
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • the owner identification instruction is used to instruct to identify the owner user
  • Face recognition is performed according to the first detection frame to determine the owner user, and the owner user is a pre-configured user.
  • the first detection frame refers to a face frame of the owner user of the device.
  • the target object includes at least one user.
  • the target object includes a first user and a second user
  • the first detection frame refers to a face frame of the first user and the first user The union frame of the face frames of the two users.
  • the first area coincides with the second area.
  • a video processing device including a unit for executing any video processing method in the first aspect.
  • the processing unit may be a processor, and the input unit may be a communication interface; the electronic device may also include a memory, and the memory is used to store computer programs A code for causing the electronic device to execute any method in the first aspect when the processor executes the computer program code stored in the memory.
  • a chip system is provided, the chip system is applied to an electronic device, and the chip system includes one or more processors, and the processor is used to call a computer instruction so that the electronic device executes the first aspect Any of the video processing methods in .
  • a computer-readable storage medium stores computer program codes, and when the computer program codes are executed by an electronic device, the electronic device executes any of the items in the first aspect.
  • a video processing method when the computer program codes are executed by an electronic device, the electronic device executes any of the items in the first aspect.
  • a computer program product comprising: computer program code, when the computer program code is run by an electronic device, the electronic device is made to perform any video processing in the first aspect method.
  • FIG. 1 is a schematic diagram of a hardware system applicable to an electronic device of the present application
  • Fig. 2 is a schematic diagram of a software system applicable to the electronic device of the present application
  • FIG. 3 is a schematic diagram of an application scenario applicable to this application.
  • Fig. 4 is a schematic diagram of the intersection of the first area and the second area provided by the implementation of the present application.
  • Fig. 5 is a schematic flowchart of a video processing method provided by the present application.
  • FIG. 6 is a schematic diagram of a video processing display interface provided by the present application.
  • FIG. 7 is a schematic diagram of a video processing display interface provided by the present application.
  • Fig. 8 is a schematic diagram of a video processing display interface provided by the present application.
  • FIG. 9 is a schematic diagram of a video processing display interface provided by the present application.
  • FIG. 10 is a schematic diagram of a video processing display interface provided by the present application.
  • FIG. 11 is a schematic diagram of a video processing display interface provided by the present application.
  • Fig. 12 is a schematic diagram of a video processing display interface provided by the present application.
  • Fig. 13 is a schematic diagram of a video processing display interface provided by the present application.
  • Fig. 14 is a schematic diagram of a video processing display interface provided by the present application.
  • Fig. 15 is a schematic diagram of a video processing display interface provided by the present application.
  • Fig. 16 is a schematic diagram of a video processing display interface provided by the present application.
  • Fig. 17 is a schematic diagram of a video processing display interface provided by the present application.
  • Fig. 18 is a schematic diagram of a video processing display interface provided by the present application.
  • FIG. 19 is a schematic structural diagram of a video processing device provided by the present application.
  • FIG. 20 is a schematic structural diagram of an electronic device provided by the present application.
  • Fig. 1 shows a hardware system applicable to the electronic equipment of this application.
  • the electronic device 100 may be a mobile phone, a smart screen, a tablet computer, a wearable electronic device, a vehicle electronic device, an augmented reality (augmented reality, AR) device, a virtual reality (virtual reality, VR) device, a notebook computer, a super mobile personal computer ( ultra-mobile personal computer (UMPC), netbook, personal digital assistant (personal digital assistant, PDA), projector, etc.
  • augmented reality augmented reality
  • VR virtual reality
  • a notebook computer a super mobile personal computer ( ultra-mobile personal computer (UMPC), netbook, personal digital assistant (personal digital assistant, PDA), projector, etc.
  • UMPC ultra-mobile personal computer
  • PDA personal digital assistant
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
  • the structure shown in FIG. 1 does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than those shown in FIG. 1 , or the electronic device 100 may include a combination of some of the components shown in FIG. 1 , or , the electronic device 100 may include subcomponents of some of the components shown in FIG. 1 .
  • the components shown in FIG. 1 can be realized in hardware, software, or a combination of software and hardware.
  • Processor 110 may include one or more processing units.
  • the processor 110 may include at least one of the following processing units: an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor) , ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, neural network processor (neural-network processing unit, NPU).
  • an application processor application processor, AP
  • modem processor graphics processing unit
  • graphics processing unit graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • the processor 110 may include at least one of the following interfaces: an inter-integrated circuit (inter-integrated circuit, I2C) interface, an inter-integrated circuit sound (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, SIM interface, USB interface.
  • I2C inter-integrated circuit
  • I2S inter-integrated circuit sound
  • PCM pulse code modulation
  • UART universal asynchronous receiver/transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (serial data line, SDA) and a serial clock line (derail clock line, SCL).
  • the I2S interface can be used for audio communication.
  • the PCM interface can also be used for audio communication, sampling, quantizing and encoding the analog signal.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
  • MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 110 communicates with the camera 193 through the CSI interface to realize the shooting function of the electronic device 100 .
  • the processor 110 communicates with the display screen 194 through the DSI interface to realize the display function of the electronic device 100 .
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal interface or as a data signal interface.
  • the GPIO interface can be used to connect the processor 110 with the camera 193 , the display screen 194 , the wireless communication module 160 , the audio module 170 and the sensor module 180 .
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface or MIPI interface.
  • the USB interface 130 is an interface conforming to the USB standard specification, for example, it can be a mini (Mini) USB interface, a micro (Micro) USB interface or a C-type USB (USB Type C) interface.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100 , can also be used to transmit data between the electronic device 100 and peripheral devices, and can also be used to connect an earphone to play audio through the earphone.
  • the USB interface 130 can also be used to connect other electronic devices 100, such as AR devices.
  • connection relationship between the modules shown in FIG. 1 is only a schematic illustration, and does not constitute a limitation on the connection relationship between the modules of the electronic device 100 .
  • each module of the electronic device 100 may also adopt a combination of various connection modes in the foregoing embodiments.
  • the charging management module 140 is used to receive power from the charger. While the charging management module 140 is charging the battery 142 , it can also supply power to the electronic device 100 through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives the input from the battery 142 and/or the charging management module 140 to provide power for the processor 110 , the internal memory 121 , the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (eg, leakage, impedance).
  • the power management module 141 may be set in the processor 110, or the power management module 141 and the charge management module 140 may be set in the same device.
  • the wireless communication function of the electronic device 100 may be realized by components such as the antenna 1 , the antenna 2 , the mobile communication module 150 , the wireless communication module 160 , a modem processor, and a baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
  • the mobile communication module 150 may provide a wireless communication solution applied to the electronic device 100, such as at least one of the following solutions: a second generation (2 th generation, 2G) mobile communication solution, a third generation (3 th generation, 3G) mobile communication solutions, fourth generation ( 4th generation, 4G) mobile communication solutions, fifth generation ( 5th generation, 5G) mobile communication solutions.
  • a wireless communication solution applied to the electronic device 100 such as at least one of the following solutions: a second generation (2 th generation, 2G) mobile communication solution, a third generation (3 th generation, 3G) mobile communication solutions, fourth generation ( 4th generation, 4G) mobile communication solutions, fifth generation ( 5th generation, 5G) mobile communication solutions.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is passed to the application processor after being processed by the baseband processor.
  • the application processor outputs a sound signal through an audio device (for example, a speaker 170A, a receiver 170B), or displays an image or video through a display screen 194 .
  • the modem processor may be a stand-alone device. In some other embodiments, the modem processor may be independent from the processor 110, and be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can also provide a wireless communication solution applied to the electronic device 100, such as at least one of the following solutions: wireless local area networks (wireless local area networks, WLAN), Bluetooth (bluetooth, BT ), Bluetooth low energy (bluetooth low energy, BLE), ultra wide band (ultra wide band, UWB), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication (near field communication, NFC), infrared (infrared, IR) technology.
  • wireless local area networks wireless local area networks, WLAN
  • Bluetooth bluetooth, BT
  • Bluetooth low energy bluetooth low energy
  • UWB ultra wide band
  • UWB global navigation satellite system
  • GNSS global navigation satellite system
  • frequency modulation frequency modulation, FM
  • NFC near field communication
  • infrared infrared
  • the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 of the electronic device 100 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other electronic devices through wireless communication technology.
  • the electronic device 100 can realize the display function through the GPU, the display screen 194 and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • Display 194 may be used to display images or video.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible Light-emitting diode (flex light-emitting diode, FLED), mini light-emitting diode (mini light-emitting diode, Mini LED), micro light-emitting diode (micro light-emitting diode, Micro LED), micro OLED (Micro OLED) or quantum dot light emitting Diodes (quantum dot light emitting diodes, QLED).
  • the electronic device 100 may include 1 or N display screens 194 , where N is a positive integer greater than 1.
  • the electronic device 100 can realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 , and the application processor.
  • the ISP is used for processing the data fed back by the camera 193 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can optimize the algorithm of image noise, brightness and color, and ISP can also optimize parameters such as exposure and color temperature of the shooting scene.
  • the ISP may be located in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard red green blue (red green blue, RGB), YUV and other image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • the camera 193 can acquire a video image frame, and the video image frame can refer to a full-size image frame collected; the camera 193 can transmit the acquired video image frame to the ISP, and the ISP is used for Process the video image frame obtained by the camera 193; for example, the ISP can obtain the parameters of the target resolution and cropping processing and scaling processing from the processor 110; the ISP can adjust the full-size video image frame to the target resolution size according to the target resolution ; According to the parameters of the cropping processing and zooming processing, the video image frame of the target resolution is clipped and zoomed to obtain the processed video image frame, and the processed video image frame meets the requested resolution size issued by the application program; The processed video image frames are transmitted to the application program, and the display screen 194 displays the processed video image frames.
  • calculation of video stream target resolution, face detection, cropping and scaling parameter calculation may be performed in the processor 110 .
  • the relevant steps of determining parameters in the video processing method of the present application may be executed in the processor 110; the ISP is used to obtain relevant parameters for processing video image frames, and process the video image frames according to the relevant parameters to obtain suitable electronic parameters.
  • the display screen 194 of the device displays the output image frame of the specification.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3 and MPEG4.
  • the external memory interface 120 can be used to connect an external memory card, such as a secure digital (secure digital, SD) card, so as to expand the storage capacity of the electronic device 100 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. Such as saving music, video and other files in the external memory card.
  • the internal memory 121 may be used to store computer-executable program codes including instructions.
  • the internal memory 121 may include an area for storing programs and an area for storing data.
  • the electronic device 100 can implement audio functions, such as music playing and recording, through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor.
  • audio functions such as music playing and recording, through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and can also be used to convert analog audio input into digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • Speaker 170A also known as a horn, is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music or make a hands-free call through the speaker 170A.
  • Receiver 170B also known as an earpiece, is used to convert audio electrical signals into audio signals.
  • pressure sensor 180A may be disposed on display screen 194 .
  • pressure sensor 180A may be a resistive pressure sensor, an inductive pressure sensor or a capacitive pressure sensor.
  • the capacitive pressure sensor may include at least two parallel plates with conductive materials.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example: when the touch operation with the touch operation intensity less than the first pressure threshold acts on the short message application icon, execute the instruction of viewing the short message; when the touch operation with the intensity greater than or equal to the first pressure threshold acts on the short message application icon , to execute the instruction of creating a new short message.
  • the gyro sensor 180B can be used to determine the motion posture of the electronic device 100 .
  • the angular velocity of the electronic device 100 around three axes may be determined by the gyro sensor 180B.
  • the gyro sensor 180B can be used for image stabilization. For example, when the shutter is pressed, the gyro sensor 180B detects the shaking angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shaking of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used in scenarios such as navigation and somatosensory games.
  • the air pressure sensor 180C is used to measure air pressure.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 may use the magnetic sensor 180D to detect the opening and closing of the flip leather case.
  • the acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally x-axis, y-axis and z-axis). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. The acceleration sensor 180E can also be used to identify the posture of the electronic device 100 as an input parameter for application programs such as horizontal and vertical screen switching and pedometer.
  • the distance sensor 180F is used to measure distance.
  • the electronic device 100 may measure the distance by infrared or laser. In some embodiments, for example, in a shooting scene, the electronic device 100 can use the distance sensor 180F for distance measurement to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light-emitting diode (LED) and a light detector, such as a photodiode.
  • the LEDs may be infrared LEDs.
  • the electronic device 100 emits infrared light through the LED.
  • Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When the reflected light is detected, the electronic device 100 may determine that there is an object nearby. When no reflected light is detected, the electronic device 100 may determine that there is no object nearby.
  • the electronic device 100 can use the proximity light sensor 180G to detect whether the user is holding the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used for automatic unlocking and automatic screen locking in leather case mode or pocket mode.
  • the ambient light sensor 180L is used for sensing ambient light brightness.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket, so as to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to implement functions such as unlocking, accessing the application lock, taking pictures, and answering incoming calls.
  • the temperature sensor 180J is used to detect temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to implement a temperature treatment strategy. For example, when the temperature reported by the temperature sensor 180J exceeds the threshold, the electronic device 100 may reduce the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
  • the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to prevent the electronic device 100 from being shut down abnormally due to the low temperature.
  • the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • the touch sensor 180K is also referred to as a touch device.
  • the touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a touch screen.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor 180K may transmit the detected touch operation to the application processor to determine the touch event type.
  • Visual output related to the touch operation can be provided through the display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 and disposed at a different position from the display screen 194 .
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the pulse of the human body and receive the blood pressure beating signal.
  • Keys 190 include a power key and a volume key.
  • the key 190 can be a mechanical key or a touch key.
  • the electronic device 100 may receive a key input signal, and implement a function related to the key input signal.
  • the motor 191 can generate vibrations.
  • the motor 191 can be used for notification of incoming calls, and can also be used for touch feedback.
  • the motor 191 can generate different vibration feedback effects for touch operations on different application programs. For touch operations acting on different areas of the display screen 194, the motor 191 can also generate different vibration feedback effects. Different application scenarios (for example, time reminder, receiving information, alarm clock and games) may correspond to different vibration feedback effects.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging status and the change of the battery capacity, and can also be used to indicate messages, missed calls and notifications.
  • the SIM card interface 195 is used for connecting a SIM card.
  • the SIM card can be inserted into the SIM card interface 195 to realize contact with the electronic device 100 , and can also be pulled out from the SIM card interface 195 to realize separation from the electronic device 100 .
  • the hardware system of the electronic device 100 is described in detail above, and the software system of the electronic device 100 is introduced below.
  • the software system may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture.
  • the embodiment of the present application uses a layered architecture as an example to exemplarily describe the software system of the electronic device 100 .
  • a software system adopting a layered architecture is divided into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the software system can be divided into four layers, which are application program layer, application program framework layer, Android Runtime (Android Runtime) and system library, and kernel layer respectively from top to bottom.
  • the application layer can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • the video processing method of the embodiment of the present application can be applied to camera applications or video applications; for example, the "follow people” function can be enabled in the settings of the electronic device, and after the electronic device detects that the video application program requests to open the camera , you can turn on the "Shadow Follower” function; or, you can enable the "Shadow Follower” function in the camera application.
  • “ function; the function of "following people's movement” can refer to the description in the follow-up Figure 3.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer can include some predefined functions.
  • the application framework layer includes window managers, content providers, view systems, telephony managers, resource managers, and notification managers.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display, determine whether there is a status bar, lock the screen, and capture the screen.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • the data may include video, images, audio, calls made and received, browsing history and bookmarks, and phonebook.
  • the view system includes visual controls, such as those that display text and those that display pictures.
  • the view system can be used to build applications.
  • the display interface may be composed of one or more views, for example, a display interface including an SMS notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the electronic device 100, such as management of call status (connected or hung up).
  • the resource manager provides various resources to the application, such as localized strings, icons, pictures, layout files, and video files.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction.
  • the Android Runtime includes core library and virtual machine. The Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application program layer and the application program framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the system library can include multiple functional modules, such as: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: open graphics library for embedded systems (open graphics library for embedded systems, OpenGL ES) and 2D graphics engine (for example: skia graphics library (skia graphics library, SGL)).
  • surface manager surface manager
  • media library Media Libraries
  • three-dimensional graphics processing library for example: open graphics library for embedded systems (open graphics library for embedded systems, OpenGL ES)
  • 2D graphics engine for example: skia graphics library (skia graphics library, SGL)
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D layers and 3D layers for multiple applications.
  • the media library supports playback and recording of multiple audio formats, playback and recording of multiple video formats, and still image files.
  • the media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, moving picture experts group audio layer III (MP3), advanced audio coding (AAC), auto Adaptive multi-rate (adaptive multi-rate, AMR), joint photographic experts group (joint photographic experts group, JPG) and portable network graphics (portable network graphics, PNG).
  • MP3 moving picture experts group audio layer III
  • AAC advanced audio coding
  • AMR auto Adaptive multi-rate
  • JPG joint photographic experts group
  • portable network graphics portable network graphics
  • the 3D graphics processing library can be used to implement 3D graphics drawing, image rendering, compositing and layer processing.
  • the 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer may include driver modules such as display driver, camera driver, audio driver and sensor driver.
  • a corresponding hardware interrupt is sent to the kernel layer, and the kernel layer processes the touch operation into an original input event.
  • the original input event includes information such as touch coordinates and a time stamp of the touch operation.
  • the original input event is stored in the kernel layer, and the application framework layer obtains the original input event from the kernel layer, identifies the control corresponding to the original input event, and notifies the corresponding application (application, APP) of the control.
  • the above-mentioned touch operation is a single-click operation
  • the APP corresponding to the above-mentioned control is a camera APP. After the camera APP is awakened by the single-click operation, it can call the camera driver of the kernel layer through the API, and control the camera 193 to take pictures through the camera driver.
  • FIG. 3 is a schematic diagram of an application scenario applicable to the present application, that is, the schematic diagram shown in FIG. 3 may refer to a scenario of “moving with shadows”.
  • the principle of "following people's movements” may mean that the camera of the electronic device performs large-resolution acquisition according to a fixed field of view, detects and tracks the user on the captured video image frames, and locates the user's position in real time;
  • the large-resolution video image frame can be cropped and scaled according to the real-time positioning of the user's position, and a small-resolution image that adapts to the display specifications and the user is located in a specific area of the image can be obtained, thereby realizing real-time monitoring according to the user's position. Adjust the display screen to achieve the effect of "moving with people".
  • the electronic device is a tablet device as an example, as shown in (a) in FIG. 211 and a control 212 for instructing video recording; before detecting that the user clicks on the control 212, a preview image may be displayed in the viewfinder frame 211.
  • the tablet device can perform video shooting; when the first subject is at the first position, the first image frame as shown in (a) in Figure 3 is displayed; during video shooting During the process, the first photographed object moves; for example, the first photographed object moves from the first position to the second position, and the third image frame as shown in (b) in FIG.
  • the first shooting object can always remain in the middle position in the viewfinder frame 211, and this shooting function is the "shadow follower" function;
  • the position of the tablet device can be kept unchanged, and after the subject moves, the subject can always be displayed in the middle position or the middle area of the video display screen.
  • the first photographed object when the first photographed object is at the first position, the first photographed object is located in the first area in the first image frame; when the first photographed object moves to the second position, the first photographed object is located in the first area The second area in the three image frames; wherein, there is an intersection between the first area and the second area.
  • intersection between the first area and the second area may mean that the first area and the second area partly overlap, as shown in (a) in FIG. 4 and (b) in FIG. 4 .
  • intersection between the first area and the second area may mean that the first area and the second area completely overlap, as shown in (c) in FIG. 4 .
  • the first area and the second area may be located in a middle area of the display screen, and there is an intersection between the first area and the second area.
  • the user tracking in the video display screen is usually realized by performing human body detection on the user, so as to realize the function of "following people's movement".
  • Points may include but are not limited to: head, shoulders, arms, hands, legs, feet, eyes, nose, mouth, and clothes; High performance requirements.
  • the embodiment of the present application provides a video processing method.
  • the video image frame of the target object is obtained after the target object moves, and the face detection is performed on the video image frame to determine the target object
  • the coordinate information of the face frame is obtained according to the coordinate information of the face frame
  • the video image frame is trimmed according to the crop frame to obtain the display content; in the embodiments of the application, due to the The coordinate information of the frame is used to obtain the coordinate information of the cropping frame, so compared with the scheme of directly detecting the key points of the human body of the target object to determine the cropping frame, the video processing method of the present application can reduce the amount of computation of the electronic device and reduce the power of the electronic device.
  • the video processing method of the present application is to determine the coordinate information of the cropping frame according to the face frame, it can avoid the video tracking display of the target object when the target object is facing away from the electronic device in the video image frame; therefore, the present application
  • the solution can also improve the accuracy of video tracking and display while reducing power consumption.
  • the video processing method provided in the embodiment of the present application may be used in a video mode, where the video mode may refer to the electronic device performing video shooting; or, the video mode may also refer to the electronic device performing video calls.
  • the function of "following people” can be set in the setting interface of the electronic device. " function executes the video processing method of the embodiment of the present application.
  • the camera of the electronic device can be set to enable the "follow people” function, and according to the settings, the "follow people” function can be turned on when recording a video, and the implementation of the embodiment of the present application can be performed.
  • Video processing method
  • Fig. 5 is a schematic flowchart of a video processing method provided by an embodiment of the present application.
  • the video processing method 300 shown in FIG. 5 includes steps S301 to S316, and these steps will be described in detail below.
  • Step S301 request to turn on the camera.
  • an application program in an electronic device sends an instruction requesting to turn on the camera;
  • the application program may include but not limited to: WeChat video call application, video conferencing application, live video application, video recording application, camera application wait.
  • the camera application program of the electronic device when the camera application program of the electronic device is recording a video, it may request to open the camera.
  • it may be a request to turn on the camera when the user clicks the icon 411 of the camera application to shoot a video.
  • the WeChat video call application in the electronic device may request to turn on the camera.
  • FIG. 6 it may refer to a request to turn on the camera when the user clicks the icon 412 of the video application program to make a video call.
  • Step S302 the camera sensor detects an instruction requesting to turn on the camera, and the camera sensor acquires a video image frame (an example of a second image frame).
  • the aforementioned camera sensor may refer to the image sensor in the camera module; the video image frame may refer to the image frame acquired by the image sensor in real time when the user's position changes.
  • the resolution size of the video image frame acquired by the camera sensor may be a full size (full size).
  • Step S303 the application program issues a request resolution command.
  • the application program may issue a request resolution instruction requesting that the video resolution is w1*h1 (an example of the first resolution); the request video resolution may refer to the processed video saved in the electronic device The resolution of the image frame.
  • Step S304 calculating the target resolution of the video image frame (an example of the second resolution).
  • the resolution size requested by the application can be expanded to obtain the target resolution; for example, the requested w1*h1 resolution can be expanded by a certain factor to the resolution w2*h2 (w2>w1, h2>h1); where , the resolution w2*h2 may be the target resolution.
  • the resolution is extended from w1*h1 to w2*h2, which can solve the problem of the video image frame definition degradation caused by subsequent cropping processing; by performing resolution extension processing, it is possible to To a certain extent, the clarity of the video image frame after the cropping process is improved.
  • Step S305 the ISP processes the video image frame according to the target resolution to obtain the video image frame of the target resolution.
  • Step S306 performing face detection on the video image frame to obtain the coordinate information of the face frame (an example of the first detection frame).
  • an existing face detection algorithm may be used to perform face detection on video image frames acquired by a camera sensor to obtain coordinate information of the face frame.
  • the full-size video image frame can be down-sampled; for example, the The full-size video image frame is down-sampled to obtain a video image frame with a resolution of w3*h3; the face detection is performed on a video image frame with a resolution of w3*h3 (w3 ⁇ w1, h3 ⁇ h1), and the obtained The coordinate information of the face frame.
  • Step S307 judging whether to enable the identification of the owner of the device; if the identification of the owner of the device is enabled, execute step S308 ; if the identification of the owner of the device is not enabled, execute step S310 .
  • the owner identification is turned on, only the owner user in the video image frame can be tracked and displayed; when the owner identification is not turned on, all users in the video image frame can be tracked and displayed; the owner can refer to the The administrative user of the tablet device; alternatively, the owner can also be any pre-configured user with higher priority.
  • Step S308 performing face recognition according to the face frame.
  • the image information in the face frame can be determined according to the coordinate information of the face frame; the face recognition is performed on the image information in the face frame; when the face recognition is performed on the image information in the face frame, the Matching is performed according to the face information database pre-stored in the electronic device, so as to determine the identity of the user corresponding to the image information in the face frame.
  • the face information database includes the face information of the user of the device, and the user of the device can be determined by matching the face information database with the image information in the face frame.
  • the face detection in step S306 is used to detect the coordinate information of the face frame in the image, that is, the face detection is used to detect the face area in the image; the face recognition is used to identify the face area corresponding to user identity information.
  • Step S309 acquiring the coordinate information of the face frame of the owner user.
  • the owner user can be determined through step S308, so that the coordinate information of the face frame corresponding to the owner user can be determined.
  • the image frame may include the first user and the second user; if the owner identification is enabled, the acquired coordinate information of the face frame may refer to the coordinates of the face frame of the owner user 711 information, such as the coordinate information of the rectangular frame 710 .
  • Step S310 performing coordinate transformation on the coordinate information of the face frame.
  • the coordinate information of the four vertices of the rectangular frame 710 is converted to obtain the corresponding vertex coordinate information at the w2*h2 resolution, and then determine the w2*h2 resolution image of the rectangular frame 720 location information in .
  • step S310 is executed after step S307 is executed; step S310, performing coordinate transformation on the coordinate information of the face frame detected in step S306.
  • the video image frame includes a single user
  • the coordinate transformation of the face frame may refer to transforming the coordinate information of the four vertices of the rectangular frame 430 to obtain Corresponding vertex coordinate information, and then determine the position information of the rectangular frame 440 in the w2*h2 resolution image.
  • the coordinate transformation of the face frame may refer to transforming the coordinate information of the four vertices of the rectangular frame 510 to obtain The corresponding vertex coordinate information is determined, and then the position information of the rectangular frame 510 in the w2*h2 resolution image is determined, that is, the rectangular frame 520 is obtained.
  • Step S311 calculating the coordinate information of the human body frame (an example of the second detection frame) according to the coordinate information of the human face frame after the coordinate transformation.
  • the coordinate information of the human face frame after the coordinate conversion can be determined; according to the human body proportion data, the human face frame after the coordinate conversion is subjected to boundary extension processing (an example of the first extension processing), to obtain the coordinates of the human body frame information; wherein, the human body proportion data may be a preset value.
  • the upper boundary can be expanded outward by 0.5 times
  • the lower boundary can be expanded outward by 1.0 times
  • the left boundary and right boundary can be expanded outward by 0.75 times, with the rectangular frame of the face as the center and the rectangular frame of the human face as the reference.
  • the single-user face frame as shown in the rectangular frame 440 can be subjected to boundary extension processing (an example of the first boundary extension), and the coordinates of the human body frame as shown in the rectangular frame 450 can be obtained. information.
  • the multi-user human face frames shown in the rectangular frame 520 can be subjected to boundary extension processing (an example of the first boundary extension), and the two-user human body frames shown in the rectangular frame 530 can be obtained. coordinate information.
  • Step S312 calculating the coordinate information of the cropping frame according to the coordinate information of the human body frame.
  • boundary extension processing (an example of the second boundary extension) may be performed according to the body frame to obtain coordinate information of the cropping frame.
  • the body frame can be taken as the center, and the upper boundary and the lower boundary can be expanded outward by 0.025 times, and the left boundary and the right boundary can be expanded outward by 0.025 times respectively based on the human body frame, to obtain a cropping frame.
  • a single-user human body frame (an example of the second detection frame) shown in the rectangular frame 450 may be subjected to boundary extension processing (an example of the second boundary extension), and a rectangular frame such as Coordinate information of the cropping frame shown in 460 .
  • the multi-user human body frame (an example of the second detection frame) shown in the rectangular frame 530 may be subjected to boundary extension processing (an example of the second boundary extension), to obtain a rectangular frame such as Coordinate information of the cropping frame shown in 540 .
  • the output face frame may have local jumps in the time domain; in order to avoid The local jitter of the face frame ensures that the cropping frame remains unchanged when the user performs small movements; the cropping frame obtained by expanding the boundary of the human body frame can ensure the stability of the image frame after cropping to a certain extent sex.
  • Step S313 judging the conditions of cropping processing and zooming processing.
  • the preset condition may refer to that the human body frame and the cropping frame meet a certain proportional relationship, and the human body frame is located inside the cropping frame.
  • step S306 to step S312 are repeatedly executed to recalculate the coordinate information of the cropping frame.
  • Step S314 parameter calculation for cropping and scaling.
  • the parameters for the clipping and scaling processing of the video image frame by the ISP are calculated, And deliver the parameters to the ISP.
  • the cropping frame can be enlarged to a certain extent; for example, the cropping frame can be zoomed in as the center, The maximum can be enlarged to 2 times the size of the crop frame to crop the video image frame.
  • Step S315 the ISP receives the parameters of the cropping and scaling processing, and the ISP performs cropping and scaling processing on the video image frame.
  • the ISP cuts the video image frame according to the coordinate information of the cropping frame to obtain the display content; it can scale the display content according to the requested resolution, so that the processed video image frame meets the requested resolution.
  • Step S316 displaying the video image frame (an example of the third image frame) in the application program.
  • the video image frame that has been cropped and scaled by the ISP is transmitted to the application program, and the video image frame is displayed in the application program.
  • the resolution size of the video image frame after the ISP cropping and scaling processing is the resolution size requested in step S303; the video image frame processed by the ISP is transmitted to the application program, and is resolved according to the display screen of the electronic device The video image frame is displayed at a rate size suitable for the display specifications of the electronic device.
  • the video image frame of the target object is obtained after the target object moves, the face detection is performed on the video image frame, the coordinate information of the face frame of the target object is determined, and the coordinate information of the face frame is obtained according to the coordinate information of the face frame Coordinate information of the cropping frame; further according to the clipping frame, the video image frame is clipped to obtain the display content; in the embodiment of the application, since the coordinate information of the cropping frame is determined by the coordinate information of the face frame, it is directly related to the target Compared with the scheme of detecting the key points of the human body of the object to determine the cropping frame, the video processing method of the present application can reduce the amount of computation of the electronic device and reduce the power consumption of the electronic device; in addition, since the video processing method of the present application is based on the face frame Determining the coordinate information of the cropping frame can prevent the target object from performing video tracking and display on the target object when its back faces the electronic device in the video image frame; therefore, the solution of the present application can also improve the
  • the electronic device is illustrated as a tablet device;
  • FIG. 6 shows a graphical user interface (graphical user interface, GUI) of the tablet device, and the GUI is a desktop 410 of the tablet device; the desktop 410 may include a camera An icon 411 of the application program and an icon 412 of the video application program.
  • GUI graphical user interface
  • a single user may be included in the video preview screen, and the video screen will automatically track this user at this time.
  • FIG. 7 is a display interface of a user using a tablet device to conduct a video call; as shown in FIG. box, a control to indicate Cancel, and a control to indicate Go to Speech.
  • the camera of the tablet device collects a preview image with a fixed field of view and displays the display interface as shown in Figure 7; after the other party connects to the video call, the display as shown in Figure 8 can be displayed interface.
  • the electronic device may enable the function of "moving with the shadow", and the preview image collected by the camera may be cropped and zoomed through the video processing method provided by the embodiment of the present application, and the processing is suitable for the tablet device.
  • a video image of the specification is displayed.
  • the video processing method provided in the embodiment of the present application will be executed.
  • FIG. 9 the processing shown in FIG. 9 is executed by a processor inside the tablet device or a chip configured in the tablet device, and the processing process will not be displayed on the display interface.
  • step S306 shown in FIG. 5 above may be to obtain a rectangular frame 430 as shown in FIG. 430 is converted to a rectangular frame 440, which represents the face frame after coordinate transformation.
  • the coordinate information of the four vertices of the rectangular frame 430 is converted to obtain the corresponding vertex coordinate information at the w2*h2 resolution, and then the position information of the rectangular frame 440 in the w2*h2 resolution image is determined.
  • the coordinate information of the rectangular frame 450 can be obtained by performing boundary extension processing according to the rectangular frame 440 , and the rectangular frame 450 represents a single user's body frame.
  • step S312 as shown in FIG. 9 , the coordinate information of the rectangular frame 460 can be obtained by performing boundary extension processing according to the rectangular frame 450 , and the rectangular frame 460 represents a single user's cropping frame.
  • the rectangular frame 450 and the rectangular frame 460 meet the preset conditions, determine the parameters of the cropping process and the scaling process according to the coordinate information of the rectangular frame 460 and the coordinate information of the video image frame;
  • the image frame is cropped and scaled to obtain an output video image frame suitable for the display specification of the tablet device.
  • the cropped display content can be obtained according to the cropping frame 460; the displayed content can be scaled according to the requested resolution size to obtain the processed video image frame; the processed video The image frame is sent to the video call application program, and the video image frame suitable for the display specification of the tablet device is obtained according to the resolution of the display screen of the tablet device.
  • multiple users may be included in the video preview screen, and the video screen may be automatically adjusted according to the positions of all users to ensure that all users are displayed in the video screen.
  • Fig. 10 is a display interface of a user using a tablet device for video calling; as shown in Fig. There are controls to indicate cancel and controls to indicate turn to speech.
  • the tablet device may display a display interface as shown in FIG. 11 .
  • FIG. 12 The process of obtaining the video image frame shown in FIG. 11 will be described in detail with reference to FIG. 12 . It should be understood that the processing shown in FIG. 12 is executed by a processor inside the tablet device or a chip configured in the tablet device, and the processing process will not be displayed on the display interface.
  • step S306 shown in FIG. 5 can determine the minimum union set including all multi-user face frames according to the coordinate information of each user's face frame as shown in FIG. 12 Coordinate information of the frame, a multi-user face frame such as the rectangular frame 510 .
  • Step S310 may transform the matrix frame 510 into a rectangular frame 520 as shown in FIG. 12 , and the rectangular frame 520 represents the face frame after coordinate transformation. For example, the coordinate information of the four vertices of the rectangular frame 510 is converted to obtain the corresponding vertex coordinate information at the w2*h2 resolution, and then the position information of the rectangular frame 520 in the w2*h2 resolution image is determined.
  • step S311 the coordinate information of the rectangular frame 530 can be obtained by performing boundary extension processing according to the rectangular frame 520 , and the rectangular frame 530 represents the body frames of multiple users.
  • step S312 as shown in FIG. 12 , performs boundary extension processing according to the rectangular frame 530 to obtain the coordinate information 540 , and the rectangular frame 540 represents the cropping frame of multiple users.
  • the rectangular frame 540 and the rectangular frame 550 meet the preset conditions, determine the parameters of the cropping process and the scaling process according to the coordinate information of the rectangular frame 550 and the coordinate information of the video image frame;
  • the image frame is cropped and scaled to obtain an output video image frame suitable for the display specification of the tablet device.
  • the cropped display content can be obtained according to the cropping frame 540; the displayed content can be scaled according to the requested resolution size to obtain the processed video image frame; the processed video The image frame is sent to the video call application program, and the video image frame suitable for the display specification of the tablet device is obtained according to the resolution of the display screen of the tablet device.
  • the multi-user automatic mirror movement is compared with the single-user automatic mirror movement; when the multi-user automatic mirror movement determines the coordinate information of the multi-user face frame, the multi-user is determined according to the minimum union of the face frames of each user in the multi-user.
  • the coordinate information of the user's face frame; the rest of the steps are the same as the single-user mirror operation, please refer to the relevant content description of the single-user mirror operation, and will not be repeated here.
  • the shooting scene corresponding to the video call interface 503 may include a first shooting object 504, a second shooting object 505, and a third shooting object 506;
  • the object 505 is the user's face facing the camera, and the third object 506 is the user's face facing the camera; therefore, the video processing method provided according to the embodiment of the present application cannot detect the third object when performing face detection.
  • the coordinate information of the face frame of the subject 506 can be tracked and displayed on the first subject 504 and the second subject 505 during user tracking, and the third subject 506 cannot be tracked and displayed; that is, the first subject 504 After moving with the second object 505, the first object 504 and the second object 505 can be tracked and displayed, so that the first object 504 and the second object 505 can always remain in the middle area of the video display screen; for example The display interface shown in Figure 14.
  • the owner tracking mode can be turned on, and multiple users can be included in the video preview screen, face detection and face recognition are performed on multiple users to determine the target user is the owner user, and the video screen can realize the tracking of the owner user. Do automatic tracking.
  • FIG. 15 is a setting display interface for video passing; as shown in FIG. 15 , the main character mode can be turned on in the setting display interface 601 , and the main character mode can refer to turning on the owner identification as shown in FIG. 5 .
  • 16 is a display interface of a user using a tablet device for a video call; the display interface may include a video call interface 602, and the video call interface 602 may include the first subject, a control for indicating cancellation, and a control for indicating to switch to Voice controls.
  • a display interface as shown in Figure 17 can be displayed.
  • the electronic device turns on the function of "moving with the shadow", and the video processing method provided by the embodiment of the present application performs cropping and scaling processing on the preview image captured by the camera, and processes it into a display suitable for a tablet device specifications of the video image.
  • the video processing method provided in the embodiment of the present application will be executed.
  • FIG. 18 The process of obtaining the video image frame shown in FIG. 17 will be described in detail with reference to FIG. 18 . It should be understood that the processing shown in FIG. 18 is executed by a processor inside the tablet device or a chip configured in the tablet device, and the processing process will not be displayed on the display interface.
  • step S306 shown in FIG. 5 may acquire the coordinate information of the rectangular frame 710 as shown in FIG. 18 .
  • Step S310 may transform the matrix frame 710 into a rectangular frame 720 as shown in FIG. 18 , and the rectangular frame 720 represents the face frame of the owner user after coordinate transformation.
  • the coordinate information of the four vertices of the rectangular frame 710 is converted to obtain the corresponding vertex coordinate information at the w2*h2 resolution, and then the position information of the rectangular frame 720 in the w2*h2 resolution image is determined.
  • step S311 as shown in FIG.
  • the coordinate information of the rectangular frame 730 can be obtained by performing boundary expansion processing according to the rectangular frame 720 , and the rectangular frame 730 represents the body frame of the owner user.
  • the coordinate information of 740 can be obtained by performing border extension processing according to the rectangular frame 730 , and the rectangular frame 740 can represent the cropping frame of the owner user.
  • the parameters of the cropping process and the scaling process can be determined according to the coordinate information of the rectangular frame 740 and the coordinate information of the video image frame;
  • the video image frame is clipped and scaled to obtain an output video image frame suitable for the display specification of the tablet device.
  • the cropped display content can be obtained according to the cropping frame 740; the displayed content can be scaled according to the issued request resolution to obtain the processed video image frame; the processed video The image frame is sent to the video call application program, and the video image frame suitable for the display specification of the tablet device is obtained according to the resolution of the display screen.
  • the automatic mirror movement of the owner user is compared with the automatic mirror movement of a single user; the automatic mirror movement of the owner user will carry out the face frame of each user after determining the coordinate information of the face frame of each user among the multi-users.
  • Face recognition so as to determine the coordinate information of the face frame of the owner user; the rest of the steps are the same as single-user mirror operation, please refer to the relevant content description of single-user mirror operation, and will not repeat them here.
  • the coordinate information of the face frame of the target object is determined, and the coordinate information of the cropping frame is obtained according to the coordinate information of the human face frame;
  • the image frame is processed to display the output video image frame; in the embodiment of the present application, since the coordinate information of the cropping frame is determined by the coordinate information of the face frame, it is different from directly detecting the key points of the human body of the target object to determine the cropping frame.
  • the video processing method of the present application can reduce the computation load of the electronic device and reduce the power consumption of the electronic device; in addition, since the video processing method of the present application determines the coordinate information of the cropping frame according to the face frame, it can avoid the When facing back to the electronic device in the second image frame, video tracking and display is performed on the target object; therefore, the solution of the present application can also improve the accuracy of video tracking and display while reducing power consumption.
  • the video processing method provided by the embodiment of the present application is described in detail above with reference to FIG. 1 to FIG. 18 ; the device embodiment of the present application will be described in detail below in conjunction with FIG. 19 and FIG. 20 . It should be understood that the devices in the embodiments of the present application can execute the various methods in the foregoing embodiments of the present application, that is, the specific working processes of the following various products can refer to the corresponding processes in the foregoing method embodiments.
  • FIG. 19 is a schematic structural diagram of a video processing device provided by an embodiment of the present application.
  • the video processing device 800 includes a display unit 810 and a processing unit 820 .
  • the display unit 810 is configured to display a first image frame, and the first image frame is an image frame of the target object at the first position;
  • the processing unit 820 is configured to display the target object when the target object moves to the second position , acquiring a second image frame, the second position is different from the first position, and the second image frame refers to an image frame collected by the electronic device when the target object moves to the second position ;
  • the display unit 810 is also configured to obtain the display content according to the The above display content displays a third image frame, wherein there is an intersection between the first area in the first image frame and the second area in the third image frame, and the first area refers to the first
  • the video processing device when the first image frame and the third image frame are displayed, the video processing device is located at the same position.
  • processing unit 820 is further configured to:
  • An action is detected indicating that the camera application is running; or,
  • processing unit 820 is specifically configured to:
  • the first expansion process refers to expanding the boundary of the first detection frame with the first detection frame as the center
  • the second detection frame is used to indicate that the body of the target object is within the first detection frame.
  • the second extension process refers to extending the boundary of the second detection frame with the second detection frame as the center.
  • processing unit 820 is specifically configured to:
  • the first expansion process is performed on the first detection frame according to a first threshold to obtain the second detection frame, and the first threshold is used to indicate body proportion data.
  • processing unit 820 is specifically configured to:
  • the preset condition means that the second detection frame and the cropping frame satisfy a preset proportional relationship
  • the second image frame is cropped according to the cropping frame to obtain the display content.
  • the coordinate information of the first detection frame refers to the coordinate information corresponding to the first detection frame when the second image frame is of the second resolution
  • processing unit 820 is specifically configured to:
  • the display unit 810 is used for:
  • the target object is the owner user
  • the processing unit 820 is specifically configured to:
  • the owner identification instruction is used to instruct to identify the owner user
  • Face recognition is performed according to the first detection frame to determine the owner user, and the owner user is a pre-configured user.
  • the first detection frame refers to a face frame of the owner user of the device.
  • the target object includes at least one user.
  • the target object includes a first user and a second user
  • the first detection frame refers to the difference between the face frame of the first user and the face frame of the second user.
  • Union box
  • the first area coincides with the second area.
  • the video processing apparatus 800 is embodied in the form of functional units.
  • the term “unit” here may be implemented in the form of software and/or hardware, which is not specifically limited.
  • a "unit” may be a software program, a hardware circuit or a combination of both to realize the above functions.
  • the hardware circuitry may include application specific integrated circuits (ASICs), electronic circuits, processors (such as shared processors, dedicated processors, or group processors) for executing one or more software or firmware programs. etc.) and memory, incorporating logic, and/or other suitable components to support the described functionality.
  • ASICs application specific integrated circuits
  • processors such as shared processors, dedicated processors, or group processors for executing one or more software or firmware programs. etc.
  • memory incorporating logic, and/or other suitable components to support the described functionality.
  • the units of each example described in the embodiments of the present application can be realized by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
  • FIG. 20 shows a schematic structural diagram of an electronic device provided by the present application.
  • the dotted line in FIG. 20 indicates that this unit or this module is optional, and the electronic device 900 can be used to implement the video processing method described in the foregoing method embodiments.
  • the electronic device 900 includes one or more processors 901, and the one or more processors 901 can support the electronic device 900 to implement the method in the method embodiment.
  • the processor 901 may be a general purpose processor or a special purpose processor.
  • the processor 901 may be a central processing unit (central processing unit, CPU), a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices such as discrete gates, transistor logic devices, or discrete hardware components.
  • the processor 901 can be used to control the electronic device 900, execute software programs, and process data of the software programs.
  • the electronic device 900 may further include a communication unit 905, configured to implement input (reception) and output (send) of signals.
  • the electronic device 900 can be a chip, and the communication unit 905 can be an input and/or output circuit of the chip, or the communication unit 905 can be a communication interface of the chip, and the chip can be used as a component of a terminal device or other electronic devices .
  • the electronic device 900 may be a terminal device, and the communication unit 905 may be a transceiver of the terminal device, or the communication unit 905 may be a transceiver circuit of the terminal device.
  • the electronic device 900 may include one or more memories 902, on which a program 904 is stored, and the program 904 may be run by the processor 901 to generate an instruction 903, so that the processor 901 executes the video processing described in the above method embodiment according to the instruction 903 method.
  • data may also be stored in the memory 902 .
  • the processor 901 may also read data stored in the memory 902, the data may be stored in the same storage address as the program 904, and the data may also be stored in a different storage address from the program 904.
  • the processor 901 and the memory 902 may be set separately, or may be integrated together; for example, integrated on a system-on-chip (system on chip, SOC) of a terminal device.
  • SOC system on chip
  • the memory 902 can be used to store the related program 904 of the video processing method provided in the embodiment of the present application
  • the processor 901 can be used to call the related program 904 of the video processing method stored in the memory 902 during video processing, and execute The video processing method of the embodiment of the present application; for example, displaying a first image frame, where the first image frame is an image frame of a target object at a first position; when the target object moves to a second position, acquiring a second image frame; The second position is different from the first position, and the second image frame refers to the image frame collected by the electronic device when the target object moves to the second position; the face detection is performed according to the second image frame, and the coordinate information of the first detection frame is obtained , the first detection frame is used to indicate the position information of the face of the target object in the second image frame; the coordinate information of the cropping frame is obtained according to the first detection frame; the second image frame is cropped according to the cropping frame to obtain the The display content of the object; the third image frame is
  • the present application also provides a computer program product, which implements the video processing method described in any method embodiment in the present application when the computer program product is executed by the processor 901 .
  • the computer program product can be stored in the memory 902 , such as a program 904 , and the program 904 is finally converted into an executable object file executable by the processor 901 through processes such as preprocessing, compiling, assembling and linking.
  • the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the video processing method described in any method embodiment in the present application is implemented.
  • the computer program may be a high-level language program or an executable object program.
  • the computer-readable storage medium is, for example, the memory 902 .
  • the memory 902 may be a volatile memory or a nonvolatile memory, or, the memory 902 may include both a volatile memory and a nonvolatile memory.
  • the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM enhanced synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • serial link DRAM SLDRAM
  • direct memory bus random access memory direct rambus RAM, DR RAM
  • the disclosed systems, devices and methods may be implemented in other ways. For example, some features of the method embodiments described above may be omitted, or not implemented.
  • the device embodiments described above are only illustrative, and the division of units is only a logical function division. In actual implementation, there may be other division methods, and multiple units or components may be combined or integrated into another system.
  • the coupling between the various units or the coupling between the various components may be direct coupling or indirect coupling, and the above coupling includes electrical, mechanical or other forms of connection.
  • sequence numbers of the processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • system and “network” are often used herein interchangeably.
  • the term “and/or” in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and A and B exist alone. There are three cases of B.
  • the character "/" in this article generally indicates that the contextual objects are an "or” relationship.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)
  • Studio Devices (AREA)

Abstract

Provided in the present application are a video processing method, and an electronic device. The video processing method is applied to an electronic device, and comprises: displaying a first image frame, wherein the first image frame is an image frame of a target object at a first position; when the target object moves to a second position, acquiring a second image frame, wherein the second position is a different position from the first position, and the second image frame refers to an image frame which is collected by an electronic device when the target object moves to the second position; performing facial detection according to the second image frame, so as to obtain coordinate information of a first detection box; obtaining coordinate information of a cropping box according to the first detection box; cropping the second image frame according to the cropping box, so as to obtain display content comprising the target object; and displaying a third image frame according to the display content, wherein there is an intersection between a first area in the first image frame and a second area in the third image frame. On the basis of the technical solution of the present application, where the power consumption is reduced, the accuracy of video processing can be improved.

Description

视频处理方法和电子设备Video processing method and electronic device
本申请要求于2021年08月31日提交国家知识产权局、申请号为202111016638.0、申请名称为“视频处理方法和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the State Intellectual Property Office on August 31, 2021, with application number 202111016638.0 and application name "Video Processing Method and Electronic Equipment", the entire contents of which are incorporated by reference in this application .
技术领域technical field
本申请涉及图像处理领域,具体涉及一种视频处理方法和电子设备。The present application relates to the field of image processing, in particular to a video processing method and electronic equipment.
背景技术Background technique
随着图像技术的飞速发展,用户对视频录制功能的需求不断增加。例如,通过相机应用录制视频、在视频通话过程中录制视频、以及监控场景中录制视频等。以视频通话为例,用户可以使用电子设备进行视频通话拍摄;如果拍摄对象移动时,电子设备的取景范围不进行调整,则会出现拍摄对象在视频画面中的位置偏向屏幕边缘,使得视频显示画面的用户感官体验欠佳。为获得更好的视频体验,确保拍摄对象能够始终保持在视频显示画面的中间位置,提出了“影随人动”的功能。With the rapid development of image technology, users' demand for video recording function is increasing. For example, recording video through the camera application, recording video during a video call, and recording video in a surveillance scene, etc. Taking a video call as an example, the user can use an electronic device to shoot a video call; if the viewing range of the electronic device is not adjusted when the subject moves, the position of the subject in the video screen will deviate from the edge of the screen, making the video display The user sensory experience is not good. In order to obtain a better video experience and ensure that the subject can always be kept in the middle of the video display screen, the function of "moving with people" is proposed.
目前,通常通过对用户进行人体检测实现视频显示画面中的用户跟踪,从而实现“影随人动”的功能;但是,对用户进行人体检测的运算量较大,从而导致对电子设备的性能要求较高,而且,对用户进行人体检测还容易造成误识别。At present, the user tracking in the video display screen is usually realized by detecting the user's human body, so as to realize the function of "following the person's movement". Higher, moreover, the human body detection of the user is also likely to cause misidentification.
因此,在降低功耗的情况下,如何准确的进行视频处理,实现“影随人动”功能成为一个亟需解决的问题。Therefore, in the case of reducing power consumption, how to accurately process video and realize the function of "moving with people" has become an urgent problem to be solved.
发明内容Contents of the invention
本申请提供了一种视频处理方法和电子设备,通过本申请实施例的视频处理方法能够在降低功耗的情况下,准确地对视频进行处理,提高“影随人动”功能的准确性。The present application provides a video processing method and electronic equipment. Through the video processing method of the embodiment of the present application, the video can be accurately processed while reducing power consumption, and the accuracy of the "follow people's movement" function can be improved.
第一方面,提供了一种视频处理方法,该视频处理方法应用于电子设备,包括:In the first aspect, a video processing method is provided, and the video processing method is applied to electronic equipment, including:
显示第一图像帧,所述第一图像帧为目标对象在第一位置的图像帧;在所述目标对象移动至第二位置的情况下,获取第二图像帧,所述第二位置与所述第一位置为不同位置,所述第二图像帧是指所述目标对象移动至所述第二位置时所述电子设备采集的图像帧;根据所述第二图像帧进行人脸检测,得到第一检测框的坐标信息,所述第一检测框用于指示所述目标对象的脸部在所述第二图像帧中的位置信息;根据所述第一检测框得到裁剪框的坐标信息;根据所述裁剪框对所述第二图像帧进行裁剪处理,得到包括所述目标对象的显示内容;根据所述显示内容显示第三图像帧,所述第一图像帧中的第一区域与所述第三图像帧中的第二区域存在交集,所述第一区域是指所述第一图像帧中所述目标对象的所在区域,所述第二区域是指所述第三图像帧中所述目标对象的所在区域。Displaying a first image frame, the first image frame is an image frame of the target object at a first position; when the target object moves to a second position, acquiring a second image frame, the second position is the same as the second position The first position is a different position, and the second image frame refers to an image frame collected by the electronic device when the target object moves to the second position; face detection is performed according to the second image frame to obtain Coordinate information of the first detection frame, the first detection frame is used to indicate the position information of the face of the target object in the second image frame; obtain the coordinate information of the cropping frame according to the first detection frame; Perform cropping processing on the second image frame according to the cropping frame to obtain display content including the target object; display a third image frame according to the display content, where the first region in the first image frame is consistent with the target object. The second area in the third image frame has an intersection, the first area refers to the area where the target object is located in the first image frame, and the second area refers to the area of the target object in the third image frame. The area where the target object is located.
应理解,第二图像帧可以是指目标对象移动后相机实时采集到的图像帧;目标对 象可以是指拍摄对象中的部分或者全部;比如,在接收机主识别指令的情况下,目标对象可以是指机主用户;在未接收开启机主识别指令的情况下,目标对象可以是指所有拍摄对象。It should be understood that the second image frame may refer to an image frame captured by the camera in real time after the target object moves; the target object may refer to a part or all of the subject; Refers to the owner user; in the case of not receiving an instruction to enable the identification of the owner, the target object may refer to all shooting objects.
还应理解,第一图像的第一区域与第三图像的第二区域存在交集可以是指第一区域与第二区域完全重合;或者,也可以是指第一区域与第二区域存在部分交集。It should also be understood that the intersection between the first area of the first image and the second area of the third image may mean that the first area and the second area completely overlap; or, it may also mean that there is a partial intersection between the first area and the second area .
在一种可能的实现方式中,在目标对象移动前,电子设备显示目标对象的第一图像帧;在目标对象移动后,电子设备的相机可以实时采集到第二图像帧,对第二图像帧进行人脸检测,得到目标对象对应的人脸框的坐标信息;根据人脸框的坐标信息得到裁剪框的坐标信息;根据裁剪框对第二图像帧进行裁剪处理,得到包括目标对象的显示内容;根据显示内容显示第三图像帧;第三图像帧是指在目标对象移动后电子设备显示的目标对象的图像帧。In a possible implementation, before the target object moves, the electronic device displays the first image frame of the target object; after the target object moves, the camera of the electronic device can collect the second image frame in real time, and the second image frame Perform face detection to obtain the coordinate information of the face frame corresponding to the target object; obtain the coordinate information of the cropping frame according to the coordinate information of the face frame; perform cropping processing on the second image frame according to the cropping frame to obtain the display content including the target object ; Displaying the third image frame according to the display content; the third image frame refers to the image frame of the target object displayed by the electronic device after the target object moves.
在本申请的实施例中,通过对获取的第二图像帧进行人脸检测,确定目标对象的人脸框的坐标信息,根据人脸框的坐标信息得到裁剪框的坐标信息;进一步根据裁剪框对第二图像帧进行裁剪处理,得到包括目标对象的显示内容;根据显示内容显示第三图像帧;在本申请的实施例中,由于通过人脸框的坐标信息确定裁剪框的坐标信息,因此与直接对目标对象的人体关键点进行检测确定裁剪框的方案相比,本申请的视频处理方法能够减少电子设备的运算量,降低电子设备的功耗;此外,由于本申请的视频处理方法是根据人脸框确定裁剪框的坐标信息,可以避免目标对象在第二图像帧中背向面对电子设备时,对目标对象进行视频跟踪显示;因此,本申请的方案在降低功耗的情况下,还能够提高视频跟踪显示的准确性。In the embodiment of the present application, by performing face detection on the acquired second image frame, the coordinate information of the human face frame of the target object is determined, and the coordinate information of the cropping frame is obtained according to the coordinate information of the human face frame; further according to the cropping frame Carry out cropping processing to the second image frame to obtain the display content including the target object; display the third image frame according to the display content; in the embodiment of the application, since the coordinate information of the cropping frame is determined by the coordinate information of the face frame, therefore Compared with the scheme of directly detecting the key points of the human body of the target object to determine the cropping frame, the video processing method of the present application can reduce the calculation amount of the electronic device and reduce the power consumption of the electronic device; in addition, because the video processing method of the present application is Determining the coordinate information of the cropping frame according to the face frame can prevent the target object from performing video tracking and display on the target object when the target object is facing away from the electronic device in the second image frame; therefore, the solution of the present application can reduce power consumption. , and can also improve the accuracy of video tracking display.
结合第一方面,在第一方面的某些实现方式中,所述第一区域与所述第二区域重合。With reference to the first aspect, in some implementation manners of the first aspect, the first area coincides with the second area.
在一种可能的实现方式中,第一区域与第二区域重合,且第一区域与第二区域位于显示界面的中间区域。In a possible implementation manner, the first area and the second area overlap, and the first area and the second area are located in a middle area of the display interface.
结合第一方面,在第一方面的某些实现方式中,在显示所述第一图像帧与所述第三图像帧时,所述电子设备所处的位置相同。With reference to the first aspect, in some implementation manners of the first aspect, when the first image frame and the third image frame are displayed, the positions of the electronic device are the same.
在本申请的实施例中,电子设备可以保持位置不变,在拍摄的目标对象进行移动后,拍摄的目标对象可以始终显示在视频显示画面的中间位置,或者中间区域;实现对目标对象的跟踪显示,即实现“影随人动”功能。In the embodiment of the present application, the electronic device can keep its position unchanged, and after the target object is moved, the target object can always be displayed in the middle position or the middle area of the video display screen; to realize the tracking of the target object Display, that is, to realize the function of "following people".
结合第一方面,在第一方面的某些实现方式中,还包括:In combination with the first aspect, in some implementation manners of the first aspect, it also includes:
检测到指示运行相机应用程序的操作;或者,An action is detected indicating that the camera application is running; or,
检测到指示运行视频通话应用程序的操作。An action was detected indicating to run a video calling application.
在本申请的实施例中,视频处理方法可以应用于相机应用程序拍摄视频的过程中;或者,视频处理方法也可以应用于视频通话应用程序中。In the embodiment of the present application, the video processing method may be applied in the process of shooting video by the camera application program; or, the video processing method may also be applied in the video call application program.
结合第一方面,在第一方面的某些实现方式中,所述根据所述第一检测框得到裁剪框的坐标信息,包括:With reference to the first aspect, in some implementation manners of the first aspect, the obtaining the coordinate information of the cropping frame according to the first detection frame includes:
对所述第一检测框进行第一扩展处理,得到第二检测框;performing a first extension process on the first detection frame to obtain a second detection frame;
对所述第二检测框进行第二扩展处理,得到所述裁剪框;performing a second extension process on the second detection frame to obtain the cropping frame;
其中,所述第一扩展处理是指以所述第一检测框为中心对所述第一检测框的边界 进行扩展,所述第二检测框用于指示所述目标对象的身体在所述第二图像帧中的位置信息,所述第二扩展处理是指以所述第二检测框为中心对所述第二检测框的边界进行扩展。Wherein, the first expansion process refers to expanding the boundary of the first detection frame with the first detection frame as the center, and the second detection frame is used to indicate that the body of the target object is within the first detection frame. For the location information in the second image frame, the second extension process refers to extending the boundary of the second detection frame with the second detection frame as the center.
在本申请的实施例中,为了避免视频图像中多个图像帧中第一检测框出现局部抖动,确保目标对象在进行小幅度运动时,裁剪框能够保持不变;因此,通过对第二检测框进行第二扩展处理得到裁剪框,能够在一定程度上确保裁剪处理后图像帧的稳定性。In the embodiment of the present application, in order to avoid local shaking of the first detection frame in multiple image frames in the video image, it is ensured that the cropping frame can remain unchanged when the target object is moving in a small range; therefore, through the second detection The frame is subjected to the second expansion process to obtain the cropping frame, which can ensure the stability of the image frame after the cropping process to a certain extent.
结合第一方面,在第一方面的某些实现方式中,所述对所述第一检测框进行第一扩展处理,得到第二检测框,包括:With reference to the first aspect, in some implementation manners of the first aspect, performing the first extension process on the first detection frame to obtain the second detection frame includes:
根据第一阈值对所述第一检测框进行所述第一扩展处理,得到所述第二检测框,所述第一阈值用于指示身体比例数据。The first expansion process is performed on the first detection frame according to a first threshold to obtain the second detection frame, and the first threshold is used to indicate body proportion data.
结合第一方面,在第一方面的某些实现方式中,所述根据所述裁剪框对所述第二图像帧进行裁剪处理,得到包括所述目标对象的显示内容,包括:With reference to the first aspect, in some implementation manners of the first aspect, performing cropping processing on the second image frame according to the cropping frame to obtain display content including the target object includes:
确定所述第二检测框与所述裁剪框是否满足预设条件,所述预设条件是指所述第二检测框与所述裁剪框满足预设比例关系;determining whether the second detection frame and the cropping frame meet a preset condition, the preset condition means that the second detection frame and the cropping frame satisfy a preset proportional relationship;
在所述第二检测框与所述裁剪框满足所述预设条件时,根据所述裁剪框对所述第二图像帧进行裁剪处理,得到所述显示内容。When the second detection frame and the cropping frame meet the preset condition, the second image frame is cropped according to the cropping frame to obtain the display content.
在一种可能的实现方式中,预设条件可以是指第二检测框与裁剪框满足一定的比例关系,并且第二检测框位于裁剪框的内部。In a possible implementation manner, the preset condition may refer to that the second detection frame and the cropping frame satisfy a certain proportional relationship, and the second detection frame is located inside the cropping frame.
结合第一方面,在第一方面的某些实现方式中,所述第一检测框的坐标信息是指在所述第二图像帧为第二分辨率时所述第一检测框对应的坐标信息,还包括:With reference to the first aspect, in some implementations of the first aspect, the coordinate information of the first detection frame refers to the coordinate information corresponding to the first detection frame when the second image frame is at the second resolution ,Also includes:
接收请求指令,所述请求指令用于请求第一分辨率;receiving a request instruction, where the request instruction is used to request a first resolution;
根据所述第一分辨率确定所述第二分辨率,所述第二分辨率大于所述第一分辨率。The second resolution is determined according to the first resolution, and the second resolution is greater than the first resolution.
在本申请的实施例中,可以将分辨率由第一分辨率扩展至第二分辨率,能够解决后续裁剪处理导致的第二图像帧清晰度下降的问题;通过进行分辨率扩展处理,能够在一定程度上使得剪裁处理后显示的第三图像帧的清晰度得到提高。In the embodiment of the present application, the resolution can be extended from the first resolution to the second resolution, which can solve the problem of lowering the definition of the second image frame caused by the subsequent cropping process; by performing the resolution expansion process, it is possible to To a certain extent, the clarity of the third image frame displayed after the trimming process is improved.
在一种可能的实现方式中,接收请求第一分辨率的请求指令;对第一分辨率进行扩展处理,确定第二分辨率;在第二图像帧中检测到第一检测框的坐标信息;将第一检测框的坐标信息转换至第二图像帧为第二分辨率时对应的坐标信息。In a possible implementation manner, a request instruction requesting the first resolution is received; the first resolution is expanded to determine the second resolution; the coordinate information of the first detection frame is detected in the second image frame; The coordinate information of the first detection frame is converted to the corresponding coordinate information when the second image frame is of the second resolution.
结合第一方面,在第一方面的某些实现方式中,所述根据所述显示内容显示第三图像帧,包括:With reference to the first aspect, in some implementation manners of the first aspect, the displaying the third image frame according to the display content includes:
根据所述第一分辨率对所述显示内容进行缩放处理,得到处理后的显示内容;performing scaling processing on the display content according to the first resolution to obtain the processed display content;
根据所述处理后的显示内容显示所述第三图像帧。displaying the third image frame according to the processed display content.
结合第一方面,在第一方面的某些实现方式中,所述目标对象为机主用户,还包括:With reference to the first aspect, in some implementation manners of the first aspect, the target object is the owner user, and further includes:
接收机主识别指令,所述机主识别指令用于指示识别所述机主用户;Receive an owner identification instruction, the owner identification instruction is used to instruct to identify the owner user;
根据所述第一检测框进行脸部识别,确定所述机主用户,所述机主用户为预先配置的用户。Face recognition is performed according to the first detection frame to determine the owner user, and the owner user is a pre-configured user.
应理解,机主可以是指电子设备的管理用户;或者,机主也可以是任意一个预先 配置的优先级较高的用户;机主识别是指在进行跟踪显示时,通过人脸检测识别目标对象中的机主用户,并对机主用户进行跟踪显示。It should be understood that the owner of the device may refer to the management user of the electronic device; or, the owner may also be any user with a higher priority configured in advance; the identification of the owner refers to identifying the target through face detection during tracking and display. The owner user in the object, and track and display the owner user.
结合第一方面,在第一方面的某些实现方式中,所述第一检测框是指所述机主用户的人脸框。With reference to the first aspect, in some implementation manners of the first aspect, the first detection frame refers to a face frame of the owner user of the device.
结合第一方面,在第一方面的某些实现方式中,所述目标对象包括至少一个用户。With reference to the first aspect, in some implementation manners of the first aspect, the target object includes at least one user.
结合第一方面,在第一方面的某些实现方式中,所述目标对象包括第一用户与第二用户,所述第一检测框是指所述第一用户的人脸框与所述第二用户的人脸框的并集框。With reference to the first aspect, in some implementation manners of the first aspect, the target object includes a first user and a second user, and the first detection frame refers to a face frame of the first user and the first user The union frame of the face frames of the two users.
第二方面,提供了一种电子设备,所述电子设备包括:一个或多个处理器、存储器和显示屏;所述存储器与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行:In a second aspect, an electronic device is provided, and the electronic device includes: one or more processors, a memory, and a display screen; the memory is coupled to the one or more processors, and the memory is used to store computer program code, the computer program code comprising computer instructions that are invoked by the one or more processors to cause the electronic device to perform:
显示第一图像帧,所述第一图像帧为目标对象在第一位置的图像帧;在所述目标对象移动至第二位置的情况下,获取第二图像帧,所述第二位置与所述第一位置为不同位置,所述第二图像帧是指所述目标对象移动至所述第二位置时所述电子设备采集的图像帧;根据所述第二图像帧进行人脸检测,得到第一检测框的坐标信息,所述第一检测框用于指示所述目标对象的脸部在所述第二图像帧中的位置信息;根据所述第一检测框得到裁剪框的坐标信息;Displaying a first image frame, the first image frame is an image frame of the target object at a first position; when the target object moves to a second position, acquiring a second image frame, the second position is the same as the second position The first position is a different position, and the second image frame refers to an image frame collected by the electronic device when the target object moves to the second position; face detection is performed according to the second image frame to obtain Coordinate information of the first detection frame, the first detection frame is used to indicate the position information of the face of the target object in the second image frame; obtain the coordinate information of the cropping frame according to the first detection frame;
根据所述裁剪框对所述第二图像帧进行裁剪处理,得到包括目标对象的显示内容;根据显示内容显示第三图像帧,所述第一图像帧中的第一区域与所述第三图像帧中的第二区域存在交集,所述第一区域是指所述第一图像帧中所述目标对象的所在区域,所述第二区域是指所述第三图像帧中所述目标对象的所在区域。Perform cropping processing on the second image frame according to the cropping frame to obtain display content including the target object; display a third image frame according to the display content, and the first area in the first image frame is consistent with the third image There is an intersection in the second area in the frame, the first area refers to the area where the target object in the first image frame is located, and the second area refers to the area of the target object in the third image frame your region.
结合第二方面,在第二方面的某些实现方式中,在显示所述第一图像帧与所述第三图像帧时,所述电子设备所处的位置相同。With reference to the second aspect, in some implementation manners of the second aspect, when the first image frame and the third image frame are displayed, the positions of the electronic device are the same.
结合第二方面,在第二方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:With reference to the second aspect, in some implementation manners of the second aspect, the one or more processors invoke the computer instructions so that the electronic device further executes:
检测到指示运行相机应用程序的操作;或者,An action is detected indicating that the camera application is running; or,
检测到指示运行视频通话应用程序的操作。An action was detected indicating to run a video calling application.
结合第二方面,在第二方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:With reference to the second aspect, in some implementation manners of the second aspect, the one or more processors invoke the computer instructions so that the electronic device further executes:
对所述第一检测框进行第一扩展处理,得到第二检测框;performing a first extension process on the first detection frame to obtain a second detection frame;
对所述第二检测框进行第二扩展处理,得到所述裁剪框;performing a second extension process on the second detection frame to obtain the cropping frame;
其中,所述第一扩展处理是指以所述第一检测框为中心对所述第一检测框的边界进行扩展,所述第二检测框用于指示所述目标对象的身体在所述第二图像帧中的位置信息,所述第二扩展处理是指以所述第二检测框为中心对所述第二检测框的边界进行扩展。Wherein, the first expansion process refers to expanding the boundary of the first detection frame with the first detection frame as the center, and the second detection frame is used to indicate that the body of the target object is within the first detection frame. For the location information in the second image frame, the second extension process refers to extending the boundary of the second detection frame with the second detection frame as the center.
结合第二方面,在第二方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:With reference to the second aspect, in some implementation manners of the second aspect, the one or more processors invoke the computer instructions so that the electronic device further executes:
根据第一阈值对所述第一检测框进行所述第一扩展处理,得到所述第二检测框, 所述第一阈值用于指示身体比例数据。The second detection frame is obtained by performing the first expansion process on the first detection frame according to a first threshold, where the first threshold is used to indicate body proportion data.
结合第二方面,在第二方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:With reference to the second aspect, in some implementation manners of the second aspect, the one or more processors invoke the computer instructions so that the electronic device further executes:
确定所述第二检测框与所述裁剪框是否满足预设条件,所述预设条件是指所述第二检测框与所述裁剪框满足预设比例关系;determining whether the second detection frame and the cropping frame meet a preset condition, the preset condition means that the second detection frame and the cropping frame satisfy a preset proportional relationship;
在所述第二检测框与所述裁剪框满足所述预设条件时,根据所述裁剪框对所述第二图像帧进行裁剪处理,得到所述显示内容。When the second detection frame and the cropping frame meet the preset condition, the second image frame is cropped according to the cropping frame to obtain the display content.
结合第二方面,在第二方面的某些实现方式中,所述第一检测框的坐标信息是指在所述第二图像帧为第二分辨率时所述第一检测框对应的坐标信息,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:With reference to the second aspect, in some implementations of the second aspect, the coordinate information of the first detection frame refers to the coordinate information corresponding to the first detection frame when the second image frame is at the second resolution , the one or more processors invoke the computer instructions so that the electronic device also performs:
接收请求指令,所述请求指令用于请求第一分辨率;receiving a request instruction, where the request instruction is used to request a first resolution;
根据所述第一分辨率确定所述第二分辨率,所述第二分辨率大于所述第一分辨率。The second resolution is determined according to the first resolution, and the second resolution is greater than the first resolution.
结合第二方面,在第二方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:With reference to the second aspect, in some implementation manners of the second aspect, the one or more processors invoke the computer instructions so that the electronic device further executes:
根据所述第一分辨率对所述显示内容进行缩放处理,得到处理后的显示内容;performing scaling processing on the display content according to the first resolution to obtain the processed display content;
根据所述处理后的显示内容显示所述第三图像帧。displaying the third image frame according to the processed display content.
结合第二方面,在第二方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:With reference to the second aspect, in some implementation manners of the second aspect, the one or more processors invoke the computer instructions so that the electronic device further executes:
接收机主识别指令,所述机主识别指令用于指示识别所述机主用户;Receive an owner identification instruction, the owner identification instruction is used to instruct to identify the owner user;
根据所述第一检测框进行脸部识别,确定所述机主用户,所述机主用户为预先配置的用户。Face recognition is performed according to the first detection frame to determine the owner user, and the owner user is a pre-configured user.
结合第二方面,在第二方面的某些实现方式中,所述第一检测框是指所述机主用户的人脸框。With reference to the second aspect, in some implementation manners of the second aspect, the first detection frame refers to a face frame of the owner user of the device.
结合第二方面,在第二方面的某些实现方式中,所述目标对象包括至少一个用户。With reference to the second aspect, in some implementation manners of the second aspect, the target object includes at least one user.
结合第二方面,在第二方面的某些实现方式中,所述目标对象包括第一用户与第二用户,所述第一检测框是指所述第一用户的人脸框与所述第二用户的人脸框的并集框。With reference to the second aspect, in some implementation manners of the second aspect, the target object includes a first user and a second user, and the first detection frame refers to a face frame of the first user and the first user The union frame of the face frames of the two users.
结合第二方面,在第二方面的某些实现方式中,所述第一区域与所述第二区域重合。With reference to the second aspect, in some implementation manners of the second aspect, the first area coincides with the second area.
应理解,在上述第一方面中对相关内容的扩展、限定、解释和说明也适用于第二方面中相同的内容。It should be understood that the extensions, limitations, explanations and descriptions of relevant content in the first aspect above are also applicable to the same content in the second aspect.
第三方面,提供了一种视频处理装置,包括用于执行第一方面中任一种视频处理方法的单元。In a third aspect, a video processing device is provided, including a unit for executing any video processing method in the first aspect.
在一种可能的实现方式中,当该视频处理装置是电子设备时,该处理单元可以是处理器,该输入单元可以是通信接口;该电子设备还可以包括存储器,该存储器用于存储计算机程序代码,当该处理器执行该存储器所存储的计算机程序代码时,使得该电子设备执行第一方面中的任一种方法。In a possible implementation manner, when the video processing device is an electronic device, the processing unit may be a processor, and the input unit may be a communication interface; the electronic device may also include a memory, and the memory is used to store computer programs A code for causing the electronic device to execute any method in the first aspect when the processor executes the computer program code stored in the memory.
第四方面,提供了一种芯片系统,所述芯片系统应用于电子设备,所述芯片系统包括一个或多个处理器,所述处理器用于调用计算机指令以使得所述电子设备执行第 一方面中的任一种视频处理方法。In a fourth aspect, a chip system is provided, the chip system is applied to an electronic device, and the chip system includes one or more processors, and the processor is used to call a computer instruction so that the electronic device executes the first aspect Any of the video processing methods in .
第五方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被电子设备运行时,使得该电子设备执行第一方面中的任一种视频处理方法。In a fifth aspect, a computer-readable storage medium is provided, the computer-readable storage medium stores computer program codes, and when the computer program codes are executed by an electronic device, the electronic device executes any of the items in the first aspect. A video processing method.
第六方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被电子设备运行时,使得该电子设备执行第一方面中的任一种视频处理方法。In a sixth aspect, a computer program product is provided, the computer program product comprising: computer program code, when the computer program code is run by an electronic device, the electronic device is made to perform any video processing in the first aspect method.
附图说明Description of drawings
图1是一种适用于本申请的电子设备的硬件系统的示意图;FIG. 1 is a schematic diagram of a hardware system applicable to an electronic device of the present application;
图2是一种适用于本申请的电子设备的软件系统的示意图;Fig. 2 is a schematic diagram of a software system applicable to the electronic device of the present application;
图3是一种适用于本申请的应用场景的示意图;FIG. 3 is a schematic diagram of an application scenario applicable to this application;
图4是本申请实施提供的第一区域与第二区域存在交集的示意图;Fig. 4 is a schematic diagram of the intersection of the first area and the second area provided by the implementation of the present application;
图5是本申请提供的一种视频处理方法的示意性流程图;Fig. 5 is a schematic flowchart of a video processing method provided by the present application;
图6是本申请提供的一种视频处理的显示界面的示意图;FIG. 6 is a schematic diagram of a video processing display interface provided by the present application;
图7是本申请提供的一种视频处理的显示界面的示意图;FIG. 7 is a schematic diagram of a video processing display interface provided by the present application;
图8是本申请提供的一种视频处理的显示界面的示意图;Fig. 8 is a schematic diagram of a video processing display interface provided by the present application;
图9是本申请提供的一种视频处理的显示界面的示意图;FIG. 9 is a schematic diagram of a video processing display interface provided by the present application;
图10是本申请提供的一种视频处理的显示界面的示意图;FIG. 10 is a schematic diagram of a video processing display interface provided by the present application;
图11是本申请提供的一种视频处理的显示界面的示意图;FIG. 11 is a schematic diagram of a video processing display interface provided by the present application;
图12是本申请提供的一种视频处理的显示界面的示意图;Fig. 12 is a schematic diagram of a video processing display interface provided by the present application;
图13是本申请提供的一种视频处理的显示界面的示意图;Fig. 13 is a schematic diagram of a video processing display interface provided by the present application;
图14是本申请提供的一种视频处理的显示界面的示意图;Fig. 14 is a schematic diagram of a video processing display interface provided by the present application;
图15是本申请提供的一种视频处理的显示界面的示意图;Fig. 15 is a schematic diagram of a video processing display interface provided by the present application;
图16是本申请提供的一种视频处理的显示界面的示意图;Fig. 16 is a schematic diagram of a video processing display interface provided by the present application;
图17是本申请提供的一种视频处理的显示界面的示意图;Fig. 17 is a schematic diagram of a video processing display interface provided by the present application;
图18是本申请提供的一种视频处理的显示界面的示意图;Fig. 18 is a schematic diagram of a video processing display interface provided by the present application;
图19是本申请提供的一种视频处理装置的结构示意图;FIG. 19 is a schematic structural diagram of a video processing device provided by the present application;
图20是本申请提供的一种电子设备的结构示意图。FIG. 20 is a schematic structural diagram of an electronic device provided by the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.
图1示出了一种适用于本申请的电子设备的硬件系统。Fig. 1 shows a hardware system applicable to the electronic equipment of this application.
电子设备100可以是手机、智慧屏、平板电脑、可穿戴电子设备、车载电子设备、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、投影仪等等,本申请实施例对电子设备100的具体类型不作任何限制。The electronic device 100 may be a mobile phone, a smart screen, a tablet computer, a wearable electronic device, a vehicle electronic device, an augmented reality (augmented reality, AR) device, a virtual reality (virtual reality, VR) device, a notebook computer, a super mobile personal computer ( ultra-mobile personal computer (UMPC), netbook, personal digital assistant (personal digital assistant, PDA), projector, etc., the embodiment of the present application does not impose any limitation on the specific type of the electronic device 100.
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用 串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
需要说明的是,图1所示的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图1所示的部件更多或更少的部件,或者,电子设备100可以包括图1所示的部件中某些部件的组合,或者,电子设备100可以包括图1所示的部件中某些部件的子部件。图1示的部件可以以硬件、软件、或软件和硬件的组合实现。It should be noted that the structure shown in FIG. 1 does not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or fewer components than those shown in FIG. 1 , or the electronic device 100 may include a combination of some of the components shown in FIG. 1 , or , the electronic device 100 may include subcomponents of some of the components shown in FIG. 1 . The components shown in FIG. 1 can be realized in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元。例如,处理器110可以包括以下处理单元中的至少一个:应用处理器(application processor,AP)、调制解调处理器、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器、神经网络处理器(neural-network processing unit,NPU)。其中,不同的处理单元可以是独立的器件,也可以是集成的器件。 Processor 110 may include one or more processing units. For example, the processor 110 may include at least one of the following processing units: an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor) , ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, neural network processor (neural-network processing unit, NPU). Wherein, different processing units may be independent devices or integrated devices.
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
在一些实施例中,处理器110可以包括一个或多个接口。例如,处理器110可以包括以下接口中的至少一个:内部集成电路(inter-integrated circuit,I2C)接口、内部集成电路音频(inter-integrated circuit sound,I2S)接口、脉冲编码调制(pulse code modulation,PCM)接口、通用异步接收传输器(universal asynchronous receiver/transmitter,UART)接口、移动产业处理器接口(mobile industry processor interface,MIPI)、通用输入输出(general-purpose input/output,GPIO)接口、SIM接口、USB接口。In some embodiments, processor 110 may include one or more interfaces. For example, the processor 110 may include at least one of the following interfaces: an inter-integrated circuit (inter-integrated circuit, I2C) interface, an inter-integrated circuit sound (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, SIM interface, USB interface.
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。I2S接口可以用于音频通信。PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。MIPI接口可以被用于连接处理器110与显示屏194和摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI)、显示屏串行接口(display serial interface,DSI)等。The I2C interface is a bidirectional synchronous serial bus, including a serial data line (serial data line, SDA) and a serial clock line (derail clock line, SCL). The I2S interface can be used for audio communication. The PCM interface can also be used for audio communication, sampling, quantizing and encoding the analog signal. The UART interface is a universal serial data bus used for asynchronous communication. The bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. The MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 . MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号接口,也可被配置为数据信号接口。In some embodiments, the processor 110 communicates with the camera 193 through the CSI interface to realize the shooting function of the electronic device 100 . The processor 110 communicates with the display screen 194 through the DSI interface to realize the display function of the electronic device 100 . The GPIO interface can be configured by software. The GPIO interface can be configured as a control signal interface or as a data signal interface.
在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194、无线通信模块160、音频模块170和传感器模块180。GPIO接口还可以被配置为I2C接口、I2S接口、UART接口或MIPI接口。In some embodiments, the GPIO interface can be used to connect the processor 110 with the camera 193 , the display screen 194 , the wireless communication module 160 , the audio module 170 and the sensor module 180 . The GPIO interface can also be configured as an I2C interface, I2S interface, UART interface or MIPI interface.
USB接口130是符合USB标准规范的接口,例如可以是迷你(Mini)USB接口、微型(Micro)USB接口或C型USB(USB Type C)接口。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据,还可以用于连接耳机以通过耳机播放音频。USB接口130还可以用于连接其他电子设备100,例如AR设备。The USB interface 130 is an interface conforming to the USB standard specification, for example, it can be a mini (Mini) USB interface, a micro (Micro) USB interface or a C-type USB (USB Type C) interface. The USB interface 130 can be used to connect a charger to charge the electronic device 100 , can also be used to transmit data between the electronic device 100 and peripheral devices, and can also be used to connect an earphone to play audio through the earphone. The USB interface 130 can also be used to connect other electronic devices 100, such as AR devices.
图1所示的各模块间的连接关系只是示意性说明,并不构成对电子设备100的各模块间的连接关系的限定。可选地,电子设备100的各模块也可以采用上述实施例中多种连接方式的组合。The connection relationship between the modules shown in FIG. 1 is only a schematic illustration, and does not constitute a limitation on the connection relationship between the modules of the electronic device 100 . Optionally, each module of the electronic device 100 may also adopt a combination of various connection modes in the foregoing embodiments.
充电管理模块140用于从充电器接收电力。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备100供电。电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量、电池循环次数和电池健康状态(例如,漏电、阻抗)等参数。可选地,电源管理模块141可以设置于处理器110中,或者,电源管理模块141和充电管理模块140可以设置于同一个器件中。The charging management module 140 is used to receive power from the charger. While the charging management module 140 is charging the battery 142 , it can also supply power to the electronic device 100 through the power management module 141 . The power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 . The power management module 141 receives the input from the battery 142 and/or the charging management module 140 to provide power for the processor 110 , the internal memory 121 , the display screen 194 , the camera 193 , and the wireless communication module 160 . The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (eg, leakage, impedance). Optionally, the power management module 141 may be set in the processor 110, or the power management module 141 and the charge management module 140 may be set in the same device.
电子设备100的无线通信功能可以通过天线1、天线2、移动通信模块150、无线通信模块160、调制解调处理器以及基带处理器等器件实现。天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。The wireless communication function of the electronic device 100 may be realized by components such as the antenna 1 , the antenna 2 , the mobile communication module 150 , the wireless communication module 160 , a modem processor, and a baseband processor. Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 may be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
移动通信模块150可以提供应用在电子设备100上的无线通信的解决方案,例如下列方案中的至少一个:第二代(2 th generation,2G)移动通信解决方案、第三代(3 th generation,3G)移动通信解决方案、第四代(4 th generation,4G)移动通信解决方案、第五代(5 th generation,5G)移动通信解决方案。 The mobile communication module 150 may provide a wireless communication solution applied to the electronic device 100, such as at least one of the following solutions: a second generation (2 th generation, 2G) mobile communication solution, a third generation (3 th generation, 3G) mobile communication solutions, fourth generation ( 4th generation, 4G) mobile communication solutions, fifth generation ( 5th generation, 5G) mobile communication solutions.
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(例如,扬声器170A、受话器170B)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。A modem processor may include a modulator and a demodulator. Wherein, the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is passed to the application processor after being processed by the baseband processor. The application processor outputs a sound signal through an audio device (for example, a speaker 170A, a receiver 170B), or displays an image or video through a display screen 194 . In some embodiments, the modem processor may be a stand-alone device. In some other embodiments, the modem processor may be independent from the processor 110, and be set in the same device as the mobile communication module 150 or other functional modules.
与移动通信模块150类似,无线通信模块160也可以提供应用在电子设备100上 的无线通信解决方案,例如下列方案中的至少一个:无线局域网(wireless local area networks,WLAN)、蓝牙(bluetooth,BT)、蓝牙低功耗(bluetooth low energy,BLE)、超宽带(ultra wide band,UWB)、全球导航卫星系统(global navigation satellite system,GNSS)、调频(frequency modulation,FM)、近场通信(near field communication,NFC)、红外(infrared,IR)技术。Similar to the mobile communication module 150, the wireless communication module 160 can also provide a wireless communication solution applied to the electronic device 100, such as at least one of the following solutions: wireless local area networks (wireless local area networks, WLAN), Bluetooth (bluetooth, BT ), Bluetooth low energy (bluetooth low energy, BLE), ultra wide band (ultra wide band, UWB), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication (near field communication, NFC), infrared (infrared, IR) technology.
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,电子设备100的天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络和其他电子设备通信。In some embodiments, the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 of the electronic device 100 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other electronic devices through wireless communication technology.
电子设备100可以通过GPU、显示屏194以及应用处理器实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 100 can realize the display function through the GPU, the display screen 194 and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
显示屏194可以用于显示图像或视频。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)、有源矩阵有机发光二极体(active-matrix organic light-emitting diode,AMOLED)、柔性发光二极管(flex light-emitting diode,FLED)、迷你发光二极管(mini light-emitting diode,Mini LED)、微型发光二极管(micro light-emitting diode,Micro LED)、微型OLED(Micro OLED)或量子点发光二极管(quantum dot light emitting diodes,QLED)。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。Display 194 may be used to display images or video. The display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible Light-emitting diode (flex light-emitting diode, FLED), mini light-emitting diode (mini light-emitting diode, Mini LED), micro light-emitting diode (micro light-emitting diode, Micro LED), micro OLED (Micro OLED) or quantum dot light emitting Diodes (quantum dot light emitting diodes, QLED). In some embodiments, the electronic device 100 may include 1 or N display screens 194 , where N is a positive integer greater than 1.
电子设备100可以通过ISP、摄像头193、视频编解码器、GPU、显示屏194以及应用处理器等实现拍摄功能。The electronic device 100 can realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 , and the application processor.
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP可以对图像的噪点、亮度和色彩进行算法优化,ISP还可以优化拍摄场景的曝光和色温等参数。在一些实施例中,ISP可以设置在摄像头193中。The ISP is used for processing the data fed back by the camera 193 . For example, when taking a picture, open the shutter, the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can optimize the algorithm of image noise, brightness and color, and ISP can also optimize parameters such as exposure and color temperature of the shooting scene. In some embodiments, the ISP may be located in the camera 193 .
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的红绿蓝(red green blue,RGB),YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。Camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects it to the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard red green blue (red green blue, RGB), YUV and other image signals. In some embodiments, the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
示例性地,在本申请的实施例中,摄像头193可以获取视频图像帧,视频图像帧可以是指采集的全尺寸的图像帧;摄像头193可以将获取的视频图像帧传输至ISP,ISP用于处理摄像头193获取的视频图像帧;比如,ISP可以从处理器110中获取目标分辨率与裁剪处理和缩放处理的参数;ISP根据目标分辨率可以将全尺寸的视频图像帧调整至目标分辨率大小;根据裁剪处理和缩放处理的参数对目标分辨率的视频图像帧 进行裁剪处理与缩放处理,得到处理后的视频图像帧,处理后的视频图像帧满足应用程序下发的请求分辨率大小;将处理后的视频图像帧传输至应用程序,显示屏194对处理后的视频图像帧进行显示。Exemplarily, in the embodiment of the present application, the camera 193 can acquire a video image frame, and the video image frame can refer to a full-size image frame collected; the camera 193 can transmit the acquired video image frame to the ISP, and the ISP is used for Process the video image frame obtained by the camera 193; for example, the ISP can obtain the parameters of the target resolution and cropping processing and scaling processing from the processor 110; the ISP can adjust the full-size video image frame to the target resolution size according to the target resolution ; According to the parameters of the cropping processing and zooming processing, the video image frame of the target resolution is clipped and zoomed to obtain the processed video image frame, and the processed video image frame meets the requested resolution size issued by the application program; The processed video image frames are transmitted to the application program, and the display screen 194 displays the processed video image frames.
示例性地,在本申请的实施例中,可以在处理器110中执行计算视频流目标分辨率、人脸检测、裁剪与缩放参数计算。应理解,在本申请的视频处理方法中确定参数的相关步骤可以是在处理器110中执行的;ISP用于获取处理视频图像帧的相关参数,根据相关参数对视频图像帧进行处理得到适合电子设备的显示屏194的显示规格的输出图像帧。Exemplarily, in the embodiment of the present application, calculation of video stream target resolution, face detection, cropping and scaling parameter calculation may be performed in the processor 110 . It should be understood that the relevant steps of determining parameters in the video processing method of the present application may be executed in the processor 110; the ISP is used to obtain relevant parameters for processing video image frames, and process the video image frames according to the relevant parameters to obtain suitable electronic parameters. The display screen 194 of the device displays the output image frame of the specification.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1、MPEG2、MPEG3和MPEG4。Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3 and MPEG4.
外部存储器接口120可以用于连接外部存储卡,例如安全数码(secure digital,SD)卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 can be used to connect an external memory card, such as a secure digital (secure digital, SD) card, so as to expand the storage capacity of the electronic device 100 . The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. Such as saving music, video and other files in the external memory card.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。The internal memory 121 may be used to store computer-executable program codes including instructions. The internal memory 121 may include an area for storing programs and an area for storing data.
电子设备100可以通过音频模块170、扬声器170A、受话器170B、麦克风170C、耳机接口170D以及应用处理器等实现音频功能,例如,音乐播放和录音。The electronic device 100 can implement audio functions, such as music playing and recording, through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也可以用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。The audio module 170 is used to convert digital audio information into analog audio signal output, and can also be used to convert analog audio input into digital audio signal. The audio module 170 may also be used to encode and decode audio signals.
扬声器170A,也称为喇叭,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐或免提通话。受话器170B,也称为听筒,用于将音频电信号转换成声音信号。Speaker 170A, also known as a horn, is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music or make a hands-free call through the speaker 170A. Receiver 170B, also known as an earpiece, is used to convert audio electrical signals into audio signals.
在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,例如可以是电阻式压力传感器、电感式压力传感器或电容式压力传感器。电容式压力传感器可以是包括至少两个具有导电材料的平行板,当力作用于压力传感器180A,电极之间的电容改变,电子设备100根据电容的变化确定压力的强度。当触摸操作作用于显示屏194时,电子设备100根据压力传感器180A检测所述触摸操作。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令;当触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。In some embodiments, pressure sensor 180A may be disposed on display screen 194 . There are many types of pressure sensor 180A, for example, it may be a resistive pressure sensor, an inductive pressure sensor or a capacitive pressure sensor. The capacitive pressure sensor may include at least two parallel plates with conductive materials. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes, and the electronic device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the electronic device 100 detects the touch operation according to the pressure sensor 180A. The electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A. In some embodiments, touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example: when the touch operation with the touch operation intensity less than the first pressure threshold acts on the short message application icon, execute the instruction of viewing the short message; when the touch operation with the intensity greater than or equal to the first pressure threshold acts on the short message application icon , to execute the instruction of creating a new short message.
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x轴、y轴和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。例如,当快门被按下时,陀螺仪 传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航和体感游戏等场景。The gyro sensor 180B can be used to determine the motion posture of the electronic device 100 . In some embodiments, the angular velocity of the electronic device 100 around three axes (ie, x-axis, y-axis and z-axis) may be determined by the gyro sensor 180B. The gyro sensor 180B can be used for image stabilization. For example, when the shutter is pressed, the gyro sensor 180B detects the shaking angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shaking of the electronic device 100 through reverse movement to achieve anti-shake. The gyro sensor 180B can also be used in scenarios such as navigation and somatosensory games.
气压传感器180C用于测量气压。磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。The air pressure sensor 180C is used to measure air pressure. The magnetic sensor 180D includes a Hall sensor. The electronic device 100 may use the magnetic sensor 180D to detect the opening and closing of the flip leather case.
加速度传感器180E可检测电子设备100在各个方向上(一般为x轴、y轴和z轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。加速度传感器180E还可以用于识别电子设备100的姿态,作为横竖屏切换和计步器等应用程序的输入参数。The acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally x-axis, y-axis and z-axis). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. The acceleration sensor 180E can also be used to identify the posture of the electronic device 100 as an input parameter for application programs such as horizontal and vertical screen switching and pedometer.
距离传感器180F用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,例如在拍摄场景中,电子设备100可以利用距离传感器180F测距以实现快速对焦。The distance sensor 180F is used to measure distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, for example, in a shooting scene, the electronic device 100 can use the distance sensor 180F for distance measurement to achieve fast focusing.
接近光传感器180G可以包括例如发光二极管(light-emitting diode,LED)和光检测器,例如,光电二极管。LED可以是红外LED。电子设备100通过LED向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到反射光时,电子设备100可以确定附近存在物体。当检测不到反射光时,电子设备100可以确定附近没有物体。电子设备100可以利用接近光传感器180G检测用户是否手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式或口袋模式的自动解锁与自动锁屏。The proximity light sensor 180G may include, for example, a light-emitting diode (LED) and a light detector, such as a photodiode. The LEDs may be infrared LEDs. The electronic device 100 emits infrared light through the LED. Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When the reflected light is detected, the electronic device 100 may determine that there is an object nearby. When no reflected light is detected, the electronic device 100 may determine that there is no object nearby. The electronic device 100 can use the proximity light sensor 180G to detect whether the user is holding the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 180G can also be used for automatic unlocking and automatic screen locking in leather case mode or pocket mode.
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。The ambient light sensor 180L is used for sensing ambient light brightness. The electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness. The ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket, so as to prevent accidental touch.
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现解锁、访问应用锁、拍照和接听来电等功能。The fingerprint sensor 180H is used to collect fingerprints. The electronic device 100 can use the collected fingerprint characteristics to implement functions such as unlocking, accessing the application lock, taking pictures, and answering incoming calls.
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。The temperature sensor 180J is used to detect temperature. In some embodiments, the electronic device 100 uses the temperature detected by the temperature sensor 180J to implement a temperature treatment strategy. For example, when the temperature reported by the temperature sensor 180J exceeds the threshold, the electronic device 100 may reduce the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to prevent the electronic device 100 from being shut down abnormally due to the low temperature. In some other embodiments, when the temperature is lower than another threshold, the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
触摸传感器180K,也称为触控器件。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,触摸屏也称为触控屏。触摸传感器180K用于检测作用于其上或其附近的触摸操作。触摸传感器180K可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,并且与显示屏194设置于不同的位置。The touch sensor 180K is also referred to as a touch device. The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a touch screen. The touch sensor 180K is used to detect a touch operation on or near it. The touch sensor 180K may transmit the detected touch operation to the application processor to determine the touch event type. Visual output related to the touch operation can be provided through the display screen 194 . In some other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 and disposed at a different position from the display screen 194 .
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏, 接收血压跳动信号。The bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 180M can also contact the pulse of the human body and receive the blood pressure beating signal.
按键190包括开机键和音量键。按键190可以是机械按键,也可以是触摸式按键。电子设备100可以接收按键输入信号,实现与按键输入信号相关的功能。Keys 190 include a power key and a volume key. The key 190 can be a mechanical key or a touch key. The electronic device 100 may receive a key input signal, and implement a function related to the key input signal.
马达191可以产生振动。马达191可以用于来电提示,也可以用于触摸反馈。马达191可以对作用于不同应用程序的触摸操作产生不同的振动反馈效果。对于作用于显示屏194的不同区域的触摸操作,马达191也可产生不同的振动反馈效果。不同的应用场景(例如,时间提醒、接收信息、闹钟和游戏)可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。The motor 191 can generate vibrations. The motor 191 can be used for notification of incoming calls, and can also be used for touch feedback. The motor 191 can generate different vibration feedback effects for touch operations on different application programs. For touch operations acting on different areas of the display screen 194, the motor 191 can also generate different vibration feedback effects. Different application scenarios (for example, time reminder, receiving information, alarm clock and games) may correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.
指示器192可以是指示灯,可以用于指示充电状态和电量变化,也可以用于指示消息、未接来电和通知。The indicator 192 can be an indicator light, which can be used to indicate the charging status and the change of the battery capacity, and can also be used to indicate messages, missed calls and notifications.
SIM卡接口195用于连接SIM卡。SIM卡可以插入SIM卡接口195实现与电子设备100的接触,也可以从SIM卡接口195拔出实现与电子设备100的分离。The SIM card interface 195 is used for connecting a SIM card. The SIM card can be inserted into the SIM card interface 195 to realize contact with the electronic device 100 , and can also be pulled out from the SIM card interface 195 to realize separation from the electronic device 100 .
上文详细描述了电子设备100的硬件系统,下面介绍电子设备100的软件系统。软件系统可以采用分层架构、事件驱动架构、微核架构、微服务架构或云架构,本申请实施例以分层架构为例,示例性地描述电子设备100的软件系统。The hardware system of the electronic device 100 is described in detail above, and the software system of the electronic device 100 is introduced below. The software system may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present application uses a layered architecture as an example to exemplarily describe the software system of the electronic device 100 .
如图2所示,采用分层架构的软件系统分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,软件系统可以分为四层,从上至下分别为应用程序层、应用程序框架层、安卓运行时(Android Runtime)和系统库、以及内核层。As shown in Figure 2, a software system adopting a layered architecture is divided into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces. In some embodiments, the software system can be divided into four layers, which are application program layer, application program framework layer, Android Runtime (Android Runtime) and system library, and kernel layer respectively from top to bottom.
应用程序层可以包括相机、图库、日历、通话、地图、导航、WLAN、蓝牙、音乐、视频、短信息等应用程序。The application layer can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
本申请实施例的视频处理方法可以应用于相机应用程序或者视频应用程序;例如,可以在电子设备中的设置开启“影随人动”功能,电子设备检测到视频应用程序请求打开相机的指令后,可以开启“影随人动”功能;或者,可以在相机应用程序中设置开启“影随人动”功能,电子设备检测到相机应用程序请求打开相机的指令后,可以开启“影随人动”功能;“影随人动”功能可以参见后续图3中的描述。The video processing method of the embodiment of the present application can be applied to camera applications or video applications; for example, the "follow people" function can be enabled in the settings of the electronic device, and after the electronic device detects that the video application program requests to open the camera , you can turn on the "Shadow Follower" function; or, you can enable the "Shadow Follower" function in the camera application. " function; the function of "following people's movement" can refer to the description in the follow-up Figure 3.
应用程序框架层为应用程序层的应用程序提供应用程序编程接口(application programming interface,API)和编程框架。应用程序框架层可以包括一些预定义的函数。The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer can include some predefined functions.
例如,应用程序框架层包括窗口管理器、内容提供器、视图系统、电话管理器、资源管理器和通知管理器。For example, the application framework layer includes window managers, content providers, view systems, telephony managers, resource managers, and notification managers.
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏、锁定屏幕和截取屏幕。A window manager is used to manage window programs. The window manager can get the size of the display, determine whether there is a status bar, lock the screen, and capture the screen.
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频、图像、音频、拨打和接听的电话、浏览历史和书签、以及电话簿。Content providers are used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, and phonebook.
视图系统包括可视控件,例如显示文字的控件和显示图片的控件。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成,例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。The view system includes visual controls, such as those that display text and those that display pictures. The view system can be used to build applications. The display interface may be composed of one or more views, for example, a display interface including an SMS notification icon may include a view for displaying text and a view for displaying pictures.
电话管理器用于提供电子设备100的通信功能,例如通话状态(接通或挂断)的管理。The phone manager is used to provide communication functions of the electronic device 100, such as management of call status (connected or hung up).
资源管理器为应用程序提供各种资源,比如本地化字符串、图标、图片、布局文件和视频文件。The resource manager provides various resources to the application, such as localized strings, icons, pictures, layout files, and video files.
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction.
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。Android Runtime includes core library and virtual machine. The Android runtime is responsible for the scheduling and management of the Android system.
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。The core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理、堆栈管理、线程管理、安全和异常的管理、以及垃圾回收等功能。The application layer and the application framework layer run in virtual machines. The virtual machine executes the java files of the application program layer and the application program framework layer as binary files. The virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
系统库可以包括多个功能模块,例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:针对嵌入式系统的开放图形库(open graphics library for embedded systems,OpenGL ES)和2D图形引擎(例如:skia图形库(skia graphics library,SGL))。The system library can include multiple functional modules, such as: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: open graphics library for embedded systems (open graphics library for embedded systems, OpenGL ES) and 2D graphics engine (for example: skia graphics library (skia graphics library, SGL)).
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D图层和3D图层的融合。The surface manager is used to manage the display subsystem and provides the fusion of 2D layers and 3D layers for multiple applications.
媒体库支持多种音频格式的回放和录制、多种视频格式回放和录制以及静态图像文件。媒体库可以支持多种音视频编码格式,例如:MPEG4、H.264、动态图像专家组音频层面3(moving picture experts group audio layer III,MP3)、高级音频编码(advanced audio coding,AAC)、自适应多码率(adaptive multi-rate,AMR)、联合图像专家组(joint photographic experts group,JPG)和便携式网络图形(portable network graphics,PNG)。The media library supports playback and recording of multiple audio formats, playback and recording of multiple video formats, and still image files. The media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, moving picture experts group audio layer III (MP3), advanced audio coding (AAC), auto Adaptive multi-rate (adaptive multi-rate, AMR), joint photographic experts group (joint photographic experts group, JPG) and portable network graphics (portable network graphics, PNG).
三维图形处理库可以用于实现三维图形绘图、图像渲染、合成和图层处理。The 3D graphics processing library can be used to implement 3D graphics drawing, image rendering, compositing and layer processing.
二维图形引擎是2D绘图的绘图引擎。The 2D graphics engine is a drawing engine for 2D drawing.
内核层是硬件和软件之间的层。内核层可以包括显示驱动、摄像头驱动、音频驱动和传感器驱动等驱动模块。The kernel layer is the layer between hardware and software. The kernel layer may include driver modules such as display driver, camera driver, audio driver and sensor driver.
下面结合显示拍照场景,示例性说明电子设备100的软件系统和硬件系统的工作流程。In the following, the workflow of the software system and the hardware system of the electronic device 100 will be exemplarily described in combination with displaying a photographing scene.
当用户在触摸传感器180K上进行触摸操作时,相应的硬件中断被发送至内核层,内核层将触摸操作加工成原始输入事件,原始输入事件例如包括触摸坐标和触摸操作的时间戳等信息。原始输入事件被存储在内核层,应用程序框架层从内核层获取原始输入事件,识别出原始输入事件对应的控件,并通知该控件对应的应用程序(application,APP)。例如,上述触摸操作为单击操作,上述控件对应的APP为相机APP,相机APP被单击操作唤醒后,可以通过API调用内核层的摄像头驱动,通过摄像头驱动控制摄像头193进行拍摄。When the user performs a touch operation on the touch sensor 180K, a corresponding hardware interrupt is sent to the kernel layer, and the kernel layer processes the touch operation into an original input event. The original input event includes information such as touch coordinates and a time stamp of the touch operation. The original input event is stored in the kernel layer, and the application framework layer obtains the original input event from the kernel layer, identifies the control corresponding to the original input event, and notifies the corresponding application (application, APP) of the control. For example, the above-mentioned touch operation is a single-click operation, and the APP corresponding to the above-mentioned control is a camera APP. After the camera APP is awakened by the single-click operation, it can call the camera driver of the kernel layer through the API, and control the camera 193 to take pictures through the camera driver.
图3是一种适用于本申请的应用场景的示意图,即图3所示可以是指“影随人动”场景的示意图。FIG. 3 is a schematic diagram of an application scenario applicable to the present application, that is, the schematic diagram shown in FIG. 3 may refer to a scenario of “moving with shadows”.
示例性地,“影随人动”的原理可以是指电子设备的摄像头按照固定视野进行大分辨率 采集,对采集到的视频图像帧进行用户检测跟踪,实时定位用户位置;在用户的位置发生移动时,能根据实时定位的用户位置对大分辨率视频图像帧进行相应的裁剪处理、缩放处理,得到适应显示规格,且用户位于图像中特定区域的小分辨率图像,从而实现根据用户位置实时调整显示画面,达到“影随人动”的效果。For example, the principle of "following people's movements" may mean that the camera of the electronic device performs large-resolution acquisition according to a fixed field of view, detects and tracks the user on the captured video image frames, and locates the user's position in real time; When moving, the large-resolution video image frame can be cropped and scaled according to the real-time positioning of the user's position, and a small-resolution image that adapts to the display specifications and the user is located in a specific area of the image can be obtained, thereby realizing real-time monitoring according to the user's position. Adjust the display screen to achieve the effect of "moving with people".
在一个示例中,以电子设备是平板设备进行举例说明,如图3中的(a)所示的录像模式的显示界面;该显示界面中可以包括拍摄界面210;拍摄界面210中可以包括取景框211与用于指示录像的控件212;在检测到用户点击控件212之前,该取景框211内可以显示预览图像。In one example, the electronic device is a tablet device as an example, as shown in (a) in FIG. 211 and a control 212 for instructing video recording; before detecting that the user clicks on the control 212, a preview image may be displayed in the viewfinder frame 211.
检测到用户点击控件212的操作,响应于用户的操作平板设备可以进行视频拍摄;第一拍摄对象在第一位置时显示如图3中的(a)所示的第一图像帧;在视频拍摄过程中,第一拍摄对象进行了移动;比如,第一拍摄对象从第一位置移动至第二位置,第一拍摄对象移动位置后显示如图3中的(b)所示的第三图像帧;在第一拍摄对象移动后,第一拍摄对象在取景框211中能够始终保持在中间位置,该拍摄功能即为“影随人动”功能;换而言之,在平板设备开启“影随人动”功能后,平板设备的位置可以保存不变,在拍摄对象进行移动后,拍摄对象可以始终显示在视频显示画面的中间位置,或者中间区域。It is detected that the user clicks on the operation of the control 212, and in response to the operation of the user, the tablet device can perform video shooting; when the first subject is at the first position, the first image frame as shown in (a) in Figure 3 is displayed; during video shooting During the process, the first photographed object moves; for example, the first photographed object moves from the first position to the second position, and the third image frame as shown in (b) in FIG. 3 is displayed after the first photographed object moves the position ; After the first shooting object moves, the first shooting object can always remain in the middle position in the viewfinder frame 211, and this shooting function is the "shadow follower" function; After the "manual movement" function, the position of the tablet device can be kept unchanged, and after the subject moves, the subject can always be displayed in the middle position or the middle area of the video display screen.
在本申请的实施例中,第一拍摄对象在第一位置时,第一拍摄对象位于第一图像帧中的第一区域;第一拍摄对象移动至第二位置时,第一拍摄对象位于第三图像帧中的第二区域;其中,第一区域与第二区域之间存在交集。In the embodiment of the present application, when the first photographed object is at the first position, the first photographed object is located in the first area in the first image frame; when the first photographed object moves to the second position, the first photographed object is located in the first area The second area in the three image frames; wherein, there is an intersection between the first area and the second area.
在一个示例中,第一区域与第二区域之间存在交集可以是指第一区域与第二区域部分区域重合,如图4中的(a)与图4中的(b)所示。In an example, the intersection between the first area and the second area may mean that the first area and the second area partly overlap, as shown in (a) in FIG. 4 and (b) in FIG. 4 .
在一个示例中,第一区域与第二区域之间存在交集可以是指第一区域与第二区域完全重合,如图4中的(c)所示。In an example, the intersection between the first area and the second area may mean that the first area and the second area completely overlap, as shown in (c) in FIG. 4 .
可选地,第一区域与第二区域可以位于显示画面的中间区域,且第一区域与第二区域之间存在交集。Optionally, the first area and the second area may be located in a middle area of the display screen, and there is an intersection between the first area and the second area.
上述图3所示的场景为举例描述,本申请实施例提供的视频处理方法可以应用但不限于以下场景中:The scenario shown in Figure 3 above is described as an example, and the video processing method provided in the embodiment of the present application can be applied but not limited to the following scenarios:
视频通话、视频会议应用、长短视频应用、视频直播类应用、视频网课应用、人像智能运镜应用场景、系统相机录像功能录制视频、视频监控以及智能猫眼等人像拍摄类场景等。Video calls, video conferencing applications, long-term and short-term video applications, live video applications, online video course applications, application scenarios of intelligent portrait movement, system camera recording function recording video, video surveillance, and portrait shooting scenarios such as smart peepholes, etc.
目前,通常通过对用户进行人体检测实现视频显示画面中的用户跟踪,从而实现“影随人动”的功能;人体检测通常采用人体检测跟踪算法,即对用户的关键点进行检测;用户的关键点可以包括但不限于:头部、肩膀、胳膊、手、腿、脚、眼睛、鼻子、嘴巴及衣服等;但是,对用户进行人体检测跟踪算法的运算量较大,从而导致对电子设备的性能要求较高。At present, the user tracking in the video display screen is usually realized by performing human body detection on the user, so as to realize the function of "following people's movement". Points may include but are not limited to: head, shoulders, arms, hands, legs, feet, eyes, nose, mouth, and clothes; High performance requirements.
有鉴于此,本申请实施例提供了一种视频处理方法,在本申请的实施例中,在目标对象移动后获取该目标对象的视频图像帧,对视频图像帧进行人脸检测,确定目标对象的人脸框的坐标信息,根据人脸框的坐标信息得到裁剪框的坐标信息;进一步根据裁剪框对视频图像帧进行裁剪处理,得到显示内容;在本申请的实施例中,由于通过人脸框的坐标信息得到裁剪框的坐标信息,因此与直接对目标对象的人体关键点进行检测确定裁剪框的方 案相比,本申请的视频处理方法能够减少电子设备的运算量,降低电子设备的功耗;此外,由于本申请的视频处理方法是根据人脸框确定裁剪框的坐标信息,可以避免目标对象在视频图像帧中背对电子设备时,对目标对象进行视频跟踪显示;因此,本申请的方案在降低功耗的情况下,还能够提高视频跟踪显示的准确性。In view of this, the embodiment of the present application provides a video processing method. In the embodiment of the present application, the video image frame of the target object is obtained after the target object moves, and the face detection is performed on the video image frame to determine the target object According to the coordinate information of the face frame, the coordinate information of the crop frame is obtained according to the coordinate information of the face frame; further, the video image frame is trimmed according to the crop frame to obtain the display content; in the embodiments of the application, due to the The coordinate information of the frame is used to obtain the coordinate information of the cropping frame, so compared with the scheme of directly detecting the key points of the human body of the target object to determine the cropping frame, the video processing method of the present application can reduce the amount of computation of the electronic device and reduce the power of the electronic device. In addition, because the video processing method of the present application is to determine the coordinate information of the cropping frame according to the face frame, it can avoid the video tracking display of the target object when the target object is facing away from the electronic device in the video image frame; therefore, the present application The solution can also improve the accuracy of video tracking and display while reducing power consumption.
下面结合图5至图18对本申请实施例提供的视频处理方法进行详细说明。The video processing method provided by the embodiment of the present application will be described in detail below with reference to FIG. 5 to FIG. 18 .
本申请实施例提供的视频处理方法可以用于视频模式,其中,视频模式可以是指电子设备进行视频拍摄;或者,视频模式也可以是指电子设备进行视频通话。The video processing method provided in the embodiment of the present application may be used in a video mode, where the video mode may refer to the electronic device performing video shooting; or, the video mode may also refer to the electronic device performing video calls.
在一种可能的实现方式中,可以在电子设备的设置界面中设置开启“影随人动”的功能,在电子设备中用于视频通话的应用程序运行后,可以自动开启“影随人动”的功能执行本申请实施例的视频处理方法。In a possible implementation, the function of "following people" can be set in the setting interface of the electronic device. " function executes the video processing method of the embodiment of the present application.
在一种可能的实现方式中,可以在电子设备的相机中设置开启“影随人动”功能,根据设置可以在录制视频时可以开启“影随人动”的功能,执行本申请实施例的视频处理方法。In a possible implementation, the camera of the electronic device can be set to enable the "follow people" function, and according to the settings, the "follow people" function can be turned on when recording a video, and the implementation of the embodiment of the present application can be performed. Video processing method.
图5是本申请实施例提供的视频处理方法的示意性流程图。如图5所示的视频处理方法300包括步骤S301至步骤S316,下面分别对这些步骤进行详细的描述。Fig. 5 is a schematic flowchart of a video processing method provided by an embodiment of the present application. The video processing method 300 shown in FIG. 5 includes steps S301 to S316, and these steps will be described in detail below.
步骤S301、请求打开相机。Step S301, request to turn on the camera.
例如,电子设备中的应用程序下发请求打开相机的指令;其中,应用程序可以包括但不限于:微信视频通话应用程序、视频会议应用程序、视频直播应用程序、视频录制应用程序、相机应用程序等。For example, an application program in an electronic device sends an instruction requesting to turn on the camera; the application program may include but not limited to: WeChat video call application, video conferencing application, live video application, video recording application, camera application wait.
在一个示例中,电子设备的相机应用程序录制视频时,可以请求打开相机。In one example, when the camera application program of the electronic device is recording a video, it may request to open the camera.
例如,如图6所示,可以是用户点击相机应用程序的图标411进行视频拍摄时请求打开相机。For example, as shown in FIG. 6 , it may be a request to turn on the camera when the user clicks the icon 411 of the camera application to shoot a video.
在一个示例中,电子设备中的微信视频通话应用程序发起视频邀请或者接收视频邀请时,可以请求打开相机。In an example, when the WeChat video call application in the electronic device initiates or receives a video invitation, it may request to turn on the camera.
例如,如图6所示,可以是指用户点击视频应用程序的图标412进行视频通话时请求打开相机。For example, as shown in FIG. 6 , it may refer to a request to turn on the camera when the user clicks the icon 412 of the video application program to make a video call.
步骤S302、相机传感器检测到请求打开相机的指令,相机传感器获取视频图像帧(第二图像帧的一个示例)。Step S302, the camera sensor detects an instruction requesting to turn on the camera, and the camera sensor acquires a video image frame (an example of a second image frame).
例如,上述相机传感器可以是指相机模组中的图像传感器;视频图像帧可以是指用户位置改变时,图像传感器实时获取的图像帧。For example, the aforementioned camera sensor may refer to the image sensor in the camera module; the video image frame may refer to the image frame acquired by the image sensor in real time when the user's position changes.
示例性地,相机传感器获取的视频图像帧的分辨率大小可以为全尺寸(full size)。Exemplarily, the resolution size of the video image frame acquired by the camera sensor may be a full size (full size).
例如,相机模组中摄像头支持的最大分辨率为4096*2160,则获取的全尺寸的视频图像帧的分辨率可以为4096*2160。步骤S303、应用程序下发请求分辨率指令。For example, if the maximum resolution supported by the camera in the camera module is 4096*2160, then the resolution of the acquired full-size video image frame may be 4096*2160. Step S303, the application program issues a request resolution command.
示例性地,应用程序可以下发请求视频分辨率为w1*h1(第一分辨率的一个示例)的请求分辨率指令;该请求视频分辨率可以是指经过处理后在电子设备中保存的视频图像帧的分辨率。Exemplarily, the application program may issue a request resolution instruction requesting that the video resolution is w1*h1 (an example of the first resolution); the request video resolution may refer to the processed video saved in the electronic device The resolution of the image frame.
步骤S304、计算视频图像帧的目标分辨率(第二分辨率的一个示例)。Step S304, calculating the target resolution of the video image frame (an example of the second resolution).
例如,可以将应用程序请求的分辨率大小进行扩展处理得到目标分辨率;比如,可以将请求的w1*h1分辨率扩展一定倍率至分辨率w2*h2(w2>w1,h2>h1);其中,分辨率w2*h2 可以为目标分辨率。For example, the resolution size requested by the application can be expanded to obtain the target resolution; for example, the requested w1*h1 resolution can be expanded by a certain factor to the resolution w2*h2 (w2>w1, h2>h1); where , the resolution w2*h2 may be the target resolution.
在本申请的实施例中,将分辨率由为w1*h1扩展至分辨率为w2*h2,能够解决后续裁剪处理导致的视频图像帧清晰度下降的问题;通过进行分辨率扩展处理,能够在一定程度上使得剪裁处理后视频图像帧的清晰度得到提高。In the embodiment of the present application, the resolution is extended from w1*h1 to w2*h2, which can solve the problem of the video image frame definition degradation caused by subsequent cropping processing; by performing resolution extension processing, it is possible to To a certain extent, the clarity of the video image frame after the cropping process is improved.
步骤S305、ISP根据目标分辨率对视频图像帧进行处理,得到目标分辨率的视频图像帧。Step S305, the ISP processes the video image frame according to the target resolution to obtain the video image frame of the target resolution.
步骤S306、对视频图像帧进行人脸检测,得到人脸框(第一检测框的一个示例)的坐标信息。Step S306, performing face detection on the video image frame to obtain the coordinate information of the face frame (an example of the first detection frame).
例如,可以采用现有的人脸检测算法对相机传感器获取的视频图像帧进行人脸检测,得到人脸框的坐标信息。For example, an existing face detection algorithm may be used to perform face detection on video image frames acquired by a camera sensor to obtain coordinate information of the face frame.
在一个示例中,由于对全尺寸的视频图像帧进行处理时运算量较大,因此为了减少视频图像帧处理过程中的运算量,可以对全尺寸的视频图像帧进行下采样处理;例如,对全尺寸的视频图像帧进行下采样处理,得到分辨率为w3*h3的视频图像帧;对分辨率为w3*h3(w3<w1,h3<h1)的视频图像帧进行人脸检测,得的人脸框的坐标信息。In one example, since the processing of the full-size video image frame requires a large amount of calculation, in order to reduce the calculation amount during the processing of the video image frame, the full-size video image frame can be down-sampled; for example, the The full-size video image frame is down-sampled to obtain a video image frame with a resolution of w3*h3; the face detection is performed on a video image frame with a resolution of w3*h3 (w3<w1, h3<h1), and the obtained The coordinate information of the face frame.
步骤S307、判断是否开启机主识别;若开启机主识别,则执行步骤S308;若未开启机主识别,则执行步骤S310。Step S307 , judging whether to enable the identification of the owner of the device; if the identification of the owner of the device is enabled, execute step S308 ; if the identification of the owner of the device is not enabled, execute step S310 .
应理解,在开启机主识别后,可以只对视频图像帧中的机主用户进行跟踪显示;在未开启机主识别,可以对视频图像帧中所有用户进行跟踪显示;机主可以是指该平板设备的管理用户;或者,机主也可以是任意一个预先配置的优先级较高的用户。It should be understood that after the owner identification is turned on, only the owner user in the video image frame can be tracked and displayed; when the owner identification is not turned on, all users in the video image frame can be tracked and displayed; the owner can refer to the The administrative user of the tablet device; alternatively, the owner can also be any pre-configured user with higher priority.
情况一:在开启机主识别的场景Case 1: In the scene where the device owner identification is enabled
步骤S308、根据人脸框进行人脸识别。Step S308, performing face recognition according to the face frame.
示例性地,可以根据人脸框的坐标信息确定人脸框中的图像信息;对人脸框中的图像信息进行人脸识别;在对人脸框中的图像信息进行人脸识别时,可以根据电子设备中预先存储的人脸信息库进行匹配,从而确定人脸框中图像信息对应的用户身份。Exemplarily, the image information in the face frame can be determined according to the coordinate information of the face frame; the face recognition is performed on the image information in the face frame; when the face recognition is performed on the image information in the face frame, the Matching is performed according to the face information database pre-stored in the electronic device, so as to determine the identity of the user corresponding to the image information in the face frame.
在一个示例中,在开启机主识别的情况下,人脸信息库中包括机主用户的人脸信息,根据人脸信息库与人脸框中的图像信息进行匹配可以确定机主用户。In one example, when the identification of the device owner is turned on, the face information database includes the face information of the user of the device, and the user of the device can be determined by matching the face information database with the image information in the face frame.
需要说明的是,步骤S306中的人脸检测用于检测图像中的人脸框的坐标信息,即人脸检测用于检测图像中的人脸区域;人脸识别用于识别该人脸区域对应的用户身份信息。It should be noted that the face detection in step S306 is used to detect the coordinate information of the face frame in the image, that is, the face detection is used to detect the face area in the image; the face recognition is used to identify the face area corresponding to user identity information.
步骤S309、获取机主用户的人脸框的坐标信息。Step S309, acquiring the coordinate information of the face frame of the owner user.
例如,通过步骤S308可以确定机主用户,从而可以确定机主用户对应的人脸框的坐标信息。For example, the owner user can be determined through step S308, so that the coordinate information of the face frame corresponding to the owner user can be determined.
示例性地,如图18所示图像帧中可以包括第一用户与第二用户;若开启机主识别,则获取的人脸框的坐标信息可以是指机主用户711的人脸框的坐标信息,例如矩形框710的坐标信息。Exemplarily, as shown in FIG. 18 , the image frame may include the first user and the second user; if the owner identification is enabled, the acquired coordinate information of the face frame may refer to the coordinates of the face frame of the owner user 711 information, such as the coordinate information of the rectangular frame 710 .
步骤S310、对人脸框的坐标信息进行坐标转换。Step S310, performing coordinate transformation on the coordinate information of the face frame.
例如,对全尺寸的视频图像帧进行下采样处理,得到分辨率为w3*h3的视频图像帧;对w3*h3的视频图像帧进行人脸检测,得到机主用户的人脸框的坐标信息;将机主用户的人脸框的坐标信息转换至到w2*h2的分辨率坐标上,其中,w2>w3,h2>h3。For example, perform down-sampling processing on the full-size video image frame to obtain a video image frame with a resolution of w3*h3; perform face detection on a w3*h3 video image frame to obtain the coordinate information of the face frame of the owner user ; Convert the coordinate information of the face frame of the owner user to the resolution coordinates of w2*h2, where w2>w3, h2>h3.
示例性地,如图18所示对矩形框710的四个顶点的坐标信息进行转换,得到在w2*h2 分辨率时,对应的顶点坐标信息,进而确定矩形框720在w2*h2分辨率图像中的位置信息。Exemplarily, as shown in FIG. 18, the coordinate information of the four vertices of the rectangular frame 710 is converted to obtain the corresponding vertex coordinate information at the w2*h2 resolution, and then determine the w2*h2 resolution image of the rectangular frame 720 location information in .
情况二:未开启机主识别的场景Case 2: The scene where the owner identification is not enabled
在未开启机主识别的情况下,在执行步骤S307之后执行步骤S310;步骤S310、对步骤S306中检测到的人脸框的坐标信息进行坐标转换。If the owner identification is not enabled, step S310 is executed after step S307 is executed; step S310, performing coordinate transformation on the coordinate information of the face frame detected in step S306.
在一个示例中,如图9所示,视频图像帧中包括单用户,人脸框的坐标转换可以是指将矩形框430的四个顶点的坐标信息进行转换,得到在w2*h2分辨率时对应的顶点坐标信息,进而确定矩形框440在w2*h2分辨率图像中的位置信息。In one example, as shown in FIG. 9 , the video image frame includes a single user, and the coordinate transformation of the face frame may refer to transforming the coordinate information of the four vertices of the rectangular frame 430 to obtain Corresponding vertex coordinate information, and then determine the position information of the rectangular frame 440 in the w2*h2 resolution image.
在一个示例中,如图12所示,视频图像帧中包括两个用户,人脸框的坐标转换可以是指将矩形框510的四个顶点的坐标信息进行转换,得到在w2*h2分辨率时对应的顶点坐标信息,进而确定矩形框510在w2*h2分辨率图像中的位置信息,即得到矩形框520。In one example, as shown in FIG. 12 , two users are included in the video image frame, and the coordinate transformation of the face frame may refer to transforming the coordinate information of the four vertices of the rectangular frame 510 to obtain The corresponding vertex coordinate information is determined, and then the position information of the rectangular frame 510 in the w2*h2 resolution image is determined, that is, the rectangular frame 520 is obtained.
步骤S311、根据坐标转换后的人脸框的坐标信息计算人体框(第二检测框的一个示例)的坐标信息。Step S311 , calculating the coordinate information of the human body frame (an example of the second detection frame) according to the coordinate information of the human face frame after the coordinate transformation.
例如,根据步骤S310可以确定坐标转换后的人脸框的坐标信息;根据人体比例数据对坐标转换后的人脸框进行边界外扩处理(第一扩展处理的一个示例),得到人体框的坐标信息;其中,人体比例数据可以为预先设置的值。For example, according to step S310, the coordinate information of the human face frame after the coordinate conversion can be determined; according to the human body proportion data, the human face frame after the coordinate conversion is subjected to boundary extension processing (an example of the first extension processing), to obtain the coordinates of the human body frame information; wherein, the human body proportion data may be a preset value.
在一个示例中,可以以人脸矩形框为中心,以人脸矩形框为基准将上边界向外扩展0.5倍,下边界向外扩展1.0倍,左边界与右边界各向外扩展0.75倍。In an example, the upper boundary can be expanded outward by 0.5 times, the lower boundary can be expanded outward by 1.0 times, and the left boundary and right boundary can be expanded outward by 0.75 times, with the rectangular frame of the face as the center and the rectangular frame of the human face as the reference.
示例性地,如图9所示可以是如矩形框440所示的单用户人脸框进行边界外扩处理(第一边界扩展的一个示例),得到如矩形框450所示的人体框的坐标信息。Exemplarily, as shown in FIG. 9, the single-user face frame as shown in the rectangular frame 440 can be subjected to boundary extension processing (an example of the first boundary extension), and the coordinates of the human body frame as shown in the rectangular frame 450 can be obtained. information.
示例性地,如图12所示可以是如矩形框520所示的多用户人脸框进行边界外扩处理(第一边界扩展的一个示例),得到如矩形框530所示的两用户人体框的坐标信息。Exemplarily, as shown in FIG. 12 , the multi-user human face frames shown in the rectangular frame 520 can be subjected to boundary extension processing (an example of the first boundary extension), and the two-user human body frames shown in the rectangular frame 530 can be obtained. coordinate information.
步骤S312、根据人体框的坐标信息计算裁剪框的坐标信息。Step S312, calculating the coordinate information of the cropping frame according to the coordinate information of the human body frame.
例如,可以根据人体框进行边界外扩处理(第二边界扩展的一个示例),得到裁剪框的坐标信息。For example, boundary extension processing (an example of the second boundary extension) may be performed according to the body frame to obtain coordinate information of the cropping frame.
在一个示例中,可以以人体框为中心,以人体框为基准对上边界与下边界各向外扩展0.025倍;对左边界与右边界各向外扩展0.025倍,得到裁剪框。In an example, the body frame can be taken as the center, and the upper boundary and the lower boundary can be expanded outward by 0.025 times, and the left boundary and the right boundary can be expanded outward by 0.025 times respectively based on the human body frame, to obtain a cropping frame.
示例性地,如图9所示可以是如矩形框450所示的单用户人体框(第二检测框的一个示例)进行边界外扩处理(第二边界扩展的一个示例),得到如矩形框460所示的裁剪框的坐标信息。Exemplarily, as shown in FIG. 9, a single-user human body frame (an example of the second detection frame) shown in the rectangular frame 450 may be subjected to boundary extension processing (an example of the second boundary extension), and a rectangular frame such as Coordinate information of the cropping frame shown in 460 .
示例性地,如图12所示可以是如矩形框530所示的多用户人体框(第二检测框的一个示例)进行边界外扩处理(第二边界扩展的一个示例),得到如矩形框540所示的裁剪框的坐标信息。Exemplarily, as shown in FIG. 12, the multi-user human body frame (an example of the second detection frame) shown in the rectangular frame 530 may be subjected to boundary extension processing (an example of the second boundary extension), to obtain a rectangular frame such as Coordinate information of the cropping frame shown in 540 .
在本申请的实施例中,由于人脸检测算法是基于视频图像帧中每个图像帧进行检测的,因此输出的人脸框可能会在时域上存在局部跳变;为了避免视频图像帧中人脸框出现局部抖动的情况,确保用户在进行小幅度运动时,裁剪框能够保持不变;通过对人体框进行边界外扩处理得到裁剪框可以在一定程度上确保裁剪处理后图像帧的稳定性。In the embodiment of the present application, since the face detection algorithm detects based on each image frame in the video image frame, the output face frame may have local jumps in the time domain; in order to avoid The local jitter of the face frame ensures that the cropping frame remains unchanged when the user performs small movements; the cropping frame obtained by expanding the boundary of the human body frame can ensure the stability of the image frame after cropping to a certain extent sex.
步骤S313、裁剪处理与缩放处理的条件判断。Step S313 , judging the conditions of cropping processing and zooming processing.
例如,可以判断上述得到的人体框与裁剪框是否满足预设条件,从而确定是否进行后续的裁剪处理与缩放处理。For example, it may be determined whether the obtained human body frame and cropping frame satisfy a preset condition, so as to determine whether to perform subsequent cropping and scaling processing.
示例性地,预设条件可以是指人体框与裁剪框满足一定的比例关系,并且人体框位于裁剪框的内部。Exemplarily, the preset condition may refer to that the human body frame and the cropping frame meet a certain proportional relationship, and the human body frame is located inside the cropping frame.
在一种可能的实现方式中,在人体框与裁剪框不满足预设条件时,则重复执行上述步骤S306至步骤S312,重新计算裁剪框的坐标信息。In a possible implementation manner, when the body frame and the cropping frame do not meet the preset conditions, the above step S306 to step S312 are repeatedly executed to recalculate the coordinate information of the cropping frame.
步骤S314、裁剪处理与缩放处理的参数计算。Step S314 , parameter calculation for cropping and scaling.
例如,基于裁剪框的坐标信息和视频图像帧的画面坐标信息,按照N帧视频图像帧的调整策略(比如,根据平滑度要求),计算ISP对视频图像帧进行裁剪处理和缩放处理的参数,并将参数下发至ISP。For example, based on the coordinate information of the cropping frame and the screen coordinate information of the video image frame, according to the adjustment strategy of N frames of video image frames (for example, according to the smoothness requirement), the parameters for the clipping and scaling processing of the video image frame by the ISP are calculated, And deliver the parameters to the ISP.
在一个示例中,在确定裁剪框后,若用户离摄像头较远,即用户在画面中显示区域较小,则可以对裁剪框进行一定程度的放大;比如,可以以裁剪框为中心进行放大,最大可以放大至2倍裁剪框大小对视频图像帧进行裁剪处理。In one example, after the cropping frame is determined, if the user is far away from the camera, that is, the display area of the user in the screen is small, the cropping frame can be enlarged to a certain extent; for example, the cropping frame can be zoomed in as the center, The maximum can be enlarged to 2 times the size of the crop frame to crop the video image frame.
步骤S315、ISP接收到剪裁处理与缩放处理的参数,ISP对视频图像帧进行裁剪处理与缩放处理。Step S315 , the ISP receives the parameters of the cropping and scaling processing, and the ISP performs cropping and scaling processing on the video image frame.
例如,ISP根据裁剪框的坐标信息对视频图像帧进行裁剪处理,得到显示内容;根据请求分辨率大小可以对显示内容进行缩放处理,使得处理后的视频图像帧满足请求分辨率大小。For example, the ISP cuts the video image frame according to the coordinate information of the cropping frame to obtain the display content; it can scale the display content according to the requested resolution, so that the processed video image frame meets the requested resolution.
步骤S316、在应用程序中显示视频图像帧(第三图像帧的一个示例)。Step S316, displaying the video image frame (an example of the third image frame) in the application program.
例如,将经过ISP裁剪处理与缩放处理后的视频图像帧传输至应用程序,在应用程序中显示视频图像帧。For example, the video image frame that has been cropped and scaled by the ISP is transmitted to the application program, and the video image frame is displayed in the application program.
示例性地,经过ISP裁剪处理与缩放处理后的视频图像帧的分辨率大小为步骤S303中请求的分辨率大小;将ISP处理后的视频图像帧传输至应用程序,根据电子设备的显示屏分辨率大小显示适合电子设备的显示规格的视频图像帧。Exemplarily, the resolution size of the video image frame after the ISP cropping and scaling processing is the resolution size requested in step S303; the video image frame processed by the ISP is transmitted to the application program, and is resolved according to the display screen of the electronic device The video image frame is displayed at a rate size suitable for the display specifications of the electronic device.
在本申请的实施例中,在目标对象移动后获取该目标对象的视频图像帧,对视频图像帧进行人脸检测,确定目标对象的人脸框的坐标信息,根据人脸框的坐标信息得到裁剪框的坐标信息;进一步根据裁剪框对视频图像帧进行裁剪处理,得到显示内容;在本申请的实施例中,由于通过人脸框的坐标信息确定裁剪框的坐标信息,因此与直接对目标对象的人体关键点进行检测确定裁剪框的方案相比,本申请的视频处理方法能够减少电子设备的运算量,降低电子设备的功耗;此外,由于本申请的视频处理方法是根据人脸框确定裁剪框的坐标信息,可以避免目标对象在视频图像帧中背向面对电子设备时,对目标对象进行视频跟踪显示;因此,本申请的方案在降低功耗的情况下,还能够提高视频跟踪显示的准确性。In the embodiment of the present application, the video image frame of the target object is obtained after the target object moves, the face detection is performed on the video image frame, the coordinate information of the face frame of the target object is determined, and the coordinate information of the face frame is obtained according to the coordinate information of the face frame Coordinate information of the cropping frame; further according to the clipping frame, the video image frame is clipped to obtain the display content; in the embodiment of the application, since the coordinate information of the cropping frame is determined by the coordinate information of the face frame, it is directly related to the target Compared with the scheme of detecting the key points of the human body of the object to determine the cropping frame, the video processing method of the present application can reduce the amount of computation of the electronic device and reduce the power consumption of the electronic device; in addition, since the video processing method of the present application is based on the face frame Determining the coordinate information of the cropping frame can prevent the target object from performing video tracking and display on the target object when its back faces the electronic device in the video image frame; therefore, the solution of the present application can also improve the video frequency while reducing power consumption. The accuracy of the tracking display.
下面结合图6至图18分别对目标对象为单用户、多用户未开启机主识别与多用户开启机主识别的情况下进行视频处理的过程分别进行详细描述。The process of video processing is described in detail below in conjunction with FIG. 6 to FIG. 18 , respectively, when the target object is a single user, multi-users do not enable the device owner identification, and multiple users enable the device owner identification.
示例性地,以电子设备为平板设备进行举例说明;图6示出了平板设备的一种图形用户界面(graphical user interface,GUI),该GUI为平板设备的桌面410;桌面410中可以包括相机应用程序的图标411与视频应用程序的图标412。Exemplarily, the electronic device is illustrated as a tablet device; FIG. 6 shows a graphical user interface (graphical user interface, GUI) of the tablet device, and the GUI is a desktop 410 of the tablet device; the desktop 410 may include a camera An icon 411 of the application program and an icon 412 of the video application program.
情况一:单用户自动运镜Case 1: Single-user automatic camera movement
在一个示例中,视频预览画面中可以包括单个用户,此时视频画面将自动跟踪此用户。In one example, a single user may be included in the video preview screen, and the video screen will automatically track this user at this time.
图7是用户在使用平板设备进行视频通话的显示界面;如图7所示,该显示界面中可 以包括视频通话界面420,视频通话界面420中可以包括第一拍摄对象421的预览图像、视频通话框、用于指示取消的控件以及用于指示转为语音的控件。在用户通过平板设备向对方发起视频邀请后,平板设备的摄像头采集固定视野的预览图像,显示如图7所示的显示界面;在对方接通视频通话后,可以显示如图8所示的显示界面。FIG. 7 is a display interface of a user using a tablet device to conduct a video call; as shown in FIG. box, a control to indicate Cancel, and a control to indicate Go to Speech. After the user initiates a video invitation to the other party through the tablet device, the camera of the tablet device collects a preview image with a fixed field of view and displays the display interface as shown in Figure 7; after the other party connects to the video call, the display as shown in Figure 8 can be displayed interface.
应理解,图7与图8可以是电子设备开启“影随人动”功能,通过本申请实施例提供的视频处理方法对相机采集的预览图像进行裁剪处理、缩放处理,处理为适合平板设备的显示规格的视频图像。当平板设备中打开相机时,会执行本申请实施例提供的视频处理方法。It should be understood that, in Figures 7 and 8, the electronic device may enable the function of "moving with the shadow", and the preview image collected by the camera may be cropped and zoomed through the video processing method provided by the embodiment of the present application, and the processing is suitable for the tablet device. A video image of the specification is displayed. When the camera is turned on in the tablet device, the video processing method provided in the embodiment of the present application will be executed.
结合图9对得到如图7所示的视频图像的处理过程进行详细描述。The process of obtaining the video image shown in FIG. 7 will be described in detail with reference to FIG. 9 .
应理解,图9所示的处理过程是由平板设备内部的处理器或者配置于平板设备的芯片执行的,该处理过程并不会在显示界面中进行显示。It should be understood that the processing shown in FIG. 9 is executed by a processor inside the tablet device or a chip configured in the tablet device, and the processing process will not be displayed on the display interface.
示例性地,对于单用户的场景,上述图5所示的步骤S306可以是得到如图9所示的矩形框430,矩形框430表示人脸框;步骤S310可以如图9所示将矩阵框430转换至矩形框440,矩形框440表示坐标转换后的人脸框。例如,对矩形框430的四个顶点的坐标信息进行转换,得到在w2*h2分辨率时对应的顶点坐标信息,进而确定矩形框440在w2*h2分辨率图像中的位置信息。步骤S311可以如图9所示根据矩形框440进行边界外扩处理得到矩形框450的坐标信息,矩形框450表示单用户的人体框。步骤S312可以如图9所示根据矩形框450进行边界外扩处理得到矩形框460的坐标信息,矩形框460表示单用户的裁剪框。Exemplarily, for a single-user scenario, step S306 shown in FIG. 5 above may be to obtain a rectangular frame 430 as shown in FIG. 430 is converted to a rectangular frame 440, which represents the face frame after coordinate transformation. For example, the coordinate information of the four vertices of the rectangular frame 430 is converted to obtain the corresponding vertex coordinate information at the w2*h2 resolution, and then the position information of the rectangular frame 440 in the w2*h2 resolution image is determined. In step S311 , as shown in FIG. 9 , the coordinate information of the rectangular frame 450 can be obtained by performing boundary extension processing according to the rectangular frame 440 , and the rectangular frame 450 represents a single user's body frame. In step S312 , as shown in FIG. 9 , the coordinate information of the rectangular frame 460 can be obtained by performing boundary extension processing according to the rectangular frame 450 , and the rectangular frame 460 represents a single user's cropping frame.
进一步地,在矩形框450和矩形框460满足预设条件时,根据矩形框460的坐标信息与视频图像帧的坐标信息确定剪裁处理与缩放处理的参数;根据裁剪处理与缩放处理的参数对视频图像帧进行裁剪处理与缩放处理,得到适合平板设备的显示规格的输出的视频图像帧。Further, when the rectangular frame 450 and the rectangular frame 460 meet the preset conditions, determine the parameters of the cropping process and the scaling process according to the coordinate information of the rectangular frame 460 and the coordinate information of the video image frame; The image frame is cropped and scaled to obtain an output video image frame suitable for the display specification of the tablet device.
例如,如图9所示,根据裁剪框460可以得到裁剪处理后的显示内容;根据下发的请求分辨率大小可以对显示内容进行缩放处理,得到处理后的视频图像帧;将处理后的视频图像帧发送至视频通话应用程序,根据平板设备的显示屏分辨率大小得到适合平板设备的显示规格的视频图像帧。For example, as shown in Figure 9, the cropped display content can be obtained according to the cropping frame 460; the displayed content can be scaled according to the requested resolution size to obtain the processed video image frame; the processed video The image frame is sent to the video call application program, and the video image frame suitable for the display specification of the tablet device is obtained according to the resolution of the display screen of the tablet device.
需要说明的是,上述具体步骤参见图5所示的相关描述,此处不再赘述。It should be noted that, for the specific steps above, refer to the relevant description shown in FIG. 5 , and details are not repeated here.
情况二:多用户自动运镜Case 2: Multi-user automatic camera movement
在一个示例中,视频预览画面中可以包括多个用户,视频画面可以根据所有用户的位置进行自动画面调整,确保所有用户均显示在视频画面中。In an example, multiple users may be included in the video preview screen, and the video screen may be automatically adjusted according to the positions of all users to ensure that all users are displayed in the video screen.
图10是用户在使用平板设备进行视频通话的显示界面;如图10所示,该显示界面中可以包括视频通话界面501,视频通话界面501中可以包括第一拍摄对象与第二拍摄对象、用于指示取消的控件以及用于指示转为语音的控件。在视频通话的过程中,平板设备可以显示如图11所示的显示界面。Fig. 10 is a display interface of a user using a tablet device for video calling; as shown in Fig. There are controls to indicate cancel and controls to indicate turn to speech. During the video call, the tablet device may display a display interface as shown in FIG. 11 .
应理解,图10与图11是电子设备开启“影随人动”功能,通过本申请实施例提供的视频处理方法对相机采集的预览图像进行裁剪处理、缩放处理,处理为适合平板设备的显示规格的视频图像。当平板设备中打开相机时,会执行本申请实施例提供的视频处理方法。It should be understood that in Figures 10 and 11, the function of "moving with the shadow" is enabled on the electronic device, and the preview image captured by the camera is processed and scaled through the video processing method provided by the embodiment of the present application, and processed into a display suitable for the tablet device specifications of the video image. When the camera is turned on in the tablet device, the video processing method provided in the embodiment of the present application will be executed.
结合图12对得到如图11所示的视频图像帧的处理过程进行详细描述。应理解,图12 所示的处理过程是由平板设备内部的处理器或者配置于平板设备的芯片执行的,该处理过程并不会在显示界面中进行显示。The process of obtaining the video image frame shown in FIG. 11 will be described in detail with reference to FIG. 12 . It should be understood that the processing shown in FIG. 12 is executed by a processor inside the tablet device or a chip configured in the tablet device, and the processing process will not be displayed on the display interface.
示例性地,对于多用户未开启机主识别的场景,图5所示的步骤S306可以如图12所示根据每个用户人脸框的坐标信息确定包括所有多用户人脸框的最小并集框的坐标信息,多用户人脸框例如矩形框510。步骤S310可以如图12所示将矩阵框510转换至矩形框520,矩形框520表示坐标转换后的人脸框。例如,对矩形框510的四个顶点的坐标信息进行转换,得到在w2*h2分辨率时对应的顶点坐标信息,进而确定矩形框520在w2*h2分辨率图像中的位置信息。步骤S311可以如图12所示根据矩形框520进行边界外扩处理得到矩形框530的坐标信息,矩形框530表示多用户的人体框。步骤S312如图12所示根据矩形框530进行边界外扩处理得到540的坐标信息,矩形框540表示多用户的裁剪框。Exemplarily, for a scenario where multiple users do not enable the owner identification, step S306 shown in FIG. 5 can determine the minimum union set including all multi-user face frames according to the coordinate information of each user's face frame as shown in FIG. 12 Coordinate information of the frame, a multi-user face frame such as the rectangular frame 510 . Step S310 may transform the matrix frame 510 into a rectangular frame 520 as shown in FIG. 12 , and the rectangular frame 520 represents the face frame after coordinate transformation. For example, the coordinate information of the four vertices of the rectangular frame 510 is converted to obtain the corresponding vertex coordinate information at the w2*h2 resolution, and then the position information of the rectangular frame 520 in the w2*h2 resolution image is determined. In step S311 , as shown in FIG. 12 , the coordinate information of the rectangular frame 530 can be obtained by performing boundary extension processing according to the rectangular frame 520 , and the rectangular frame 530 represents the body frames of multiple users. Step S312 , as shown in FIG. 12 , performs boundary extension processing according to the rectangular frame 530 to obtain the coordinate information 540 , and the rectangular frame 540 represents the cropping frame of multiple users.
进一步地,在矩形框540和矩形框550满足预设条件时,根据矩形框550的坐标信息与视频图像帧的坐标信息确定剪裁处理与缩放处理的参数;根据裁剪处理与缩放处理的参数对视频图像帧进行裁剪处理与缩放处理,得到适合平板设备的显示规格的输出的视频图像帧。Further, when the rectangular frame 540 and the rectangular frame 550 meet the preset conditions, determine the parameters of the cropping process and the scaling process according to the coordinate information of the rectangular frame 550 and the coordinate information of the video image frame; The image frame is cropped and scaled to obtain an output video image frame suitable for the display specification of the tablet device.
例如,如图12所示,根据裁剪框540可以得到裁剪处理后的显示内容;根据下发的请求分辨率大小可以对显示内容进行缩放处理,得到处理后的视频图像帧;将处理后的视频图像帧发送至视频通话应用程序,根据平板设备的显示屏分辨率大小得到适合平板设备的显示规格的视频图像帧。For example, as shown in Figure 12, the cropped display content can be obtained according to the cropping frame 540; the displayed content can be scaled according to the requested resolution size to obtain the processed video image frame; the processed video The image frame is sent to the video call application program, and the video image frame suitable for the display specification of the tablet device is obtained according to the resolution of the display screen of the tablet device.
需要说明的是,上述具体步骤参见图5所示的相关描述,此处不再赘述。It should be noted that, for the specific steps above, refer to the relevant description shown in FIG. 5 , and details are not repeated here.
应理解,多用户自动运镜与单用户自动运镜相比;多用户自动运镜在确定多用户人脸框的坐标信息时,根据多用户中各个用户的人脸框的最小并集确定多用户人脸框的坐标信息;其余步骤与单用户运镜相同,可以参见单用户运镜的相关内容描述,此处不再赘述。It should be understood that the multi-user automatic mirror movement is compared with the single-user automatic mirror movement; when the multi-user automatic mirror movement determines the coordinate information of the multi-user face frame, the multi-user is determined according to the minimum union of the face frames of each user in the multi-user. The coordinate information of the user's face frame; the rest of the steps are the same as the single-user mirror operation, please refer to the relevant content description of the single-user mirror operation, and will not be repeated here.
在一个示例中,如图13所示视频通话界面503对应的拍摄场景中可以包括第一拍摄对象504、第二拍摄对象505以及第三拍摄对象506;其中,第一拍摄对象504与第二拍摄对象505是用户面部正向面对摄像头的,第三拍摄对象506是用户面部背向面对摄像头的;因此,根据本申请实施例提供的视频处理方法在进行人脸检测时无法检测到第三拍摄对象506的人脸框的坐标信息,在进行用户跟踪时可以对第一拍摄对象504与第二拍摄对象505进行跟踪显示,不对第三拍摄对象506进行跟踪显示;即在第一拍摄对象504与第二拍摄对象505移动后,可以对第一拍摄对象504与第二拍摄对象505进行跟踪显示,使得第一拍摄对象504与第二拍摄对象505可以始终保持在视频显示画面的中间区域;例如图14所示的显示界面。In one example, as shown in FIG. 13 , the shooting scene corresponding to the video call interface 503 may include a first shooting object 504, a second shooting object 505, and a third shooting object 506; The object 505 is the user's face facing the camera, and the third object 506 is the user's face facing the camera; therefore, the video processing method provided according to the embodiment of the present application cannot detect the third object when performing face detection. The coordinate information of the face frame of the subject 506 can be tracked and displayed on the first subject 504 and the second subject 505 during user tracking, and the third subject 506 cannot be tracked and displayed; that is, the first subject 504 After moving with the second object 505, the first object 504 and the second object 505 can be tracked and displayed, so that the first object 504 and the second object 505 can always remain in the middle area of the video display screen; for example The display interface shown in Figure 14.
情况三:机主自动运镜Situation 3: The owner automatically moves the mirror
在一个示例中,可以开启机主跟踪模式,视频预览画面中可以包括多个用户,对多个用户进行人脸检测与人脸识别确定目标用户即机主用户,视频画面可以实现对机主用户进行自动跟踪。In one example, the owner tracking mode can be turned on, and multiple users can be included in the video preview screen, face detection and face recognition are performed on multiple users to determine the target user is the owner user, and the video screen can realize the tracking of the owner user. Do automatic tracking.
图15是视频通过的设置显示界面;如图15所示,在设置显示界面601中可以开启主角模式,该主角模式可以是指如图5所示的开启机主识别。图16是用户在使用平板设备进行视频通话的显示界面;该显示界面中可以包括视频通话界面602,视频通话界面602中可以包括第一拍摄对象、用于指示取消的控件以及用于指示转为语音的控件。在视频通 话的过程中,可以显示如图17所示的显示界面。FIG. 15 is a setting display interface for video passing; as shown in FIG. 15 , the main character mode can be turned on in the setting display interface 601 , and the main character mode can refer to turning on the owner identification as shown in FIG. 5 . 16 is a display interface of a user using a tablet device for a video call; the display interface may include a video call interface 602, and the video call interface 602 may include the first subject, a control for indicating cancellation, and a control for indicating to switch to Voice controls. During the video call, a display interface as shown in Figure 17 can be displayed.
应理解,图16与图17是电子设备开启“影随人动”功能,通过本申请实施例提供的视频处理方法对相机采集的预览图像进行裁剪处理、缩放处理,处理为适合平板设备的显示规格的视频图像。当平板设备中打开相机时,会执行本申请实施例提供的视频处理方法。It should be understood that in Figure 16 and Figure 17, the electronic device turns on the function of "moving with the shadow", and the video processing method provided by the embodiment of the present application performs cropping and scaling processing on the preview image captured by the camera, and processes it into a display suitable for a tablet device specifications of the video image. When the camera is turned on in the tablet device, the video processing method provided in the embodiment of the present application will be executed.
结合图18对得到如图17所示的视频图像帧的处理过程进行详细描述。应理解,图18所示的处理过程是由平板设备内部的处理器或者配置于平板设备的芯片执行的,该处理过程并不会在显示界面中进行显示。The process of obtaining the video image frame shown in FIG. 17 will be described in detail with reference to FIG. 18 . It should be understood that the processing shown in FIG. 18 is executed by a processor inside the tablet device or a chip configured in the tablet device, and the processing process will not be displayed on the display interface.
示例性地,对于多用户开启机主识别的场景,上述图5所示的步骤S306可以如图18所示获取矩形框710的坐标信息。步骤S310可以如图18所示将矩阵框710转换至矩形框720,矩形框720表示坐标转换后的机主用户的人脸框。例如,对矩形框710的四个顶点的坐标信息进行转换,得到在w2*h2分辨率时对应的顶点坐标信息,进而确定矩形框720在w2*h2分辨率图像中的位置信息。步骤S311可以如图18所示根据矩形框720进行边界外扩处理得到矩形框730的坐标信息,矩形框730表示机主用户的人体框。步骤S312可以如图18所示根据矩形框730进行边界外扩处理得到740的坐标信息,矩形框740可以表示机主用户的裁剪框。Exemplarily, for a scenario where multiple users enable the identification of the owner of the device, step S306 shown in FIG. 5 may acquire the coordinate information of the rectangular frame 710 as shown in FIG. 18 . Step S310 may transform the matrix frame 710 into a rectangular frame 720 as shown in FIG. 18 , and the rectangular frame 720 represents the face frame of the owner user after coordinate transformation. For example, the coordinate information of the four vertices of the rectangular frame 710 is converted to obtain the corresponding vertex coordinate information at the w2*h2 resolution, and then the position information of the rectangular frame 720 in the w2*h2 resolution image is determined. In step S311 , as shown in FIG. 18 , the coordinate information of the rectangular frame 730 can be obtained by performing boundary expansion processing according to the rectangular frame 720 , and the rectangular frame 730 represents the body frame of the owner user. In step S312 , as shown in FIG. 18 , the coordinate information of 740 can be obtained by performing border extension processing according to the rectangular frame 730 , and the rectangular frame 740 can represent the cropping frame of the owner user.
进一步地,在矩形框730和矩形框740满足预设条件时,可以根据矩形框740的坐标信息与视频图像帧的坐标信息确定剪裁处理与缩放处理的参数;根据裁剪处理与缩放处理的参数对视频图像帧进行裁剪处理与缩放处理,得到适合平板设备的显示规格的输出的视频图像帧。Further, when the rectangular frame 730 and the rectangular frame 740 meet the preset conditions, the parameters of the cropping process and the scaling process can be determined according to the coordinate information of the rectangular frame 740 and the coordinate information of the video image frame; The video image frame is clipped and scaled to obtain an output video image frame suitable for the display specification of the tablet device.
例如,如图18所示,根据裁剪框740可以得到裁剪处理后的显示内容;根据下发的请求分辨率大小可以对显示内容进行缩放处理,得到处理后的视频图像帧;将处理后的视频图像帧发送至视频通话应用程序,根据显示屏分辨率大小得到适合平板设备的显示规格的视频图像帧。For example, as shown in Figure 18, the cropped display content can be obtained according to the cropping frame 740; the displayed content can be scaled according to the issued request resolution to obtain the processed video image frame; the processed video The image frame is sent to the video call application program, and the video image frame suitable for the display specification of the tablet device is obtained according to the resolution of the display screen.
需要说明的是,上述具体步骤参见图5所示的相关描述,此处不再赘述。It should be noted that, for the specific steps above, refer to the relevant description shown in FIG. 5 , and details are not repeated here.
应理解,机主用户自动运镜与单用户自动运镜相比;机主用户自动运镜在确定多用户中每个用户的人脸框的坐标信息之后会对每个用户的人脸框进行人脸识别,从而确定机主用户的人脸框的坐标信息;其余步骤与单用户运镜相同,可以参见单用户运镜的相关内容描述,此处不再赘述。It should be understood that the automatic mirror movement of the owner user is compared with the automatic mirror movement of a single user; the automatic mirror movement of the owner user will carry out the face frame of each user after determining the coordinate information of the face frame of each user among the multi-users. Face recognition, so as to determine the coordinate information of the face frame of the owner user; the rest of the steps are the same as single-user mirror operation, please refer to the relevant content description of single-user mirror operation, and will not repeat them here.
在本申请的实施例中,通过对获取视频图像帧进行人脸检测,确定目标对象的人脸框的坐标信息,根据人脸框的坐标信息得到裁剪框的坐标信息;进一步根据裁剪框对视频图像帧进行处理显示输出的视频图像帧;在本申请的实施例中,由于通过人脸框的坐标信息确定裁剪框的坐标信息,因此与直接对目标对象的人体关键点进行检测确定裁剪框的方案相比,本申请的视频处理方法能够减少电子设备的运算量,降低电子设备的功耗;此外,由于本申请的视频处理方法是根据人脸框确定裁剪框的坐标信息,可以避免目标对象在第二图像帧中背向面对电子设备时,对目标对象进行视频跟踪显示;因此,本申请的方案在降低功耗的情况下,还能够提高视频跟踪显示的准确性。In the embodiment of the present application, by performing face detection on the acquired video image frame, the coordinate information of the face frame of the target object is determined, and the coordinate information of the cropping frame is obtained according to the coordinate information of the human face frame; The image frame is processed to display the output video image frame; in the embodiment of the present application, since the coordinate information of the cropping frame is determined by the coordinate information of the face frame, it is different from directly detecting the key points of the human body of the target object to determine the cropping frame. Compared with other solutions, the video processing method of the present application can reduce the computation load of the electronic device and reduce the power consumption of the electronic device; in addition, since the video processing method of the present application determines the coordinate information of the cropping frame according to the face frame, it can avoid the When facing back to the electronic device in the second image frame, video tracking and display is performed on the target object; therefore, the solution of the present application can also improve the accuracy of video tracking and display while reducing power consumption.
应理解,上述举例说明是为了帮助本领域技术人员理解本申请实施例,而非要将本申请实施例限于所例示的具体数值或具体场景。本领域技术人员根据所给出的上述举例说明, 显然可以进行各种等价的修改或变化,这样的修改或变化也落入本申请实施例的范围内。It should be understood that the above illustrations are intended to help those skilled in the art understand the embodiments of the present application, rather than to limit the embodiments of the present application to the illustrated specific values or specific scenarios. Those skilled in the art can obviously make various equivalent modifications or changes based on the above examples given, and such modifications or changes also fall within the scope of the embodiments of the present application.
上文结合图1至图18详细描述了本申请实施例提供的视频处理方法;下面将结合图19与图20详细描述本申请的装置实施例。应理解,本申请实施例中的装置可以执行前述本申请实施例的各种方法,即以下各种产品的具体工作过程,可以参考前述方法实施例中的对应过程。The video processing method provided by the embodiment of the present application is described in detail above with reference to FIG. 1 to FIG. 18 ; the device embodiment of the present application will be described in detail below in conjunction with FIG. 19 and FIG. 20 . It should be understood that the devices in the embodiments of the present application can execute the various methods in the foregoing embodiments of the present application, that is, the specific working processes of the following various products can refer to the corresponding processes in the foregoing method embodiments.
图19是本申请实施例提供的视频处理装置的结构示意图。该视频处理装置800包括显示单元810和处理单元820。FIG. 19 is a schematic structural diagram of a video processing device provided by an embodiment of the present application. The video processing device 800 includes a display unit 810 and a processing unit 820 .
其中,所述显示单元810用于显示第一图像帧,所述第一图像帧为目标对象在第一位置的图像帧;处理单元820用于在所述目标对象移动至第二位置的情况下,获取第二图像帧,所述第二位置与所述第一位置为不同位置,所述第二图像帧是指所述目标对象移动至所述第二位置时所述电子设备采集的图像帧;根据所述第二图像帧进行人脸检测,得到第一检测框的坐标信息,所述第一检测框用于指示所述目标对象的脸部在所述第二图像帧中的位置信息;根据所述第一检测框得到裁剪框的坐标信息;根据所述裁剪框对所述第二图像帧进行裁剪处理,得到包括所述目标对象的显示内容;所述显示单元810还用于根据所述显示内容显示第三图像帧,其中,所述第一图像帧中的第一区域与所述第三图像帧中的第二区域存在交集,所述第一区域是指所述第一图像帧中所述目标对象的所在区域,所述第二区域是指所述第三图像帧中所述目标对象的所在区域。Wherein, the display unit 810 is configured to display a first image frame, and the first image frame is an image frame of the target object at the first position; the processing unit 820 is configured to display the target object when the target object moves to the second position , acquiring a second image frame, the second position is different from the first position, and the second image frame refers to an image frame collected by the electronic device when the target object moves to the second position ; Perform face detection according to the second image frame to obtain coordinate information of a first detection frame, the first detection frame is used to indicate the position information of the face of the target object in the second image frame; Obtain the coordinate information of the cropping frame according to the first detection frame; perform cropping processing on the second image frame according to the cropping frame to obtain the display content including the target object; the display unit 810 is also configured to obtain the display content according to the The above display content displays a third image frame, wherein there is an intersection between the first area in the first image frame and the second area in the third image frame, and the first area refers to the first image frame The area where the target object is located in, the second area refers to the area where the target object is located in the third image frame.
可选地,作为一个实施例,在显示所述第一图像帧与所述第三图像帧时,所述视频处理装置所处的位置相同。Optionally, as an embodiment, when the first image frame and the third image frame are displayed, the video processing device is located at the same position.
可选地,作为一个实施例,所述处理单元820还用于:Optionally, as an embodiment, the processing unit 820 is further configured to:
检测到指示运行相机应用程序的操作;或者,An action is detected indicating that the camera application is running; or,
检测到指示运行视频通话应用程序的操作。An action was detected indicating to run a video calling application.
可选地,作为一个实施例,所述处理单元820具体用于:Optionally, as an embodiment, the processing unit 820 is specifically configured to:
对所述第一检测框进行第一扩展处理,得到第二检测框;performing a first extension process on the first detection frame to obtain a second detection frame;
对所述第二检测框进行第二扩展处理,得到所述裁剪框;performing a second extension process on the second detection frame to obtain the cropping frame;
其中,所述第一扩展处理是指以所述第一检测框为中心对所述第一检测框的边界进行扩展,所述第二检测框用于指示所述目标对象的身体在所述第二图像帧中的位置信息,所述第二扩展处理是指以所述第二检测框为中心对所述第二检测框的边界进行扩展。Wherein, the first expansion process refers to expanding the boundary of the first detection frame with the first detection frame as the center, and the second detection frame is used to indicate that the body of the target object is within the first detection frame. For the location information in the second image frame, the second extension process refers to extending the boundary of the second detection frame with the second detection frame as the center.
可选地,作为一个实施例,所述处理单元820具体用于:Optionally, as an embodiment, the processing unit 820 is specifically configured to:
根据第一阈值对所述第一检测框进行所述第一扩展处理,得到所述第二检测框,所述第一阈值用于指示身体比例数据。The first expansion process is performed on the first detection frame according to a first threshold to obtain the second detection frame, and the first threshold is used to indicate body proportion data.
可选地,作为一个实施例,所述处理单元820具体用于:Optionally, as an embodiment, the processing unit 820 is specifically configured to:
确定所述第二检测框与所述裁剪框是否满足预设条件,所述预设条件是指所述第二检测框与所述裁剪框满足预设比例关系;determining whether the second detection frame and the cropping frame meet a preset condition, the preset condition means that the second detection frame and the cropping frame satisfy a preset proportional relationship;
在所述第二检测框与所述裁剪框满足所述预设条件时,根据所述裁剪框对所述第二图像帧进行裁剪处理,得到所述显示内容。When the second detection frame and the cropping frame meet the preset condition, the second image frame is cropped according to the cropping frame to obtain the display content.
可选地,作为一个实施例,所述第一检测框的坐标信息是指在所述第二图像帧为 第二分辨率时所述第一检测框对应的坐标信息,所述处理单元820具体用于:Optionally, as an embodiment, the coordinate information of the first detection frame refers to the coordinate information corresponding to the first detection frame when the second image frame is of the second resolution, and the processing unit 820 specifically Used for:
接收请求指令,所述请求指令用于请求第一分辨率;receiving a request instruction, where the request instruction is used to request a first resolution;
根据所述第一分辨率确定所述第二分辨率,所述第二分辨率大于所述第一分辨率;determining the second resolution based on the first resolution, the second resolution being greater than the first resolution;
可选地,作为一个实施例,所述处理单元820具体用于:Optionally, as an embodiment, the processing unit 820 is specifically configured to:
根据所述第一分辨率对所述显示内容进行缩放处理,得到处理后的显示内容;performing scaling processing on the display content according to the first resolution to obtain the processed display content;
所述显示单元810用于:The display unit 810 is used for:
根据所述处理后的显示内容显示所述第三图像帧。displaying the third image frame according to the processed display content.
可选地,作为一个实施例,所述目标对象为机主用户,所述处理单元820具体用于:Optionally, as an embodiment, the target object is the owner user, and the processing unit 820 is specifically configured to:
接收机主识别指令,所述机主识别指令用于指示识别所述机主用户;Receive an owner identification instruction, the owner identification instruction is used to instruct to identify the owner user;
根据所述第一检测框进行脸部识别,确定所述机主用户,所述机主用户为预先配置的用户。Face recognition is performed according to the first detection frame to determine the owner user, and the owner user is a pre-configured user.
可选地,作为一个实施例,所述第一检测框是指所述机主用户的人脸框。Optionally, as an embodiment, the first detection frame refers to a face frame of the owner user of the device.
可选地,作为一个实施例,所述目标对象包括至少一个用户。Optionally, as an embodiment, the target object includes at least one user.
可选地,作为一个实施例,所述目标对象包括第一用户与第二用户,所述第一检测框是指所述第一用户的人脸框与所述第二用户的人脸框的并集框。Optionally, as an embodiment, the target object includes a first user and a second user, and the first detection frame refers to the difference between the face frame of the first user and the face frame of the second user. Union box.
可选地,作为一个实施例,所述第一区域与所述第二区域重合。Optionally, as an embodiment, the first area coincides with the second area.
需要说明的是,上述视频处理装置800以功能单元的形式体现。这里的术语“单元”可以通过软件和/或硬件形式实现,对此不作具体限定。It should be noted that the video processing apparatus 800 is embodied in the form of functional units. The term "unit" here may be implemented in the form of software and/or hardware, which is not specifically limited.
例如,“单元”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。For example, a "unit" may be a software program, a hardware circuit or a combination of both to realize the above functions. The hardware circuitry may include application specific integrated circuits (ASICs), electronic circuits, processors (such as shared processors, dedicated processors, or group processors) for executing one or more software or firmware programs. etc.) and memory, incorporating logic, and/or other suitable components to support the described functionality.
因此,在本申请的实施例中描述的各示例的单元,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Therefore, the units of each example described in the embodiments of the present application can be realized by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
图20示出了本申请提供的一种电子设备的结构示意图。图20中的虚线表示该单元或该模块为可选的,电子设备900可用于实现上述方法实施例中描述的视频处理方法。FIG. 20 shows a schematic structural diagram of an electronic device provided by the present application. The dotted line in FIG. 20 indicates that this unit or this module is optional, and the electronic device 900 can be used to implement the video processing method described in the foregoing method embodiments.
电子设备900包括一个或多个处理器901,该一个或多个处理器901可支持电子设备900实现方法实施例中的方法。处理器901可以是通用处理器或者专用处理器。例如,处理器901可以是中央处理器(central processing unit,CPU)、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件,如分立门、晶体管逻辑器件或分立硬件组件。The electronic device 900 includes one or more processors 901, and the one or more processors 901 can support the electronic device 900 to implement the method in the method embodiment. The processor 901 may be a general purpose processor or a special purpose processor. For example, the processor 901 may be a central processing unit (central processing unit, CPU), a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices such as discrete gates, transistor logic devices, or discrete hardware components.
处理器901可以用于对电子设备900进行控制,执行软件程序,处理软件程序的数据。电子设备900还可以包括通信单元905,用以实现信号的输入(接收)和输出(发送)。The processor 901 can be used to control the electronic device 900, execute software programs, and process data of the software programs. The electronic device 900 may further include a communication unit 905, configured to implement input (reception) and output (send) of signals.
例如,电子设备900可以是芯片,通信单元905可以是该芯片的输入和/或输出电路,或者,通信单元905可以是该芯片的通信接口,该芯片可以作为终端设备或其它电子设备 的组成部分。For example, the electronic device 900 can be a chip, and the communication unit 905 can be an input and/or output circuit of the chip, or the communication unit 905 can be a communication interface of the chip, and the chip can be used as a component of a terminal device or other electronic devices .
又例如,电子设备900可以是终端设备,通信单元905可以是该终端设备的收发器,或者,通信单元905可以是该终端设备的收发电路。For another example, the electronic device 900 may be a terminal device, and the communication unit 905 may be a transceiver of the terminal device, or the communication unit 905 may be a transceiver circuit of the terminal device.
电子设备900中可以包括一个或多个存储器902,其上存有程序904,程序904可被处理器901运行,生成指令903,使得处理器901根据指令903执行上述方法实施例中描述的视频处理方法。The electronic device 900 may include one or more memories 902, on which a program 904 is stored, and the program 904 may be run by the processor 901 to generate an instruction 903, so that the processor 901 executes the video processing described in the above method embodiment according to the instruction 903 method.
可选地,存储器902中还可以存储有数据。可选地,处理器901还可以读取存储器902中存储的数据,该数据可以与程序904存储在相同的存储地址,该数据也可以与程序904存储在不同的存储地址。Optionally, data may also be stored in the memory 902 . Optionally, the processor 901 may also read data stored in the memory 902, the data may be stored in the same storage address as the program 904, and the data may also be stored in a different storage address from the program 904.
处理器901和存储器902可以单独设置,也可以集成在一起;例如,集成在终端设备的系统级芯片(system on chip,SOC)上。The processor 901 and the memory 902 may be set separately, or may be integrated together; for example, integrated on a system-on-chip (system on chip, SOC) of a terminal device.
示例性地,存储器902可以用于存储本申请实施例中提供的视频处理方法的相关程序904,处理器901可以用于在视频处理时调用存储器902中存储的视频处理方法的相关程序904,执行本申请实施例的视频处理方法;例如,显示第一图像帧,第一图像帧为目标对象在第一位置的图像帧;在目标对象移动至第二位置的情况下,获取第二图像帧;第二位置与第一位置为不同位置,第二图像帧是指目标对象移动至第二位置时电子设备采集的图像帧;根据第二图像帧进行人脸检测,得到第一检测框的坐标信息,第一检测框用于指示目标对象的脸部在第二图像帧中的位置信息;根据第一检测框得到裁剪框的坐标信息;根据裁剪框对第二图像帧进行裁剪处理,得到包括目标对象的显示内容;根据显示内容显示第三图像帧,第一图像帧中的第一区域与第三图像帧中的第二区域存在交集,第一区域是指第一图像帧中目标对象的所在区域,第二区域是指第三图像帧中所述目标对象的所在区域。Exemplarily, the memory 902 can be used to store the related program 904 of the video processing method provided in the embodiment of the present application, and the processor 901 can be used to call the related program 904 of the video processing method stored in the memory 902 during video processing, and execute The video processing method of the embodiment of the present application; for example, displaying a first image frame, where the first image frame is an image frame of a target object at a first position; when the target object moves to a second position, acquiring a second image frame; The second position is different from the first position, and the second image frame refers to the image frame collected by the electronic device when the target object moves to the second position; the face detection is performed according to the second image frame, and the coordinate information of the first detection frame is obtained , the first detection frame is used to indicate the position information of the face of the target object in the second image frame; the coordinate information of the cropping frame is obtained according to the first detection frame; the second image frame is cropped according to the cropping frame to obtain the The display content of the object; the third image frame is displayed according to the display content, the first area in the first image frame overlaps with the second area in the third image frame, and the first area refers to the location of the target object in the first image frame The second area refers to the area where the target object is located in the third image frame.
本申请还提供了一种计算机程序产品,该计算机程序产品被处理器901执行时实现本申请中任一方法实施例所述的视频处理方法。The present application also provides a computer program product, which implements the video processing method described in any method embodiment in the present application when the computer program product is executed by the processor 901 .
该计算机程序产品可以存储在存储器902中,例如是程序904,程序904经过预处理、编译、汇编和链接等处理过程最终被转换为能够被处理器901执行的可执行目标文件。The computer program product can be stored in the memory 902 , such as a program 904 , and the program 904 is finally converted into an executable object file executable by the processor 901 through processes such as preprocessing, compiling, assembling and linking.
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被计算机执行时实现本申请中任一方法实施例所述的视频处理方法。该计算机程序可以是高级语言程序,也可以是可执行目标程序。The present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the video processing method described in any method embodiment in the present application is implemented. The computer program may be a high-level language program or an executable object program.
可选地,该计算机可读存储介质例如是存储器902。存储器902可以是易失性存储器或非易失性存储器,或者,存储器902可以同时包括易失性存储器和非易失性存储器。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动 态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。Optionally, the computer-readable storage medium is, for example, the memory 902 . The memory 902 may be a volatile memory or a nonvolatile memory, or, the memory 902 may include both a volatile memory and a nonvolatile memory. Among them, the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available such as static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM ) and direct memory bus random access memory (direct rambus RAM, DR RAM).
本领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和设备的具体工作过程以及产生的技术效果,可以参考前述方法实施例中对应的过程和技术效果,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process and technical effects of the devices and equipment described above can refer to the corresponding processes and technical effects in the foregoing method embodiments, here No longer.
在本申请所提供的几个实施例中,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的方法实施例的一些特征可以忽略,或不执行。以上所描述的装置实施例仅仅是示意性的,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,多个单元或组件可以结合或者可以集成到另一个系统。另外,各单元之间的耦合或各个组件之间的耦合可以是直接耦合,也可以是间接耦合,上述耦合包括电的、机械的或其它形式的连接。In several embodiments provided in this application, the disclosed systems, devices and methods may be implemented in other ways. For example, some features of the method embodiments described above may be omitted, or not implemented. The device embodiments described above are only illustrative, and the division of units is only a logical function division. In actual implementation, there may be other division methods, and multiple units or components may be combined or integrated into another system. In addition, the coupling between the various units or the coupling between the various components may be direct coupling or indirect coupling, and the above coupling includes electrical, mechanical or other forms of connection.
应理解,在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请的实施例的实施过程构成任何限定。It should be understood that in various embodiments of the present application, the sequence numbers of the processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the embodiments of the present application. The implementation process constitutes any limitation.
另外,本文中术语“系统”和“网络”在本文中常被可互换使用。本文中的术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。Additionally, the terms "system" and "network" are often used herein interchangeably. The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and A and B exist alone. There are three cases of B. In addition, the character "/" in this article generally indicates that the contextual objects are an "or" relationship.
总之,以上所述仅为本申请技术方案的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。In a word, the above descriptions are only preferred embodiments of the technical solutions of the present application, and are not intended to limit the scope of protection of the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application shall be included within the protection scope of this application.

Claims (17)

  1. 一种视频处理方法,其特征在于,所述视频处理方法应用于电子设备,包括:A video processing method, characterized in that the video processing method is applied to electronic equipment, including:
    显示第一图像帧,所述第一图像帧为目标对象在第一位置的图像帧;displaying a first image frame, where the first image frame is an image frame of a target object at a first position;
    在所述目标对象移动至第二位置的情况下,获取第二图像帧,所述第二位置与所述第一位置为不同位置,所述第二图像帧是指所述目标对象移动至所述第二位置时所述电子设备采集的图像帧;When the target object moves to a second position, acquire a second image frame, the second position is a different position from the first position, and the second image frame means that the target object moves to the An image frame collected by the electronic device at the second position;
    根据所述第二图像帧进行人脸检测,得到第一检测框的坐标信息,所述第一检测框用于指示所述目标对象的脸部在所述第二图像帧中的位置信息;Perform face detection according to the second image frame to obtain coordinate information of a first detection frame, the first detection frame is used to indicate the position information of the face of the target object in the second image frame;
    根据所述第一检测框得到裁剪框的坐标信息;Obtaining the coordinate information of the cropping frame according to the first detection frame;
    根据所述裁剪框对所述第二图像帧进行裁剪处理,得到包括所述目标对象的显示内容;performing cropping processing on the second image frame according to the cropping frame to obtain display content including the target object;
    根据所述显示内容显示第三图像帧,所述第一图像帧中的第一区域与所述第三图像帧中的第二区域存在交集,所述第一区域是指所述第一图像帧中所述目标对象的所在区域,所述第二区域是指所述第三图像帧中所述目标对象的所在区域。Displaying a third image frame according to the display content, where the first area in the first image frame overlaps with the second area in the third image frame, and the first area refers to the first image frame The area where the target object is located in, the second area refers to the area where the target object is located in the third image frame.
  2. 如权利要求1所述的视频处理方法,其特征在于,在显示所述第一图像帧与所述第三图像帧时,所述电子设备所处的位置相同。The video processing method according to claim 1, wherein when displaying the first image frame and the third image frame, the electronic device is located at the same position.
  3. 如权利要求1或2所述的视频处理方法,其特征在于,还包括:The video processing method according to claim 1 or 2, further comprising:
    检测到指示运行相机应用程序的操作;或者,An action is detected indicating that the camera application is running; or,
    检测到指示运行视频通话应用程序的操作。An action was detected indicating to run a video calling application.
  4. 如权利要求1至3中任一项所述的视频处理方法,其特征在于,所述根据所述第一检测框得到裁剪框的坐标信息,包括:The video processing method according to any one of claims 1 to 3, wherein said obtaining the coordinate information of the cropping frame according to the first detection frame includes:
    对所述第一检测框进行第一扩展处理,得到第二检测框;performing a first extension process on the first detection frame to obtain a second detection frame;
    对所述第二检测框进行第二扩展处理,得到所述裁剪框;performing a second extension process on the second detection frame to obtain the cropping frame;
    其中,所述第一扩展处理是指以所述第一检测框为中心对所述第一检测框的边界进行扩展,所述第二检测框用于指示所述目标对象的身体在所述第二图像帧中的位置信息,所述第二扩展处理是指以所述第二检测框为中心对所述第二检测框的边界进行扩展。Wherein, the first expansion process refers to expanding the boundary of the first detection frame with the first detection frame as the center, and the second detection frame is used to indicate that the body of the target object is within the first detection frame. For the location information in the second image frame, the second extension process refers to extending the boundary of the second detection frame with the second detection frame as the center.
  5. 如权利要求4所述的视频处理方法,其特征在于,所述对所述第一检测框进行第一扩展处理,得到第二检测框,包括:The video processing method according to claim 4, wherein said performing first extension processing on said first detection frame to obtain a second detection frame comprises:
    根据第一阈值对所述第一检测框进行所述第一扩展处理,得到所述第二检测框,所述第一阈值用于指示身体比例数据。The first expansion process is performed on the first detection frame according to a first threshold to obtain the second detection frame, and the first threshold is used to indicate body proportion data.
  6. 如权利要求4或5所述的视频处理方法,其特征在于,所述根据所述裁剪框对所述第二图像帧进行裁剪处理,得到包括所述目标对象的显示内容,包括:The video processing method according to claim 4 or 5, characterized in that, performing cropping processing on the second image frame according to the cropping frame to obtain display content including the target object, comprising:
    确定所述第二检测框与所述裁剪框是否满足预设条件,所述预设条件是指所述第二检测框与所述裁剪框满足预设比例关系;determining whether the second detection frame and the cropping frame meet a preset condition, the preset condition means that the second detection frame and the cropping frame satisfy a preset proportional relationship;
    在所述第二检测框与所述裁剪框满足所述预设条件时,根据所述裁剪框对所述第二图像帧进行裁剪处理,得到所述显示内容。When the second detection frame and the cropping frame meet the preset condition, the second image frame is cropped according to the cropping frame to obtain the display content.
  7. 如权利要求1至6中任一项所述的视频处理方法,其特征在于,所述第一检测框的坐标信息是指在所述第二图像帧为第二分辨率时所述第一检测框对应的坐标信息, 还包括:The video processing method according to any one of claims 1 to 6, wherein the coordinate information of the first detection frame refers to the first detection frame when the second image frame is at the second resolution. The coordinate information corresponding to the frame also includes:
    接收请求指令,所述请求指令用于请求第一分辨率;receiving a request instruction, where the request instruction is used to request a first resolution;
    根据所述第一分辨率确定所述第二分辨率,所述第二分辨率大于所述第一分辨率。The second resolution is determined according to the first resolution, and the second resolution is greater than the first resolution.
  8. 如权利要求7所述的视频处理方法,其特征在于,所述根据所述显示内容显示第三图像帧,包括:The video processing method according to claim 7, wherein said displaying a third image frame according to said display content comprises:
    根据所述第一分辨率对所述显示内容进行缩放处理,得到处理后的显示内容;performing scaling processing on the display content according to the first resolution to obtain the processed display content;
    根据所述处理后的显示内容显示所述第三图像帧。displaying the third image frame according to the processed display content.
  9. 如权利要求1至8中任一项所述的视频处理方法,其特征在于,所述目标对象为机主用户,还包括:The video processing method according to any one of claims 1 to 8, wherein the target object is the owner user, further comprising:
    接收机主识别指令,所述机主识别指令用于指示识别所述机主用户;Receive an owner identification instruction, the owner identification instruction is used to instruct to identify the owner user;
    根据所述第一检测框进行脸部识别,确定所述机主用户,所述机主用户为预先配置的用户。Face recognition is performed according to the first detection frame to determine the owner user, and the owner user is a pre-configured user.
  10. 如权利要求9所述的视频处理方法,其特征在于,所述第一检测框是指所述机主用户的人脸框。The video processing method according to claim 9, wherein the first detection frame refers to a face frame of the owner user.
  11. 如权利要求1至8中任一项所述的视频处理方法,其特征在于,所述目标对象包括至少一个用户。The video processing method according to any one of claims 1 to 8, wherein the target object includes at least one user.
  12. 如权利要求11所述的视频处理方法,其特征在于,所述目标对象包括第一用户与第二用户,所述第一检测框是指所述第一用户的人脸框与所述第二用户的人脸框的并集框。The video processing method according to claim 11, wherein the target object includes a first user and a second user, and the first detection frame refers to the face frame of the first user and the second user. The union frame of the user's face frames.
  13. 如权利要求1至12中任一项所述的视频处理方法,其特征在于,所述第一区域与所述第二区域重合。The video processing method according to any one of claims 1 to 12, wherein the first area overlaps with the second area.
  14. 一种电子设备,其特征在于,所述电子设备包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于从所述存储器中调用并运行所述计算机程序,使得所述电子设备执行如权利要求1至13中任一项所述的视频处理方法。An electronic device, characterized in that the electronic device includes a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the electronic The device executes the video processing method according to any one of claims 1 to 13.
  15. 一种芯片,其特征在于,包括处理器,当所述处理器执行指令时,所述处理器执行如权利要求1至13中任一项所述的视频处理方法。A chip, characterized by comprising a processor, and when the processor executes instructions, the processor executes the video processing method according to any one of claims 1 to 13.
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储了计算机程序,当所述计算机程序被处理器执行时,使得处理器执行如权利要求1至13中任一项所述的视频处理方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the The video processing method described above.
  17. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码被处理器执行时,使得处理器执行如权利要求1至13中任一项所述的视频处理方法。A computer program product, characterized in that the computer program product includes computer program code, and when the computer program code is executed by a processor, the processor is made to execute the video recording according to any one of claims 1 to 13. Approach.
PCT/CN2022/091447 2021-08-31 2022-05-07 Video processing method, and electronic device WO2023029547A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111016638.0 2021-08-31
CN202111016638.0A CN115633255B (en) 2021-08-31 2021-08-31 Video processing method and electronic equipment

Publications (1)

Publication Number Publication Date
WO2023029547A1 true WO2023029547A1 (en) 2023-03-09

Family

ID=84903712

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091447 WO2023029547A1 (en) 2021-08-31 2022-05-07 Video processing method, and electronic device

Country Status (2)

Country Link
CN (1) CN115633255B (en)
WO (1) WO2023029547A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710697A (en) * 2023-08-09 2024-03-15 荣耀终端有限公司 Object detection method, electronic device, storage medium, and program product

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107872639A (en) * 2017-11-14 2018-04-03 维沃移动通信有限公司 Transmission method, device and the mobile terminal of communication video
US20180152666A1 (en) * 2016-11-29 2018-05-31 Facebook, Inc. Face detection for video calls
CN110334653A (en) * 2019-07-08 2019-10-15 聚好看科技股份有限公司 Image processing method, device and equipment in video communication
CN112446255A (en) * 2019-08-31 2021-03-05 华为技术有限公司 Video image processing method and device
CN112907617A (en) * 2021-01-29 2021-06-04 深圳壹秘科技有限公司 Video processing method and device
CN113014793A (en) * 2019-12-19 2021-06-22 华为技术有限公司 Video processing method and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229418B (en) * 2018-01-19 2021-04-02 北京市商汤科技开发有限公司 Human body key point detection method and apparatus, electronic device, storage medium, and program
CN111178343A (en) * 2020-04-13 2020-05-19 腾讯科技(深圳)有限公司 Multimedia resource detection method, device, equipment and medium based on artificial intelligence
CN112561840B (en) * 2020-12-02 2024-05-28 北京有竹居网络技术有限公司 Video clipping method and device, storage medium and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180152666A1 (en) * 2016-11-29 2018-05-31 Facebook, Inc. Face detection for video calls
CN107872639A (en) * 2017-11-14 2018-04-03 维沃移动通信有限公司 Transmission method, device and the mobile terminal of communication video
CN110334653A (en) * 2019-07-08 2019-10-15 聚好看科技股份有限公司 Image processing method, device and equipment in video communication
CN112446255A (en) * 2019-08-31 2021-03-05 华为技术有限公司 Video image processing method and device
CN113014793A (en) * 2019-12-19 2021-06-22 华为技术有限公司 Video processing method and electronic equipment
CN112907617A (en) * 2021-01-29 2021-06-04 深圳壹秘科技有限公司 Video processing method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710697A (en) * 2023-08-09 2024-03-15 荣耀终端有限公司 Object detection method, electronic device, storage medium, and program product

Also Published As

Publication number Publication date
CN115633255A (en) 2023-01-20
CN115633255B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
WO2020168970A1 (en) Screen display control method and electronic device
WO2020259452A1 (en) Full-screen display method for mobile terminal, and apparatus
WO2020168956A1 (en) Method for photographing the moon and electronic device
WO2021136050A1 (en) Image photographing method and related apparatus
WO2021147482A1 (en) Telephoto photographing method and electronic device
WO2021179773A1 (en) Image processing method and device
WO2021249053A1 (en) Image processing method and related apparatus
WO2021078001A1 (en) Image enhancement method and apparatus
WO2020093988A1 (en) Image processing method and electronic device
WO2022127787A1 (en) Image display method and electronic device
WO2020102978A1 (en) Image processing method and electronic device
CN112887583A (en) Shooting method and electronic equipment
WO2021238351A1 (en) Image correction method and electronic apparatus
US20230353862A1 (en) Image capture method, graphic user interface, and electronic device
WO2022143180A1 (en) Collaborative display method, terminal device, and computer readable storage medium
WO2022001258A1 (en) Multi-screen display method and apparatus, terminal device, and storage medium
CN111553846A (en) Super-resolution processing method and device
CN115272138B (en) Image processing method and related device
US20230351570A1 (en) Image processing method and apparatus
WO2022095744A1 (en) Vr display control method, electronic device, and computer readable storage medium
WO2022057384A1 (en) Photographing method and device
WO2023029547A1 (en) Video processing method, and electronic device
CN115150542B (en) Video anti-shake method and related equipment
US20240031675A1 (en) Image processing method and related device
WO2020233593A1 (en) Method for displaying foreground element, and electronic device

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE