WO2021238325A1 - 一种图像处理方法及装置 - Google Patents

一种图像处理方法及装置 Download PDF

Info

Publication number
WO2021238325A1
WO2021238325A1 PCT/CN2021/079103 CN2021079103W WO2021238325A1 WO 2021238325 A1 WO2021238325 A1 WO 2021238325A1 CN 2021079103 W CN2021079103 W CN 2021079103W WO 2021238325 A1 WO2021238325 A1 WO 2021238325A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
frame
target
target subject
current frame
Prior art date
Application number
PCT/CN2021/079103
Other languages
English (en)
French (fr)
Inventor
彭焕文
宋楠
李宏俏
刘苑文
曾毅华
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021238325A1 publication Critical patent/WO2021238325A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20224Image subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • This application relates to the field of multimedia processing technology, and in particular to an image processing method and device.
  • the existing solution is to process the image data of the already generated video frame, and add the motion path of the target object to generate the special effect video. For example, displaying the actual movement trajectory path of a football or a player in a football match video, that is, using image processing technology to visualize the movement path of the football or player in the later stage, for example, adding a curve or a straight line to represent the movement path to generate Special effects video.
  • this solution can only be used for post-processing and cannot generate special effects videos in real time.
  • the present application provides an image processing method and device, which solves the problem in the prior art that the motion track special effect video of the target shooting object cannot be generated in real time.
  • an image processing method includes: acquiring a current frame and N historical action frames, where both the current frame and the N historical action frames include a target subject, and the current frame and the N historical action frames There is overlap in the scene, the position of the target subject in the N historical action frames is different, and N is a positive integer greater than or equal to 1.
  • Image segmentation is performed on the N historical action frames to obtain N corresponding to the N historical action frames. Images of three target subjects; N reference positions are determined in the current frame according to the positions of the N target subjects in the scene of the N historical action frames and the scene of the current frame; the images of the N target subjects are respectively fused in the current frame At the N reference positions of the frame, the target frame is obtained.
  • the electronic device obtains a real-time video stream through the lens.
  • the real-time video stream is composed of a sequence of consecutive frames in time, and each video frame can be the current frame at the current moment. .
  • the key action frame may be referred to as a historical action frame relative to the current frame corresponding to the moment after the key action frame is determined.
  • the electronic device starts video shooting at time t0, the electronic device determines the real-time video frame corresponding to time t1 as the key action frame (historical action frame 1), and then the electronic device corresponds to time t2
  • the real-time video frame of is determined as the key action frame (historical action frame 2), then for the current frame corresponding to the current time t3, the acquired N historical action frames are the historical action frame 1 and the historical action frame 2.
  • the electronic device determines the at least one key action frame as the historical action frame in the real-time video frame stream, and segments the image of the at least one target subject corresponding to the at least one historical action frame.
  • the key action frame refers to the image corresponding to the specified action or obvious key action of the target subject in the video frame stream captured in real time by the electronic device.
  • the image of the target subject in each historical action frame is displayed in the current frame at the same time according to the position correspondence of the object in the multi-frame image.
  • the main application scenario of the technical solution is the segmentation of portraits and the fusion display of motion trajectories, so that special effect images or special effect videos of the motion trajectory of the target subject can be generated in real time, which enriches the user's shooting experience.
  • the method before the acquisition of the current frame and the N historical action frames, the method further includes: receiving a user's first selection instruction, where the first selection instruction is used to instruct to enter the automatic shooting mode or the manual shooting mode .
  • the electronic device determines the automatic shooting mode or the manual shooting mode by receiving a user's selection instruction. In this way, the electronic device can automatically detect or manually determine the historical action frames in the currently acquired video frame stream, and merge the multiple historical action frames into a special effect video effect showing the motion track, thereby increasing the user's shooting pleasure.
  • acquiring historical action frames specifically includes: performing motion detection on the real-time video stream to determine the target subject; detecting that the target subject includes The position of the scene in each video frame of the video frame; the video frame in which the scene position change of the target subject in the video frame included in the real-time video stream meets the preset threshold is determined as the historical action frame.
  • the electronic device can automatically detect the moving target subject from the real-time video frame stream according to the automatic shooting instruction indicated by the user, and determine the historical action that meets the preset conditions according to the image change of the moving target subject frame. Thereby automatically according to at least one determined historical action frame, the fusion display is updated to the current frame in real time, and the special effect video is synthesized, which enriches the user's shooting experience.
  • acquiring historical action frames specifically includes: receiving a user's second selection instruction for video frames included in the real-time video stream; determining the second The subject at the corresponding position of the selection instruction in the video frame is the target subject, and the video frame is determined to be a historical action frame.
  • the electronic device can also perform real-time fusion of multi-frame images based on the target subject in motion in the current video frame stream determined by the user and at least one historical action frame determined by the user through real-time interaction with the user. Display, update to the composite special effect video in the current frame, enrich the user's shooting experience.
  • image segmentation of historical action frames to obtain the image of the target subject corresponding to the historical action frame specifically includes: reducing the image area of the historical action frame corresponding to the target subject in the historical action frame according to the motion detection technology , The target image area in the historical action frame is obtained; the image of the target image area is processed by the deep learning algorithm to obtain the mask image of the target subject corresponding to the historical action frame.
  • the electronic device can perform image segmentation according to historical action frames to obtain the mask image of the target subject, and realize the tracking and recording of the motion of multiple frames of the target subject, so as to compare the current frame according to the mask image of at least one target subject.
  • reducing the image area of the image segmentation can improve the accuracy of image segmentation and simplify the complexity of the algorithm.
  • the method further includes: according to the depth information of the multiple subjects in the historical action frame, from the multiple subjects overlapped The mask image of the target subject is separated from the mask image.
  • the depth information of multiple subjects in the historical action frame and the mask image of multiple persons can be separated to obtain the image of the target subject. mask image.
  • techniques such as binocular visual depth, monocular depth estimation, structured light depth, or instance segmentation can also be used to achieve segmentation of a multi-person overlapping mask image.
  • the mask image of the target subject is segmented from the overlapped mask image of multiple people, and the accuracy of image processing is improved, so that the generated motion trajectory special effect video of the target subject is more realistic and natural.
  • the reference position is determined in the current frame according to the position of the target subject in the scene of the historical action frame and the scene of the current frame, which specifically includes: according to image registration technology or synchronous positioning and mapping SLAM Technology to obtain the correspondence between the position of at least one object in the historical action frame and the position in the current frame; according to the correspondence and the position of the target subject in the historical action frame, the reference position of the target subject is determined in the current frame.
  • the position mapping of multi-frame images is performed through image registration technology or synchronous positioning and mapping SLAM technology. According to the corresponding relationship between the image positions of different objects in the multi-frame images, each historical action frame is determined The corresponding reference position of the image of the target subject in the current frame can generate a special effect video with a real and natural motion trajectory, and improve the user’s shooting experience.
  • the images of the N target subjects are respectively fused to the N reference positions of the current frame, which specifically includes: in the N reference positions of the current frame, the images of the N target subjects are combined with the images of the N target subjects.
  • the pixel information of the image in the current frame is subjected to weighted fusion processing.
  • the image of the target subject and the background image in the current frame can also be edge-fused to update the target frame so that the displayed multiple target subjects are merged
  • the transition between the image and the background image is natural.
  • the method further includes: adding at least one grayscale image to the image of the target subject in the current frame to obtain The target frame, wherein, if the distance between the grayscale image and the image of the target subject in the current frame is closer, the grayscale value of the grayscale image is greater.
  • the shadow images can be displayed by grayscale images, and different grayscale values are used to reflect the motion trajectory, so as to It more intuitively shows the movement direction and trajectory of the target subject, increases the interest and intuitiveness of the special effects video, and further enhances the user's shooting experience.
  • an image processing device in a second aspect, includes: an acquisition module for acquiring a current frame and N historical action frames, wherein both the current frame and the N historical action frames include a target subject, the current frame and N historical action frames.
  • the scenes of historical action frames are overlapped, and the position of the target subject in the N historical action frames is different, and N is a positive integer greater than or equal to 1;
  • the image segmentation module is used to segment the N historical action frames to obtain N Images of N target subjects corresponding to each historical action frame;
  • the mapping module is used to determine N references in the current frame according to the positions of the N target subjects in the scene of the N historical action frames and the scene of the current frame.
  • Position The image fusion module is used to fuse the images of the N target subjects on the N reference positions of the current frame to obtain the target frame.
  • the device further includes: a receiving module for receiving a user's first selection instruction, where the first selection instruction is used for instructing to enter the automatic shooting mode or the manual shooting mode.
  • the acquisition module is specifically used to: perform motion detection on the real-time video stream to determine the target subject; The position of the scene in each video frame; it is determined that the video frame whose scene position change of the target subject in the video frame included in the real-time video stream satisfies the preset threshold is a historical action frame.
  • the receiving module is also used to receive the user's second selection instruction of the video frame included in the real-time video stream; the acquisition module is also specifically used to : Determine that the subject at the corresponding position of the second selection instruction in the video frame is the target subject, and determine that the video frame is a historical action frame.
  • the image segmentation module is specifically used to: reduce the image area corresponding to the target subject in the historical action frame according to the motion detection technology to obtain the target image area in the historical action frame; Process the image of, and get the mask image of the target subject corresponding to the historical action frame.
  • the image segmentation module is specifically used to: according to the depth information of multiple subjects in the historical action frame, overlap from multiple subjects
  • the mask image of the target subject is separated from the mask image.
  • the mapping module is specifically used to obtain the correspondence between the position of at least one object in the historical action frame and the position in the current frame according to the image registration technology or the synchronous positioning and mapping SLAM technology; According to the correspondence and the position of the target subject in the historical action frame, the reference position of the target subject is determined in the current frame.
  • the image fusion module is specifically configured to perform weighted fusion processing on the images of the N target subjects and the pixel information of the image in the current frame at the N reference positions of the current frame.
  • the image fusion module is also specifically used to: add at least one grayscale image to the image of the target subject in the current frame to obtain the target frame, where if the grayscale image and the target subject in the current frame The closer the distance between the images, the greater the gray value of the gray image.
  • an electronic device in a third aspect, includes: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the instructions to Realize any possible implementation manner as in the first aspect and the first aspect described above.
  • a computer-readable storage medium is provided.
  • the instructions in the computer storage medium are executed by the processor of the electronic device, the electronic device can execute any one of the above-mentioned first aspect and the first aspect.
  • a computer program product which when the computer program product runs on a computer, causes the computer to execute any one of the possible implementation manners in the first aspect and the first aspect described above.
  • any of the image processing apparatus, electronic equipment, computer readable storage medium, and computer program product provided above can be implemented by the corresponding method provided above, and therefore, the beneficial effects that can be achieved Reference may be made to the beneficial effects in the corresponding methods provided above, which will not be repeated here.
  • FIG. 1A is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the application.
  • FIG. 1B is a software system architecture diagram of an electronic device provided by an embodiment of this application.
  • FIG. 1C is a schematic flowchart of an image processing method provided by an embodiment of this application.
  • FIG. 2 is a schematic diagram of an interface for shooting a special effect video of an electronic device according to an embodiment of the application
  • FIG. 3 is a schematic diagram of an interface for shooting a special effect video of another electronic device according to an embodiment of the application
  • FIG. 4 is a schematic diagram of user interaction of a shooting preview interface provided by an embodiment of the application.
  • FIG. 5 is a schematic flowchart of another image processing method provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of an algorithm for determining a current frame as a key action frame according to an embodiment of the application
  • FIG. 7 is a schematic diagram of an image segmentation processing method provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of a complementary mask image provided by an embodiment of the application.
  • FIG. 9A is a schematic diagram of separating overlapping portraits according to an embodiment of the application.
  • FIG. 9B is another schematic diagram of separating overlapping portraits according to an embodiment of the application.
  • FIG. 10 is a schematic diagram of multi-frame image mapping provided by an embodiment of the application.
  • FIG. 11 is a schematic flowchart of another image processing method provided by an embodiment of the application.
  • FIG. 12 is a schematic flowchart of another image processing method provided by an embodiment of the application.
  • FIG. 13 is a schematic flowchart of another image processing method provided by an embodiment of this application.
  • FIG. 14 is a schematic structural diagram of an image processing device provided by an embodiment of the application.
  • FIG. 15 is a schematic structural diagram of an electronic device provided by an embodiment of this application.
  • first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the present embodiment, unless otherwise specified, “plurality” means two or more.
  • the embodiments of the present application provide an image processing method and device, which can be applied to a video shooting scene, and can generate a special effect video or a special effect image of the motion trajectory of a target shooting object in real time based on a video frame stream shot in real time.
  • the motion trajectory special effect can be used to record the key actions of the target subject in the timeline, or the location where it once appeared, and the recorded historical key actions of the target subject image are fused and displayed in the current frame , And merge with the background image, ground, etc. of the current frame.
  • the user can see the special effect video shooting effect in real time on the shooting preview screen, forming a unique user experience of interlacing time and space, and can also generate special effect video in real time. This solves the problem that the motion track special effect video cannot be generated in real time in the prior art, enriches the fun of video shooting, and improves the user's shooting and viewing experience.
  • the image processing method provided by the embodiments of this application can be applied to electronic devices with shooting capabilities and image processing capabilities.
  • the electronic devices can be mobile phones, tablet computers, desktops, laptops, handheld computers, notebook computers, in-vehicle devices, and super Mobile personal computers (ultra-mobile personal computers, UMPC), netbooks, as well as cellular phones, personal digital assistants (PDAs), augmented reality (AR) ⁇ virtual reality (VR) devices, etc.
  • PDAs personal digital assistants
  • AR augmented reality
  • VR virtual reality
  • the embodiments of the present disclosure do not impose special restrictions on the specific form of the electronic device.
  • FIG. 1A shows a schematic structural diagram of an electronic device 100.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
  • AP application processor
  • modem processor modem processor
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the electronic device 100.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
  • the processor 110 may include one or more interfaces.
  • the interface can include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices.
  • the MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100.
  • the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the electronic device 100.
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the interface connection relationship between the modules illustrated in the embodiment of the present application is merely a schematic description, and does not constitute a structural limitation of the electronic device 100.
  • the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
  • the wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • the mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100.
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, connected to the display 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, and the like.
  • the display screen 194 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the electronic device 100 may include one or N display screens 194, and N is a positive integer greater than one.
  • the electronic device 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
  • the ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include one or N cameras 193, and N is a positive integer greater than one.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
  • UFS universal flash storage
  • the aforementioned internal memory 121 may store computer program code for implementing the steps in the method embodiment of the present application.
  • the foregoing processor 110 may run the computer program code of the steps in the method embodiment of the present application stored in the memory 121.
  • the above-mentioned display screen 194 may be used to display the photographed object of the camera and the real-time video frames involved in the embodiment of the present application.
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiment of the present application takes an Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 by way of example.
  • FIG. 1B is a software structure block diagram of an electronic device 100 according to an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
  • the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
  • the application layer can include a series of application packages.
  • the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
  • the embodiments of the present application are mainly implemented by improving the camera application at the application layer, for example, by adding a plug-in to the camera to expand its function.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and so on.
  • the camera program of the application layer can be improved through the application framework layer, so that when the subject is photographed, the special effect image or the special effect video of the target object's motion track can be displayed on the display screen 194.
  • the special effect image or special effect video is synthesized by real-time calculation and processing in the background of the electronic device.
  • the window manager is used to manage window programs.
  • the window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc.
  • the content provider is used to store and retrieve data and make these data accessible to applications.
  • the data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, and so on.
  • the view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
  • the phone manager is used to provide the communication function of the electronic device 100. For example, the management of the call status (including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can disappear automatically after a short stay without user interaction.
  • the notification manager is used to notify download completion, message reminders, and so on.
  • the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
  • prompt text information in the status bar sound a prompt sound, electronic device vibration, flashing indicator light, etc.
  • Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in a virtual machine.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.
  • the 2D graphics engine is a graphics engine for 2D drawing.
  • the kernel layer is the layer between hardware and software, and can also be called the driver layer.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • the corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes the touch operation into the original input event (including touch coordinates, time stamp of the touch operation, etc.).
  • the original input events are stored in the kernel layer.
  • the application framework layer obtains the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer.
  • the camera 193 captures still images or videos.
  • the captured image or video can be temporarily stored in the content provider, and when the camera is performed or During the video shooting operation, the completed photo or video can be displayed through the view system.
  • the captured image or video before displaying the image, multiple frames of images need to be fused before being displayed on the preview interface frame by frame through the view system middle.
  • the method may include:
  • S01 The electronic device acquires the current frame and the historical action frame, and both the current frame and the historical action frame include the target subject.
  • the shooting scenes applied in the embodiments of this application are as follows.
  • the user needs to open the camera application of the electronic device to take video shots of the target subject.
  • the target subject is the subject of the electronic device, and there is relative motion relative to the shooting scene.
  • the target subject of, for example, the target subject can be a person, an animal, or a sports device, etc.
  • the movement can specifically refer to the movement, rotation, jumping, limb stretching, or designated movement of the target body position.
  • the camera of the electronic device follows its moving target subject to shoot in real time, so that through the technical method provided in this application, image processing can be performed according to the real-time video stream during the shooting process, and the special effect video of the movement trajectory can be generated in real time and can be previewed in real time.
  • the electronic device may obtain the current frame and N historical action frames according to the obtained real-time video stream, where N may be a positive integer greater than or equal to 1.
  • a real-time video stream refers to a stream of image frames acquired by a camera of an electronic device in real-time shooting, and may also be referred to as a video frame stream, which may include multiple historical action frames.
  • the frame currently displayed or currently processed by the electronic device may be referred to as the current frame.
  • the action frame refers to multiple images.
  • the current frame is recorded as a key action frame, which can be referred to as a key action frame.
  • Action frame The key action frames determined before the current frame can all be called historical action frames.
  • the target subject refers to one or more subjects photographed by a camera of an electronic device, a subject that has a motion state and is determined to be a motion target subject.
  • the method of determining the target subject may be automatically detected and determined by the electronic device, or manually determined by the user.
  • the method before the electronic device acquires the current frame and at least one historical action frame, the method further includes: receiving a first selection instruction from the user, where the first selection instruction may include an automatic shooting instruction or a manual shooting The instructions are respectively used to instruct the electronic device to enter the automatic shooting mode or the manual shooting mode.
  • the electronic device can automatically detect the target shooting object, and automatically detect the key action frame to generate a special effect video of the motion track. If the first selection instruction is used to instruct the electronic device to enter the manual shooting mode, the electronic device further receives the user's second selection instruction, that is, the user manually operates the electronic device to determine the target shooting object, and determine the specified shooting action of the target shooting object.
  • the instruction of the frame that is, the electronic device can receive at least one second selection instruction input by the user.
  • the user's first selection instruction may include an automatic shooting instruction, and the user may determine to automatically shoot a special effect video by operating an electronic device, that is, to turn on the automatic shooting mode.
  • the user can open the camera application of the mobile phone through a touch or click operation. As shown in FIG. 2, click the "special effect video shooting" icon to switch to the special effect video shooting interface.
  • the electronic device can pre-configure the default state of special effects video shooting as automatic shooting, or the user can manually select "automatic shooting” or “manual shooting", that is, the shooting of special effects video can be started and the target shooting image can be viewed in real time on the preview interface.
  • the top of the preview interface of the electronic device can display a "typical motion track special effect video” clip playback through the thumbnail, and the user can click to view it, so that the user is familiar with the shooting operation method of the special effect video in advance And shooting effects.
  • the electronic device can automatically detect the target subject and determine at least one key action frame according to the real-time shooting image, the moving object detection technology or the frame difference method and other technologies.
  • the specific methods for determining the target subject, determining at least one historical action frame, and determining the image of the target subject in the historical action frame will be described in detail below, and will not be described in detail here.
  • the user's first selection instruction may include a manual shooting instruction, and the user may determine to manually shoot a special effect video by operating the electronic device, that is, to turn on the manual shooting mode, and according to at least one second selection instruction input by the user, At least one target subject and at least one key action frame corresponding to the at least one second selection instruction are determined.
  • the electronic device may determine the corresponding target subject according to the corresponding position in the video frame according to the second selection instruction, and determine that the video frame is a key action frame.
  • the user can open the camera application of the mobile phone by touching or clicking. Click to select the "Manual shooting” option, you can start shooting special effects video and view the target shot image in real time on the preview interface.
  • the electronic device may display the prompt message "Please click to select the subject portrait" on the interface after receiving the user's click "manual shooting” operation. Instruct the user to input the second selection instruction.
  • the electronic device can continue to display prompt information on the interface, such as "Please click the favorite action", prompting the user to continue to input at least one through touch operation or click operation
  • the second selection instruction further determines multiple key action frames.
  • the user when the user is previewing the video frame stream, the user can determine the target subject as the target subject according to the prompt information or actively click on a certain portrait or object in the preview screen. During the subsequent continuous video frame flow, the user can also click on the preview screen to confirm multiple key action frames.
  • the electronic device can display prompt information on the interface, such as "optionally click to switch the main body".
  • the user initially determines portrait A as the target subject, and then clicks on portrait B in the shooting preview interface to select the target subject for subsequent generation of a special effect video of the target subject B.
  • the image of the target subject in the historical action frame refers to the image of the partial area where the target subject is displayed in the image, specifically refers to the historical action frame after a certain image segmentation or cutout processing, segmentation or cutout
  • the displayed image of the corresponding area of the target subject For example, as shown in FIG. 2, except for the background image and the still image in the shooting screen, it is detected and determined that the image of the target subject moving in the current frame is a portrait.
  • the image of the target subject in the key action frame can be distinguished by image segmentation technology.
  • any historical action frame has a part that overlaps with the shooting scene in the current frame.
  • the shooting scene can refer to the shooting objects surrounding the target subject in the video frame, such as trees, lawns, or buildings. .
  • Overlapping means that any historical action frame has the same part as the scene in the current frame.
  • the same tree in the historical action frame is also displayed in the same scene in the current frame.
  • the building in the historical action frame is also displayed in the same or different position in the shooting scene of the current frame.
  • the position of the target subject A is in front of the tree to the left.
  • the target subject A Moved to the front of the building.
  • the prerequisite for the implementation of the embodiments of the present application is that there is a part that overlaps the scene in the current frame in any determined historical action frame, and if the scene of a historical action frame does not have any overlapping scenes with the current frame Or an object, the electronic device cannot obtain the image mapping relationship based on the historical action frame and the current frame, and thus cannot perform multi-frame fusion display.
  • the electronic device After the electronic device receives the user's instruction to start shooting, the electronic device obtains the real-time video stream through the lens, and each video frame included in the real-time video stream can be considered as the current frame at the corresponding moment. Regardless of whether the electronic device obtains the key action frame automatically through the above, or determines the key action frame in manual mode according to the method indicated by the user, relative to the current frame corresponding to the moment after the key action frame is determined, the key action frame can be called It is a historical action frame.
  • the electronic device starts video shooting at time t0, and the electronic device determines the real-time video frame corresponding to time t1 as the key action frame (first action frame 01), and then , The electronic device determines the real-time video frame corresponding to time t2 as the key action frame (second action frame 02), then for the current frame corresponding to the current time t3, the acquired N historical action frames are the first action frame 01 and the second action frame 02.
  • S02 The electronic device performs image segmentation on the historical action frame to obtain an image of the target subject corresponding to the historical action frame.
  • the electronic device when the electronic device acquires a historical action frame, in order to obtain the image of the target subject in each historical action frame according to the historical action frame, the electronic device can segment the historical action frame one by one to determine the history
  • the target subject image in the action frame may specifically be a mask image. Therefore, the electronic device can record the N historical action frames included in the real-time video stream one by one, and the images of the N target subjects corresponding to the N historical action frames.
  • image segmentation is the technology and process of dividing the original image into a number of specific or unique areas, and extracting the target object of interest.
  • Image segmentation is a key step from image processing to image recognition and analysis.
  • the processing of image segmentation based on the portrait in the original image can also be referred to as a portrait segmentation technique, which can extract the portrait portion of the original image.
  • the mask image is to mark a specific target area in the image with different mask values. For example, mark the image area of the target subject with a different mask value from the background image, so as to combine the image area of the target subject with The other background image areas are separated.
  • the pixel mask value of the target subject image area may be set to 255, and the pixel mask value of the remaining area may be set to 0.
  • the image of the target subject in the historical action frame can be separated according to the mask image.
  • the target image area of each historical action frame can be processed through a deep learning algorithm to obtain a mask image of the target subject corresponding to each historical action frame, for example, through a neural network algorithm or a support vector machine algorithm, etc.
  • This application does not specifically limit the algorithm for image segmentation.
  • S03 The electronic device determines the reference position in the current frame according to the position of the target subject in the scene of the historical action frame and the scene of the current frame.
  • the electronic device may respectively map the reference positions of the N target subjects in the current frame based on the positions of the N target subjects in the scene of the N historical action frames in combination with the scene of the current frame.
  • the electronic device can obtain the image mapping relationship between each historical action frame and the current frame according to the position of the background image in each historical action frame and the position of the background image in the current frame, thereby according to the image of the target subject in the historical action frame
  • the relative position of the image of the target subject in the target frame can be obtained, and the image of the target subject is fused in the current frame according to the determined relative position.
  • the relative position is used to indicate that the image of the target subject in the target frame is located at the position of the image of the target subject in the historical action frame.
  • S04 The electronic device fuses the image of the target subject on the reference positions of the current frame respectively to obtain the target frame.
  • the electronic device After the electronic device determines at least one historical action frame, it can use the image fusion technology to draw the images of the multiple target subjects into the current frame to generate the target frame by fusing the images of the multiple target subjects obtained in S02 above.
  • the first action frame 01 and the second action frame 02 in the real-time video frame stream are determined, and each frame image displayed in real time after the first action frame 01 is merged with the first action frame 01
  • the image of the first target subject in the display Taking the second action frame 02 as an example, the fusion display is as shown in FIG. 5, that is, including the image (1) of the first target subject in the first action frame 01 and all the images in the second action frame 02.
  • the current frame after the Nth action frame ON is displayed as shown in Figure 5 after fusion, that is, it includes the image (1) of the first target subject in the first action frame 01 and the second target in the second action frame 02 Subject's image (2)...All the images in the Nth action frame ON are the image (N) of the Nth target subject in the Nth action frame ON in the figure.
  • N 5
  • N 5
  • the images (5) of the fifth target subject corresponding to the 5 action frame 05 are respectively fused and displayed at the corresponding reference positions.
  • the specific multi-frame image fusion process that is, the algorithm, will be described in detail below, and will not be repeated here.
  • the electronic device may save the generated special effect video in the gallery.
  • a specific logo can be displayed in the corner of the special effect video thumbnail.
  • the four words "motion track” can be superimposed on the play button of the special effect video to combine the special effect video file of the motion track with the ordinary video. Files are distinguished for easy viewing by users.
  • At least one key action frame is automatically detected or manually determined in the real-time video frame stream, and the image of at least one target subject in the at least one key action frame is displayed through multi-frame fusion and displayed at the same time.
  • a special effect image or video of the target subject's motion trajectory can be generated in real time.
  • the currently generated target image can be transmitted to the shooting preview screen and video generation stream of the mobile phone in real time, so that the user can preview the effect of the motion trajectory in real time online, or view the complete motion trajectory special effect video after the shooting is completed, enriching the user’s experience Shooting experience.
  • the electronic device in the above step S01, if the user's first selection instruction includes an automatic shooting instruction, that is, instructing the electronic device to enter the automatic shooting mode, the electronic device can automatically detect the moving target subject according to the algorithm, and At least one historical action frame (key action frame) is automatically detected.
  • an automatic shooting instruction that is, instructing the electronic device to enter the automatic shooting mode
  • the electronic device can determine the target subject of the video frame in the real-time video stream according to the motion detection technology.
  • the motion detection of the target subject can be determined by portrait recognition or other target recognition technology, which can automatically detect moving objects in real-time video frames, such as people, animals, sports equipment, vehicles, or footballs. Since the main application scenario of the present application is the special effect shooting of the movement trajectory of a person, in the embodiment, person recognition and detection are taken as an example for introduction.
  • the electronic device determines the target subject in the real-time video frame, and can obtain the mask image of the target subject by performing image segmentation on the image, such as portrait segmentation or instance segmentation. If the obtained mask image has only one portrait mask, then the portrait mask is determined as the target subject; if multiple mask images are obtained by segmentation, the electronic device can determine the largest mask area as the target subject; if the portrait mask is not obtained, then The electronic device can prompt the user that no portrait is detected by displaying a prompt message on the preview interface, and ask the user to move the camera closer to the person being photographed.
  • image segmentation such as portrait segmentation or instance segmentation.
  • the electronic device can detect the scene position of the target subject in each video frame included in the real-time video stream, and obtain the scene position change of the target subject among multiple frames.
  • the scene position change of the target subject may be the position change of the target subject relative to the shooting scene, or the change of the limb posture, limb angle, or limb position of the target subject.
  • the electronic device determines which frames are the key action frames one by one.
  • the electronic device can determine the key action frames in the real-time video frame by the frame difference method.
  • the frame difference method refers to obtaining information such as the scene position change between adjacent video frames by comparing the pixel positions in adjacent video frames. That is, the electronic device can determine the key action frame by detecting the video frame in which the position change of the scene of the target subject in the video frame included in the real-time video stream meets the preset threshold.
  • the electronic device can determine the first frame of image successfully segmented out of the target subject as the first key action frame. Or, to ensure the time delay of the image processing algorithm, the electronic device may determine the third frame or the fourth frame after the first frame of the target subject is successfully segmented as the first key action frame.
  • the second and subsequent key action frames can be determined by comparing with the previous key action frame.
  • the electronic device may determine that the image of the target subject in the real-time video frame satisfies the following two conditions as the key action frame:
  • Condition 1 The image location area of the target subject in the current frame does not overlap with the location area of the target subject in the previous key action frame mapped to the current frame.
  • Condition 2 The change in the image of the target subject in the current frame and the image of the target subject in the previous key action frame meets the preset threshold.
  • the electronic device can automatically change the image of the target subject in the current frame in the real-time video frame to meet the preset threshold through motion detection, and the image of the target subject in the current frame is different from the image of the target subject in the previous key action frame.
  • the overlapping video frames are determined as historical action frames.
  • the detection determines that the image change of the target subject in the current video frame meets the preset threshold, it is determined to be a key action frame (historical action frame). For example, when the detection determines that the image change of the target subject in the current video frame is greater than or equal to the preset threshold, it is determined that the current video frame is a key action frame; when the detection determines that the image change of the target subject in the current video frame is less than the preset threshold, It is determined that the current video frame is not a key action frame.
  • a key action frame historical action frame
  • the center of gravity coincidence algorithm can be used to determine whether the change of the target subject image in the current frame and the target subject image in the previous key action frame meets the preset threshold.
  • the specific algorithm is as follows:
  • the electronic device calculates the center of gravity coordinates of the target subject mask image of the previous key action frame and the center of gravity coordinates of the target subject mask image of the current frame, and after the two centers of gravity overlap, calculates the current frame target subject mask image and the previous key action The area of the non-overlapping area of the frame target subject mask image.
  • the preset threshold may be configured as a certain proportion of the combined area of the two target subject mask images, for example, 30%.
  • the setting of the preset threshold value can be pre-set by those skilled in the art according to the image detection accuracy, combined with the requirements of the special effect video and technical experience, which is not specifically limited in this application.
  • the formula for calculating the center of gravity coordinates is as follows (the coordinates of the center of gravity can be rounded):
  • the specific calculation method for the coincidence of the center of gravity can be as follows: If the center of gravity coordinates of the target subject in the current frame are added to the coordinate offset ( ⁇ x, ⁇ y) and the coordinates of the center of gravity of the target subject in the previous key action frame are equal, then the current frame target subject area After adding ( ⁇ x, ⁇ y) to the coordinates of all pixels, the new coordinate set of the target subject area of the current frame is obtained, and then the coordinate set of the target subject area in the previous key action frame and the coordinate set of the target subject area in the new current frame are determined The number of pixels with unequal coordinates. See the following formula for specific calculations.
  • the new coordinate set of the target subject area of the current frame is:
  • New coordinates (x', y') original coordinates (x, y) + ( ⁇ x, ⁇ y),
  • ( ⁇ x, ⁇ y) barycentric coordinates (x 0 , y 0 ) previous key action frame -barycentric coordinates (x 0 , y 0 ) current frame .
  • the area of the non-overlapping area is the ratio of the area of the union of the two target subject mask images.
  • the formula for calculating the proportion of non-overlapping areas is as follows:
  • the target body front region a key action frame ⁇ current region of the previous frame represents a key action frame target body a target body region and the region of the current frame is the target subject intersection, a key operation of the front region of the target body frame ⁇ target body region
  • the current frame represents the union of the area of the target subject in the previous key action frame and the area of the target subject in the current frame.
  • the above condition 1 is not satisfied, and the current frame 1 is not a key action frame.
  • the above condition 2 is not met, and the current frame 2 is not a key action frame.
  • the non-overlapping area ratio exceeds If the threshold is preset, the current frame 3 satisfies the above condition 1 and condition 2 at the same time, and the current frame 3 is determined to be the key action frame.
  • the electronic device can automatically detect the target moving object in the video in real time, and automatically detect and determine the key action frame, so as to generate the motion trajectory in real time according to the recorded key action frame of the target subject.
  • Special effects videos increase the fun and flexibility of video shooting and enhance the user's shooting experience.
  • the moving target subject before performing image segmentation on historical action frames, can be identified through motion detection technology, and then the image area of the corresponding target subject in the historical action frame is reduced, that is, only the historical action frame is intercepted Part of the image area of the moving subject of interest in the image segmentation algorithm. This reduces the image area for image segmentation processing, which can improve the accuracy of image segmentation and simplify the data processing complexity of the image segmentation algorithm.
  • the motion detection technology can be realized by the frame difference method, the background difference method, or the optical flow method.
  • the frame difference method uses two difference images of adjacent three frames, and then obtains the difference image of the adjacent frames through the two difference images, which can roughly detect the moving objects in the image.
  • the image area of interest may be reduced first through motion detection, such as the portrait area in FIG. 7. Then perform portrait segmentation according to the roughly obtained portrait area to obtain the mask image of the target subject.
  • the mask image of the target subject in the historical action frame can be separated according to the historical action frame, and the mask image of the target subject can be accurately separated, and the motion tracking and recording of the target subject can be realized, so as to be based on at least one target
  • the mask image of the subject performs multi-frame image fusion on the current frame to generate a special effect video of the motion trajectory to enhance the user's shooting experience.
  • the mask image of the segmented target subject may be incomplete or missing, as shown in FIG. 7.
  • the mask image of the target subject can be complemented with motion detection.
  • the specific process of completing the mask image of the target subject can be: after detecting the target subject moving in the key action frame, select an appropriate threshold to separate the image area of the target subject in the key action frame image; and then use this target
  • the image area of the subject repairs the segmented mask image of the target subject, so as to obtain a complete mask image of the target subject.
  • the mask image A of the target portrait is obtained according to the portrait segmentation, and the mask image A is complemented according to the target portrait in the adjacent frames to obtain the mask image B.
  • the subject captured by the real-time video frame may be more than one moving subject, and multiple target subjects may overlap with the image of the target subject.
  • the target subject is Portrait 1.
  • the portrait 1 and Portrait 2 partially overlap or block each other. Therefore, the electronic device needs to separate the mask image of the target subject from the mask images where multiple subjects overlap, and continuously and automatically track and record the same target subject.
  • the overlapping target shooting objects can be divided in the following manner.
  • the first way is to segment the mask image with multiple subjects overlapping according to the depth map.
  • the depth map corresponding to the two-dimensional image can be combined, and the electronic device obtains the mask image of the target subject according to the overlapping mask images of multiple subjects in the historical action frame and the depth information corresponding to the multiple subjects. That is, the electronic device can separate and obtain the mask image of the target subject from the overlapping mask images of the multiple subjects according to the depth information of the multiple subjects and the depth information of the target subject in the historical action frame.
  • the depth map is an image or image channel that contains information about the distance between the shooting point and the surface of the target shooting object.
  • the depth map is similar to a grayscale image, except that each pixel value of the depth map reflects the actual distance between the shooting point and the target shooting object.
  • the RGB image and the depth map are registered, so there is a one-to-one correspondence between the pixels of the RGB image and the pixels of the depth map.
  • the depth map can be obtained from a distance measuring camera based on Time of Flight (ToF), or the original two-dimensional image can be calculated through an artificial neural network algorithm to obtain the depth value corresponding to each pixel, and the original two can be restored.
  • the depth map of the three-dimensional image is not specifically limited in this application.
  • the electronic device needs to distinguish multiple overlapping portraits from the portrait of the target subject, and the pixels of the obtained depth map can be one-to-one corresponding to the pixels of the current key action frame, and statistics The average or median of the depth values of the pixels in the mask area of the target subject's portrait corresponding to the depth map.
  • the electronic device processes the depth map according to the average or median value of the target subject's portrait depth value, extracts the depth value range covered by the subject portrait in the depth map, and then takes the intersection of this depth value range and the corresponding portrait mask, thereby
  • the portrait mask of the target subject is separated from multiple overlapping portrait masks. Ensure that the separated portrait mask of the target subject is always a single portrait.
  • the instance refers to the object, and the object represents a specific instance of a type of shooting object.
  • Instance segmentation means that each pixel in the image is divided into corresponding categories, that is, based on the realization of pixel-level classification, different instances need to be distinguished on the basis of specific categories. For example, according to each pixel in the image, people and background objects are divided. To distinguish different people from multiple people, such as A, B, and C, is to perform instance division.
  • the electronic device can perform instance segmentation through a deep learning algorithm.
  • the mask values of different portraits are different, and the portrait mask area of the target subject can be directly separated.
  • the existing binocular visual depth, monocular depth estimation, structured light depth and other methods can also be used to separate multiple overlapping masks. No longer.
  • the electronic device can separate the target subject mask from multiple overlapping target subjects, thereby accurately tracking and recording the motion trajectory of the target subject in different frames, and generating a specific target subject's motion trajectory special effect video.
  • the electronic device determines the reference position in the current frame according to the position of the target subject in the scene of each historical action frame and the scene of the current frame, which may specifically include:
  • the electronic device can obtain the corresponding relationship between the position of at least one object in each historical action frame and the position in the current frame according to the image registration technology or the synchronous positioning and mapping technology; and then according to the obtained corresponding relationship and each history The corresponding relationship between the image position of each target subject in the action frame and the aforementioned determination is obtained, and the image position area corresponding to each target subject in the current frame, that is, the reference position is obtained. Therefore, the electronic device can draw the image of each target subject corresponding to each historical action frame to each corresponding reference position in the current frame to obtain the target frame.
  • the recorded historical action frame includes the first action frame 01
  • the target subject corresponding to the first action frame 01 is the first target subject. Then, for each subsequent frame of image, the image of the first target subject is drawn into the current frame 03 according to the mapping relationship between the position of at least one object in the first action frame 01 and the position of at least one object in the current frame.
  • the recorded historical action frame also includes the second action frame 02, and the target subject corresponding to the second action frame 02 is the second target subject, then every subsequent frame after the second action frame 02 is determined
  • the images are based on the mapping relationship between the position of at least one object in the first action frame 01 and the position of at least one object in the current frame 03, and the position of at least one object in the second action frame 02 and the position of at least one object in the current frame 03
  • the mapping relationship of the first target subject and the second target subject’s image are drawn into the current frame 03.
  • the rendering refers to the process of generating a two-dimensional image by a central processing unit (CPU) or a graphics processor (graphics processing unit, GPU) of an electronic device according to drawing instructions and pixel point information.
  • the target image can be displayed on the display screen of the electronic device through the display device.
  • the electronic device performs the above-mentioned fusion drawing processing on the determined key action frames one by one, and displays them in real time, so that the generated motion trajectory special effect video can be previewed online, and the final motion trajectory special effect video can be generated.
  • all historical action frames recorded in the process of real-time video frame streaming need to be mapped to the corresponding position of the current frame.
  • the specific mapping methods that can be used include image registration technology or synchronous positioning and mapping technology ( Simultaneous Localization And Mapping, SLAM). Therefore, the electronic device can draw the image of the target subject in each historical action frame into the current frame according to the image mapping relationship between at least one historical action frame and the current frame. Specifically, the target image can be generated through the following processing.
  • Step1 According to image registration technology or SLAM technology, obtain the corresponding relationship between the image position of at least one object in each historical action frame and the image position of at least one object in the current frame.
  • image registration is the process of matching, mapping or superimposing multiple images acquired at different times, different imaging devices or under different conditions (such as weather, brightness, camera position or angle, etc.), which can be widely used in data analysis , Computer vision and image processing.
  • the electronic device can obtain the correspondence between the position of the object in the first action frame and the position of the object in the current frame according to the position of at least one object in the first action frame and the position of the same object in the current frame. It can be called a mapping relationship. Then, the electronic device can obtain the reference position of the target subject in the current frame according to the position of the target subject in the first action frame in combination with the position correspondence. The position indicated by the dotted line in FIG. 10 may be the reference position.
  • the image registration technology When the image registration technology is used, it is necessary to extract the features in the historical action frame, for example, it can be the Semantic Kernels Binarized (SKB) feature. Then perform feature matching and calculate the homography matrix, and finally map the historical key action frame to the corresponding position in the current frame according to the obtained homography matrix.
  • SKB feature is a description operator of image features. Image registration technology can achieve mapping and matching between two-dimensional images.
  • SLAM technology is a technology that allows the device to gradually depict the three-dimensional location information of the surrounding environment while moving. Specifically, the device starts from an unknown location in an unknown environment, uses repeatedly observed map features (such as wall corners, pillars, etc.) to locate its own position and posture during the movement, and then builds a map incrementally based on its own location to achieve The purpose of synchronous positioning and map construction.
  • map features such as wall corners, pillars, etc.
  • the SLAM technology performs position mapping based on three-dimensional position information, the three-dimensional position information can be applied to three-dimensional motion between frames. Therefore, when the motion trajectory of the target subject captured by the electronic device involves three-dimensional motion, the SLAM technology can be used for mapping.
  • Step2 According to the image position and corresponding relationship of each target subject in each historical action frame, obtain the reference position of each target subject in the current frame.
  • the image of each target subject in each historical action frame is mapped to the corresponding image location area in the current frame.
  • Step3 Draw the image of each target subject in each historical action frame to the reference position of each target subject in the current frame.
  • the image of each target subject in the current frame is drawn to the corresponding reference position in the current frame, thereby obtaining a fusion image of multiple frames, and updating the display to the current frame .
  • the first target subject in the first action frame 01 is mapped to the corresponding reference position in the second action frame 02 and drawn into the second action frame 02; the first action frame The first target subject in 01 is mapped to the corresponding reference position in the current frame and drawn into the current frame.
  • the second target subject in the second action frame 02 is mapped to the corresponding reference position in the current frame, and Draw to the current frame and update the current frame.
  • the image registration technology or SLAM technology is used to perform the mapping between multiple frames of images, thereby completing the fusion display of the target subject images in the multiple frames of images, so that the motion trajectory of the target subject can be displayed more accurately and naturally.
  • each historical action after all historical action frames are mapped to the corresponding positions of the current frame using image registration technology or SLAM technology, combined with the mask image of the target subject in each historical action frame, each historical action After the mask image of the target subject in the frame is mapped to the corresponding position of the current frame, in order to make the display transition of the added target subject’s image and the background image of the current frame more natural, the method may also include: The image of the target subject in the historical action frame undergoes edge fusion processing to update the target image, so that the transition between the image of the target subject and the background image is natural.
  • the above-mentioned multi-frame image fusion processing is to merge and display the images that do not belong to the current frame (the image of the target subject in the historical action frame) into the current frame; therefore, it is necessary to further use the N reference images of the current frame.
  • the pixel information of the images of the N target subjects and the image in the current frame are respectively subjected to weighted fusion processing, so that the image of the target subject added by the fusion and the image before the current frame appear natural and the boundary transition is more realistic.
  • the weighted fusion technology used may be alpha fusion.
  • the specific processing process can be as follows: according to the edge mask value of the target subject image 255 and the edge mask value of the background image 0, the mask value is adjusted from the original vertical transition of 255-0 to a smooth transition of 255-0, for example, through Linear or non-linear function adjusts the mask value of the transition. Then use the adjusted mask value of the smooth transition as the weight to do a weighted superposition of the image of the target subject and the background image.
  • a Gaussian filtering method can also be used to process the edge area to weaken the boundary line.
  • Gaussian filtering is a nonlinear smoothing filtering method in which weights are selected according to the shape of the Gaussian function.
  • image fusion technologies such as Poisson Blending technology and Laplacian Blending technology can also be used in the foregoing embodiments, and this application does not limit the specific image fusion technology.
  • the images of multiple key action frames are fused and displayed, and after the target image is obtained, in order to more intuitively display the movement trajectory of the target subject in the current frame, the method may further include: The image of the subject is superimposed on at least one photo image. The still image is generated based on the image of the target subject several consecutive frames before the current frame.
  • At least one shadow image can be represented by a grayscale image, where the gray value of each shadow image may be the same or different.
  • At least one shadow image may be superimposed behind the second target subject image in the second action frame 02, and multiple shadow images may be superimposed behind the movement direction of the target subject in the current frame 03 Take pictures.
  • the still image may gradually become farther away from the target subject image in the current frame 03, and its intensity will gradually decrease to 0.
  • the movement direction and trajectory of the target subject can be more intuitively expressed, which increases the interest and intuitiveness of the special effect video, and further enhances the user’s experience.
  • Shooting experience by superimposing multiple shadow images behind the movement direction of the target subject in the current frame, the movement direction and trajectory of the target subject can be more intuitively expressed, which increases the interest and intuitiveness of the special effect video, and further enhances the user’s experience. Shooting experience.
  • the video frame stream is continuously updated, and the image output from the current frame is displayed to the electronic device
  • the video shooting preview screen As shown in FIG. 12, after the user starts to shoot the special effect video, at the same time, the shooting effect of the special effect video can be seen in real time in the video shooting preview screen of the electronic device.
  • the real-time generated video frames can also be output to the final video generation stream. After the user completes the shooting, the generated complete motion track special effect video can be watched.
  • FIG. 13 a detailed implementation process for generating a motion track special effect video provided by this embodiment of the present application.
  • the process mainly includes: 1. Interaction of the shooting preview interface, determining the target subject and key action frame; 2. Image segmentation to obtain the image of the target subject; 3. Mapping the key action frame to the current frame, and mapping the target subject in the key action frame The image is drawn to the current frame; 4. Online preview and real-time generation of video frame streams.
  • the processing procedures shown in FIG. 13 are not all processing procedures, nor are all necessary processing procedures. Those skilled in the art can adjust and set detailed processing procedures and sequences according to design requirements. At the same time, the above-mentioned technical solution of the present application is not only suitable for generating special effect videos of motion trajectories, but also can be used to quickly develop other similar special effect videos, such as multi-portrait special effect synthesis or growth special effects, etc. This application does not specifically limit this.
  • the device 1400 may include: an acquisition module 1401, an image segmentation module 1402, a mapping module 1403, and an image fusion module 1404.
  • the acquiring module 1401 is configured to acquire a current frame and N historical action frames, where the current frame and the N historical action frames both include a target subject, and the current frame and the N historical action frames The scenes are overlapped, the positions of the target subjects in the scenes of the N historical action frames are different, and N is a positive integer greater than or equal to 1.
  • the image segmentation module 1402 is configured to perform image segmentation on the N historical action frames to obtain images of N target subjects respectively corresponding to the N historical action frames.
  • the mapping module 1403 is configured to determine N reference positions in the current frame according to the positions of the N target subjects in the scene of the N historical action frames and the scene of the current frame, respectively.
  • the image fusion module 1404 is configured to fuse the images of the N target subjects on the N reference positions of the current frame, respectively, to obtain a target frame.
  • the device may further include: a receiving module, configured to receive a user's first selection instruction, where the first selection instruction is used to instruct to enter the automatic shooting mode or the manual shooting mode.
  • the acquisition module 1401 is specifically configured to: perform motion detection on the real-time video stream to determine the target subject; detect that the target subject is included in the real-time video stream The position of the scene in each video frame; it is determined that the video frame whose scene position change of the target subject in the video frame included in the real-time video stream satisfies a preset threshold is a historical action frame.
  • the receiving module is also used to receive the user's second selection instruction of the video frame included in the real-time video stream; the acquisition module 1401 specifically also uses Yu: Determine that the subject at the corresponding position of the second selection instruction in the video frame is the target subject, and determine that the video frame is a historical action frame.
  • the image segmentation module 1402 is specifically used to: reduce the image area corresponding to the target subject in the historical action frame according to the motion detection technology to obtain the target image area in the historical action frame; The image of the region is processed to obtain the mask image of the target subject corresponding to the historical action frame.
  • the image segmentation module 1402 is also specifically used to: according to the depth information of multiple subjects in the historical action frame, from multiple subjects The mask image of the target subject is separated from the overlapped mask image.
  • the mapping module 1403 is specifically used to obtain the correspondence between the position of at least one object in the historical action frame and the position in the current frame according to the image registration technology or the synchronous positioning and mapping SLAM technology ; According to the correspondence and the position of the target subject in the historical action frame, the reference position of the target subject is determined in the current frame.
  • the image fusion module 1404 is specifically configured to perform weighted fusion processing on the image of the N target subjects and the pixel information of the image in the current frame at the N reference positions of the current frame.
  • the image fusion module 1404 is also specifically used to: add at least one grayscale image to the image of the target subject in the current frame to obtain the target frame, where if the grayscale image and the target subject in the current frame The closer the distance between the images, the greater the gray value of the grayscale image.
  • the specific execution process and embodiments of the device 1400 can refer to the steps performed by the electronic device in the above method embodiments and related descriptions, and the technical problems solved and the technical effects brought about can also refer to the content described in the previous embodiments. , I will not repeat them here.
  • test device is presented in the form of dividing various functional modules in an integrated manner.
  • module herein may refer to a specific circuit, a processor and memory that executes one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above-mentioned functions.
  • the device can take the form shown in Figure 15 below.
  • FIG. 15 is a schematic structural diagram showing an electronic device 1500 according to an exemplary embodiment.
  • the electronic device 1500 can be used to generate a special effect video of the motion track of a shooting subject according to the foregoing embodiment.
  • the electronic device 1500 may include at least one processor 1501, a communication line 1502, and a memory 1503.
  • the processor 1501 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more programs for controlling the execution of the program of the present disclosure integrated circuit.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the communication line 1502 may include a path for transferring information between the above-mentioned components, such as a bus.
  • the memory 1503 can be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory can exist independently, and is connected to the processor through a communication line 1502.
  • the memory can also be integrated with the processor.
  • the memory provided by the embodiments of the present disclosure may generally be non-volatile.
  • the memory 1503 is used to store and execute the computer-executable instructions involved in the solution of the present disclosure, and the processor 1501 controls the execution.
  • the processor 1501 is configured to execute computer-executable instructions stored in the memory 1503, so as to implement the method provided in the embodiment of the present disclosure.
  • the computer-executable instructions in the embodiments of the present disclosure may also be referred to as application program codes, which are not specifically limited in the embodiments of the present disclosure.
  • the processor 1501 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 15.
  • the electronic device 1500 may include multiple processors, such as the processor 1501 and the processor 1507 in FIG. 15. Each of these processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
  • the electronic device 1500 may further include a communication interface 1504.
  • Communication interface 1504 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet interface, radio access network (RAN), wireless local area networks, WLAN) etc.
  • RAN radio access network
  • WLAN wireless local area networks
  • the electronic device 1500 may further include an output device 1505 and an input device 15015.
  • the output device 1505 communicates with the processor 1501 and can display information in a variety of ways.
  • the output device 1505 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector (projector) Wait.
  • the input device 1506 communicates with the processor 1501, and can receive user input in a variety of ways.
  • the input device 1506 may be a mouse, a keyboard, a touch screen device, a sensor device, or the like.
  • the electronic device 1500 may be a desktop computer, a portable computer, a network server, a personal digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or a similar structure in FIG. 15 equipment.
  • PDA personal digital assistant
  • the embodiment of the present disclosure does not limit the type of the electronic device 1500.
  • the processor 1501 in FIG. 15 may invoke the computer-executable instructions stored in the memory 1503 to make the electronic device 1500 execute the method in the foregoing method embodiment.
  • the function/implementation process of the acquisition module 1401, the image segmentation module 1402, the mapping module 1403, and the image fusion module 1404 in FIG. 14 may be implemented by the processor 1501 in FIG. 15 calling the computer execution instructions stored in the memory 1503 .
  • a storage medium including instructions is also provided, for example, a memory 1503 including instructions, and the foregoing instructions may be executed by the processor 1501 of the electronic device 1500 to complete the foregoing method.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • a software program it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.

Abstract

本申请提供一种图像处理方法及装置,涉及多媒体处理技术领域,用于解决现有技术中不能实时生成目标拍摄对象的运动轨迹特效视频的问题。该方法包括:获取当前帧和N个历史动作帧,其中,当前帧和N个历史动作帧均包括目标主体,且当前帧和N个历史动作帧的场景存在交叠,所述目标主体在N个历史动作帧中场景的位置不同;对N个历史动作帧进行图像分割,得到N个历史动作帧分别对应的N个目标主体的图像;根据N个目标主体分别在N个历史动作帧的场景中位置以及当前帧的场景,在当前帧中确定出N个参考位置;将N个目标主体的图像分别融合在当前帧的N个参考位置上,得到目标帧。

Description

一种图像处理方法及装置
本申请要求于2020年05月29日提交国家知识产权局、申请号为202010478673.3、申请名称为“一种图像处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及多媒体处理技术领域,尤其涉及一种图像处理方法及装置。
背景技术
目前越来越多的用户选择用手机等移动电子设备上的相机拍摄照片或者视频来记录生活,而相机一般拍摄的图像或者视频中,不能在同一视频帧中直观体现出物体或者人物的运动轨迹,人像和背景、人像和人像之间的互动体验不够丰富,缺乏趣味性。
现有的解决方案是对已经生成的视频帧的图像数据进行处理,在其中加入目标对象的运动路径,生成特效视频。例如,在足球比赛视频中展示足球或者球员的实际运动轨迹路径,也就是后期通过图像处理技术,对足球或者球员的运动路线进行可视化的体现,例如,添加曲线或者直线表示的运动路线,从而生成特效视频。但是这种方案只能后期处理,不能实时生成特效视频。
发明内容
本申请提供一种图像处理方法及装置,解决了现有技术中不能实时生成目标拍摄对象的运动轨迹特效视频的问题。
为达到上述目的,本申请采用如下技术方案:
第一方面,提供一种图像处理方法,该方法包括:获取当前帧和N个历史动作帧,其中,当前帧和N个历史动作帧均包括目标主体,且当前帧和N个历史动作帧的场景存在交叠,所述目标主体在N个历史动作帧中场景的位置不同,N为大于等于1的正整数;对N个历史动作帧进行图像分割,得到N个历史动作帧分别对应的N个目标主体的图像;根据N个目标主体分别在N个历史动作帧的场景中位置以及当前帧的场景,在当前帧中确定出N个参考位置;将N个目标主体的图像分别融合在当前帧的N个参考位置上,得到目标帧。
需要说明的是,当电子设备接收用户的开始拍摄指令后,电子设备通过镜头获取到实时视频流,实时视频流由时间上连续的帧序列构成,每一帧视频帧在当前时刻可以是当前帧。当电子设备通过下述的具体方法确定关键动作帧后,相对于确定关键动作帧之后时刻对应的当前帧,该关键动作帧可以称为历史动作帧。以实时拍摄的时间轴t为例,电子设备在t0时刻开始视频拍摄,电子设备将t1时刻对应的实时视频帧确定为关键动作帧(历史动作帧1),随后,电子设备又将t2时刻对应的实时视频帧确定为关键动作帧(历史动作帧2),则对于当前时刻t3对应的当前帧来说,获取的N个历史动作帧即为历史动作帧1和历史动作帧2。
上述技术方案中,电子设备通过在实时视频帧流中确定至少一个关键动作帧作为 历史动作帧,分割出至少一个历史动作帧中对应的至少一个目标主体的图像。其中,关键动作帧是指电子设备实时拍摄的视频帧流中,目标主体作出指定动作或者明显的关键动作时所对应的图像。再通过多帧融合显示的方法,根据多帧图像中物体的位置对应关系,将每个历史动作帧中的目标主体的图像同时显示在当前帧中。该技术方案主要的应用场景为对人像的分割和运动轨迹的融合显示,从而能够实时地生成拍摄的目标主体运动轨迹的特效图像或者特效视频,丰富用户的拍摄体验。
在一种可能的设计方式中,所述获取当前帧和N个历史动作帧之前,该方法还包括:接收用户的第一选择指令,第一选择指令用于指示进入自动拍摄模式或者手动拍摄模式。
上述可能的实现方式中,电子设备通过接收用户的选择指令确定自动拍摄模式或者手动拍摄模式。从而电子设备可以自动检测或者由用户手动确定出当前获取的视频帧流中的历史动作帧,根据多个历史动作帧融合出显示运动轨迹的特效视频效果,增加用户的拍摄乐趣。
在一种可能的设计方式中,若第一选择指令用于指示进入自动拍摄模式,则获取历史动作帧,具体包括:对实时视频流进行运动检测确定目标主体;检测目标主体在实时视频流包括的每个视频帧中场景的位置;确定目标主体在实时视频流包括的视频帧中场景的位置变化满足预设阈值的视频帧为历史动作帧。
上述可能的实现方式中,电子设备可以根据用户指示的自动拍摄指示,从实时视频帧流中自动检测出运动的目标主体,并根据运动的目标主体的图像变化确定符合预设条件的作为历史动作帧。从而自动根据确定的至少一个历史动作帧,实时进行融合显示更新到当前帧中,合成特效视频,丰富用户的拍摄体验。
在一种可能的设计方式中,若第一选择指令用于指示进入手动拍摄模式,则获取历史动作帧,具体包括:接收用户对实时视频流包括的视频帧的第二选择指令;确定第二选择指令在视频帧中对应位置的主体为目标主体,并确定该视频帧为历史动作帧。
上述可能的实现方式中,电子设备还可以通过与用户的实时交互,根据用户确定的当前视频帧流中的运动的目标主体,以及用户确定的至少一个历史动作帧,实时进行多帧图像的融合显示,更新到当前帧中合成特效视频,丰富用户的拍摄体验。
在一种可能的设计方式中,对历史动作帧进行图像分割,得到历史动作帧对应的目标主体的图像,具体包括:对历史动作帧根据运动检测技术缩小历史动作帧中对应目标主体的图像区域,得到历史动作帧中的目标图像区域;通过深度学习算法对目标图像区域的图像进行处理,得到历史动作帧对应的目标主体的掩码图像。
上述可能的实现方式中,电子设备可以根据历史动作帧进行图像分割得到目标主体的掩码mask图像,实现对多帧目标主体的运动跟踪与记录,从而根据至少一个目标主体的mask图像对当前帧进行多帧图像融合,生成运动轨迹的特效视频。另外,在进行图像分割之前,缩小图像分割的图像区域,可以提高图像分割的精度,并简化算法的复杂度。
在一种可能的设计方式中,若掩码图像中存在多个主体重叠的掩码图像,则该方法还包括:根据历史动作帧中所述多个主体的深度信息,从多个主体重叠的掩码图像中分离得到目标主体的掩码图像。
上述可能的实现方式中,当拍摄的目标主体的图像与其他主体图像存在重叠显示的问题时,可以根据历史动作帧中多个主体的深度信息与多人重叠的mask图像,分离得到目标主体的mask图像。除了上述的根据深度图像进行mask图像分割之外,还可以采用双目视觉深度、单目深度估计、结构光深度或者实例分割等技术实现对多人重叠的mask图像的分割。从多人重叠的mask图像中分割出目标主体的mask图像,提高图像处理的精度,使得生成的目标主体的运动轨迹特效视频更加真实、自然。
在一种可能的设计方式中,根据目标主体在历史动作帧的场景中位置以及当前帧的场景,在当前帧中确定出参考位置,具体包括:根据图像配准技术或者同步定位与建图SLAM技术,得到至少一个物体在历史动作帧中的位置与在当前帧中位置的对应关系;根据对应关系以及目标主体在历史动作帧中的位置,在当前帧中确定出目标主体的参考位置。
上述可能的实现方式中,通过图像配准技术或者同步定位与建图SLAM技术进行多帧图像的位置映射,根据多帧图像中不同物体的图像位置对应关系,从而确定出每一个历史动作帧中的目标主体的图像在当前帧中对应的参考位置,从而能够生成效果真实、自然的运动轨迹的特效视频,提升用户的拍摄体验。
在一种可能的设计方式中,将N个目标主体的图像分别融合在当前帧的N个参考位置上,具体包括:在当前帧的N个参考位置上,分别将N个目标主体的图像与当前帧中图像的像素信息进行加权融合处理。
上述可能的实现方式中,多个目标主体的图像进行融合显示后,还可以将目标主体的图像与当前帧中的背景图像等进行边缘融合处理,更新目标帧,使得融合显示的多个目标主体的图像与背景图像过渡自然。
在一种可能的设计方式中,将N个目标主体的图像分别融合在当前帧的N个参考位置上之后,该方法还包括:对当前帧中的目标主体的图像添加至少一个灰度图像得到目标帧,其中,若灰度图像与当前帧中的目标主体的图像之间的距离越近,则灰度图像的灰度值越大。
上述可能的实现方式中,通过在当前帧中目标主体的运动方向背后叠加多个留影图像,该留影图像可以通过灰度图像来显示,并且通过不同的灰度值来体现运动的轨迹,从而能够更加直观地表示出目标主体的运动方向和轨迹,增加特效视频的趣味性和直观性,进一步提升用户的拍摄体验。
第二方面,提供一种图像处理装置,该装置包括:获取模块,用于获取当前帧和N个历史动作帧,其中,当前帧和N个历史动作帧均包括目标主体,当前帧和N个历史动作帧的场景存在交叠,目标主体在N个历史动作帧中场景的位置不同,N为大于等于1的正整数;图像分割模块,用于对N个历史动作帧进行图像分割,得到N个历史动作帧分别对应的N个目标主体的图像;映射模块,用于根据N个目标主体分别在N个历史动作帧的场景中位置以及当前帧的场景,在当前帧中确定出N个参考位置;图像融合模块,用于将N个目标主体的图像分别融合在当前帧的N个参考位置上,得到目标帧。
在一种可能的设计方式中,该装置还包括:接收模块,用于接收用户的第一选择指令,第一选择指令用于指示进入自动拍摄模式或者手动拍摄模式。
在一种可能的设计方式中,若第一选择指令用于指示进入自动拍摄模式,则获取模块具体用于:对实时视频流进行运动检测确定目标主体;检测目标主体在实时视频流包括的每个视频帧中场景的位置;确定目标主体在实时视频流包括的视频帧中场景的位置变化满足预设阈值的视频帧为历史动作帧。
在一种可能的设计方式中,若第一选择指令用于指示进入手动拍摄模式,则接收模块还用于接收用户对实时视频流包括的视频帧的第二选择指令;获取模块具体还用于:确定第二选择指令在视频帧中对应位置的主体为目标主体,并确定视频帧为历史动作帧。
在一种可能的设计方式中,图像分割模块具体用于:根据运动检测技术缩小历史动作帧中对应目标主体的图像区域,得到历史动作帧中的目标图像区域;通过深度学习算法对目标图像区域的图像进行处理,得到历史动作帧对应的目标主体的掩码图像。
在一种可能的设计方式中,若掩码图像中存在多个主体重叠的掩码图像,则图像分割模块具体还用于:根据历史动作帧中多个主体的深度信息,从多个主体重叠的掩码图像中分离得到目标主体的掩码图像。
在一种可能的设计方式中,映射模块具体用于:根据图像配准技术或者同步定位与建图SLAM技术,得到至少一个物体在历史动作帧中的位置与在当前帧中位置的对应关系;根据对应关系以及目标主体在历史动作帧中的位置,在当前帧中确定出目标主体的参考位置。
在一种可能的设计方式中,图像融合模块具体用于:在当前帧的N个参考位置上,分别将N个目标主体的图像与当前帧中图像的像素信息进行加权融合处理。
在一种可能的设计方式中,图像融合模块具体还用于:对当前帧中的目标主体的图像添加至少一个灰度图像得到目标帧,其中,若灰度图像与当前帧中的目标主体的图像之间的距离越近,则灰度图像的灰度值越大。
第三方面,提供一种电子设备,其特征在于,该电子设备包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令,以实现如上述第一方面及第一方面中任一种可能的实施方式。
第四方面,提供一种计算机可读存储介质,当所述计算机存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如上述第一方面及第一方面中任一种可能的实施方式。
第五方面,提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如上述第一方面及第一方面中任一种可能的实施方式。
可以理解地,上述提供的任一种图像处理装置、电子设备、计算机可读存储介质和计算机程序产品,均可以通过上文所提供的对应的方法来实现,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
附图说明
图1A为本申请实施例提供的一种电子设备的硬件结构示意图;
图1B为本申请实施例提供的一种电子设备的软件系统架构图;
图1C为本申请实施例提供的一种图像处理方法的流程示意图;
图2为本申请实施例提供的一种电子设备特效视频拍摄的界面示意图;
图3为本申请实施例提供的另一种电子设备特效视频拍摄的界面示意图;
图4为本申请实施例提供的一种拍摄预览界面的用户交互示意图;
图5为本申请实施例提供的另一种图像处理方法的流程示意图;
图6为本申请实施例提供的一种确定当前帧为关键动作帧的算法示意图;
图7为本申请实施例提供的一种图像分割处理方法的示意图;
图8为本申请实施例提供的一种补全掩码图像的示意图;
图9A为本申请实施例提供的一种分离重叠人像的示意图;
图9B为本申请实施例提供的另一种分离重叠人像的示意图;
图10为本申请实施例提供的多帧图像映射的示意图;
图11为本申请实施例提供的另一种图像处理方法的流程示意图;
图12为本申请实施例提供的另一种图像处理方法的流程示意图;
图13为本申请实施例提供的另一种图像处理方法的流程示意图;
图14为本申请实施例提供的一种图像处理装置的结构示意图;
图15为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供一种图像处理方法和装置,可以应用于视频拍摄的场景中,能够基于实时拍摄的视频帧流,实时生成目标拍摄对象运动轨迹的特效视频或者特效图像。其中,运动轨迹特效可以用来记录目标拍摄对象在时间轴上曾经发生过的关键动作,或者曾经出现的所在位置,并将被记录的历史关键动作中的目标拍摄对象图像融合显示在当前帧中,并且与当前帧的背景图像、地面等融合在一起。用户在拍摄视频过程中即可在拍摄预览画面实时看到特效视频拍摄效果,形成交错时间和空间的独特用户体验,同时也可以实时生成特效视频。从而解决了现有技术中不能实时生成运动轨迹特效视频的问题,丰富了视频拍摄的趣味性,提升了用户的拍摄和观看体验。
本申请实施例提供的图像处理方法可以应用于具备拍摄能力和图像处理能力的电子设备,该电子设备可以为手机、平板电脑、桌面型、膝上型、手持计算机、笔记本电脑、车载设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实 (augmented reality,AR)\虚拟现实(virtual reality,VR)设备等,本公开实施例对该电子设备的具体形态不作特殊限制。
图1A示出了电子设备100的结构示意图。电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请的另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子 设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其 中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
在本申请实施例中,上述内部存储器121中可以存储有用于实现本申请方法实施例中步骤的计算机程序代码。上述处理器110可以运行存储器121中存储的本申请方法实施例中步骤的计算机程序代码。上述显示屏194可以用于显示相机的拍摄对象,以及本申请实施例中涉及的实时视频帧等。
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
图1B是本申请实施例的电子设备100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。如图1B所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。本申请实施例主要就是通过改进应用程序层的相机应用程序来实现的,例如通过对相机增加插件来扩展其功能。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图1B所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。在本申请实施例中,可以通过应用程序框架层对应用程序层的相机的程序进行改进,使得拍摄对象在拍摄时,可以在显示屏194中显示目标物体运动轨迹的特效图像或者特效视频,该特效图像或者特效视频是由电子设备后台通过实时的计算和处理合成的。
其中,窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的 消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层,也可以称为驱动层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面结合捕获拍照场景,示例性说明电子设备100软件以及硬件的工作流程。
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像头193捕获静态图像或视频。
在本申请实施例中,用户使用电子设备拍摄视频的过程中,当通过摄像头193捕获到静态图像或者视频时,可以将捕获到的图像或者视频暂时存储于内容提供器中,当执行拍照操作或者视频拍摄操作时,可通过视图系统显示拍摄完成的照片或者视频,对于本申请的实施例,在显示图像之前,还需要经过将多帧图像进行融合处理后再通过视图系统逐帧显示在预览界面中。
在本申请实施例涉及到的上述硬件和软件的基础上,下面将结合附图,对本申请的实施例进行详细介绍。如图1C所示,该方法可以包括:
S01:电子设备获取当前帧和历史动作帧,当前帧和历史动作帧均包括目标主体。
首先,需要说明的是,本申请实施例应用的拍摄场景如下,用户需要打开电子设备的相机应用进行对目标主体进行视频拍摄,目标主体即为电子设备的拍摄对象,是 相对拍摄场景存在相对运动的目标主体,例如,目标主体可以为人物、动物或者运动的装置等。运动具体可以是指目标主体位置的移动、旋转、跳跃、肢体伸展或者指定动作等。电子设备的相机实时跟随其运动的目标主体进行拍摄,从而能够通过本申请提供的技术方法,在拍摄过程中根据实时视频流进行图像处理,实时生成运功轨迹的特效视频并可以实时预览。
其中,电子设备可以根据获取的实时视频流获取当前帧和N个历史动作帧,其中,N可以为大于或者等于1的正整数。实时视频流是指电子设备的相机实时拍摄获取的图像帧流,也可称为视频帧流,可以包括多个历史动作帧。根据实时视频流的实时获取的性质,可以将电子设备当前显示或者当前处理的帧称为当前帧。
在实时视频流中包括多张图像,动作帧就是指多张图像中,判断目标主体做出类似起舞、跳跃、转身或者肢体伸展等关键动作时,即将当前帧记录为关键动作帧,可以简称为动作帧。当前帧之前所确定的关键动作帧都可以称为历史动作帧。
目标主体是指电子设备的相机拍摄的一个或多个拍摄对象中,存在运动状态且被确定为运动目标主体的拍摄对象。确定目标主体的方式可以是电子设备自动检测确定的,也可以是由用户手动确定的。
因此,在一种实施方式中,电子设备获取当前帧和至少一个历史动作帧之前,该方法还包括:接收用户的第一选择指令,其中,该第一选择指令可以包括自动拍摄指示或者手动拍摄指示,分别用于指示电子设备进入自动拍摄模式或者手动拍摄模式。
其中,若第一选择指令用于指示电子设备进入自动拍摄模式,则电子设备可以自动检测目标拍摄对象,并自动检测关键动作帧生成运动轨迹的特效视频。若第一选择指令用于指示电子设备进入手动拍摄模式,则电子设备通过进一步接收用户的第二选择指令,也就是用户手动操作电子设备,确定目标拍摄对象,并且确定目标拍摄对象的指定拍摄动作帧的指令,即电子设备可以接收用户输入的至少一个第二选择指令。接下来,将结合附图详细说明应用的场景。
在一种实施方式中,用户的第一选择指令可以包括自动拍摄指示,用户可以通过操作电子设备确定自动拍摄特效视频,即开启自动拍摄模式。
示例性的,以电子设备是手机为例,用户可以通过触摸或者点击操作打开手机的相机应用,如图2所示,可以点击“特效视频拍摄”图标,切换到特效视频的拍摄界面。电子设备可以预配置特效视频拍摄的默认状态为自动拍摄,或者也可以由用户手动选择“自动拍摄”或者“手动拍摄”,即可以开始特效视频的拍摄并可以在预览界面实时查看目标拍摄图像。
进一步的,点击“特效视频拍摄”图标之后,电子设备的预览界面上方可以通过缩略图显示一个“典型运动轨迹特效视频”片段播放,用户可以点击进行查看,以便用户预先熟悉特效视频的拍摄操作方法和拍摄的效果等。
自动拍摄模式下,电子设备可以根据实时拍摄图像,根据运动物体检测技术或者帧差法等技术,自动检测出目标主体,并且确定出至少一个关键动作帧。具体的确定目标主体、确定至少一个历史动作帧以及确定历史动作帧中的目标主体的图像的方法,将在下文详细介绍,此处不再详述。
在另一种实施方式中,用户的第一选择指令可以包括手动拍摄指示,用户可以通 过操作电子设备确定手动拍摄特效视频,即开启手动拍摄模式,并根据用户输入的至少一个第二选择指令,确定至少一个第二选择指令对应的至少一个目标主体和至少一个关键动作帧。具体的,电子设备可以根据第二选择指令在视频帧中对应位置,确定出对应的目标主体,并确定该视频帧为关键动作帧。
示例性的,以电子设备是手机为例,用户可以通过触摸或者点击操作打开手机的相机应用,如图3所示的,可以点击“特效视频拍摄”图标,切换到特效视频的拍摄界面,再点击选择“手动拍摄”选项,即可以开始特效视频的拍摄并可以在预览界面实时查看目标拍摄图像。
进一步的,为了方便提示用户操作电子设备以确定目标主体以及关键动作帧,电子设备可以在接收到用户点击“手动拍摄”的操作后,在界面上显示提示信息“请点击选择主体人像”,以指示用户输入第二选择指令。当用户点击或者触摸电子设备的显示区域,选择一个目标主体之后,电子设备可以持续在界面上显示提示信息,如“请点击喜爱的动作”,提示用户通过触摸操作或者点击操作,继续输入至少一个第二选择指令,进一步确定多个关键动作帧。
在手动拍摄模式下,用户在预览视频帧流的过程中,可以根据提示信息或者主动点击预览画面中的某个人像或者物体确定为目标主体。在随后的持续视频帧流过程中,用户也可以点击预览画面确定多个关键动作帧。
另外,当用户手动确定目标主体后,在后续的拍摄过程中,当拍摄界面中出现不止一个主体时,用户也可以自由切换为其他目标主体。此时,电子设备可以在界面上显示提示信息,如“可选择点击切换主体”。示例性的,如图4所示,用户初始确定人像A为目标主体,后续又点击拍摄预览界面中的人像B选择为目标主体,用于后续生成该目标主体B的特效视频。
其中,历史动作帧(关键动作帧)中的目标主体的图像是指图像中显示目标主体的部分区域的图像,具体是指对历史动作帧进行一定图像分割或者抠图处理后,分割得到或者抠出的显示目标主体对应区域的图像。例如,如图2中所示的,拍摄画面中除背景图像和静止不动的图像之外的,检测确定当前帧中运动的目标主体的图像为人像。具体可以通过图像分割技术将关键动作帧中的目标主体的图像区分出来。
需要说明的是,电子设备获取的当前帧和多个历史动作帧的场景是存在交叠的,目标主体在多个历史动作帧中场景的位置不同。也就是说任意一个历史动作帧中都存在与当前帧中的拍摄场景交叠的部分,其中,拍摄场景可以指目标主体在视频帧中周围存在的拍摄物体,例如,树木、草坪或者建筑物等。
交叠是指任意一个历史动作帧中都存在与当前帧中场景相同的部分,示例性的,如图4中所示的,历史动作帧中的同一颗树木也显示在当前帧拍摄场景中相同或者不同的位置,历史动作帧中的建筑物也显示在当前帧拍摄场景中相同或者不同的位置,在历史动作帧目标主体A的位置在树木的左前方,在当前帧中,该目标主体A的位置移动到了建筑物的正前方。因此,本申请的实施例可以实现的前提是,确定的任意一个历史动作帧中都存在与当前帧中场景交叠的部分,如果一个历史动作帧的场景与当前帧没有任何存在交叠的场景或者物体,则电子设备无法根据历史动作帧与当前帧得到图像映射关系,从而不能进行多帧融合显示。
综上所述,当电子设备接收用户的开始拍摄指令后,电子设备通过镜头获取到实时视频流,该实时视频流中包括的每一帧视频帧在对应的时刻可以认为是当前帧。无论电子设备是通过上述自动获取关键动作帧,或者在手动模式下根据用户指示获取的方法确定关键动作帧之后,相对于确定关键动作帧之后的时刻所对应的当前帧,该关键动作帧可以称为历史动作帧。结合图5所示的,以实时拍摄的时间轴t为例,电子设备在t0时刻开始视频拍摄,电子设备将t1时刻对应的实时视频帧确定为关键动作帧(第一动作帧01),随后,电子设备又将t2时刻对应的实时视频帧确定为关键动作帧(第二动作帧02),则对于当前时刻t3对应的当前帧来说,获取的N个历史动作帧即为第一动作帧01和第二动作帧02。
S02:电子设备对历史动作帧进行图像分割,得到历史动作帧对应的目标主体的图像。
在拍摄过程中,当电子设备每获取到一个历史动作帧时,为了能够根据历史动作帧得到每个历史动作帧中的目标主体的图像,电子设备可以逐个对历史动作帧进行图像分割,确定历史动作帧中的目标主体图像,具体可以为掩码图像。从而电子设备可以逐个记录实时视频流中包括的N个历史动作帧,以及N个历史动作帧对应的N个目标主体的图像。
其中,图像分割就是把原始图像分成若干个特定的或者具有独特性质的区域,并提取出感兴趣的目标对象的技术和过程。图像分割是由图像处理到图像识别和分析的关键步骤。具体的,基于原始图像中的人像进行图像分割的处理也可以称为人像分割技术,可以把原始图像中的人像部分提取出来。
掩码图像就是通过不同的掩码(mask)值,来标记图像中的特定目标区域,例如,用与背景图像不同的mask值标记目标主体的图像区域,以此来将目标主体的图像区域和其他的背景图像区域进行分离。示例性的,常见的掩码图像中,可以将目标主体图像区域的像素点mask值设置为255,其余区域的像素点mask值设置为0。从而可以根据掩码图像将历史动作帧中的目标主体的图像分离出来。
示例性的,可以通过深度学习算法对每个历史动作帧的目标图像区域进行处理,得到每个历史动作帧对应的目标主体的掩码图像,例如,通过神经网络算法或者支持向量机算法等,本申请对实现图像分割的算法不作具体限定。
S03:电子设备根据目标主体在历史动作帧的场景中位置以及当前帧的场景,在当前帧中确定出参考位置。
电子设备可以逐个根据N个目标主体在N个历史动作帧的场景中位置,结合当前帧的场景,分别映射出所述N个目标主体在当前帧中的参考位置。
具体的,电子设备可以根据每个历史动作帧中背景图像的位置与当前帧中背景图像的位置,得到每个历史动作帧与当前帧的图像映射关系,从而根据历史动作帧中目标主体的图像位置结合上述映射关系,可以得到目标主体的图像在目标帧中的相对位置,根据确定的相对位置将目标主体的图像在当前帧中进行融合处理。其中,该相对位置用于表示目标帧中目标主体的图像位于历史动作帧中该目标主体的图像的位置。
S04:电子设备将目标主体的图像分别融合在当前帧的参考位置上,得到目标帧。
电子设备确定至少一个历史动作帧之后,可以将上述S02得到的多个目标主体的 图像,通过图像融合技术将多个目标主体的图像绘制到当前帧中,融合生成目标帧。
示例性的,如图5所示的,确定实时视频帧流中的第一动作帧01以及第二动作帧02,第一动作帧01之后实时显示的每一帧图像都融合第一动作帧01中的第一目标主体的图像进行显示。以第二动作帧02为例,经过融合显示为如图5中所示的,即包括第一动作帧01中的第一目标主体的图像(1)和第二动作帧02中的全部图像。而确定第N动作帧0N之后的当前帧经过融合显示为如图5中的,即包括第一动作帧01中的第一目标主体的图像(1)、第二动作帧02中的第二目标主体的图像(2)……第N动作帧0N中的全部图像,如图中第N动作帧0N的第N目标主体的图像(N)。当N为5时,即表示在当前帧中将第一动作帧01对应的第一目标主体的图像(1)、第二动作帧02对应的第二目标主体的图像(2)……和第5动作帧05对应的第5目标主体的图像(5)分别在对应的参考位置进行融合显示。具体的多帧图像融合过程即算法将在下文中详细介绍,此处不再赘述。
进一步的,在特效视频拍摄结束后,电子设备可以将生成的特效视频保存到图库中。为了区别于普通视频,可在特效视频的缩略图一角显示特定的标志,例如,特效视频的播放按钮上面叠加“运动轨迹”四个字,以此来将运动轨迹的特效视频文件和普通的视频文件进行区分,方便用户查看。
上述本申请的实施方式,通过在实时视频帧流中自动检测或者手动确定至少一个关键动作帧,将至少一个关键动作帧中的至少一个目标主体的图像通过多帧融合显示的方法,同时显示在当前帧中,从而能够实时地生成目标主体运动轨迹的特效图像或视频。同时可以实时将当前生成的目标图像传送到手机的拍摄预览画面和视频生成流,使得用户既可以在线实时预览运动轨迹的效果,也可以在拍摄完成后查看完整的运动轨迹特效视频,丰富用户的拍摄体验。
在一种实施方式中,上述的步骤S01中,若用户的第一选择指令包括自动拍摄指令,也就是指示电子设备进入自动拍摄模式下,电子设备能够根据算法自动检测出运动的目标主体,以及自动检测出至少一个历史动作帧(关键动作帧)。
首先,电子设备可以根据运动检测技术对实时视频流中的视频帧确定出目标主体。目标主体的运动检测可以通过人像识别或者其他目标识别技术确定,能够自动检测出实时视频帧中的运动物体,例如,人、动物、运动装置、车辆或者足球等。由于本申请的主要应用场景为人物的运动轨迹特效拍摄,因此实施例中以人像识别和检测作为示例进行介绍。
具体的,电子设备确定实时视频帧中的目标主体,可以通过对图像进行图像分割,例如人像分割或者实例分割,得到目标主体的掩码图像。如果得到的掩码图像只有一个人像mask,那确定该人像mask为目标主体;如果分割得到多个掩码图像,则电子设备可以将mask面积最大的确定为目标主体;如果没有得到人像mask,则电子设备可以通过在预览界面显示提示信息,提示用户没有检测到人像,请用户移动摄像头靠近被拍摄者。
接着,电子设备可以检测该目标主体在实时视频流包括的每个视频帧中场景的位置,得到多帧之间目标主体的场景位置变化。目标主体的场景位置变化可以为目标主体相对于拍摄场景的位置变化,或者目标主体的肢体姿势、肢体角度或肢体位置变化 等。
电子设备确定目标主体之后,在持续拍摄过程中,逐个确定哪些帧是关键动作帧。电子设备可以通过帧差法来确定实时视频帧中的关键动作帧,帧差法即是指通过对比相邻的视频帧中像素点位置得到相邻视频帧之间的场景位置变化等信息。也就是电子设备可以通过检测出目标主体在实时视频流包括的视频帧中场景的位置变化满足预设阈值的视频帧,确定为关键动作帧。
其中,由于第一个关键动作帧之前没有参考帧,因此,电子设备可以将成功分割出目标主体的第一帧图像确定为第一个关键动作帧。或者,为保证图像处理算法的时延,电子设备可以将成功分割出目标主体的第一帧图像之后的第三帧或者第四帧确定为第一个关键动作帧。
第二个及后续的关键动作帧,都可以与前一个关键动作帧做比较进行确定。具体的,电子设备可以通过确定实时视频帧中的目标主体的图像同时满足以下两个条件的为关键动作帧:
条件一:当前帧中目标主体的图像位置区域与前一个关键动作帧中目标主体的图像映射到当前帧中的位置区域没有重合。
条件二:当前帧中目标主体的图像与前一个关键动作帧中目标主体的图像变化满足预设阈值。
也就是说,电子设备通过运动检测可以自动将实时视频帧中当前帧的目标主体的图像变化满足预设阈值,并且该当前帧中目标主体的图像与前一个关键动作帧的目标主体的图像没有重合的视频帧确定为历史动作帧。
当检测确定当前视频帧中的目标主体的图像变化满足预设阈值,则确定为关键动作帧(历史动作帧)。例如,当检测确定当前视频帧中的目标主体的图像变化大于或者等于预设阈值,则确定当前视频帧为关键动作帧;当检测确定当前视频帧中的目标主体的图像变化小于预设阈值,则确定当前视频帧不是关键动作帧。
示例性的,可以通过重心重合算法,确定当前帧中目标主体图像与前一个关键动作帧中目标主体图像的变化是否满足预设阈值。具体算法如下:
电子设备通过计算前一个关键动作帧目标主体掩码图像的重心坐标,以及当前帧目标主体掩码图像的重心坐标,将两者重心重合后,计算当前帧目标主体掩码图像与前一个关键动作帧目标主体掩码图像的非重叠区域面积。当非重叠区域面积超出预设阈值则将该当前帧确定为关键动作帧,否则确定当前帧不是关键动作帧。其中,预设阈值可以配置为两个目标主体掩码图像取并集后面积的一定比例,例如30%。
需要说明的是,预设阈值的设置可以由本领域技术人员根据图像检测精度,结合特效视频的需求和技术经验进行预先设定,本申请对此不做具体限定。
其中,计算重心坐标的公式如下(重心坐标可以取整):
重心坐标
Figure PCTCN2021079103-appb-000001
重心重合的具体计算方法可以为:如将当前帧目标主体的重心坐标加上坐标偏移(Δx,Δy)后与前一关键动作帧目标主体的重心坐标相等,则将当前帧目标主体区域内所有像素点的坐标加上(Δx,Δy)后,得到新的当前帧目标主体区域的坐标集,然 后判断前一个关键动作帧中目标主体区域坐标集与新的当前帧中目标主体区域坐标集中坐标不相等的像素点数量。具体计算参见如下公式。
新的当前帧目标主体区域的坐标集为:
新坐标(x′,y′)=原坐标(x,y)+(Δx,Δy),
其中,(Δx,Δy)=重心坐标(x 0,y 0) 前一个关键动作帧-重心坐标(x 0,y 0) 当前帧
重心重合后,计算当前帧目标主体掩码图像与前一个关键动作帧目标主体掩码图像的非重叠区域比例,也就是计算当前帧目标主体掩码图像与前一个关键动作帧目标主体掩码图像的非重叠区域面积,相对两个目标主体掩码图像取并集面积的占比。非重叠区域比例计算公式如下:
Figure PCTCN2021079103-appb-000002
其中,目标主体区域 前一个关键动作帧∩目标主体区域 当前帧表示前一个关键动作帧中目标主体的区域与当前帧中目标主体的区域的交集,目标主体区域 前一个关键动作帧∪目标主体区域 当前帧表示前一个关键动作帧中目标主体的区域与当前帧中目标主体的区域的并集。
结合图6所示,当前一个关键动作帧中的目标主体区域与当前帧1中的目标主体区域重叠,则不满足上述的条件一,当前帧1不是关键动作帧。当前一个关键动作帧中的目标主体重心与当前帧2中的目标主体重心重合后,非重叠区域比例不满足预设阈值,则不满足上述的条件二,当前帧2不是关键动作帧。当前一个关键动作帧中的目标主体区域与当前帧3中的目标主体区域不重叠,且前一个关键动作帧中的目标主体重心与当前帧3中的目标主体重心重合后,非重叠区域比例超过预设阈值,则当前帧3同时满足上述的条件一和条件二,确定当前帧3为关键动作帧。
上述的实施方式中,通过上述算法,电子设备可以实时、自动地检测到视频中的目标运动物体,并自动检测确定关键动作帧,从而能根据记录的关键动作帧中目标主体实时生成运动轨迹的特效视频,增加视频拍摄的趣味性和灵活性,提升用户的拍摄体验。
在一种实施方式中,对历史动作帧进行图像分割之前,可以先通过运动检测技术识别出运动的目标主体,然后缩小历史动作帧中对应的目标主体的图像区域,也就是只截取历史动作帧中感兴趣的运动主体的部分图像区域来进行图像分割算法的处理。由此缩小进行图像分割处理的图像区域,可以提高图像分割的精度,简化图像分割算法的数据处理复杂度。
其中,运动检测技术可以通过帧差法、背景差法或者光流法等实现。比如帧差法,是通过对相邻的三帧图像两两作差,再通过两个差值图像得到相邻帧的差分图像,就可以大致将图像中的运动物体检测出来。
示例性的,如图7所示,可以先通过运动检测缩小感兴趣的图像区域,例如图7中的人像区域。再根据大致得到的人像区域进行人像分割,得到目标主体的mask图像。
通过上述的实施方式,可以根据历史动作帧进行分离得到历史动作帧中的目标主体的mask图像,能够准确分离出目标主体的mask图像,实现对目标主体的运动跟踪 与记录,从而根据至少一个目标主体的mask图像对当前帧进行多帧图像融合,生成运动轨迹的特效视频,提升用户的拍摄体验。
在上述的实施方式中,对关键动作帧进行图像分割的处理过程中,可能造成分割出来的目标主体的mask图像不完整或者有缺失,如图7所示。为了得到完整的目标主体的mask图像,可以结合运动检测补全目标主体的mask图像。
补全目标主体mask图像的具体处理过程可以为:在检测出关键动作帧中运动的目标主体后,通过选择合适的阈值将关键动作帧图像中的目标主体的图像区域分离出来;再利用此目标主体的图像区域对分割的目标主体的mask图像进行修复,从而得到完整的目标主体的mask图像。示例性的,如图8所示的,根据人像分割得到目标人像的mask图像A,根据相邻帧中的该目标人像对所述mask图像A进行补全,得到mask图像B。
在一种实施方式中,实时视频帧拍摄的对象可能不止一个运动主体,且多个目标拍摄对象可能会与目标主体的图像相互重叠,例如,目标主体为人像1,在关键动作帧中,人像1与人像2存在部分重叠或者相互遮挡的情况。因此,电子设备需要从多个主体重叠的掩码图像中分离出目标主体的掩码图像,并持续地自动对同一个目标主体进行跟踪记录。具体的,可以通过如下方式分割重叠的目标拍摄对象。
方式一、根据深度图分割多个主体重叠的掩码图像。
可以结合二维图像对应的深度图,电子设备根据历史动作帧中多个主体重叠的掩码图像与多个主体对应的深度信息,得到目标主体的掩码图像。也就是电子设备可以根据历史动作帧中的多个主体的深度信息和目标主体的深度信息,从所述多个主体重叠的掩码图像中分离得到目标主体的掩码图像。
其中,深度图,是包含拍摄点与目标拍摄物体的表面距离有关的信息的图像或图像通道。深度图类似于灰度图像,只是深度图的每个像素值反映的是拍摄点距离目标拍摄物体的实际距离。通常RGB图像和深度图是配准的,因而RGB图像的像素点和深度图的像素点之间具有一一对应的关系。
深度图具体可以根据基于飞行时间(Time of Flight,ToF)的测距相机得到,或者可以对原始二维图像通过人工神经网络算法进行计算,得到每个像素点对应的深度值,还原得到原始二维图像的深度图,本申请对此不做具体限定。
通过对深度图进行处理,可以将多个不同目标拍摄对象进行区分。示例性的,如图9A所示,电子设备需要将多个重叠的人像区分出目标主体的人像,可以将得到的深度图的像素点与当前的关键动作帧的像素点一一对应,统计出深度图中对应的目标主体人像mask区域像素点的深度值的平均值或者中值。电子设备根据目标主体人像深度值的平均值或者中值对深度图进行处理,提取出主体人像在深度图中覆盖的深度值范围,然后将此深度值范围与对应的人像mask取交集,从而在多个重叠的人像mask中分离出目标主体的人像mask。保证分离出的目标主体的人像mask始终是单一的人像。
方式二、实例分割重叠的目标拍摄对象。
其中,实例是指对象,对象代表了一类拍摄对象中的一个特定的实例。
实例分割即是指在对图像中的每个像素都划分出对应的类别,即实现像素级别的 分类基础上,还需在具体的类别基础上区别开不同的实例。例如,根据图像中的每个像素划分出有人和背景物体。从多个人例如甲、乙和丙中区分开不同的人,即是进行实例分割。
具体的,电子设备可以通过深度学习算法进行实例分割。可参照图9B,实例分割mask中,不同人像的mask数值不相同,可直接分离出目标主体的人像mask区域。
需要说明的是,除了采用上述技术来分离多人重叠mask之外,现有的双目视觉深度、单目深度估计、结构光深度等方法也可以用于分离多人重叠mask,本申请对此不再赘述。
通过上述的实施方式,电子设备可以对多个重叠的目标拍摄对象分离出目标主体mask,从而准确对不同帧的目标主体进行运动轨迹的跟踪和记录,生成特定的目标主体的运动轨迹特效视频。
在一种实施方式中,上述的步骤S03中,电子设备根据目标主体在每个历史动作帧的场景中位置以及当前帧的场景,在当前帧中确定出参考位置,具体可以包括:
电子设备可以根据图像配准技术或者同步定位与建图技术,得到至少一个物体在每个历史动作帧中的位置与在当前帧中位置的对应关系;再根据得到的对应关系,以及每个历史动作帧中每个目标主体的图像位置与上述确定的对应关系,得到当前帧中每个目标主体对应的图像位置区域也即参考位置。从而电子设备可以将每个历史动作帧对应的每个目标主体的图像绘制到当前帧中对应的每个参考位置,即可得到目标帧。
示例性的,以下将结合图5,以历史动作帧包括第一动作帧01和第二动作帧02为例对此进行介绍。
如图5所示,若记录的历史动作帧包括第一动作帧01,且第一动作帧01对应的目标主体为第一目标主体。则后续每一帧图像都根据第一动作帧01中至少一个物体的位置与当前中至少一个物体的位置的映射关系,将第一目标主体的图像绘制到当前帧03中。
如图5所示,若记录的历史动作帧中还包括第二动作帧02,第二动作帧02对应的目标主体为第二目标主体,则当确定第二动作帧02之后的后续每一帧图像都根据第一动作帧01中至少一个物体的位置与当前帧03中至少一个物体的位置的映射关系,以及第二动作帧02中至少一个物体的位置与当前帧03中至少一个物体的位置的映射关系,将第一目标主体的图像和第二目标主体的图像绘制到当前帧03中。
其中,所述绘制是指电子设备的中央处理器(Central Processing Unit,CPU)或者图行处理器(图形处理器Graphics Processing Unit,GPU)根据绘制指令以及像素点信息等生成二维图像的过程。电子设备完成图像绘制之后,即可通过显示器件将目标图像显示在电子设备的显示屏上。
根据上述记载的实施方式,电子设备逐个对确定的关键动作帧做上述融合绘制处理,并进行实时显示,即可在线预览生成的运动轨迹特效视频,并生成最终的运动轨迹特效视频。
在上述实施方式中,在实时视频帧流过程中所记录的所有历史动作帧都需要映射到当前帧的相应位置来,具体可采用的映射方法有图像配准技术或者同步定位与建图技术(Simultaneous Localization And Mapping,SLAM)。从而,电子设备可以根据至 少一个历史动作帧与当前帧的图像映射关系,将每个历史动作帧中的目标主体的图像绘制到当前帧中,具体的,可以通过如下处理生成目标图像。
Step1:根据图像配准技术或者SLAM技术,得到每个历史动作帧中至少一个物体的图像位置与当前帧中至少一个物体的图像位置的对应关系。
其中,图像配准就是将不同时间、不同成像设备或不同条件下(如天候、亮度、摄像位置或角度等)获取的多张图像进行匹配、映射或者叠加的过程,可以广泛地应用于数据分析、计算机视觉和图像处理等领域。
如图10所示,电子设备可以根据第一动作帧中至少一个物体的位置,和当前帧中该相同物体的位置,得到第一动作帧中物体位置与当前帧中物体位置的对应关系,也可称为映射关系。则电子设备可以根据第一动作帧中目标主体的位置,结合该位置对应关系,得到该目标主体再当前帧中的参考位置,如图10中的虚线示意的位置可以为参考位置。
采用图像配准技术时,需要提取出历史动作帧中的特征,比如可以为语义内核二值化(Semantic Kernels Binarized,SKB)特征。再进行特征匹配并计算出单应性矩阵,最后根据得到的单应性矩阵将历史关键动作帧映射到当前帧中的对应位置。其中,SKB特征是一种图像特征的描述算子。图像配准技术可以实现二维图像之间的映射匹配。
SLAM技术是一种可以让设备一边移动一边逐步描绘出周围环境三维位置信息的技术。具体的,设备从未知环境的未知地点出发,在运动过程中通过重复观测到的地图特征(比如,墙角,柱子等)定位自身位置和姿态,再根据自身位置增量式的构建地图,从而达到同步定位和地图构建的目的。
采用SLAM技术时,需要通过电子设备中的SLAM模块计算得到历史动作帧中物体的三维位置信息,根据物体的三维位置信息将历史动作帧映射到当前帧中的相应位置。
由于SLAM技术是基于三维位置信息进行位置映射的,而三维位置信息可适用于帧间三维运动。因此,当电子设备拍摄的目标主体的运动轨迹涉及三维运动时,可以采用SLAM技术进行映射。
Step2:根据每个历史动作帧中每个目标主体的图像位置与对应关系,得到每个目标主体在当前帧中的参考位置。
也就是将每个历史动作帧中每个目标主体的图像映射到当前帧中的相应的图像位置区域。
Step3:将每个历史动作帧中的每个目标主体的图像绘制到当前帧中每个目标主体在当前帧对应的参考位置。
根据上述映射得到的每个目标主体的图像在当前帧中的参考位置,将每个目标主体的图像绘制到当前帧中相应的参考位置,从而得到多帧图像的融合图像,更新显示为当前帧。
示例性的,如图5所示,将第一动作帧01中的第一目标主体映射到第二动作帧02中相应的参考位置,并绘制到第二动作帧02中;将第一动作帧01中的第一目标主体映射到当前帧中的相应的参考位置,并绘制到当前帧中,同时将第二动作帧02中的第二目标主体映射到当前帧中的相应的参考位置,并绘制到当前帧中,更新当前帧。
上述的实施方式,通过图像配准技术或者SLAM技术进行多帧图像之间的映射,从而完成多帧图像中的目标主体图像的融合显示,使得目标主体的运动轨迹能够较准确、自然地显示在同一帧图像中的相应位置,从而形成交错时间、空间的运动轨迹特效视频,丰富用户的拍摄体验。
在一种实施方式中,把所有的历史动作帧都用图像配准技术或者SLAM技术映射到当前帧的相应位置后,结合每个历史动作帧中的目标主体的mask图像,将每个历史动作帧中的目标主体的mask图像映射到当前帧的相应位置后,为了使添加的目标主体的图像与当前帧的背景图像的显示过渡更加自然,该方法还可以包括:将目标图像中的每个历史动作帧的目标主体的图像进行边缘融合处理,更新目标图像,使得目标主体的图像和背景图像过渡自然。
其中,上述的多帧图像的融合处理就是把本不属于当前帧中的图像(历史动作帧中的目标主体的图像),融合显示到当前帧中;因此,需要进一步在当前帧的N个参考位置上,分别将N个目标主体的图像与当前帧中图像的像素信息进行加权融合处理,从而使得融合添加进来的目标主体的图像与当前帧之前的图像显示自然,边界过渡更加真实。
示例性的,采用的加权融合技术可以为alpha融合。具体处理过程可以为,根据目标主体图像的边缘mask值255,背景图像的边缘mask值0,将mask值由原始的255~0的垂直过渡,调整为255~0的平缓过渡,例如,可以通过线性或者非线性函数调整过渡的mask值。再把调整后的平缓过渡的mask值作为权重对目标主体的图像和背景图像做加权叠加。可选的,也可以采用高斯滤波方法对边缘区域处理,弱化边界线。其中,高斯滤波是根据高斯函数的形状来选择权值的非线性平滑滤波方式。
除了alpha融合技术外,泊松融合(Poisson Blending)技术、拉普拉斯融合(Laplacian Blending)技术等图像融合技术也可以用于上述实施方式,本申请对具体的图像融合技术不作限定。
在一种实施方式中,对多帧关键动作帧的图像进行融合显示,得到目标图像之后,为了更加直观显示出当前帧中的目标主体的运动轨迹,该方法还可以包括:对当前帧中目标主体的图像叠加至少一个留影图像。该留影图像是根据当前帧之前连续若干帧的目标主体的图像生成的。
具体的,至少一个留影图像可以用灰度图像来表示,其中,每个留影图像的灰度值可以一样,也可以不一样。
示例性的,如图11所示,可以在第二动作帧02中的第二目标主体图像的背后,叠加至少一个留影图像,并且,在当前帧03中的目标主体的运动方向背后叠加多个留影图像。留影图像距离当前帧03中目标主体的图像越远,留影图像的强度可以越弱;留影图像距离当前帧03中目标主体的图像越近,则留影图像的强度可以越强。留影图像可以随着距离当前帧03中目标主体图像逐渐变远,其强度逐渐减弱到0为止。
其中,本申请对留影图像的个数不做限定,本领域技术人员可以根据设计需要进行设置。
当留影图像用灰度图像表示的时候,其中,至少一个灰度图像与当前帧中的目标主体的图像之间的距离越近,则该灰度图像的灰度值越大;至少一个灰度图像与当前 帧中的目标主体的图像之间的距离越远,则该灰度图像的灰度值越小。
上述的实施方式,通过在当前帧中目标主体的运动方向背后叠加多个留影图像,能够更加直观地表示出目标主体的运动方向和轨迹,增加特效视频的趣味性和直观性,进一步提升用户的拍摄体验。
根据上述的任一种实施方式,在实时将记录的所有历史动作帧中的目标主体的图像映射到当前帧的图像中之后,视频帧流持续更新,并将当前帧输出的图像显示到电子设备的视频拍摄预览画面。如图12所示,用户在开始拍摄特效视频后,同时能够在电子设备的视频拍摄预览画面中实时看到特效视频的拍摄效果。另外,也可以将实时生成的视频帧输出到最终的视频生成流中,在用户完成拍摄之后,即可观看生成的完整的运动轨迹特效视频。
结合上述的任一种可能的实施方式,如图13所示,为本申请实施例提供的一种生成运动轨迹特效视频的详细实施流程。该流程主要包括:1、拍摄预览界面交互、确定目标主体和关键动作帧;2、图像分割得到目标主体的图像;3、关键动作帧映射到当前帧,并将关键动作帧中的目标主体的图像绘制到当前帧;4、在线预览和实时生成视频帧流。
其中,图13中所示的处理流程中,并不是全部的处理流程,也不都是必选的处理流程,本领域技术人员可以根据设计需要,对详细的处理过程和顺序进行调整和设置。同时,本申请的上述技术方案不仅适用于生成运动轨迹的特效视频,还可以用于快速开发其他的类似特效视频,例如,多人像特效合成或者成长特效等,本申请对此不做具体限制。
本申请实施例还提供一种图像处理装置,如图14所示,该装置1400可以包括:获取模块1401、图像分割模块1402、映射模块1403和图像融合模块1404。
其中,获取模块1401,用于获取当前帧和N个历史动作帧,其中,所述当前帧和所述N个历史动作帧均包括目标主体,所述当前帧和所述N个历史动作帧的场景存在交叠,所述目标主体在所述N个历史动作帧中场景的位置不同,N为大于或者等于1的正整数。
图像分割模块1402,用于对所述N个历史动作帧进行图像分割,得到所述N个历史动作帧分别对应的N个目标主体的图像。
映射模块1403,用于根据所述N个目标主体分别在所述N个历史动作帧的场景中位置以及所述当前帧的场景,在所述当前帧中确定出N个参考位置。
图像融合模块1404,用于将所述N个目标主体的图像分别融合在所述当前帧的N个参考位置上,得到目标帧。
在一种可能的设计方式中,该装置还可以包括:接收模块,用于接收用户的第一选择指令,第一选择指令用于指示进入自动拍摄模式或者手动拍摄模式。
在一种可能的设计方式中,若第一选择指令用于指示进入自动拍摄模式,则获取模块1401具体用于:对实时视频流进行运动检测确定目标主体;检测目标主体在实时视频流包括的每个视频帧中场景的位置;确定目标主体在实时视频流包括的视频帧中场景的位置变化满足预设阈值的视频帧为历史动作帧。
在一种可能的设计方式中,若第一选择指令用于指示进入手动拍摄模式,则接收 模块还用于接收用户对实时视频流包括的视频帧的第二选择指令;获取模块1401具体还用于:确定第二选择指令在视频帧中对应位置的主体为目标主体,并确定视频帧为历史动作帧。
在一种可能的设计方式中,图像分割模块1402具体用于:根据运动检测技术缩小历史动作帧中对应目标主体的图像区域,得到历史动作帧中的目标图像区域;通过深度学习算法对目标图像区域的图像进行处理,得到历史动作帧对应的目标主体的掩码图像。
在一种可能的设计方式中,若掩码图像中存在多个主体重叠的掩码图像,则图像分割模块1402具体还用于:根据历史动作帧中多个主体的深度信息,从多个主体重叠的掩码图像中分离得到目标主体的掩码图像。
在一种可能的设计方式中,映射模块1403具体用于:根据图像配准技术或者同步定位与建图SLAM技术,得到至少一个物体在历史动作帧中的位置与在当前帧中位置的对应关系;根据对应关系以及目标主体在历史动作帧中的位置,在当前帧中确定出目标主体的参考位置。
在一种可能的设计方式中,图像融合模块1404具体用于:在当前帧的N个参考位置上,分别将N个目标主体的图像与当前帧中图像的像素信息进行加权融合处理。
在一种可能的设计方式中,图像融合模块1404具体还用于:对当前帧中的目标主体的图像添加至少一个灰度图像得到目标帧,其中,若灰度图像与当前帧中的目标主体的图像之间的距离越近,则灰度图像的灰度值越大。
此外,该装置1400具体的执行过程和实施例可以参照上述方法实施例中电子设备执行的步骤和相关的描述,所解决的技术问题和带来的技术效果也可以参照前述实施例所述的内容,此处不再一一赘述。
在本实施例中,该测试装置以采用集成的方式划分各个功能模块的形式来呈现。这里的“模块”可以指特定电路、执行一个或多个软件或固件程序的处理器和存储器、集成逻辑电路、和/或其他可以提供上述功能的器件。在一个简单的实施例中,本领域的技术人员可以想到该装置可以采用如下图15所示的形式。
图15是根据一示例性实施例示出的一种电子设备1500的结构示意图,该电子设备1500可以用于根据上述实施方式生成拍摄主体的运动轨迹特效视频。如图15所示,该电子设备1500可以包括至少一个处理器1501,通信线路1502以及存储器1503。
处理器1501可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本公开方案程序执行的集成电路。
通信线路1502可包括一通路,在上述组件之间传送信息,例如总线。
存储器1503可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能 够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路1502与处理器相连接。存储器也可以和处理器集成在一起。本公开实施例提供的存储器通常可以具有非易失性。其中,存储器1503用于存储执行本公开方案所涉及的计算机执行指令,并由处理器1501来控制执行。处理器1501用于执行存储器1503中存储的计算机执行指令,从而实现本公开实施例提供的方法。
可选的,本公开实施例中的计算机执行指令也可以称之为应用程序代码,本公开实施例对此不作具体限定。
在具体实现中,作为一种实施例,处理器1501可以包括一个或多个CPU,例如图15中的CPU0和CPU1。
在具体实现中,作为一种实施例,电子设备1500可以包括多个处理器,例如图15中的处理器1501和处理器1507。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,电子设备1500还可以包括通信接口1504。通信接口1504,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网接口,无线接入网接口(radio access network,RAN),无线局域网接口(wireless local area networks,WLAN)等。
在具体实现中,作为一种实施例,电子设备1500还可以包括输出设备1505和输入设备15015。输出设备1505和处理器1501通信,可以以多种方式来显示信息。例如,输出设备1505可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。输入设备1506和处理器1501通信,可以以多种方式接收用户的输入。例如,输入设备1506可以是鼠标、键盘、触摸屏设备或传感设备等。
在具体实现中,电子设备1500可以是台式机、便携式电脑、网络服务器、掌上电脑(personal digital assistant,PDA)、移动手机、平板电脑、无线终端设备、嵌入式设备或有图15中类似结构的设备。本公开实施例不限定电子设备1500的类型。
在一些实施例中,图15中的处理器1501可以通过调用存储器1503中存储的计算机执行指令,使得电子设备1500执行上述方法实施例中的方法。
示例性的,图14中的获取模块1401、图像分割模块1402、映射模块1403和图像融合模块1404的功能/实现过程可以通过图15中的处理器1501调用存储器1503中存储的计算机执行指令来实现。
在示例性实施例中,还提供了一种包括指令的存储介质,例如包括指令的存储器1503,上述指令可由电子设备1500的处理器1501执行以完成上述方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式来实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (21)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取当前帧和N个历史动作帧,其中,所述当前帧和所述N个历史动作帧均包括目标主体,所述当前帧和所述N个历史动作帧的场景存在交叠,所述目标主体在所述N个历史动作帧中场景的位置不同,N为大于或者等于1的正整数;
    对所述N个历史动作帧进行图像分割,得到所述N个历史动作帧分别对应的N个目标主体的图像;
    根据所述N个目标主体分别在所述N个历史动作帧的场景中位置以及所述当前帧的场景,在所述当前帧中确定出N个参考位置;
    将所述N个目标主体的图像分别融合在所述当前帧的N个参考位置上,得到目标帧。
  2. 根据权利要求1所述的方法,其特征在于,所述获取当前帧和N个历史动作帧之前,所述方法还包括:
    接收用户的第一选择指令,所述第一选择指令用于指示进入自动拍摄模式或者手动拍摄模式。
  3. 根据权利要求2所述的方法,其特征在于,若所述第一选择指令用于指示进入所述自动拍摄模式,则获取所述历史动作帧,具体包括:
    对实时视频流进行运动检测确定所述目标主体;
    检测所述目标主体在所述实时视频流包括的每个视频帧中场景的位置;
    确定所述目标主体在所述实时视频流包括的视频帧中场景的位置变化满足预设阈值的视频帧为所述历史动作帧。
  4. 根据权利要求2所述的方法,其特征在于,若所述第一选择指令用于指示进入所述手动拍摄模式,则获取所述历史动作帧,具体包括:
    接收用户对实时视频流包括的视频帧的第二选择指令;
    确定所述第二选择指令在所述视频帧中对应位置的主体为所述目标主体,并确定所述视频帧为所述历史动作帧。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,对所述历史动作帧进行图像分割,得到所述历史动作帧对应的目标主体的图像,具体包括:
    根据运动检测技术缩小所述历史动作帧中对应目标主体的图像区域,得到所述历史动作帧中的目标图像区域;
    通过深度学习算法对所述目标图像区域的图像进行处理,得到所述历史动作帧对应的目标主体的掩码图像。
  6. 根据权利要求5所述的方法,其特征在于,若所述掩码图像中存在多个主体重叠的掩码图像,则所述方法还包括:
    根据所述历史动作帧中所述多个主体的深度信息,从所述多个主体重叠的掩码图像中分离得到所述目标主体的掩码图像。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,根据所述目标主体在所述历史动作帧的场景中位置以及所述当前帧的场景,在所述当前帧中确定出参考位置,具体包括:
    根据图像配准技术或者同步定位与建图SLAM技术,得到至少一个物体在所述历史动作帧中的位置与在所述当前帧中位置的对应关系;
    根据所述对应关系以及所述目标主体在所述历史动作帧中的位置,在所述当前帧中确定出所述目标主体的参考位置。
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述将所述N个目标主体的图像分别融合在所述当前帧的N个参考位置上,具体包括:
    在所述当前帧的N个参考位置上,分别将所述N个目标主体的图像与所述当前帧中图像的像素信息进行加权融合处理。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述将所述N个目标主体的图像分别融合在所述当前帧的N个参考位置上之后,所述方法还包括:
    对所述当前帧中的目标主体的图像添加至少一个灰度图像得到所述目标帧,其中,若所述灰度图像与所述当前帧中的目标主体的图像之间的距离越近,则所述灰度图像的灰度值越大。
  10. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于获取当前帧和N个历史动作帧,其中,所述当前帧和所述N个历史动作帧均包括目标主体,所述当前帧和所述N个历史动作帧的场景存在交叠,所述目标主体在所述N个历史动作帧中场景的位置不同,N为大于或者等于1的正整数;
    图像分割模块,用于对所述N个历史动作帧进行图像分割,得到所述N个历史动作帧分别对应的N个目标主体的图像;
    映射模块,用于根据所述N个目标主体分别在所述N个历史动作帧的场景中位置以及所述当前帧的场景,在所述当前帧中确定出N个参考位置;
    图像融合模块,用于将所述N个目标主体的图像分别融合在所述当前帧的N个参考位置上,得到目标帧。
  11. 根据权利要求10所述的装置,其特征在于,所述装置还包括:
    接收模块,用于接收用户的第一选择指令,所述第一选择指令用于指示进入自动拍摄模式或者手动拍摄模式。
  12. 根据权利要求11所述的装置,其特征在于,若所述第一选择指令用于指示进入所述自动拍摄模式,则所述获取模块具体用于:
    对实时视频流进行运动检测确定所述目标主体;
    检测所述目标主体在所述实时视频流包括的每个视频帧中场景的位置;
    确定所述目标主体在所述实时视频流包括的视频帧中场景的位置变化满足预设阈值的视频帧为所述历史动作帧。
  13. 根据权利要求11所述的装置,其特征在于,若所述第一选择指令用于指示进入所述手动拍摄模式,则所述接收模块还用于接收用户对实时视频流包括的视频帧的第二选择指令;
    所述获取模块具体还用于:确定所述第二选择指令在所述视频帧中对应位置的主体为所述目标主体,并确定所述视频帧为所述历史动作帧。
  14. 根据权利要求10-13任一项所述的装置,其特征在于,所述图像分割模块具体用于:
    根据运动检测技术缩小所述历史动作帧中对应目标主体的图像区域,得到所述历史动作帧中的目标图像区域;
    通过深度学习算法对所述目标图像区域的图像进行处理,得到所述历史动作帧对应的目标主体的掩码图像。
  15. 根据权利要求14所述的装置,其特征在于,若所述掩码图像中存在多个主体重叠的掩码图像,则所述图像分割模块具体还用于:
    根据所述历史动作帧中所述多个主体的深度信息,从所述多个主体重叠的掩码图像中分离得到所述目标主体的掩码图像。
  16. 根据权利要求10-15任一项所述的装置,其特征在于,所述映射模块具体用于:
    根据图像配准技术或者同步定位与建图SLAM技术,得到至少一个物体在所述历史动作帧中的位置与在所述当前帧中位置的对应关系;
    根据所述对应关系以及所述目标主体在所述历史动作帧中的位置,在所述当前帧中确定出所述目标主体的参考位置。
  17. 根据权利要求10-16任一项所述的装置,其特征在于,所述图像融合模块具体用于:
    在所述当前帧的N个参考位置上,分别将所述N个目标主体的图像与所述当前帧中图像的像素信息进行加权融合处理。
  18. 根据权利要求10-17任一项所述的装置,其特征在于,所述图像融合模块具体还用于:
    对所述当前帧中的目标主体的图像添加至少一个灰度图像得到所述目标帧,其中,若所述灰度图像与所述当前帧中的目标主体的图像之间的距离越近,则所述灰度图像的灰度值越大。
  19. 一种电子设备,其特征在于,包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令,以实现如权利要求1至9中任一项所述的图像处理方法。
  20. 一种计算机可读存储介质,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行如权利要求1至9中任一项所述的图像处理方法。
  21. 一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至9中任一项所述的图像处理方法。
PCT/CN2021/079103 2020-05-29 2021-03-04 一种图像处理方法及装置 WO2021238325A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010478673.3 2020-05-29
CN202010478673.3A CN113810587B (zh) 2020-05-29 2020-05-29 一种图像处理方法及装置

Publications (1)

Publication Number Publication Date
WO2021238325A1 true WO2021238325A1 (zh) 2021-12-02

Family

ID=78745570

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/079103 WO2021238325A1 (zh) 2020-05-29 2021-03-04 一种图像处理方法及装置

Country Status (2)

Country Link
CN (1) CN113810587B (zh)
WO (1) WO2021238325A1 (zh)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114302071A (zh) * 2021-12-28 2022-04-08 影石创新科技股份有限公司 视频处理方法、装置、存储介质及电子设备
CN114302234A (zh) * 2021-12-29 2022-04-08 杭州当虹科技股份有限公司 一种空中技巧快速包装方法
CN114531553A (zh) * 2022-02-11 2022-05-24 北京字跳网络技术有限公司 生成特效视频的方法、装置、电子设备及存储介质
CN115037992A (zh) * 2022-06-08 2022-09-09 中央广播电视总台 视频处理方法、装置和存储介质
CN115147441A (zh) * 2022-07-31 2022-10-04 江苏云舟通信科技有限公司 基于数据分析的抠图特效处理系统
CN115273565A (zh) * 2022-06-24 2022-11-01 苏州数智源信息技术有限公司 一种基于ai大数据的飞机坪预警方法、装置及终端
CN115567633A (zh) * 2022-02-24 2023-01-03 荣耀终端有限公司 拍摄方法、介质、程序产品及电子设备
CN115689963A (zh) * 2022-11-21 2023-02-03 荣耀终端有限公司 一种图像处理方法及电子设备
CN116048379A (zh) * 2022-06-30 2023-05-02 荣耀终端有限公司 数据回灌方法及装置
CN116229337A (zh) * 2023-05-10 2023-06-06 瀚博半导体(上海)有限公司 用于视频处理的方法、装置、系统、设备和介质
WO2023103944A1 (zh) * 2021-12-07 2023-06-15 影石创新科技股份有限公司 视频的多帧延迟特效生成方法、装置、设备及介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114288647B (zh) * 2021-12-31 2022-07-08 深圳方舟互动科技有限公司 基于AI Designer的人工智能游戏引擎、游戏渲染方法及装置
CN114440920A (zh) * 2022-01-27 2022-05-06 电信科学技术第十研究所有限公司 基于电子地图的轨迹流动显示方法及装置
CN114494328B (zh) * 2022-02-11 2024-01-30 北京字跳网络技术有限公司 图像显示方法、装置、电子设备及存储介质
CN115175005A (zh) * 2022-06-08 2022-10-11 中央广播电视总台 视频处理方法、装置、电子设备及存储介质
CN114863036B (zh) * 2022-07-06 2022-11-15 深圳市信润富联数字科技有限公司 基于结构光的数据处理方法、装置、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104243819A (zh) * 2014-08-29 2014-12-24 小米科技有限责任公司 照片获取方法及装置
JP2015167676A (ja) * 2014-03-06 2015-09-28 株式会社横浜DeNAベイスターズ ピッチング分析支援システム
CN105049674A (zh) * 2015-07-01 2015-11-11 中科创达软件股份有限公司 一种视频图像处理方法和系统
CN105872452A (zh) * 2015-02-10 2016-08-17 韩华泰科株式会社 浏览摘要图像的系统及方法
US20170040036A1 (en) * 2014-01-14 2017-02-09 Hanwha Techwin Co., Ltd. Summary image browsing system and method
CN107943837A (zh) * 2017-10-27 2018-04-20 江苏理工学院 一种前景目标关键帧化的视频摘要生成方法
CN110536087A (zh) * 2019-05-06 2019-12-03 珠海全志科技股份有限公司 电子设备及其运动轨迹照片合成方法、装置和嵌入式装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5750864B2 (ja) * 2010-10-27 2015-07-22 ソニー株式会社 画像処理装置、画像処理方法、プログラム
JP2012109898A (ja) * 2010-11-19 2012-06-07 Aof Imaging Technology Ltd 撮影装置、撮影方法、およびプログラム
CN104113693B (zh) * 2014-07-22 2016-04-06 努比亚技术有限公司 拍摄方法和拍摄装置
CN104125407B (zh) * 2014-08-13 2018-09-04 努比亚技术有限公司 物体运动轨迹的拍摄方法和移动终端
CN104159033B (zh) * 2014-08-21 2016-01-27 努比亚技术有限公司 一种拍摄效果的优化方法及装置
CN104751488B (zh) * 2015-04-08 2017-02-15 努比亚技术有限公司 运动物体的运动轨迹的拍摄方法及终端设备
CN107077720A (zh) * 2016-12-27 2017-08-18 深圳市大疆创新科技有限公司 图像处理的方法、装置和设备
CN111105434A (zh) * 2018-10-25 2020-05-05 中兴通讯股份有限公司 运动轨迹合成方法及电子设备
CN109922294B (zh) * 2019-01-31 2021-06-22 维沃移动通信有限公司 一种视频处理方法及移动终端

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170040036A1 (en) * 2014-01-14 2017-02-09 Hanwha Techwin Co., Ltd. Summary image browsing system and method
JP2015167676A (ja) * 2014-03-06 2015-09-28 株式会社横浜DeNAベイスターズ ピッチング分析支援システム
CN104243819A (zh) * 2014-08-29 2014-12-24 小米科技有限责任公司 照片获取方法及装置
CN105872452A (zh) * 2015-02-10 2016-08-17 韩华泰科株式会社 浏览摘要图像的系统及方法
CN105049674A (zh) * 2015-07-01 2015-11-11 中科创达软件股份有限公司 一种视频图像处理方法和系统
CN107943837A (zh) * 2017-10-27 2018-04-20 江苏理工学院 一种前景目标关键帧化的视频摘要生成方法
CN110536087A (zh) * 2019-05-06 2019-12-03 珠海全志科技股份有限公司 电子设备及其运动轨迹照片合成方法、装置和嵌入式装置

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023103944A1 (zh) * 2021-12-07 2023-06-15 影石创新科技股份有限公司 视频的多帧延迟特效生成方法、装置、设备及介质
CN114302071A (zh) * 2021-12-28 2022-04-08 影石创新科技股份有限公司 视频处理方法、装置、存储介质及电子设备
CN114302071B (zh) * 2021-12-28 2024-02-20 影石创新科技股份有限公司 视频处理方法、装置、存储介质及电子设备
CN114302234A (zh) * 2021-12-29 2022-04-08 杭州当虹科技股份有限公司 一种空中技巧快速包装方法
CN114302234B (zh) * 2021-12-29 2023-11-07 杭州当虹科技股份有限公司 一种空中技巧快速包装方法
CN114531553A (zh) * 2022-02-11 2022-05-24 北京字跳网络技术有限公司 生成特效视频的方法、装置、电子设备及存储介质
CN114531553B (zh) * 2022-02-11 2024-02-09 北京字跳网络技术有限公司 生成特效视频的方法、装置、电子设备及存储介质
CN115567633A (zh) * 2022-02-24 2023-01-03 荣耀终端有限公司 拍摄方法、介质、程序产品及电子设备
CN115037992A (zh) * 2022-06-08 2022-09-09 中央广播电视总台 视频处理方法、装置和存储介质
CN115273565A (zh) * 2022-06-24 2022-11-01 苏州数智源信息技术有限公司 一种基于ai大数据的飞机坪预警方法、装置及终端
CN116048379B (zh) * 2022-06-30 2023-10-24 荣耀终端有限公司 数据回灌方法及装置
CN116048379A (zh) * 2022-06-30 2023-05-02 荣耀终端有限公司 数据回灌方法及装置
CN115147441A (zh) * 2022-07-31 2022-10-04 江苏云舟通信科技有限公司 基于数据分析的抠图特效处理系统
CN115689963B (zh) * 2022-11-21 2023-06-06 荣耀终端有限公司 一种图像处理方法及电子设备
CN115689963A (zh) * 2022-11-21 2023-02-03 荣耀终端有限公司 一种图像处理方法及电子设备
CN116229337B (zh) * 2023-05-10 2023-09-26 瀚博半导体(上海)有限公司 用于视频处理的方法、装置、系统、设备和介质
CN116229337A (zh) * 2023-05-10 2023-06-06 瀚博半导体(上海)有限公司 用于视频处理的方法、装置、系统、设备和介质

Also Published As

Publication number Publication date
CN113810587A (zh) 2021-12-17
CN113810587B (zh) 2023-04-18

Similar Documents

Publication Publication Date Title
WO2021238325A1 (zh) 一种图像处理方法及装置
WO2021121236A1 (zh) 一种控制方法、电子设备、计算机可读存储介质、芯片
US20220301180A1 (en) Image Processing Method and Electronic Device
KR20210113333A (ko) 다수의 가상 캐릭터를 제어하는 방법, 기기, 장치 및 저장 매체
US11977981B2 (en) Device for automatically capturing photo or video about specific moment, and operation method thereof
US20230334789A1 (en) Image Processing Method, Mobile Terminal, and Storage Medium
WO2023093169A1 (zh) 拍摄的方法和电子设备
CN113536866A (zh) 一种人物追踪显示方法和电子设备
CN115689963A (zh) 一种图像处理方法及电子设备
WO2022057384A1 (zh) 拍摄方法和装置
EP4109879A1 (en) Image color retention method and device
CN116916151B (zh) 拍摄方法、电子设备和存储介质
US20230353864A1 (en) Photographing method and apparatus for intelligent framing recommendation
WO2022206605A1 (zh) 确定目标对象的方法、拍摄方法和装置
WO2021103919A1 (zh) 构图推荐方法和电子设备
WO2023072113A1 (zh) 显示方法及电子设备
CN115880348B (zh) 一种人脸深度的确定方法、电子设备及存储介质
CN116546274B (zh) 视频切分方法、选取方法、合成方法及相关装置
WO2023004682A1 (zh) 身高检测方法、装置及存储介质
WO2022267781A1 (zh) 建模方法及相关电子设备及存储介质
WO2023078133A1 (zh) 视频播放方法和装置
WO2022143230A1 (zh) 一种确定跟踪目标的方法及电子设备
CN116193243B (zh) 拍摄方法和电子设备
WO2024045854A1 (zh) 一种虚拟数字内容显示系统、方法与电子设备
US20240135115A1 (en) Ar translation processing method and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21814068

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21814068

Country of ref document: EP

Kind code of ref document: A1