CN113810587A - Image processing method and device - Google Patents
Image processing method and device Download PDFInfo
- Publication number
- CN113810587A CN113810587A CN202010478673.3A CN202010478673A CN113810587A CN 113810587 A CN113810587 A CN 113810587A CN 202010478673 A CN202010478673 A CN 202010478673A CN 113810587 A CN113810587 A CN 113810587A
- Authority
- CN
- China
- Prior art keywords
- image
- frame
- target
- current frame
- historical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/80—Camera processing pipelines; Components thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/254—Analysis of motion involving subtraction of images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/269—Analysis of motion using gradient-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/61—Control of cameras or camera modules based on recognised objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/265—Mixing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20224—Image subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30241—Trajectory
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Studio Devices (AREA)
- Image Analysis (AREA)
Abstract
The application provides an image processing method and device, relates to the technical field of multimedia processing, and is used for solving the problem that a motion track special-effect video of a target shooting object cannot be generated in real time in the prior art. The method comprises the following steps: acquiring a current frame and N historical action frames, wherein the current frame and the N historical action frames comprise target bodies, scenes of the current frame and the N historical action frames are overlapped, and the positions of the target bodies in the N historical action frames are different; performing image segmentation on the N historical action frames to obtain images of N target bodies corresponding to the N historical action frames respectively; determining N reference positions in the current frame according to the positions of the N target main bodies in the scenes of the N historical action frames and the scene of the current frame; and respectively fusing the images of the N target subjects on the N reference positions of the current frame to obtain the target frame.
Description
Technical Field
The present application relates to the field of multimedia processing technologies, and in particular, to an image processing method and apparatus.
Background
At present, more and more users select to take pictures or videos by cameras on mobile electronic equipment such as mobile phones and the like to record life, the motion tracks of objects or human beings cannot be visually embodied in the same video frame in the images or videos generally taken by the cameras, and the interaction experience between the human images and the background and between the human images is not rich enough and lacks interestingness.
The existing solution is to process the image data of the generated video frame, add the motion path of the target object into the processed image data, and generate a special effect video. For example, the actual motion trajectory path of the football or the player is displayed in the football match video, that is, the motion route of the football or the player is visually embodied by image processing technology at a later stage, for example, the motion route represented by a curve or a straight line is added, so that the special effect video is generated. However, this scheme can only be post-processed, and cannot generate a special effect video in real time.
Disclosure of Invention
The application provides an image processing method and device, and solves the problem that a motion track special-effect video of a target shooting object cannot be generated in real time in the prior art.
In order to achieve the purpose, the technical scheme is as follows:
in a first aspect, an image processing method is provided, which includes: acquiring a current frame and N historical action frames, wherein the current frame and the N historical action frames both comprise a target main body, scenes of the current frame and the N historical action frames are overlapped, the positions of the scenes of the target main body in the N historical action frames are different, and N is a positive integer greater than or equal to 1; performing image segmentation on the N historical action frames to obtain images of N target bodies corresponding to the N historical action frames respectively; determining N reference positions in the current frame according to the positions of the N target main bodies in the scenes of the N historical action frames and the scene of the current frame; and respectively fusing the images of the N target subjects on the N reference positions of the current frame to obtain the target frame.
It should be noted that, after the electronic device receives a shooting start instruction from a user, the electronic device acquires a real-time video stream through a lens, where the real-time video stream is composed of a temporally continuous frame sequence, and each frame of the video stream may be a current frame at a current time. When the electronic device determines a key action frame by a specific method described below, the key action frame may be referred to as a historical action frame with respect to a current frame corresponding to a time after the determination of the key action frame. Taking a time axis t of real-time shooting as an example, the electronic device starts video shooting at time t0, the electronic device determines a real-time video frame corresponding to time t1 as a key action frame (historical action frame 1), and then determines a real-time video frame corresponding to time t2 as a key action frame (historical action frame 2), so that for a current frame corresponding to current time t3, the acquired N historical action frames are the historical action frame 1 and the historical action frame 2.
In the above technical solution, the electronic device segments the image of the corresponding at least one target subject in the at least one historical motion frame by determining at least one key motion frame as a historical motion frame in the real-time video frame stream. The key action frame refers to an image corresponding to a target subject when the target subject makes a specified action or an obvious key action in a video frame stream shot by the electronic equipment in real time. And simultaneously displaying the image of the target main body in each historical action frame in the current frame by a multi-frame fusion display method according to the position corresponding relation of the object in the multi-frame image. The main application scene of the technical scheme is the segmentation of the portrait and the fusion display of the motion trail, so that a special effect image or a special effect video of the motion trail of the shot target main body can be generated in real time, and the shooting experience of a user is enriched.
In a possible design, before the obtaining the current frame and the N historical motion frames, the method further includes: and receiving a first selection instruction of a user, wherein the first selection instruction is used for indicating to enter an automatic shooting mode or a manual shooting mode.
In the above possible implementation manner, the electronic device determines the automatic shooting mode or the manual shooting mode by receiving a selection instruction of a user. Therefore, the electronic equipment can automatically detect or manually determine the historical action frames in the currently acquired video frame stream by the user, and the special effect video effect of displaying the motion track is fused according to the plurality of historical action frames, so that the shooting pleasure of the user is increased.
In a possible design, if the first selection instruction is used to instruct to enter the automatic shooting mode, acquiring the history motion frame specifically includes: carrying out motion detection on the real-time video stream to determine a target subject; detecting a position of a target subject in a scene in each video frame included in the real-time video stream; and determining a video frame of which the position change of the scene in the video frames included in the real-time video stream of the target subject meets a preset threshold value as a historical action frame.
In the above possible implementation manner, the electronic device may automatically detect a moving target subject from the real-time video frame stream according to an automatic shooting instruction instructed by a user, and determine a historical motion frame that meets a preset condition according to an image change of the moving target subject. Therefore, fusion display and updating are automatically carried out in real time to the current frame according to the determined at least one historical action frame, a special effect video is synthesized, and the shooting experience of a user is enriched.
In a possible design, if the first selection instruction is used to instruct to enter the manual shooting mode, the acquiring the history motion frame specifically includes: receiving a second selection instruction of the user for the video frames included in the real-time video stream; and determining that the main body of the corresponding position of the second selection instruction in the video frame is a target main body, and determining that the video frame is a historical action frame.
In the possible implementation manner, the electronic device can also perform fusion display of multiple frames of images in real time through real-time interaction with the user according to the moving target subject in the current video frame stream determined by the user and at least one historical action frame determined by the user, update the fusion display to the current frame to synthesize the special effect video, and enrich the shooting experience of the user.
In a possible design manner, image segmentation is performed on a historical motion frame to obtain an image of a target subject corresponding to the historical motion frame, and the method specifically includes: reducing the image area of the historical action frame corresponding to the target subject according to a motion detection technology to obtain a target image area in the historical action frame; and processing the image of the target image area through a deep learning algorithm to obtain a mask image of the target main body corresponding to the historical action frame.
In the possible implementation manner, the electronic device may perform image segmentation according to the historical motion frame to obtain a mask image of the target subject, so as to track and record the motion of the multi-frame target subject, and perform multi-frame image fusion on the current frame according to the mask image of at least one target subject, thereby generating a special effect video of a motion track. In addition, the image area of the image division is reduced before the image division is carried out, so that the accuracy of the image division can be improved, and the complexity of the algorithm can be simplified.
In one possible design, if there are multiple mask images with overlapping subjects in the mask image, the method further includes: and separating the mask image of the target subject from the mask image overlapped by the subjects according to the depth information of the subjects in the historical action frame.
In the possible implementation manner, when there is a problem that the captured image of the target subject and the other subject images are displayed in an overlapping manner, the mask image of the target subject may be obtained by separating the mask image of the target subject according to the depth information of the subjects in the historical motion frame and the mask image in which the multiple persons overlap. Besides the above mask image segmentation according to the depth image, the segmentation of the mask image overlapped by multiple people can be realized by adopting technologies such as binocular visual depth, monocular depth estimation, structured light depth or example segmentation. The mask image of the target subject is divided from the mask images overlapped by a plurality of persons, so that the image processing precision is improved, and the generated motion track special effect video of the target subject is more real and natural.
In a possible design manner, determining a reference position in a current frame according to a position of a target subject in a scene of a historical action frame and a scene of the current frame specifically includes: obtaining the corresponding relation between the position of at least one object in a historical action frame and the position of the object in the current frame according to an image registration technology or a synchronous positioning and mapping SLAM technology; and determining the reference position of the target main body in the current frame according to the corresponding relation and the position of the target main body in the historical action frame.
In the possible implementation manner, the position mapping of the multi-frame images is performed through an image registration technology or a synchronous positioning and mapping SLAM technology, and the corresponding reference position of the image of the target subject in each historical action frame in the current frame is determined according to the corresponding relation of the image positions of different objects in the multi-frame images, so that a special effect video with a real and natural motion track can be generated, and the shooting experience of a user is improved.
In a possible design, fusing images of N target subjects to N reference positions of a current frame, respectively, specifically includes: and respectively carrying out weighted fusion processing on the images of the N target bodies and the pixel information of the image in the current frame at the N reference positions of the current frame.
In the possible implementation manner, after the images of the multiple target subjects are fused and displayed, edge fusion processing may be performed on the images of the target subjects and the background image in the current frame, and the target frame is updated, so that the images of the multiple target subjects and the background image displayed in a fusion manner are in a natural transition.
In one possible design, after fusing the images of the N target subjects on the N reference positions of the current frame, respectively, the method further includes: and adding at least one gray level image to the image of the target subject in the current frame to obtain the target frame, wherein the gray level value of the gray level image is larger if the distance between the gray level image and the image of the target subject in the current frame is shorter.
In the possible implementation manner, a plurality of image retention images are superposed behind the motion direction of the target main body in the current frame, the image retention images can be displayed through gray level images, and the motion track is embodied through different gray level values, so that the motion direction and the track of the target main body can be more intuitively represented, the interest and the intuitiveness of the special effect video are increased, and the shooting experience of a user is further improved.
In a second aspect, there is provided an image processing apparatus comprising: the acquisition module is used for acquiring a current frame and N historical action frames, wherein the current frame and the N historical action frames both comprise a target main body, scenes of the current frame and the N historical action frames are overlapped, the positions of the target main body in the scenes of the N historical action frames are different, and N is a positive integer greater than or equal to 1; the image segmentation module is used for carrying out image segmentation on the N historical action frames to obtain images of N target bodies corresponding to the N historical action frames respectively; the mapping module is used for determining N reference positions in the current frame according to the positions of the N target main bodies in the scenes of the N historical action frames and the scene of the current frame; and the image fusion module is used for respectively fusing the images of the N target subjects on the N reference positions of the current frame to obtain the target frame.
In one possible embodiment, the device further comprises: the receiving module is used for receiving a first selection instruction of a user, and the first selection instruction is used for indicating to enter an automatic shooting mode or a manual shooting mode.
In a possible design, if the first selection instruction is used to instruct entry into the automatic shooting mode, the obtaining module is specifically configured to: carrying out motion detection on the real-time video stream to determine a target subject; detecting a position of a target subject in a scene in each video frame included in the real-time video stream; and determining the video frames of which the position change of the scene in the video frames included in the real-time video stream meets the preset threshold value as historical action frames.
In a possible design manner, if the first selection instruction is used to instruct to enter the manual shooting mode, the receiving module is further configured to receive a second selection instruction of the user for a video frame included in the real-time video stream; the obtaining module is specifically further configured to: and determining that the main body of the corresponding position of the second selection instruction in the video frame is a target main body, and determining that the video frame is a historical action frame.
In one possible design, the image segmentation module is specifically configured to: reducing an image area corresponding to a target main body in a historical action frame according to a motion detection technology to obtain a target image area in the historical action frame; and processing the image of the target image area by a deep learning algorithm to obtain a mask image of the target main body corresponding to the historical action frame.
In a possible design, if there are multiple mask images with overlapped subjects in the mask image, the image segmentation module is further specifically configured to: and separating the mask image of the target subject from the mask image overlapped by the subjects according to the depth information of the subjects in the historical action frame.
In one possible design, the mapping module is specifically configured to: obtaining a corresponding relation between the position of at least one object in a historical action frame and the position of the object in a current frame according to an image registration technology or a synchronous positioning and mapping SLAM technology; and determining the reference position of the target main body in the current frame according to the corresponding relation and the position of the target main body in the historical action frame.
In one possible design, the image fusion module is specifically configured to: and respectively carrying out weighted fusion processing on the images of the N target subjects and the pixel information of the image in the current frame at the N reference positions of the current frame.
In a possible design, the image fusion module is further specifically configured to: and adding at least one gray level image to the image of the target subject in the current frame to obtain the target frame, wherein the gray level value of the gray level image is larger if the distance between the gray level image and the image of the target subject in the current frame is shorter.
In a third aspect, an electronic device is provided, which includes: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement any of the possible embodiments of the first aspect and the first aspect as described above.
In a fourth aspect, a computer-readable storage medium is provided, in which instructions that, when executed by a processor of an electronic device, enable the electronic device to perform any one of the possible implementations of the first aspect and the first aspect as described above.
In a fifth aspect, a computer program product is provided, which, when run on a computer, causes the computer to perform any of the possible implementations of the first aspect and the first aspect as described above.
It is understood that any one of the image processing apparatus, the electronic device, the computer readable storage medium and the computer program product provided above can be implemented by the corresponding method provided above, and therefore, the beneficial effects achieved by the method can refer to the beneficial effects in the corresponding method provided above, and are not described herein again.
Drawings
Fig. 1A is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure;
fig. 1B is a software system architecture diagram of an electronic device according to an embodiment of the present application
Fig. 1C is a schematic flowchart of an image processing method according to an embodiment of the present disclosure;
fig. 2 is a schematic interface diagram of special-effect video shooting of an electronic device according to an embodiment of the present disclosure;
fig. 3 is a schematic interface diagram of special-effect video shooting of another electronic device according to an embodiment of the present application;
fig. 4 is a schematic view of a user interaction of a shooting preview interface according to an embodiment of the present application;
fig. 5 is a schematic flowchart of another image processing method according to an embodiment of the present application;
fig. 6 is a schematic diagram of an algorithm for determining a current frame as a key action frame according to an embodiment of the present application;
fig. 7 is a schematic diagram of an image segmentation processing method according to an embodiment of the present application;
fig. 8 is a schematic diagram of a completion mask image according to an embodiment of the present disclosure;
FIG. 9A is a schematic view of a separated overlapping portrait provided by an embodiment of the present application;
FIG. 9B is a schematic view of another embodiment of the present application showing the separation of overlapping human images;
FIG. 10 is a diagram illustrating a multi-frame image mapping according to an embodiment of the present disclosure;
fig. 11 is a schematic flowchart of another image processing method according to an embodiment of the present application;
fig. 12 is a schematic flowchart of another image processing method according to an embodiment of the present application;
fig. 13 is a schematic flowchart of another image processing method according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present embodiment, "a plurality" means two or more unless otherwise specified.
It is noted that the terms "exemplary" or "such as" and the like are used herein to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides an image processing method and device, which can be applied to a scene of video shooting and can generate a special effect video or a special effect image of a motion track of a target shooting object in real time based on a video frame stream shot in real time. The motion track special effect can be used for recording key actions of a target shooting object once occurring on a time axis or positions where the target shooting object once occurs, fusing and displaying target shooting object images in recorded historical key actions in a current frame, and fusing the target shooting object images with a background image, the ground and the like of the current frame. A user can see the special effect video shooting effect in real time when shooting the preview picture in the video shooting process, unique user experience of staggered time and space is formed, and meanwhile, the special effect video can be generated in real time. Therefore, the problem that a motion track special-effect video cannot be generated in real time in the prior art is solved, the interestingness of video shooting is enriched, and the shooting and watching experience of a user is improved.
The image processing method provided in the embodiment of the present application can be applied to an electronic device with shooting capability and image processing capability, where the electronic device can be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, a vehicle-mounted device, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR) \ Virtual Reality (VR) device, and the like.
Fig. 1A shows a schematic structural diagram of an electronic device 100. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identification Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The controller may be, among other things, a neural center and a command center of the electronic device 100. The controller can generate an operation control signal according to the instruction operation code and the time sequence signal to complete the control of instruction fetching and instruction execution.
A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.
In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.
MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the capture functionality of electronic device 100. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the electronic device 100.
The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.
It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only an exemplary illustration, and does not limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.
The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device 100. The electronic device 100 implements display functions via the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, with N being a positive integer greater than 1.
The electronic device 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, and the application processor, etc.
The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP to be converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.
The internal memory 121 may be used to store computer executable program code, including instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, a phonebook, etc.) created during use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.
In the embodiment of the present application, the internal memory 121 may store computer program codes for implementing the steps in the embodiment of the method of the present application. The processor 110 may execute the computer program code stored in the memory 121 for the steps of the method embodiments of the present application. The display screen 194 may be used to display a subject of the camera, a real-time video frame referred to in the embodiment of the present application, and the like.
The software system of the electronic device 100 may employ a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present application takes an Android system with a layered architecture as an example, and exemplarily illustrates a software structure of the electronic device 100.
Fig. 1B is a block diagram of a software structure of the electronic device 100 according to the embodiment of the present application.
The layered architecture divides the software into several layers, each layer having a clear role and division of labor. And the layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library, and a kernel layer from top to bottom.
The application layer may include a series of application packages. As shown in fig. 1B, the application package may include applications such as camera, gallery, calendar, phone call, map, navigation, WLAN, bluetooth, music, video, short message, etc. The embodiment of the application is mainly realized by improving the camera application program of the application program layer, for example, adding plug-in to a camera to expand the function of the camera.
The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.
As shown in FIG. 1B, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like. In the embodiment of the present application, the application framework layer may improve the program of the camera of the application layer, so that when a shooting object shoots, a special effect image or a special effect video of the motion trajectory of the target object may be displayed in the display screen 194, and the special effect image or the special effect video is synthesized by the electronic device background through real-time calculation and processing.
Wherein, the window manager is used for managing the window program. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
The phone manager is used to provide communication functions of the electronic device 100. Such as management of call status (including on, off, etc.).
The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.
The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.
The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.
The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a kernel library of android.
The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application layer and the application framework layer as binary files. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface managers (surface managers), Media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), and the like.
The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.
The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software, and may also be referred to as a driver layer. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
The following describes exemplary workflow of the software and hardware of the electronic device 100 in connection with capturing a photo scene.
When the touch sensor 180K receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into an original input event (including touch coordinates, a time stamp of the touch operation, and other information). The raw input events are stored at the kernel layer. And the application program framework layer acquires the original input event from the kernel layer and identifies the control piece corresponding to the input event. Taking the touch operation as a touch click operation, and taking a control corresponding to the click operation as a control of a camera application icon as an example, the camera application calls an interface of an application framework layer, starts the camera application, further starts a camera drive by calling a kernel layer, and captures a still image or a video through the camera 193.
In the embodiment of the present application, in the process of shooting a video by using an electronic device, when a still image or a video is captured by using the camera 193, the captured image or video may be temporarily stored in the content provider, and when a shooting operation or a video shooting operation is performed, a shot picture or video may be displayed by the view system.
Based on the above hardware and software, embodiments of the present application will be described in detail below with reference to the accompanying drawings. As shown in fig. 1C, the method may include:
s01: the electronic device acquires a current frame and a historical action frame, both of which include a target subject.
First, it should be noted that, in the shooting scene applied in the embodiment of the present application, a user needs to open a camera application of an electronic device to perform video shooting on a target subject, where the target subject is a shooting object of the electronic device and is a target subject that has relative motion with respect to the shooting scene, and the target subject may be, for example, a person, an animal, or a moving device. The motion may specifically refer to a movement, rotation, jumping, limb stretching or a specified action of the target subject position, etc. The camera of the electronic equipment shoots along with the moving target body in real time, so that the technical method provided by the application can process images according to the real-time video stream in the shooting process, generate a special effect video of the motion trail in real time and preview in real time.
The electronic device may obtain a current frame and N historical motion frames according to the obtained real-time video stream, where N may be a positive integer greater than or equal to 1. The real-time video stream refers to an image frame stream acquired by a camera of the electronic device through real-time shooting, and may also be referred to as a video frame stream, and may include a plurality of historical action frames. Depending on the nature of the real-time acquisition of the real-time video stream, the frame currently displayed or currently processed by the electronic device may be referred to as the current frame.
The real-time video stream comprises a plurality of images, the action frame refers to the plurality of images, and when the target body is judged to make key actions like dancing, jumping, turning around or limb stretching, the current frame is recorded as a key action frame, which can be simply called an action frame. The key action frames determined prior to the current frame may all be referred to as historical action frames.
The target subject refers to a photographic subject which has a motion state and is determined as a motion target subject, among one or more photographic subjects photographed by a camera of the electronic device. The target subject can be determined by automatic detection of the electronic device or manually by the user.
Therefore, in one embodiment, before the electronic device acquires the current frame and the at least one historical action frame, the method further comprises: receiving a first selection instruction of a user, wherein the first selection instruction may include an automatic shooting instruction or a manual shooting instruction, and respectively instructs the electronic device to enter an automatic shooting mode or a manual shooting mode.
If the first selection instruction is used for indicating the electronic equipment to enter the automatic shooting mode, the electronic equipment can automatically detect a target shooting object and automatically detect a key action frame to generate a special effect video of a motion track. If the first selection instruction is used for instructing the electronic device to enter the manual shooting mode, the electronic device further receives a second selection instruction of the user, namely, an instruction of the user to manually operate the electronic device, determine the target shooting object, and determine the specified shooting action frame of the target shooting object, namely, the electronic device can receive at least one second selection instruction input by the user. Next, a scenario of the application will be described in detail with reference to the drawings.
In one embodiment, the first selection instruction of the user may include an automatic shooting instruction, and the user may determine to automatically shoot the special effect video by operating the electronic device, that is, to start the automatic shooting mode.
For example, taking the electronic device as a mobile phone as an example, a user may open a camera application of the mobile phone through a touch or click operation, as shown in fig. 2, click a "special effect video shooting" icon, and switch to a shooting interface of the special effect video. The electronic device may pre-configure the default state of special effect video shooting as automatic shooting, or may manually select "automatic shooting" or "manual shooting" by the user, that is, shooting of the special effect video may be started and the target shot image may be viewed in real time on the preview interface.
Furthermore, after the icon of special-effect video shooting is clicked, a section of a typical motion track special-effect video can be displayed through a thumbnail on the preview interface of the electronic equipment for playing, and a user can click to view the section so that the user can be familiar with a shooting operation method, a shooting effect and the like of the special-effect video in advance.
In the automatic shooting mode, the electronic device can automatically detect a target subject according to a real-time shot image and according to technologies such as a moving object detection technology or a frame difference method, and determine at least one key action frame. The specific methods of determining the target subject, determining at least one historical motion frame, and determining an image of the target subject in the historical motion frame will be described in detail below, and will not be described in detail here.
In another embodiment, the first selection instruction of the user may include a manual shooting instruction, and the user may determine to manually shoot the special effect video by operating the electronic device, that is, to start a manual shooting mode, and determine, according to at least one second selection instruction input by the user, at least one target subject and at least one key action frame corresponding to the at least one second selection instruction. Specifically, the electronic device may determine the corresponding target subject according to the corresponding position of the second selection instruction in the video frame, and determine that the video frame is the key action frame.
For example, taking the electronic device as a mobile phone as an example, a user may open a camera application of the mobile phone through a touch or click operation, as shown in fig. 3, click a "special effect video shooting" icon, switch to a shooting interface of a special effect video, and click and select a "manual shooting" option, that is, shooting of the special effect video may be started and a target shooting image may be viewed in real time on a preview interface.
Further, in order to prompt the user to operate the electronic device to determine the target subject and the key action frame, after receiving an operation of "manually shooting" by the user, the electronic device may display a prompt message "please click a selected subject portrait" on the interface to instruct the user to input a second selection instruction. When the user clicks or touches the display area of the electronic device, and selects a target body, the electronic device may continuously display a prompt message on the interface, such as "please click favorite action", prompting the user to continue inputting at least one second selection instruction through a touch operation or a click operation, and further determining a plurality of key action frames.
In the manual shooting mode, a user can determine a certain portrait or object in a preview picture as a target subject according to prompt information or active click in the process of previewing the video frame stream. The user may also click on the preview screen to determine multiple key action frames during the subsequent continuous video frame stream.
In addition, after the user manually determines the target subject, in the subsequent shooting process, when more than one subject appears in the shooting interface, the user can also freely switch to other target subjects. At this time, the electronic device may display a prompt message such as "click-selectable switch body" on the interface. Illustratively, as shown in fig. 4, the user initially determines a portrait a as a target subject, and then clicks a portrait B in the shooting preview interface to select the portrait B as the target subject for subsequently generating a special effect video of the target subject B.
The image of the target subject in the historical action frame (key action frame) refers to an image of a partial region of the image where the target subject is displayed, and specifically refers to an image of a region corresponding to the display target subject, which is obtained by segmentation or matting after certain image segmentation or matting processing is performed on the historical action frame. For example, as shown in fig. 2, an image of a target subject moving in the current frame is detected and determined to be a portrait, in addition to a background image and a still image in the captured picture. Specifically, the image of the target subject in the key action frame may be distinguished through an image segmentation technique.
It should be noted that, scenes of the current frame and the plurality of historical action frames acquired by the electronic device overlap, and the positions of the scenes of the target subject in the plurality of historical action frames are different. That is, there is a portion overlapping with the shooting scene in the current frame in any one of the historical motion frames, where the shooting scene may refer to a shooting object, such as a tree, lawn, or building, etc., that is present around the target subject in the video frame.
Overlapping means that there is a part in any one of the historical motion frames that is the same as the scene in the current frame, and for example, as shown in fig. 4, the same tree in the historical motion frame is also displayed at the same or different position in the current frame shooting scene, and the building in the historical motion frame is also displayed at the same or different position in the current frame shooting scene, where the position of the target body a is in the front left of the tree in the historical motion frame, and the position of the target body a is moved to the front right of the building in the current frame. Therefore, the embodiment of the application can be implemented on the premise that a part overlapping with a scene in a current frame exists in any determined historical action frame, and if the scene of one historical action frame does not have any overlapped scene or object with the current frame, the electronic device cannot obtain an image mapping relationship according to the historical action frame and the current frame, so that multi-frame fusion display cannot be performed.
In summary, after the electronic device receives a shooting start instruction from a user, the electronic device acquires a real-time video stream through a lens, where each frame of video frame included in the real-time video stream may be regarded as a current frame at a corresponding time. Whether the electronic device automatically acquires the key action frame or determines the key action frame according to a method of user instruction acquisition in a manual mode, the key action frame may be referred to as a historical action frame with respect to a current frame corresponding to a time after the determination of the key action frame. Taking a time axis t of real-time shooting as an example shown in fig. 5, the electronic device starts video shooting at a time t0, the electronic device determines a real-time video frame corresponding to the time t1 as a key action frame (a first action frame 01), and then the electronic device determines a real-time video frame corresponding to the time t2 as a key action frame (a second action frame 02), so that for a current frame corresponding to the current time t3, the acquired N historical action frames are the first action frame 01 and the second action frame 02.
S02: and the electronic equipment performs image segmentation on the historical action frame to obtain an image of the target main body corresponding to the historical action frame.
In the shooting process, when the electronic device acquires each historical motion frame, in order to obtain an image of a target subject in each historical motion frame according to the historical motion frame, the electronic device may perform image segmentation on the historical motion frames one by one, and determine a target subject image in the historical motion frame, which may be specifically a mask image. Thus, the electronic device can record the N historical motion frames included in the real-time video stream and the images of the N target subjects corresponding to the N historical motion frames one by one.
Image segmentation is a technique and process of dividing an original image into a plurality of regions with specific or unique properties and extracting a target object of interest. Image segmentation is a key step from image processing to image recognition and analysis. Specifically, the image segmentation process based on the portrait in the original image may also be referred to as a portrait segmentation technique, and the portrait portion in the original image may be extracted.
The mask image is used to mark a specific target area in the image by using different mask (mask) values, for example, to mark an image area of a target subject by using a mask value different from that of the background image, so as to separate the image area of the target subject from other background image areas. For example, in a common mask image, the mask value of the pixel point in the target subject image region may be set to 255, and the mask values of the pixel points in the remaining regions may be set to 0. So that the image of the target subject in the historical action frame can be separated according to the mask image.
For example, the target image region of each historical motion frame may be processed by a deep learning algorithm to obtain a mask image of a target subject corresponding to each historical motion frame, for example, by a neural network algorithm or a support vector machine algorithm, and the application does not specifically limit the algorithm for implementing image segmentation.
S03: the electronic equipment determines a reference position in the current frame according to the position of the target main body in the scene of the historical action frame and the scene of the current frame.
The electronic device can map the reference positions of the N target bodies in the current frame respectively according to the positions of the N target bodies in the scenes of the N historical action frames and in combination with the scene of the current frame.
Specifically, the electronic device may obtain an image mapping relationship between each historical action frame and the current frame according to the position of the background image in each historical action frame and the position of the background image in the current frame, so that the electronic device may obtain a relative position of the image of the target subject in the target frame according to the image position of the target subject in the historical action frame in combination with the mapping relationship, and perform fusion processing on the image of the target subject in the current frame according to the determined relative position. Wherein the relative position is used to indicate the position of the image of the target subject in the target frame at the image of the target subject in the historical action frame.
S04: and the electronic equipment respectively fuses the images of the target main body on the reference positions of the current frame to obtain the target frame.
After the electronic device determines at least one historical motion frame, the images of the plurality of target subjects obtained in S02 may be rendered into the current frame by an image fusion technique, and fused to generate the target frame.
Illustratively, as shown in fig. 5, a first action frame 01 and a second action frame 02 in the real-time video frame stream are determined, and each frame image displayed in real time after the first action frame 01 is fused with the image of the first target subject in the first action frame 01 for display. Taking the second motion frame 02 as an example, it is displayed as shown in fig. 5 through fusion, that is, including the image (1) of the first target subject in the first motion frame 01 and the entire image in the second motion frame 02. While the current frame after the N-th action frame 0N is determined to be displayed through fusion as in fig. 5, i.e., including the image (1) of the first target subject in the first action frame 01, the image (2) … … of the second target subject in the second action frame 02, the image (N) of the N-th target subject in the N-th action frame 0N in the figure. When N is 5, it means that the image (1) of the first target subject corresponding to the first action frame 01, the image (2) … … of the second target subject corresponding to the second action frame 02, and the image (5) of the 5 th target subject corresponding to the 5 th action frame 05 are displayed in a fused manner at the corresponding reference positions in the current frame. The specific multi-frame image fusion process, i.e., the algorithm, will be described in detail below, and will not be described herein again.
Further, after the shooting of the special effect video is finished, the electronic device may store the generated special effect video in the gallery. In order to distinguish from a common video, a specific mark can be displayed at one corner of a thumbnail of the special effect video, for example, four characters of a 'motion track' are superimposed on a play button of the special effect video, so that a special effect video file of the motion track is distinguished from a common video file, and the special effect video file is convenient for a user to view.
According to the embodiment of the application, at least one key action frame is automatically detected or manually determined in a real-time video frame stream, and the image of at least one target subject in the at least one key action frame is displayed in the current frame simultaneously through a multi-frame fusion display method, so that a special effect image or video of the motion track of the target subject can be generated in real time. Meanwhile, the currently generated target image can be transmitted to a shooting preview picture of the mobile phone and a video generation stream in real time, so that a user can preview the effect of the motion track in real time on line, the complete motion track special-effect video can be viewed after shooting is finished, and the shooting experience of the user is enriched.
In one embodiment, in the step S01, if the first selection command of the user includes an automatic shooting command, that is, the electronic device is instructed to enter the automatic shooting mode, the electronic device can automatically detect a target body of motion according to an algorithm and automatically detect at least one historical motion frame (key motion frame).
First, the electronic device may determine a target subject for a video frame in a real-time video stream according to a motion detection technique. The motion detection of the target subject can be determined by portrait recognition or other target recognition technologies, and can automatically detect moving objects in real-time video frames, such as people, animals, sports devices, vehicles or soccer balls. Since the main application scene of the present application is special effect shooting of a motion trajectory of a person, the embodiment is described by taking portrait recognition and detection as an example.
Specifically, the electronic device determines a target subject in the real-time video frame, and may obtain a mask image of the target subject by performing image segmentation on the image, such as portrait segmentation or example segmentation. If the obtained mask image only has one portrait mask, determining the portrait mask as a target main body; if the mask images are obtained by segmentation, the electronic equipment can determine that the mask area is the largest as a target main body; if the portrait mask is not obtained, the electronic device may prompt the user that the portrait is not detected by displaying prompt information on the preview interface, and ask the user to move the camera to be close to the photographed person.
Then, the electronic device may detect a position of the target subject in each video frame included in the real-time video stream, and obtain a scene position change of the target subject between the multiple frames. The scene position change of the target subject may be a position change of the target subject with respect to the shooting scene, or a change in a limb posture, a limb angle, or a limb position of the target subject.
After the electronic device determines the target subject, it determines which frames are key action frames one by one during the continuous shooting. The electronic device may determine the key action frame in the real-time video frame by using a frame difference method, where the frame difference method is to obtain information such as scene position change between adjacent video frames by comparing pixel point positions in the adjacent video frames. That is, the electronic device may determine the video frame as the key action frame by detecting a video frame in which a position change of a scene in the video frames included in the real-time video stream by the target subject satisfies a preset threshold.
The electronic device may determine the first frame image successfully segmented out the target subject as the first key action frame, because the first key action frame is not preceded by the reference frame. Or, to ensure the time delay of the image processing algorithm, the electronic device may determine a third frame or a fourth frame after the first frame of image of the target subject is successfully segmented as the first key action frame.
The second and subsequent key action frames may be determined by comparison with the previous key action frame. Specifically, the electronic device may determine that an image of a target subject in the real-time video frame simultaneously satisfies the following two conditions as a key action frame:
the first condition is as follows: the image position area of the target subject in the current frame is not overlapped with the position area mapped to the current frame by the image of the target subject in the previous key action frame.
And a second condition: the change between the image of the target subject in the current frame and the image of the target subject in the previous key action frame meets a preset threshold value.
That is, the electronic device may automatically determine, as the historical action frame, a video frame in which an image change of a target subject of a current frame in the real-time video frame satisfies a preset threshold and an image of the target subject of a previous key action frame does not coincide with the image of the target subject of the current frame, through motion detection.
When the detection determines that the image change of the target subject in the current video frame meets a preset threshold, the target subject is determined to be a key action frame (historical action frame). For example, when the detection determines that the image change of the target subject in the current video frame is greater than or equal to a preset threshold, determining that the current video frame is a key action frame; and when the detection determines that the image change of the target subject in the current video frame is smaller than a preset threshold value, determining that the current video frame is not the key action frame.
For example, whether the change between the target subject image in the current frame and the target subject image in the previous key action frame meets a preset threshold may be determined through a centroid overlapping algorithm. The specific algorithm is as follows:
the electronic equipment calculates the gravity center coordinate of the target main body mask image of the previous key action frame and the gravity center coordinate of the target main body mask image of the current frame, and calculates the non-overlapping area of the target main body mask image of the current frame and the target main body mask image of the previous key action frame after the gravity centers of the two images are overlapped. And when the area of the non-overlapping area exceeds a preset threshold value, determining the current frame as a key action frame, otherwise, determining that the current frame is not the key action frame. The preset threshold may be configured as a certain proportion, for example, 30% of the area of the merged two target subject mask images.
It should be noted that the setting of the preset threshold may be preset by a person skilled in the art according to the image detection precision and in combination with the requirement and technical experience of the special video, and this is not specifically limited in this application.
The formula for calculating the barycentric coordinates is as follows (the barycentric coordinates can be rounded):
the specific calculation method of the gravity center coincidence can be as follows: if the barycentric coordinates of the target main body of the current frame are equal to the barycentric coordinates of the target main body of the previous key action frame after adding the coordinate offset (delta x, delta y), the coordinates of all pixel points in the target main body region of the current frame are added with the coordinate offset (delta x, delta y) to obtain a new coordinate set of the target main body region of the current frame, and then the number of the pixel points of which the coordinates of the target main body region coordinate set in the previous key action frame are not equal to the coordinates of the target main body region coordinate set in the new current frame is judged. See the following formula for a specific calculation.
The new set of coordinates for the current frame target subject region is:
new coordinates (x ', y') are original coordinates (x, y) + (Δ x, Δ y),
wherein (Δ x, Δ y) ═ barycentric coordinates (x)0,y0)Previous key action frame-barycentric coordinates (x)0,y0)Current frame。
After the centers of gravity are overlapped, calculating the proportion of the non-overlapping area of the current frame target main body mask image and the previous key action frame target main body mask image, namely calculating the area of the non-overlapping area of the current frame target main body mask image and the previous key action frame target main body mask image, and taking the area of the union set of the two target main body mask images relatively. The non-overlapping area proportion calculation formula is as follows:
wherein the target body regionPrevious key action frameReverse target subject areaCurrent frameThe intersection of the region representing the target subject in the previous key action frame and the region representing the target subject in the current frame, the target subject regionPrevious key action frameU-shaped target main body regionCurrent frameRepresents the union of the region of the target subject in the previous key action frame and the region of the target subject in the current frame.
As shown in fig. 6, if the target main body area in the current key action frame overlaps with the target main body area in the current frame 1, the condition one is not satisfied, and the current frame 1 is not a key action frame. And after the gravity center of the target subject in the current key action frame is superposed with the gravity center of the target subject in the current frame 2, if the proportion of the non-overlapped area does not meet the preset threshold value, the condition II is not met, and the current frame 2 is not the key action frame. And after the target body area in the current key action frame is not overlapped with the target body area in the current frame 3, and the target body gravity center in the previous key action frame is overlapped with the target body gravity center in the current frame 3, and the proportion of the non-overlapped area exceeds a preset threshold value, the current frame 3 simultaneously meets the first condition and the second condition, and the current frame 3 is determined to be the key action frame.
In the above embodiment, through the above algorithm, the electronic device can automatically detect the target moving object in the video in real time and automatically detect and determine the key action frame, so that the special effect video of the motion track can be generated in real time according to the target subject in the recorded key action frame, the interest and flexibility of video shooting are increased, and the shooting experience of the user is improved.
In one embodiment, before performing image segmentation on the historical motion frame, a moving target subject may be identified by a motion detection technique, and then an image region of the corresponding target subject in the historical motion frame is reduced, that is, only a partial image region of the moving subject of interest in the historical motion frame is captured to perform the image segmentation algorithm. Therefore, the image area for image segmentation processing is reduced, the image segmentation precision is improved, and the data processing complexity of the image segmentation algorithm is simplified.
The motion detection technique may be implemented by a frame difference method, a background difference method, an optical flow method, or the like. For example, the frame difference method is to perform a difference on every two adjacent three frames of images, and then obtain a difference image of the adjacent frames through two difference images, so as to approximately detect a moving object in the image.
Illustratively, as shown in fig. 7, an image region of interest, such as the portrait region in fig. 7, may be first reduced by motion detection. And then, the portrait is divided according to the roughly obtained portrait area to obtain a mask image of the target subject.
Through the implementation mode, the mask image of the target main body in the historical action frame can be obtained through separation according to the historical action frame, the mask image of the target main body can be accurately separated, motion tracking and recording of the target main body are achieved, therefore, multi-frame image fusion is carried out on the current frame according to the mask image of at least one target main body, a special effect video of a motion track is generated, and shooting experience of a user is improved.
In the above-described embodiment, in the process of image segmentation of the key motion frame, the mask image of the segmented target subject may be incomplete or missing, as shown in fig. 7. In order to obtain a complete mask image of the target subject, the mask image of the target subject may be supplemented in combination with motion detection.
The specific processing procedure for completing the target subject mask image may be: after a moving target main body in the key action frame is detected, separating an image area of the target main body in the key action frame image by selecting a proper threshold value; and repairing the mask image of the segmented target main body by using the image area of the target main body, thereby obtaining the complete mask image of the target main body. Illustratively, as shown in fig. 8, a mask image a of a target portrait is obtained according to portrait segmentation, and the mask image a is complemented according to the target portrait in an adjacent frame to obtain a mask image B.
In one embodiment, the object captured by the real-time video frame may include more than one moving subject, and a plurality of target objects may overlap with the image of the target subject, for example, the target subject is a portrait 1, and in the key action frame, there is a case where the portrait 1 and the portrait 2 are partially overlapped or mutually occluded. Therefore, the electronic device needs to separate the mask image of the target subject from the mask image overlapped by the plurality of subjects, and continuously and automatically perform the tracking record on the same target subject. Specifically, the overlapped target photographic subjects can be divided in the following manner.
In the first method, a mask image in which a plurality of subjects overlap is divided from a depth map.
The mask image of the target subject can be obtained by the electronic device according to the mask image overlapped by the plurality of subjects in the historical motion frame and the depth information corresponding to the plurality of subjects by combining the depth map corresponding to the two-dimensional image. That is, the electronic device may separate the mask image of the target subject from the mask images in which the plurality of subjects overlap, based on the depth information of the plurality of subjects and the depth information of the target subject in the historical motion frame.
The depth map is an image or an image channel containing information on the distance between the shooting point and the surface of the target shooting object. The depth map is similar to a grayscale image except that each pixel value of the depth map reflects the actual distance of the shot point from the target photographic object. Usually, the RGB image and the depth map are registered, so that there is a one-to-one correspondence between pixel points of the RGB image and pixel points of the depth map.
The depth map may be obtained specifically according to a Time of flight (ToF) -based ranging camera, or the depth map may be obtained by calculating the original two-dimensional image through an artificial neural network algorithm to obtain a depth value corresponding to each pixel point, and the depth map of the original two-dimensional image is obtained by reducing the depth map.
By processing the depth map, a plurality of different target photographic subjects can be distinguished. For example, as shown in fig. 9A, the electronic device needs to distinguish multiple overlapped human images into human images of a target subject, and may make one-to-one correspondence between pixel points of an obtained depth map and pixel points of a current key action frame, and count an average value or a median value of depth values of pixel points in a mask region of the corresponding target subject human image in the depth map. The electronic equipment processes the depth map according to the average value or the median value of the depth values of the target main body portrait, extracts the depth value range covered by the main body portrait in the depth map, and then intersects the depth value range with the corresponding portrait mask, so that the portrait mask of the target main body is separated from a plurality of overlapped portrait masks. And ensuring that the separated portrait mask of the target subject is always a single portrait.
And secondly, example segmentation of the overlapped target shooting objects.
The instance refers to an object, and the object represents a specific instance in a class of shooting objects.
Example segmentation means that different examples need to be distinguished on the basis of specific categories on the basis of dividing each pixel in an image into corresponding categories, namely realizing pixel-level classification. For example, a person and a background object are demarcated according to each pixel in an image. Distinguishing different persons from multiple persons, e.g. a, b and c, is to perform instance segmentation.
Specifically, the electronic device may perform instance segmentation through a deep learning algorithm. Referring to fig. 9B, in the example segmentation mask, mask values of different faces are different, and the face mask region of the target subject can be directly separated.
It should be noted that, besides the above-mentioned technology for separating the multi-person overlapping masks, the existing methods of binocular visual depth, monocular depth estimation, structured light depth, etc. may also be used for separating the multi-person overlapping masks, and this application is not described herein again.
Through the embodiment, the electronic equipment can separate the target main body mask from a plurality of overlapped target shooting objects, so that the target main bodies of different frames are accurately tracked and recorded, and the special-effect video of the motion track of the specific target main body is generated.
In an embodiment, in step S03, the electronic device determines the reference position in the current frame according to the position of the target subject in the scene of each historical motion frame and the scene of the current frame, which may specifically include:
the electronic equipment can obtain the corresponding relation between the position of at least one object in each historical action frame and the position of the object in the current frame according to an image registration technology or a synchronous positioning and mapping technology; and then obtaining an image position area corresponding to each target main body in the current frame, namely a reference position according to the obtained corresponding relation and the corresponding relation between the image position of each target main body in each historical action frame and the determined corresponding relation. Therefore, the electronic equipment can draw the image of each target main body corresponding to each historical action frame to each corresponding reference position in the current frame, and the target frame can be obtained.
Illustratively, this will be described below in conjunction with fig. 5, taking the example that the historical action frames include a first action frame 01 and a second action frame 02.
As shown in fig. 5, if the recorded historical action frames include the first action frame 01, and the target subject corresponding to the first action frame 01 is the first target subject. Then, each subsequent frame of image draws the image of the first target subject into the current frame 03 according to the mapping relationship between the position of the at least one object in the first action frame 01 and the position of the at least one object in the current frame.
As shown in fig. 5, if the recorded historical motion frame further includes a second motion frame 02, and the target subject corresponding to the second motion frame 02 is a second target subject, when it is determined that each image of the subsequent frame after the second motion frame 02 is the second target subject, the image of the first target subject and the image of the second target subject are drawn into the current frame 03 according to the mapping relationship between the position of the at least one object in the first motion frame 01 and the position of the at least one object in the current frame 03, and the mapping relationship between the position of the at least one object in the second motion frame 02 and the position of the at least one object in the current frame 03.
The drawing refers to a process of generating a two-dimensional image by a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU) of the electronic device according to a drawing instruction, pixel point information, and the like. After the electronic equipment finishes drawing the image, the target image can be displayed on a display screen of the electronic equipment through the display device.
According to the embodiment described above, the electronic device performs the fusion rendering processing on the determined key motion frames one by one, and performs real-time display, that is, the generated motion trajectory special effect video can be previewed online, and a final motion trajectory special effect video is generated.
In the above embodiment, all the historical motion frames recorded during the real-time video frame stream need to be mapped to the corresponding positions of the current frame, And a specific Mapping method that can be adopted is an image registration technique or a Simultaneous Localization And Mapping (SLAM) technique. Therefore, the electronic device may draw the image of the target subject in each historical action frame into the current frame according to the image mapping relationship between at least one historical action frame and the current frame, and specifically, may generate the target image by the following processing.
Step 1: and obtaining the corresponding relation between the image position of at least one object in each historical action frame and the image position of at least one object in the current frame according to an image registration technology or an SLAM technology.
The image registration is a process of matching, mapping or superimposing a plurality of images acquired at different times and under different imaging devices or under different conditions (such as weather, brightness, camera positions or angles and the like), and can be widely applied to the fields of data analysis, computer vision, image processing and the like.
As shown in fig. 10, the electronic device may obtain a corresponding relationship between the position of the object in the first action frame and the position of the object in the current frame according to the position of at least one object in the first action frame and the position of the same object in the current frame, which may also be referred to as a mapping relationship. The electronic device may obtain the reference position of the target subject in the current frame again according to the position of the target subject in the first action frame and by combining the corresponding relationship of the positions, and the position indicated by the dotted line in fig. 10 may be the reference position.
When the image registration technology is adopted, features in the historical action frame need to be extracted, for example, the features can be Semantic Kernel Binary (SKB) features. And finally mapping the historical key action frame to a corresponding position in the current frame according to the obtained homography matrix. The SKB feature is a descriptor of the image feature. Image registration techniques may enable mapping matching between two-dimensional images.
The SLAM technique is a technique that allows a device to move while gradually depicting three-dimensional positional information of the surrounding environment. Specifically, the device starts from an unknown place of an unknown environment, positions the position and the posture of the device by repeatedly observing map features (such as a wall corner, a column and the like) in the movement process, and constructs a map incrementally according to the position of the device, so that the purposes of synchronous positioning and map construction are achieved.
When the SLAM technology is adopted, the three-dimensional position information of the object in the historical action frame needs to be obtained through calculation by a SLAM module in the electronic equipment, and the historical action frame is mapped to the corresponding position in the current frame according to the three-dimensional position information of the object.
Since the SLAM technique performs position mapping based on three-dimensional position information, the three-dimensional position information is applicable to three-dimensional motion between frames. Therefore, when the motion trajectory of the target subject photographed by the electronic device involves a three-dimensional motion, the mapping may be performed using the SLAM technique.
Step 2: and obtaining the reference position of each target main body in the current frame according to the image position and the corresponding relation of each target main body in each historical action frame.
I.e. maps the image of each target subject in each historical action frame to the corresponding image location area in the current frame.
Step 3: and drawing an image of each target subject in each historical action frame to a corresponding reference position of each target subject in the current frame.
And drawing the image of each target subject to the corresponding reference position in the current frame according to the reference position of the image of each target subject in the current frame obtained by mapping, thereby obtaining a fused image of the multi-frame images, and updating and displaying the fused image as the current frame.
Illustratively, as shown in fig. 5, a first target subject in a first action frame 01 is mapped to a corresponding reference position in a second action frame 02 and is drawn into the second action frame 02; and mapping the first target body in the first action frame 01 to a corresponding reference position in the current frame and drawing the first target body in the current frame, mapping the second target body in the second action frame 02 to a corresponding reference position in the current frame and drawing the second target body in the current frame, and updating the current frame.
In the embodiment, the image registration technology or the SLAM technology is used for mapping the multiple frames of images, so that the fusion display of the target main body image in the multiple frames of images is completed, the motion track of the target main body can be accurately and naturally displayed at the corresponding position in the same frame of image, the time-staggered and space-staggered motion track special-effect video is formed, and the shooting experience of a user is enriched.
In one embodiment, after mapping all the historical motion frames to the corresponding positions of the current frame by using an image registration technique or a SLAM technique, and mapping the mask image of the target subject in each historical motion frame to the corresponding positions of the current frame by combining the mask image of the target subject in each historical motion frame, in order to make the display transition of the added image of the target subject and the background image of the current frame more natural, the method may further include: and performing edge fusion processing on the image of the target subject of each history action frame in the target image, and updating the target image to enable the image of the target subject and the background image to be in natural transition.
The fusion processing of the multi-frame images is to fuse and display images (images of target subjects in historical action frames) which are not in the current frame into the current frame; therefore, it is necessary to further perform weighted fusion processing on the images of the N target subjects and the pixel information of the image in the current frame at the N reference positions of the current frame, so that the image of the target subject added by fusion and the image before the current frame are displayed naturally, and the boundary transition is more real.
Illustratively, the weighted fusion technique employed may be alpha fusion. The specific processing procedure may be to adjust the mask value from the original vertical transition of 255-0 to the gentle transition of 255-0 according to the edge mask value 255 of the target subject image and the edge mask value 0 of the background image, for example, the transition mask value may be adjusted by a linear or nonlinear function. And then weighting and superposing the image of the target subject and the background image by taking the adjusted mask value of the smooth transition as weight. Optionally, the boundary line may also be weakened by processing the edge region by using a gaussian filtering method. The gaussian filtering is a nonlinear smooth filtering method that selects a weight value according to the shape of a gaussian function.
In addition to the alpha fusion technique, image fusion techniques such as Poisson fusion (Poisson Blending) technique and Laplacian fusion (Laplacian Blending) technique may also be used in the above embodiments, and the present application does not limit the specific image fusion techniques.
In an embodiment, after the images of the multiple frames of key action frames are fused and displayed to obtain the target image, in order to more intuitively display the motion trajectory of the target subject in the current frame, the method may further include: and superposing at least one shading image on the image of the target main body in the current frame. The shading image is generated from an image of the target subject several consecutive frames before the current frame.
Specifically, at least one of the shading images can be represented by a gray scale image, wherein the gray scale value of each shading image may be the same or different.
For example, as shown in fig. 11, at least one of the afterimage may be superimposed behind the second target subject image in the second action frame 02, and a plurality of afterimages may be superimposed behind the moving direction of the target subject in the current frame 03. The farther the image of the target subject in the current frame 03 is from the image of the target subject, the weaker the intensity of the image of the target subject can be; the closer the image to be kept is to the image of the target subject in the current frame 03, the stronger the intensity of the image to be kept can be. The image to be kept in shadow may gradually decrease in intensity up to 0 as it becomes farther from the target subject image in the current frame 03.
The number of the image to be shaded is not limited, and the number can be set by a person skilled in the art according to design requirements.
When the image to be kept shadow is represented by a gray image, wherein the closer the distance between at least one gray image and the image of the target subject in the current frame is, the larger the gray value of the gray image is; the farther the distance between at least one grayscale image and the image of the target subject in the current frame, the smaller the grayscale value of the grayscale image.
According to the embodiment, the motion direction and the track of the target main body can be more intuitively represented by superposing the plurality of image after the motion direction of the target main body in the current frame, so that the interestingness and the intuitiveness of a special-effect video are increased, and the shooting experience of a user is further improved.
According to any of the embodiments described above, after the image of the target subject in all the recorded historical motion frames is mapped into the image of the current frame in real time, the video frame stream is continuously updated, and the image output by the current frame is displayed on the video capture preview screen of the electronic device. As shown in fig. 12, after the user starts shooting the special effect video, the user can see the shooting effect of the special effect video in real time in the video shooting preview screen of the electronic device. In addition, the video frames generated in real time can be output to the final video generation stream, and the generated complete motion track special effect video can be watched after the user finishes shooting.
With reference to any one of the foregoing possible implementation manners, as shown in fig. 13, a detailed implementation flow for generating a motion trajectory special effect video is provided in this application embodiment. The process mainly comprises the following steps: 1. shooting preview interface interaction, and determining a target main body and a key action frame; 2. obtaining an image of a target main body by image segmentation; 3. mapping the key action frame to the current frame, and drawing an image of a target subject in the key action frame to the current frame; 4. a stream of video frames is generated for online preview and real-time.
In the processing flow shown in fig. 13, not all of the processing flows, nor all of the processing flows are indispensable processing flows, and those skilled in the art may adjust and set the detailed processing procedures and sequences according to design requirements. Meanwhile, the technical scheme of the application is not only suitable for generating the special effect video of the motion trail, but also suitable for rapidly developing other similar special effect videos, such as multi-person image special effect synthesis or growth special effect and the like, and the application is not particularly limited to this.
An embodiment of the present application further provides an image processing apparatus, as shown in fig. 14, the apparatus 1400 may include: an acquisition module 1401, an image segmentation module 1402, a mapping module 1403, and an image fusion module 1404.
An obtaining module 1401, configured to obtain a current frame and N historical action frames, where the current frame and the N historical action frames both include a target main body, scenes of the current frame and the N historical action frames overlap, the target main body has different positions in the scenes of the N historical action frames, and N is a positive integer greater than or equal to 1.
An image segmentation module 1402, configured to perform image segmentation on the N historical motion frames to obtain images of N target subjects corresponding to the N historical motion frames, respectively.
A mapping module 1403, configured to determine N reference positions in the current frame according to the positions of the N target subjects in the scenes of the N historical motion frames and the scene of the current frame, respectively
An image fusion module 1404, configured to fuse the images of the N target subjects on the N reference positions of the current frame, respectively, to obtain a target frame.
In one possible embodiment, the device may further include: the receiving module is used for receiving a first selection instruction of a user, and the first selection instruction is used for indicating to enter an automatic shooting mode or a manual shooting mode.
In a possible design, if the first selection instruction is used to instruct entry into the automatic shooting mode, the obtaining module 1401 is specifically configured to: carrying out motion detection on the real-time video stream to determine a target subject; detecting a position of a target subject in a scene in each video frame included in the real-time video stream; and determining the video frames of which the position change of the scene in the video frames included in the real-time video stream meets the preset threshold value as historical action frames.
In a possible design manner, if the first selection instruction is used to instruct to enter the manual shooting mode, the receiving module is further configured to receive a second selection instruction of the user for a video frame included in the real-time video stream; the obtaining module 1401 is further specifically configured to: and determining a main body of a corresponding position of the second selection instruction in the video frame as a target main body, and determining the video frame as a historical action frame.
In one possible design, the image segmentation module 1402 is specifically configured to: reducing an image area corresponding to a target subject in the historical action frame according to a motion detection technology to obtain a target image area in the historical action frame; and processing the image of the target image area through a depth learning algorithm to obtain a mask image of the target main body corresponding to the historical action frame.
In a possible design, if there are multiple mask images with overlapped subjects in the mask images, the image segmentation module 1402 is further specifically configured to: and separating the mask image of the target subject from the mask images overlapped by the subjects according to the depth information of the subjects in the historical action frame.
In one possible design, the mapping module 1403 is specifically configured to: obtaining the corresponding relation between the position of at least one object in a historical action frame and the position of the object in a current frame according to an image registration technology or a synchronous positioning and mapping SLAM technology; and determining the reference position of the target main body in the current frame according to the corresponding relation and the position of the target main body in the historical action frame.
In one possible design, the image fusion module 1404 is specifically configured to: and respectively carrying out weighted fusion processing on the images of the N target subjects and the pixel information of the image in the current frame at the N reference positions of the current frame.
In one possible design, the image fusion module 1404 is further specifically configured to: and adding at least one gray level image to the image of the target subject in the current frame to obtain the target frame, wherein the gray level value of the gray level image is larger if the distance between the gray level image and the image of the target subject in the current frame is shorter.
In addition, the specific implementation process and embodiment of the apparatus 1400 may refer to the steps executed by the electronic device in the foregoing method embodiment and the related description, and the technical problem to be solved and the technical effect brought about may also refer to the contents described in the foregoing embodiment, which are not described herein again.
In the present embodiment, the test apparatus is presented in a form in which the respective functional modules are divided in an integrated manner. A "module" herein may refer to a specific circuit, a processor and memory that execute one or more software or firmware programs, an integrated logic circuit, and/or other devices that may provide the described functionality. In a simple embodiment, one skilled in the art will appreciate that the device may take the form shown in FIG. 15 below.
Fig. 15 is a schematic structural diagram of an electronic device 1500 according to an exemplary embodiment, where the electronic device 1500 may be used to generate a motion trajectory special effect video of a photographic subject according to the foregoing embodiments. As shown in fig. 15, the electronic device 1500 may include at least one processor 1501, communication lines 1502, and memory 1503.
Communication link 1502 may include a path, such as a bus, for communicating information between the aforementioned components.
Alternatively, the computer executed instructions in the embodiments of the present disclosure may also be referred to as application program codes, which are not specifically limited in the embodiments of the present disclosure.
In particular implementations, processor 1501 may include one or more CPUs such as CPU0 and CPU1 of fig. 15, for example, as an example.
In particular implementations, electronic device 1500 may include multiple processors, such as processor 1501 and processor 1507 in fig. 15, for example, as an example. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In particular implementations, electronic device 1500 may also include communications interface 1504, as one embodiment. The communication interface 1504 may be any device, such as a transceiver, for communicating with other devices or communication networks, such as an ethernet interface, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.
In particular implementations, electronic device 1500 may also include an output device 1505 and an input device 15015, as one embodiment. Output device 1505 is in communication with processor 1501 and may display information in a variety of ways. For example, the output device 1505 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. An input device 1506 communicates with the processor 1501 and may receive user input in a variety of ways. For example, the input device 1506 may be a mouse, a keyboard, a touch screen device or a sensing device, etc.
In a specific implementation, the electronic device 1500 may be a desktop, a laptop, a web server, a Personal Digital Assistant (PDA), a mobile phone, a tablet, a wireless terminal device, an embedded device, or a device with a similar structure as in fig. 15. The disclosed embodiments do not limit the type of electronic device 1500.
In some embodiments, the processor 1501 in fig. 15 may cause the electronic device 1500 to perform the methods in the above-described method embodiments by calling the computer-executable instructions stored in the memory 1503.
Illustratively, the functions/implementation processes of the acquisition module 1401, the image segmentation module 1402, the mapping module 1403, and the image fusion module 1404 in fig. 14 may be implemented by the processor 1501 in fig. 15 calling computer-executable instructions stored in the memory 1503.
In an exemplary embodiment, there is also provided a storage medium comprising instructions, such as the memory 1503 comprising instructions, which are executable by the processor 1501 of the electronic device 1500 to perform the above-described method.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented using a software program, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
Finally, it should be noted that: the above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (21)
1. An image processing method, characterized in that the method comprises:
acquiring a current frame and N historical action frames, wherein the current frame and the N historical action frames both comprise a target main body, scenes of the current frame and the N historical action frames are overlapped, the positions of the target main body in the scenes of the N historical action frames are different, and N is a positive integer greater than or equal to 1;
performing image segmentation on the N historical action frames to obtain images of N target bodies corresponding to the N historical action frames respectively;
determining N reference positions in the current frame according to the positions of the N target main bodies in the scenes of the N historical action frames and the scene of the current frame;
and respectively fusing the images of the N target subjects on the N reference positions of the current frame to obtain a target frame.
2. The method of claim 1, wherein prior to obtaining the current frame and the N historical action frames, the method further comprises:
receiving a first selection instruction of a user, wherein the first selection instruction is used for indicating to enter an automatic shooting mode or a manual shooting mode.
3. The method according to claim 2, wherein if the first selection instruction is used to instruct entry into the automatic shooting mode, acquiring the historical motion frame specifically includes:
performing motion detection on a real-time video stream to determine the target subject;
detecting a position of the target subject in a scene in each video frame included in the real-time video stream;
and determining the video frames of which the position change of the target subject in the video frames included in the real-time video stream meets a preset threshold value as the historical action frames.
4. The method according to claim 2, wherein if the first selection instruction is used to indicate that the manual shooting mode is entered, acquiring the historical motion frame specifically includes:
receiving a second selection instruction of the user for the video frames included in the real-time video stream;
and determining a subject of the corresponding position of the second selection instruction in the video frame as the target subject, and determining the video frame as the historical action frame.
5. The method according to any one of claims 1 to 4, wherein performing image segmentation on the historical motion frame to obtain an image of a target subject corresponding to the historical motion frame specifically includes:
reducing an image area corresponding to a target subject in the historical action frame according to a motion detection technology to obtain a target image area in the historical action frame;
and processing the image of the target image area through a deep learning algorithm to obtain a mask image of the target main body corresponding to the historical action frame.
6. The method of claim 5, wherein if there are multiple mask images in which the subject overlaps in the mask image, the method further comprises:
and separating the mask image of the target subject from the mask image overlapped by the subjects according to the depth information of the subjects in the historical action frame.
7. The method according to any one of claims 1 to 6, wherein determining a reference position in the current frame according to the position of the target subject in the scene of the historical action frame and the scene of the current frame specifically includes:
obtaining the corresponding relation between the position of at least one object in the historical action frame and the position of the object in the current frame according to an image registration technology or a synchronous positioning and mapping SLAM technology;
and determining the reference position of the target main body in the current frame according to the corresponding relation and the position of the target main body in the historical action frame.
8. The method according to any one of claims 1 to 7, wherein the fusing the images of the N target subjects on the N reference positions of the current frame respectively comprises:
and respectively carrying out weighted fusion processing on the images of the N target main bodies and the pixel information of the image in the current frame at the N reference positions of the current frame.
9. The method according to any one of claims 1-8, wherein after fusing the images of the N target subjects on the N reference positions of the current frame, respectively, the method further comprises:
and adding at least one gray level image to the image of the target main body in the current frame to obtain the target frame, wherein the gray level value of the gray level image is larger if the distance between the gray level image and the image of the target main body in the current frame is shorter.
10. An image processing apparatus, characterized in that the apparatus comprises:
the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a current frame and N historical action frames, the current frame and the N historical action frames both comprise a target main body, scenes of the current frame and the N historical action frames are overlapped, the positions of the target main body in the scenes of the N historical action frames are different, and N is a positive integer greater than or equal to 1;
the image segmentation module is used for carrying out image segmentation on the N historical action frames to obtain images of N target bodies corresponding to the N historical action frames respectively;
a mapping module, configured to determine N reference positions in the current frame according to positions of the N target subjects in scenes of the N historical action frames and a scene of the current frame, respectively;
and the image fusion module is used for respectively fusing the images of the N target subjects on the N reference positions of the current frame to obtain a target frame.
11. The apparatus of claim 10, further comprising:
the device comprises a receiving module and a control module, wherein the receiving module is used for receiving a first selection instruction of a user, and the first selection instruction is used for indicating to enter an automatic shooting mode or a manual shooting mode.
12. The apparatus according to claim 11, wherein if the first selection instruction is used to instruct entry into the automatic shooting mode, the obtaining module is specifically configured to:
performing motion detection on a real-time video stream to determine the target subject;
detecting a position of the target subject in a scene in each video frame included in the real-time video stream;
and determining the video frames of which the position change of the target subject in the video frames included in the real-time video stream meets a preset threshold value as the historical action frames.
13. The apparatus according to claim 11, wherein if the first selection instruction is used to instruct entering the manual shooting mode, the receiving module is further configured to receive a second selection instruction of a user for a video frame included in a real-time video stream;
the obtaining module is specifically further configured to: and determining a subject of the corresponding position of the second selection instruction in the video frame as the target subject, and determining the video frame as the historical action frame.
14. The apparatus according to any one of claims 10 to 13, wherein the image segmentation module is specifically configured to:
reducing an image area corresponding to a target subject in the historical action frame according to a motion detection technology to obtain a target image area in the historical action frame;
and processing the image of the target image area through a deep learning algorithm to obtain a mask image of the target main body corresponding to the historical action frame.
15. The apparatus according to claim 14, wherein if there are a plurality of mask images with overlapping subjects in the mask images, the image segmentation module is further configured to:
and separating the mask image of the target subject from the mask image overlapped by the subjects according to the depth information of the subjects in the historical action frame.
16. The apparatus according to any one of claims 10-15, wherein the mapping module is specifically configured to:
obtaining the corresponding relation between the position of at least one object in the historical action frame and the position of the object in the current frame according to an image registration technology or a synchronous positioning and mapping SLAM technology;
and determining the reference position of the target main body in the current frame according to the corresponding relation and the position of the target main body in the historical action frame.
17. The apparatus according to any one of claims 10 to 16, wherein the image fusion module is specifically configured to:
and respectively carrying out weighted fusion processing on the images of the N target main bodies and the pixel information of the image in the current frame at the N reference positions of the current frame.
18. The apparatus according to any one of claims 10 to 17, wherein the image fusion module is further configured to:
and adding at least one gray level image to the image of the target main body in the current frame to obtain the target frame, wherein the gray level value of the gray level image is larger if the distance between the gray level image and the image of the target main body in the current frame is shorter.
19. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the image processing method of any one of claims 1 to 9.
20. A computer-readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the image processing method of any one of claims 1 to 9.
21. A computer program product which, when run on a computer, causes the computer to perform the image processing method of any one of claims 1 to 9.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010478673.3A CN113810587B (en) | 2020-05-29 | 2020-05-29 | Image processing method and device |
PCT/CN2021/079103 WO2021238325A1 (en) | 2020-05-29 | 2021-03-04 | Image processing method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010478673.3A CN113810587B (en) | 2020-05-29 | 2020-05-29 | Image processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113810587A true CN113810587A (en) | 2021-12-17 |
CN113810587B CN113810587B (en) | 2023-04-18 |
Family
ID=78745570
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010478673.3A Active CN113810587B (en) | 2020-05-29 | 2020-05-29 | Image processing method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113810587B (en) |
WO (1) | WO2021238325A1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114040117A (en) * | 2021-12-20 | 2022-02-11 | 努比亚技术有限公司 | Photographing processing method of multi-frame image, terminal and storage medium |
CN114302071A (en) * | 2021-12-28 | 2022-04-08 | 影石创新科技股份有限公司 | Video processing method and device, storage medium and electronic equipment |
CN114288647A (en) * | 2021-12-31 | 2022-04-08 | 深圳方舟互动科技有限公司 | Artificial intelligence game engine based on AI Designer, game rendering method and device |
CN114440920A (en) * | 2022-01-27 | 2022-05-06 | 电信科学技术第十研究所有限公司 | Track flow display method and device based on electronic map |
CN114494328A (en) * | 2022-02-11 | 2022-05-13 | 北京字跳网络技术有限公司 | Image display method, image display device, electronic device, and storage medium |
CN114863036A (en) * | 2022-07-06 | 2022-08-05 | 深圳市信润富联数字科技有限公司 | Data processing method and device based on structured light, electronic equipment and storage medium |
CN115115679A (en) * | 2022-06-02 | 2022-09-27 | 华为技术有限公司 | Image registration method and related equipment |
CN115175005A (en) * | 2022-06-08 | 2022-10-11 | 中央广播电视总台 | Video processing method and device, electronic equipment and storage medium |
CN116048379A (en) * | 2022-06-30 | 2023-05-02 | 荣耀终端有限公司 | Data recharging method and device |
WO2024093854A1 (en) * | 2022-11-02 | 2024-05-10 | 华为技术有限公司 | Image processing method and electronic device |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114401360B (en) * | 2021-12-07 | 2024-05-31 | 影石创新科技股份有限公司 | Method, device, equipment and medium for generating multi-frame delay special effects of video |
CN114302234B (en) * | 2021-12-29 | 2023-11-07 | 杭州当虹科技股份有限公司 | Quick packaging method for air skills |
CN114531553B (en) * | 2022-02-11 | 2024-02-09 | 北京字跳网络技术有限公司 | Method, device, electronic equipment and storage medium for generating special effect video |
CN115567633A (en) * | 2022-02-24 | 2023-01-03 | 荣耀终端有限公司 | Photographing method, medium, program product and electronic device |
CN114693780A (en) * | 2022-04-11 | 2022-07-01 | 北京字跳网络技术有限公司 | Image processing method, device, equipment, storage medium and program product |
CN115037992A (en) * | 2022-06-08 | 2022-09-09 | 中央广播电视总台 | Video processing method, device and storage medium |
CN115273565A (en) * | 2022-06-24 | 2022-11-01 | 苏州数智源信息技术有限公司 | Airplane apron early warning method, device and terminal based on AI big data |
CN115147441A (en) * | 2022-07-31 | 2022-10-04 | 江苏云舟通信科技有限公司 | Cutout special effect processing system based on data analysis |
CN115689963B (en) * | 2022-11-21 | 2023-06-06 | 荣耀终端有限公司 | Image processing method and electronic equipment |
CN116229337B (en) * | 2023-05-10 | 2023-09-26 | 瀚博半导体(上海)有限公司 | Method, apparatus, system, device and medium for video processing |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120106869A1 (en) * | 2010-10-27 | 2012-05-03 | Sony Corporation | Image processing apparatus, image processing method, and program |
CN102480598A (en) * | 2010-11-19 | 2012-05-30 | 信泰伟创影像科技有限公司 | Imaging apparatus, imaging method and computer program |
CN104113693A (en) * | 2014-07-22 | 2014-10-22 | 深圳市中兴移动通信有限公司 | Shooting method and shooting device |
CN104125407A (en) * | 2014-08-13 | 2014-10-29 | 深圳市中兴移动通信有限公司 | Object motion track shooting method and mobile terminal |
CN104159033A (en) * | 2014-08-21 | 2014-11-19 | 深圳市中兴移动通信有限公司 | Method and device of optimizing shooting effect |
CN104751488A (en) * | 2015-04-08 | 2015-07-01 | 努比亚技术有限公司 | Photographing method for moving track of moving object and terminal equipment |
CN107077720A (en) * | 2016-12-27 | 2017-08-18 | 深圳市大疆创新科技有限公司 | Method, device and the equipment of image procossing |
CN107943837A (en) * | 2017-10-27 | 2018-04-20 | 江苏理工学院 | A kind of video abstraction generating method of foreground target key frame |
CN109922294A (en) * | 2019-01-31 | 2019-06-21 | 维沃移动通信有限公司 | A kind of method for processing video frequency and mobile terminal |
CN110536087A (en) * | 2019-05-06 | 2019-12-03 | 珠海全志科技股份有限公司 | Electronic equipment and its motion profile picture synthesis method, device and embedded equipment |
CN111105434A (en) * | 2018-10-25 | 2020-05-05 | 中兴通讯股份有限公司 | Motion trajectory synthesis method and electronic equipment |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101804383B1 (en) * | 2014-01-14 | 2017-12-04 | 한화테크윈 주식회사 | System and method for browsing summary image |
JP2015167676A (en) * | 2014-03-06 | 2015-09-28 | 株式会社横浜DeNAベイスターズ | pitching analysis support system |
CN104243819B (en) * | 2014-08-29 | 2018-02-23 | 小米科技有限责任公司 | Photo acquisition methods and device |
KR102375864B1 (en) * | 2015-02-10 | 2022-03-18 | 한화테크윈 주식회사 | System and method for browsing summary image |
CN105049674A (en) * | 2015-07-01 | 2015-11-11 | 中科创达软件股份有限公司 | Video image processing method and system |
-
2020
- 2020-05-29 CN CN202010478673.3A patent/CN113810587B/en active Active
-
2021
- 2021-03-04 WO PCT/CN2021/079103 patent/WO2021238325A1/en active Application Filing
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120106869A1 (en) * | 2010-10-27 | 2012-05-03 | Sony Corporation | Image processing apparatus, image processing method, and program |
CN102480598A (en) * | 2010-11-19 | 2012-05-30 | 信泰伟创影像科技有限公司 | Imaging apparatus, imaging method and computer program |
CN104113693A (en) * | 2014-07-22 | 2014-10-22 | 深圳市中兴移动通信有限公司 | Shooting method and shooting device |
CN104125407A (en) * | 2014-08-13 | 2014-10-29 | 深圳市中兴移动通信有限公司 | Object motion track shooting method and mobile terminal |
CN104159033A (en) * | 2014-08-21 | 2014-11-19 | 深圳市中兴移动通信有限公司 | Method and device of optimizing shooting effect |
CN104751488A (en) * | 2015-04-08 | 2015-07-01 | 努比亚技术有限公司 | Photographing method for moving track of moving object and terminal equipment |
CN107077720A (en) * | 2016-12-27 | 2017-08-18 | 深圳市大疆创新科技有限公司 | Method, device and the equipment of image procossing |
CN107943837A (en) * | 2017-10-27 | 2018-04-20 | 江苏理工学院 | A kind of video abstraction generating method of foreground target key frame |
CN111105434A (en) * | 2018-10-25 | 2020-05-05 | 中兴通讯股份有限公司 | Motion trajectory synthesis method and electronic equipment |
CN109922294A (en) * | 2019-01-31 | 2019-06-21 | 维沃移动通信有限公司 | A kind of method for processing video frequency and mobile terminal |
CN110536087A (en) * | 2019-05-06 | 2019-12-03 | 珠海全志科技股份有限公司 | Electronic equipment and its motion profile picture synthesis method, device and embedded equipment |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114040117A (en) * | 2021-12-20 | 2022-02-11 | 努比亚技术有限公司 | Photographing processing method of multi-frame image, terminal and storage medium |
CN114302071A (en) * | 2021-12-28 | 2022-04-08 | 影石创新科技股份有限公司 | Video processing method and device, storage medium and electronic equipment |
CN114302071B (en) * | 2021-12-28 | 2024-02-20 | 影石创新科技股份有限公司 | Video processing method and device, storage medium and electronic equipment |
CN114288647A (en) * | 2021-12-31 | 2022-04-08 | 深圳方舟互动科技有限公司 | Artificial intelligence game engine based on AI Designer, game rendering method and device |
CN114288647B (en) * | 2021-12-31 | 2022-07-08 | 深圳方舟互动科技有限公司 | Artificial intelligence game engine based on AI Designer, game rendering method and device |
CN114440920A (en) * | 2022-01-27 | 2022-05-06 | 电信科学技术第十研究所有限公司 | Track flow display method and device based on electronic map |
CN114494328B (en) * | 2022-02-11 | 2024-01-30 | 北京字跳网络技术有限公司 | Image display method, device, electronic equipment and storage medium |
CN114494328A (en) * | 2022-02-11 | 2022-05-13 | 北京字跳网络技术有限公司 | Image display method, image display device, electronic device, and storage medium |
CN115115679A (en) * | 2022-06-02 | 2022-09-27 | 华为技术有限公司 | Image registration method and related equipment |
CN115175005A (en) * | 2022-06-08 | 2022-10-11 | 中央广播电视总台 | Video processing method and device, electronic equipment and storage medium |
CN116048379A (en) * | 2022-06-30 | 2023-05-02 | 荣耀终端有限公司 | Data recharging method and device |
CN116048379B (en) * | 2022-06-30 | 2023-10-24 | 荣耀终端有限公司 | Data recharging method and device |
CN114863036A (en) * | 2022-07-06 | 2022-08-05 | 深圳市信润富联数字科技有限公司 | Data processing method and device based on structured light, electronic equipment and storage medium |
WO2024093854A1 (en) * | 2022-11-02 | 2024-05-10 | 华为技术有限公司 | Image processing method and electronic device |
Also Published As
Publication number | Publication date |
---|---|
WO2021238325A1 (en) | 2021-12-02 |
CN113810587B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113810587B (en) | Image processing method and device | |
KR20220030263A (en) | texture mesh building | |
CN111726536A (en) | Video generation method and device, storage medium and computer equipment | |
EP3383036A2 (en) | Information processing device, information processing method, and program | |
CN112287852B (en) | Face image processing method, face image display method, face image processing device and face image display equipment | |
CN112262563A (en) | Image processing method and electronic device | |
EP4109879A1 (en) | Image color retention method and device | |
CN109448050B (en) | Method for determining position of target point and terminal | |
CN111833461A (en) | Method and device for realizing special effect of image, electronic equipment and storage medium | |
CN111833403A (en) | Method and apparatus for spatial localization | |
CN113570614A (en) | Image processing method, device, equipment and storage medium | |
WO2022143311A1 (en) | Photographing method and apparatus for intelligent view-finding recommendation | |
WO2022057384A1 (en) | Photographing method and device | |
WO2022068522A1 (en) | Target tracking method and electronic device | |
CN114926351A (en) | Image processing method, electronic device, and computer storage medium | |
CN113747044A (en) | Panoramic shooting method and device | |
CN117724781A (en) | Playing method for starting animation by application program and electronic equipment | |
WO2022206605A1 (en) | Method for determining target object, and photographing method and device | |
CN115880348B (en) | Face depth determining method, electronic equipment and storage medium | |
CN117729421B (en) | Image processing method, electronic device, and computer-readable storage medium | |
CN115797815B (en) | AR translation processing method and electronic equipment | |
CN116091572B (en) | Method for acquiring image depth information, electronic equipment and storage medium | |
WO2023072113A1 (en) | Display method and electronic device | |
CN117115481B (en) | Positioning method, electronic equipment and medium | |
CN116740777B (en) | Training method of face quality detection model and related equipment thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |