CN113596350A

CN113596350A - Image processing method, mobile terminal and readable storage medium

Info

Publication number: CN113596350A
Application number: CN202110853257.1A
Authority: CN
Inventors: 张腾
Original assignee: Shenzhen Transsion Holdings Co Ltd
Current assignee: Shenzhen Transsion Holdings Co Ltd
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-11-02
Anticipated expiration: 2041-07-27
Also published as: CN113596350B

Abstract

The application discloses an image processing method, a mobile terminal and a readable storage medium, wherein the method comprises the following steps: constructing a first image layer according to object information of a target object in a video image frame; and constructing a second image layer from the target image; forming a target image frame from the first image layer and the second image layer. According to the method and the device, the target image frame is obtained according to the first image layer constructed by the video image frame and the second image layer constructed by the target image. In the video formed by the target image frames, the target object is displayed dynamically, and the pictures except the target object are displayed statically, so that the effect of combining dynamic and static videos is realized, and the interestingness of the video is increased.

Description

Image processing method, mobile terminal and readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method, a mobile terminal, and a readable storage medium.

Background

With the progress of technology, the making modes of videos are richer and richer, and more users tend to make videos in a special effect mode.

In the course of conceiving and implementing the present application, the inventors found that at least the following problems existed: in the current video production mode, the effect of dynamic and static combination cannot be realized.

The foregoing description is provided for general background information and is not admitted to be prior art.

Disclosure of Invention

In view of the above technical problems, the present application provides an image processing method, a mobile terminal and a readable storage medium to provide a new video production method to achieve the effect of combining moving and static images.

In order to solve the above technical problem, the present application provides an image processing method, including:

constructing a first image layer according to object information of a target object in a video image frame; and the number of the first and second groups,

constructing a second image layer according to the target image;

forming a target image frame from the first image layer and the second image layer.

Optionally, the constructing a second image layer according to the target image includes:

and acquiring pixel information of the target image, and constructing the second image layer according to the pixel information of the target image and/or the position information of the target object in the video image frame.

Optionally, if the target image includes the target object;

the constructing the second image layer according to the pixel information of the target image and/or the position information of the target object in the video image frame comprises:

constructing a candidate second image layer according to the pixel information of the target image and/or the position information of the target object in the video image frame;

determining a region to be filled of the target image according to the position information of the target object in the target image, and filling the region to be filled with target pixels to construct an alternative second image layer; and the number of the first and second groups,

and fusing the candidate second image layer and the candidate second image layer to construct the second image layer.

Optionally, the determining, according to the position information of the target object in the target image, a region to be filled in the target image includes:

determining a region corresponding to the position information of the target object in the target image as the region to be filled; alternatively, the first and second electrodes may be,

acquiring a first area corresponding to the position information of the target object in the target image and/or a second area corresponding to the position information of the target object in the video image frame, and determining the area to be filled according to the first area and/or the second area.

Optionally, the filling the region to be filled with the target pixel includes:

filling the area to be filled with pixel information corresponding to the area to be filled in the video image frame; alternatively, the first and second electrodes may be,

and filling the area to be filled with pixel information adjacent to the target object in a target video image frame, wherein optionally, the target video image frame and the video image frame are from the same video.

Optionally, the forming a target image frame from the first image layer and the second image layer comprises:

when an occlusion object of the target object exists in the video image frame, extracting the occlusion object from the video image frame to construct a third image layer;

and fusing the second image layer, the first image layer and the third image layer according to a preset sequence to form the target image frame.

Optionally, the method further comprises:

acquiring edge information of a target object in the video image frame;

and when the edge information of the target object meets a preset condition, judging that an occlusion object of the target object exists in the video image frame.

Optionally, the target object is a target subject identified in the video image frame; or, the target object is a target background, other than a target subject, identified in the video image frame.

Optionally, the target subject is at least one target human figure object identified from each of the video image frames corresponding to the video; alternatively, the first and second electrodes may be,

the target subject is at least one target person object identified from the video image frame after switching to a target mode in the video recording process.

Optionally, the target image is acquired in a manner including:

acquiring a preset image, and taking the preset image as the target image; alternatively, the first and second electrodes may be,

acquiring a target video image frame in a video, and using the target video image frame as the target image, wherein optionally, the target video image frame comprises an initial video image frame or a main body video image frame.

Optionally, the video image frames comprise at least two;

after forming a target image frame from the first image layer and the second image layer, the method further comprises:

and splicing the target image frames corresponding to the video image frames according to the time information corresponding to the video image frames to form a target video.

Optionally, the constructing a first image layer according to object information of a target object in a video image frame includes:

acquiring pixel information of a target object in the video image frame and/or position information of the target object in the video image frame, and constructing the first image layer according to the pixel information and/or the position information.

Optionally, before the step of constructing the first image layer from object information of the target object in the video image frame, the method further comprises:

acquiring the video image frame, and identifying a main body in the video image frame;

when the main body exists in the video image frame, the step of constructing a first image layer according to the object information of the target object in the video image frame is executed; and/or the presence of a gas in the gas,

when the subject is not present in the video image frames, regarding the video image frames as the target image frames.

The present application further provides a mobile terminal, including: the image processing system comprises a memory and a processor, wherein the memory stores an image processing program, and the image processing program realizes the steps of the method when being executed by the processor.

The present application also provides a computer storage medium having a computer program stored thereon, which, when being executed by a processor, carries out the steps of the method as described above.

As described above, the image processing method of the present application is applied to a mobile terminal, and in a video recording process, a first image layer is constructed by processing each video image frame, for example, according to object information of a target object in the video image frame; constructing a second image layer according to the target image; a target image frame is then formed from the first image layer and the second image layer. A first image layer in a target image frame obtained by the application displays a target object acquired in real time, and a second image layer displays a fixed target image. When the target image frames form a video, object information based on a target object in each frame of the target image frames is different, and pictures other than the target object are formed by fixed target images. When the video is played, the target object is displayed dynamically, and the pictures except the target object are displayed statically. The new video making mode realizes the effect of combining dynamic video and static video and increases the interest of the video.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic hardware structure diagram of a mobile terminal implementing embodiments of the present application;

fig. 2 is a communication network system architecture diagram according to an embodiment of the present application;

FIG. 3 is a flow chart of an image processing method in a first embodiment of the present application;

FIG. 4 is a schematic flow chart of an image processing method in a second embodiment of the present application;

FIG. 5 is a flowchart illustrating a method for constructing a second image layer according to a second embodiment of the present application;

fig. 6 is a flowchart illustrating an image processing method according to a third embodiment of the present application;

FIG. 7 is a flowchart illustrating an image processing method according to a fourth embodiment of the present application;

fig. 8 is a flowchart illustrating an image processing method according to a fifth embodiment of the present application;

FIG. 9 is a schematic diagram of mask information of a video image frame in a first embodiment of the present application;

fig. 10 is a schematic diagram illustrating the processing effect of the presence of the obstruction in the video image frame in the third embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings. With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the recitation of an element by the phrase "comprising an … …" does not exclude the presence of additional like elements in the process, method, article, or apparatus that comprises the element, and optionally, identically named components, features, and elements in different embodiments of the present application may have different meanings, as may be determined by their interpretation in the embodiment or by their further context within the embodiment.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or," "and/or," "including at least one of the following," and the like, as used herein, are to be construed as inclusive or mean any one or any combination. For example, "includes at least one of: A. b, C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C ", again for example," A, B or C "or" A, B and/or C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C'. An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.

It should be understood that, although the steps in the flowcharts in the embodiments of the present application are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, in different orders, and may be performed alternately or at least partially with respect to other steps or sub-steps of other steps.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It should be noted that step numbers such as S10 and S20 are used herein for the purpose of more clearly and briefly describing the corresponding content, and do not constitute a substantial limitation on the sequence, and those skilled in the art may perform S20 first and then S10 in specific implementation, which should be within the scope of the present application.

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module", "component" or "unit" may be used mixedly.

The mobile terminal may be implemented in various forms. For example, the mobile terminal described in the present application may include mobile terminals such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and the like, and fixed terminals such as a Digital TV, a desktop computer, and the like.

The following description will be given taking a mobile terminal as an example, and it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for mobile purposes.

Referring to fig. 1, which is a schematic diagram of a hardware structure of a mobile terminal for implementing various embodiments of the present application, the mobile terminal 100 may include: RF (Radio Frequency) unit 101, WiFi module 102, audio output unit 103, a/V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 1 is not intended to be limiting of mobile terminals, which may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile terminal in detail with reference to fig. 1:

the radio frequency unit 101 may be configured to receive and transmit signals during information transmission and reception or during a call, and specifically, receive downlink information of a base station and then process the downlink information to the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. Alternatively, the radio frequency unit 101 may also communicate with a network and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA2000(Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division duplex Long Term Evolution), and TDD-LTE (Time Division duplex Long Term Evolution).

WiFi belongs to short-distance wireless transmission technology, and the mobile terminal can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 102, and provides wireless broadband internet access for the user. Although fig. 1 shows the WiFi module 102, it is understood that it does not belong to the essential constitution of the mobile terminal, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 into an audio signal and output as sound when the mobile terminal 100 is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the mobile terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 may include a speaker, a buzzer, and the like.

The a/V input unit 104 is used to receive audio or video signals. The a/V input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, the Graphics processor 1041 Processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the WiFi module 102. The microphone 1042 may receive sounds (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, or the like, and may be capable of processing such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting audio signals.

The mobile terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Optionally, the light sensor includes an ambient light sensor that may adjust the brightness of the display panel 1061 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 1061 and/or the backlight when the mobile terminal 100 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

The display unit 106 is used to display information input by a user or information provided to the user. The Display unit 106 may include a Display panel 1061, and the Display panel 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Alternatively, the user input unit 107 may include a touch panel 1071 and other input devices 1072. The touch panel 1071, also referred to as a touch screen, may collect a touch operation performed by a user on or near the touch panel 1071 (e.g., an operation performed by the user on or near the touch panel 1071 using a finger, a stylus, or any other suitable object or accessory), and drive a corresponding connection device according to a predetermined program. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Optionally, the touch detection device detects a touch orientation of a user, detects a signal caused by a touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and can receive and execute commands sent by the processor 110. Alternatively, the touch panel 1071 may be implemented in various types, such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 1071, the user input unit 107 may include other input devices 1072. Optionally, other input devices 1072 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like, and are not limited thereto.

Alternatively, the touch panel 1071 may cover the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or nearby, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 1061 according to the type of the touch event. Although the touch panel 1071 and the display panel 1061 are shown in fig. 1 as two separate components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated to implement the input and output functions of the mobile terminal, and is not limited herein.

The interface unit 108 serves as an interface through which at least one external device is connected to the mobile terminal 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 100 or may be used to transmit data between the mobile terminal 100 and external devices.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a program storage area and a data storage area, and optionally, the program storage area may store an operating system, an application program (such as a sound playing function, an image playing function, and the like) required by at least one function, and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Optionally, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 110 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the mobile terminal. Processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor, optionally, the application processor mainly handles operating systems, user interfaces, application programs, etc., and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The mobile terminal 100 may further include a power supply 111 (e.g., a battery) for supplying power to various components, and preferably, the power supply 111 may be logically connected to the processor 110 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system.

Although not shown in fig. 1, the mobile terminal 100 may further include a bluetooth module or the like, which is not described in detail herein.

In order to facilitate understanding of the embodiments of the present application, a communication network system on which the mobile terminal of the present application is based is described below.

Referring to fig. 2, fig. 2 is an architecture diagram of a communication Network system according to an embodiment of the present disclosure, where the communication Network system is an LTE system of a universal mobile telecommunications technology, and the LTE system includes a UE (User Equipment) 201, an E-UTRAN (Evolved UMTS Terrestrial Radio Access Network) 202, an EPC (Evolved Packet Core) 203, and an IP service 204 of an operator, which are in communication connection in sequence.

Optionally, the UE201 may be the terminal 100 described above, and is not described herein again.

The E-UTRAN202 includes eNodeB2021 and other eNodeBs 2022, among others. Alternatively, the eNodeB2021 may be connected with other enodebs 2022 through a backhaul (e.g., X2 interface), the eNodeB2021 is connected to the EPC203, and the eNodeB2021 may provide the UE201 access to the EPC 203.

The EPC203 may include an MME (Mobility Management Entity) 2031, an HSS (Home Subscriber Server) 2032, other MMEs 2033, an SGW (Serving gateway) 2034, a PGW (PDN gateway) 2035, and a PCRF (Policy and Charging Rules Function) 2036, and the like. Optionally, the MME2031 is a control node that handles signaling between the UE201 and the EPC203, providing bearer and connection management. HSS2032 is used to provide registers to manage functions such as home location register (not shown) and holds subscriber specific information about service characteristics, data rates, etc. All user data may be sent through SGW2034, PGW2035 may provide IP address assignment for UE201 and other functions, and PCRF2036 is a policy and charging control policy decision point for traffic data flow and IP bearer resources, which selects and provides available policy and charging control decisions for a policy and charging enforcement function (not shown).

The IP services 204 may include the internet, intranets, IMS (IP Multimedia Subsystem), or other IP services, among others.

Although the LTE system is described as an example, it should be understood by those skilled in the art that the present application is not limited to the LTE system, but may also be applied to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, and future new network systems.

Based on the above mobile terminal hardware structure and communication network system, various embodiments of the present application are provided.

First embodiment

Fig. 3 shows a flowchart of the image processing method of the present embodiment, and referring to fig. 3, the image processing method may specifically include the following steps:

step S10, constructing a first image layer according to the object information of the target object in the video image frame;

step S20, constructing a second image layer according to the target image;

step S30, forming a target image frame from the first image layer and the second image layer.

Optionally, the present embodiment is applied to a mobile terminal, and the mobile terminal has a video recording function. The embodiment can be applied to the process of image processing of each frame of video image frame by the mobile terminal in the video recording process. Optionally, the present embodiment may also be applied to a process of performing image processing on each frame of video image frame in a formed video after the video recording is finished, and the following describes processing each frame of video image frame in the video recording process as an example.

Optionally, in the process of recording the video by the mobile terminal, after the video image frame is collected by the camera, the video image frame is transmitted in two ways. Optionally, one approach is to transmit the video image frame to an image processor, and the image processor constructs a first image layer according to object information of a target object in the video image frame after receiving the video image frame; acquiring a target image, and constructing a second image layer based on the target image; the target image frame is then formed from the first image layer and the second image layer. Optionally, another approach is to transmit the video image frames to a preview interface for displaying the preview interface, so that a user can view video content before image processing through the preview interface during recording of a video.

In this embodiment, the object information of the target object includes pixel information and/or position information. Optionally, pixel information of the target object in the video image frame and/or position information of the target object in the video image frame may be acquired, and the first image layer may be constructed according to the pixel information and/or the position information of the target object in the video image frame. Optionally, after the mobile terminal acquires the video image frame, a target object in the video image frame is identified, and the video image frame is subjected to target object extraction processing. Optionally, the tracked target object is segmented to extract the target object. Alternatively, mask information of a display area including the target object is determined based on the position information of the target object (as shown in fig. 9), and then a first image layer is formed based on the pixel information of the target object and the mask information. Since the first image layer is constructed and formed according to the pixel information of the target object and the position information of the target object, the first image layer displays the target object only in the display area. If the target object is a dynamic object during the video recording process, the positions of the target object in the video image frames at different times may be different. That is, the position of the target object in the first image layer constructed based on different video image frames is different.

Optionally, the present embodiment further performs static layer processing on the video image frame. Optionally, a target image is obtained to form the second image layer, and the second image layer performs pixel filling on a position except for a position where the target object is located in the first image layer, so that the first image layer forms a complete image picture. Optionally, each of the video image frames constructs a second image layer using the target image. In this way, when a video is formed, the image displayed by the second image layer is fixed, and other images except the target object are displayed in a static state.

As can be seen, the present embodiment forms a target image frame based on the first image layer and the second image layer. When a video is formed according to the target image frame, a first image layer presents a dynamic picture, a second image layer presents a static picture, and the video formed by splicing the target image frames formed by the first image layer and the second image layer presents a part of dynamic objects and a part of static objects. Therefore, the mobile terminal can record the video combined by moving and static, and the interestingness of the video is increased.

Alternatively, the target subject may be a person, an animal, or other object such as a vehicle. The target subject may be preset by a user, or may be automatically identified by the mobile terminal based on a video image frame. The target subject can be a subject tracked from the Nth video image frame, and can also be a subject identified in each image frame. N is a positive integer greater than or equal to 1.

Next, taking the target subject as a person as an example, the steps of the image processing method of the present embodiment are described: the target object may be a person identified in the video image frame or a background object other than the person in the video image frame.

When the target object is a person identified in a video image frame, the first image layer is a main body layer of the target image frame, and the second image layer is a background layer of the target image frame. And forming the target image frame by superposing the first image layer and the second image layer up and down, wherein the video formed by the target image frame realizes the effect that the background is static and the character is dynamically displayed.

When the target object is a background object except a person in a video image frame, the first image layer is a background layer of the target image frame, and the second image layer is a main body layer of the target image frame. And forming the target image frame by vertically superposing the second image layer and the first image layer, wherein the video formed by the target image frame realizes the effect that the background is dynamically displayed and the character is statically displayed. Optionally, the target image is an image containing a target subject.

Alternatively, the target subject may be at least one target human object identified from respective video image frames corresponding to the video. Optionally, determining the same person tracked in each video image frame as a target subject; or the tracked moving person in each video image frame is taken as a target subject; or taking the person identified in the preset recording time length as the target subject; or the Nth identified person or the Nth identified moving person at the beginning of the video recording is taken as the target subject.

Optionally, the target subject may also be at least one target human object identified from the video image frames during video recording after switching to a target mode. Optionally, the video image frame may include a first frame video image frame after the target mode is switched, or an nth frame video image frame. Alternatively, the switching target mode may be a switching recording mode during video recording. For example, the target mode may be a dynamic and static combination mode, so that the video presents partial video segment dynamic and static combination display, the effect of partial video segment dynamic display is achieved, and the display mode of the video is enriched.

In an embodiment, the target image may be a preset fixed image, or may be any frame image in the same video. Alternatively, a preset image may be acquired, and the preset image may be taken as the target image. Optionally, a target video image frame in a video may also be acquired, and the target video image frame is taken as the target image. Optionally, the target video image frame comprises an initial video image frame or any frame of a subject video image frame.

Optionally, when the target image is a preset image, the second image layer is a layer formed by the preset image. And the video formed by the target image frame statically displays the content of the preset image. Alternatively, the preset image may be a user-defined image, such as an image that is set to be statically displayed in a video setting of the mobile terminal. The preset image may also be a default image set by the mobile terminal when the mobile terminal leaves the factory. Or, the mobile terminal can also identify a recording scene based on a video recording process, and then download an image matched with the current recording scene based on big data analysis and comparison. When the preset image is adopted as the target image, the preset image can be directly used as the second image layer.

Optionally, the target image may also be any image frame in the same video, and any image frame in the same video is used as the target image, so that the influence on the presentation effect of the video due to a large change of the recording scene of the video can be avoided. Optionally, the target image frame is an initial video image frame. It can be understood that, when the mobile terminal records a video, the mobile terminal is fixed at the same position for recording. Therefore, scenes in the same video are scenes in the same view field, the initial video image frame is taken as the target image frame, the scenes of the video can be kept unchanged all the time, but the scenes are static scenes, and the target object is a dynamic object, so that the video is more harmonious and natural, and the video distortion can be avoided. Alternatively, a subject video image frame may be used as the target image, and the subject video image frame may be any video image frame including a subject. For example, the target image is a video image frame in which the main body is recognized for the first time, and the video image frame is used as the target image, so that the influence on the video display effect due to a large display difference between the main body and the main body when the main body appears can be avoided.

Optionally, based on the different acquisition manners of the target image, the construction manner of the second image layer in this embodiment includes, but is not limited to, the following manners: pixel information of the target image can be directly adopted as the second image layer; the pixel information of the target image can be acquired, and the second image layer is constructed according to the pixel information of the target image and/or the position information of the target object in the video image frame.

Optionally, the other positions in the video image frame except for the position information of the target object are filled with the pixel information of the target image to form the second image layer.

In this embodiment, in the video recording process, each video image frame is processed. Alternatively, the process of processing each video image frame may be: constructing a first image layer according to pixel information of a target object in a video image frame and position information of the target object; constructing a second image layer according to the target image; a target image frame is then formed from the first image layer and the second image layer. And displaying the target object acquired in real time on the first image layer in the processed target image frame, displaying the fixed target image on the second image layer, and dynamically displaying the target object based on the change of the recording process. Therefore, the video formed by the target image frame displays the picture in a dynamic and static combination mode, and the interestingness of the video is increased.

Second embodiment

Referring to fig. 4, based on the first embodiment, step S20 includes:

step S21, acquiring pixel information of the target image, and constructing the second image layer according to the pixel information of the target image and/or the position information of the target object in the video image frame.

Optionally, the second image layer is formed based on pixel information filling of the target image, such as the entire second image layer may be filled with pixel information of the target image. Optionally, other positions in the second image layer than the position of the target object in the video image frame may also be filled with pixel information of the target image to form the second image layer. And when the first image layer and the second image layer are fused based on the pixel information which is not filled with the target image on the position information of the target object of the video image frame in the second image layer. The target object on the first image layer is not influenced by the pixel information of the second image layer, so that the fusion is easier and the fusion effect is better.

Optionally, if the target image does not include a target object, it is determined that the target image is a pure background image, and even if the target object exists in the current video image frame, when the second image layer and the first image layer are fused, two target objects or a ghost of the target objects is not presented. Thus, the second image layer can be constructed directly using the pixel information of the target image.

Optionally, if the target image includes the target object, when the position of the target object in the target image is not consistent with the position of the target object in the video image frame, that is, when the target object moves. If the second image layer is directly filled with the pixel information of the target image and the pixel information based on the target image includes the pixel information of the target object, the target object with inconsistent positions can be presented in the first image layer and the second image layer. When the first image layer and the second image layer are fused, two target objects or a ghost of the target objects is presented in the target image frame.

Optionally, the present embodiment proposes an embodiment of further constructing a second image layer. Referring to fig. 5, step S21 includes:

step S211, constructing a candidate second image layer according to the pixel information of the target image and/or the position information of the target object in the video image frame;

step S212, determining a region to be filled of the target image according to the position information of the target object in the target image, and filling the region to be filled with target pixels to construct an alternative second image layer;

step S213, fusing the candidate second image layer and the candidate second image layer to construct the second image layer.

In this embodiment, when constructing the candidate second image layer, the candidate second image layer may be directly filled based on the pixel information of the target image. The filling area of the pixel information of the target object in the video image frame can also be determined based on the position information of the target object in the video image frame, for example, the filling area is other positions except the position corresponding to the position information of the target object in the video image frame. Based on the above, after the candidate second image layer is constructed, the target object of the target image exists in the candidate second image layer.

Optionally, after the position information of the target object in the target image is determined, the region to be filled of the target object is determined based on the position information, and then the region to be filled is filled with the target pixel. Forming a candidate second image layer after the region to be filled is filled, then fusing the candidate second image layer and the candidate second image layer, and filling the position of the target object in the second image layer into a target pixel so as to hide the pixel information of the target object in the second image layer. In this way, when the second image layer and the first image layer are fused into the target image frame, the target image frame only displays the target object in the first image layer, and the problem that two target objects are presented in the target image frame or the ghost of the target objects is presented is solved.

In an embodiment, the determination manner of the region to be filled includes, but is not limited to, one of the following:

optionally, a region corresponding to the position information of the target object in the target image may be determined as the region to be filled. Optionally, after a target object is identified in the target image, the target object is extracted, and the position of the target object is determined as the region to be filled.

Optionally, the area to be filled may be a frame selection area capable of completely frame selecting the target object, or the area to be filled may be an area formed by a contour of the target object.

In some embodiments, after the region to be filled is determined, the region to be filled is directly filled with the target pixels. Optionally, in other embodiments, after the area to be filled is determined, the area to be filled is divided so that the area to be filled is a blank area, and at this time, the blank area is filled with the target pixels. Therefore, the fusion effect of the target pixel and other areas of the target image is better.

Optionally, the region to be filled is a non-overlapping region between a target object in the target image and a target object in the video image frame, and only the non-overlapping region is filled.

Optionally, a first region corresponding to position information of a target object in the target image and/or a second region corresponding to position information of the target object in the video image frame may be acquired; and determining the area to be filled according to the first area and/or the second area. If the overlapping area of the first area and the second area is obtained, the area except the overlapping area in the first area is further obtained as the area to be filled, and then the area to be filled is filled by adopting the target pixel. Compared with the embodiment, the pixel filling processing process can be reduced, and the image processing efficiency is improved.

Optionally, the first region refers to a region formed by lines constituting an outer edge of the target object in the target image, or a region in which the target object is framed. The second area is an area formed by lines constituting an outer edge of the target object shape in the video image frame, or an area where the target object is framed.

Optionally, the target pixel may be a pixel at any other position in the target image except for the target object, may also be a pixel of a preset image, or may also be a pixel in an image matched with the target image and automatically matched based on big data. Optionally, the target pixel may also be a pixel in a current video image frame, or may be a pixel of another video image frame that is the same video as the current video image frame.

Optionally, when the target pixel is a pixel at any other position in the target image except the target object, the filling the region to be filled with the target pixel includes: and filling the area to be filled with the pixel information adjacent to the target object in the target image so as to avoid overlarge difference between the pixel information of the area to be filled and other pixel information of the target image and avoid influencing the display effect.

Optionally, when the target pixel is a pixel in the current video image frame, the filling the region to be filled with the target pixel includes: and filling the area to be filled with pixel information corresponding to the area to be filled in the video image frame. Namely, the target position information in the video image frame is determined based on the position information of the area to be filled, then the pixel information on the target position information in the video image frame is obtained, and the area to be filled is filled by adopting the pixel information on the target position. It can be understood that the position information of the region to be padded coincides with the target position information in the video image frame.

In the process of recording the video by the mobile terminal, the mobile terminal is fixed at the same position, and pictures of the same view field at different times are shot, so that the pictures of the video image frames collected at different times are consistent for the fixed object, and the area to be filled is filled by adopting the pixels of the current video image frame, so that a better display effect can be achieved.

Optionally, taking the example that the target image is a target video image frame in the same video, the captured target video image frame and the captured fixed object in the current video image frame are in the same position. Based on this, if only the target object moves in the image frame, the position of the target object in the target video image frame is the first position, the position in the current video image frame is the second position, and the first position and the second position do not overlap (the target object has a position change). The background of the first position of the target video image frame is occluded by the target object, while the actual background is captured at the first position of the current video image frame. At this time, the actual background pixel is used to fill the pixel at the first position in the target video image frame, and the filling of the pixel achieves more natural and smooth fusion, so that the display effect of the background is more real and natural.

Optionally, in another embodiment, when the target pixel is a pixel of another video image frame of the same video as the current video image frame, the filling the area to be filled with the target pixel includes: and filling the area to be filled with pixel information adjacent to the target object in the target video image frame.

Optionally, the target object refers to a target object in the target video image frame, and the present embodiment fills the region to be filled with pixel information adjacent to the target object in the target video image frame. And filling the region to be filled with the background pixels adjacent to the target object based on the fact that the pixel information adjacent to the target object is closer to the background information of the corresponding position of the target object, so that the second image layer is more natural, and the region to be filled is supplemented to achieve a better display effect.

Optionally, the target video image frame may be the same image frame as the target image, such as the target video image frame being an initial video image frame (captured nth frame video image frame, or captured nth frame video image frame identifying the subject); or may be a different image frame from the target image, the video image frame being a video image frame preceding the current video image frame.

Third embodiment

Referring to fig. 6, this embodiment proposes that before forming a target image frame according to the first image layer and the second image layer, the method further includes:

step S40, when an occlusion object of the target object exists in the video image frame, extracting the occlusion object from the video image frame to construct a third image layer.

Optionally, when a third image layer is constructed, in the process of forming the target image frame, directly overlaying the third image layer on the first image layer and the second image layer, so that an occlusion object in the third image layer is occluded onto the target object, so as to form a display effect that the target object is occluded by the occlusion object.

Or, when a third image layer is constructed, the forming a target image frame according to the first image layer and the second image layer comprises:

step S31, fusing the second image layer, the first image layer and the third image layer according to a preset sequence to form the target image frame.

Namely, when an occlusion object exists in the video image frame, a third image layer is constructed in advance, the third image layer serves as an occlusion layer, and three image layers are fused in sequence based on a background layer (a second image layer), a main body layer (a first image layer) and the occlusion layer (a third occlusion layer) to form the target image frame. It can be understood that the preset sequence refers to a sequence from bottom to top, if the second image layer is located at the lowest layer, the first image layer is located at the middle layer, the third image layer is located at the uppermost layer, and based on the principle that the upper layer blocks the lower layer, the blocking object on the third image layer blocks the first image layer and the second image layer. In the formed video, the shielding object is located between the target object and the camera of the mobile terminal, so that the visual effect that the shielding object shields the target object is formed, the layering sense is embodied, and the three-dimensional property and the reality of the video are improved (as shown in fig. 10).

Optionally, the third image layer is constructed in a manner including, but not limited to, the following manners: the occlusion object is extracted from the video image frame, a filling area of a third image layer is determined based on position information of the occlusion object, and the filling area is filled with pixel information of the occlusion object to form the third image layer.

It can be understood that, in the third image layer, an area except for the area where the shielding object is located is set as a transparent area, so that when the third image layer is fused to the second image layer, the second image layer is not shielded by other differences except for the area where the shielding object is located shielding the second image layer.

Optionally, the identification manner of the occlusion object includes: acquiring edge information of a target object in the video image frame; and when the edge information of the target object meets a preset condition, judging that an occlusion object of the target object exists in the video image frame.

Alternatively, the preset condition may be that the edge information of the target object does not have continuity. The edge information constituting the target object may refer to pixel information of an edge of an area where the target object is located. The edge information having continuity means that the pixel information of the edge of the area where the target object is located is continuous, and the edge information not having continuity means that the pixel information of the edge of the area where the target object is located is discontinuous. Optionally, pixel information of an edge of an area where the target object is located in the video image frame is obtained, and whether the edge information of the target object meets a preset condition is determined by judging whether the pixel information of the edge of the area is the same or is a continuous function. If the edge information of the target object meets the preset condition, the edge of the area where the target object is located is incomplete, and the fact that the shielding object of the target object exists in the video image frame is further judged.

In the video recording process, when an occlusion object appears on a target object, the target object in a video image frame acquired by a camera is at least partially occluded by the occlusion object, so that the edge information of the target object does not meet the continuity requirement. Based on this, in this embodiment, edge information of the target object is identified, and then whether an occlusion object of the target object exists is determined according to continuity of the edge information, if yes, it is determined that no occlusion object exists on the target object, and/or if not, it is determined that an occlusion object exists on the target object.

Fourth embodiment

Referring to fig. 7, in this embodiment, based on all the embodiments described above, if at least two target image frames are received by an encoder of the mobile terminal, that is, at least two video image frames are processed based on all the embodiments described above, the encoder encodes each target image frame based on a time sequence of acquiring the video image frames to form a target video. Optionally, as after forming a target image frame from the first image layer and the second image layer, the method further comprises:

step S50, stitching target image frames corresponding to the video image frames according to the time information corresponding to the video image frames to form a target video.

Optionally, the time information is a time stamp corresponding to recording each video image frame. In this embodiment, target image frames corresponding to the video image frames are spliced according to the sequence of the timestamps corresponding to the video image frames to form a target video.

The target video in this embodiment is formed by the target image frames, and the target image frames are fused based on a first image layer and a second image layer, the second image layer in each target image frame displays the same image, and the first image layer displays a target object captured by each video image frame. Therefore, the formed video presents dynamic display of the target object, and other images except the target object are displayed statically, so that the recorded video achieves the effect of dynamic and static combination, and the video interestingness is increased.

Fifth embodiment

Referring to fig. 8, this embodiment is a further embodiment based on all the above embodiments, and this embodiment records a video that is displayed by a main body dynamically instead of being displayed by a main body statically when the main body is available, and records a dynamic video when the main body is unavailable, so as to enrich the video recording function and increase the diversified display effects of the video.

step S60, acquiring the video image frame, and identifying a main body in the video image frame;

step S70, determining whether the subject is present in the video image frame;

if the main body exists in the video image frame, executing step S10: constructing a first image layer according to object information of a target object in a video image frame; and/or the presence of a gas in the gas,

if the subject does not exist in the video image frame, step S80 is executed to use the video image frame as the target image frame.

In this embodiment, after receiving the video image frame, it is identified in advance whether there is a subject, such as a person, an animal, or another object, in the video image frame. If the main body exists in the video image frame, the video image frame is processed according to the method of the above embodiments, and then a dynamic and static combined video can be shot. And/or if the main body does not exist in the video image frame, namely the video image frame contains static plants, sky, ground and other things, directly transmitting the collected video image frame as a target image frame to an encoder for encoding to form a video. It can be understood that in the video recording process, the recording of the dynamic and static combined video and the standard video is combined, the effect that the background is static when a main body appears in the video and the background is dynamic when the main body leaves the video is achieved, and the interestingness of video recording is increased.

The application also provides a mobile terminal, which comprises a memory and a processor, wherein the memory is stored with an image processing program, and the image processing program realizes the steps of the image processing method in any embodiment when being executed by the processor.

The present application further provides a computer-readable storage medium, on which an image processing program is stored, and the image processing program, when executed by a processor, implements the steps of the image processing method in any of the above embodiments.

In the embodiments of the mobile terminal and the computer-readable storage medium provided in the present application, all technical features of any one of the embodiments of the image processing method may be included, and the expanding and explaining contents of the specification are basically the same as those of the embodiments of the method, and are not described herein again.

Embodiments of the present application also provide a computer program product, which includes computer program code, when the computer program code runs on a computer, the computer is caused to execute the method in the above various possible embodiments.

Embodiments of the present application further provide a chip, which includes a memory and a processor, where the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that a device in which the chip is installed executes the method in the above various possible embodiments.

It is to be understood that the foregoing scenarios are only examples, and do not constitute a limitation on application scenarios of the technical solutions provided in the embodiments of the present application, and the technical solutions of the present application may also be applied to other scenarios. For example, as can be known by those skilled in the art, with the evolution of system architecture and the emergence of new service scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The units in the device in the embodiment of the application can be merged, divided and deleted according to actual needs.

In the present application, the same or similar term concepts, technical solutions and/or application scenario descriptions will be generally described only in detail at the first occurrence, and when the description is repeated later, the detailed description will not be repeated in general for brevity, and when understanding the technical solutions and the like of the present application, reference may be made to the related detailed description before the description for the same or similar term concepts, technical solutions and/or application scenario descriptions and the like which are not described in detail later.

In the present application, the description of each embodiment is focused on, and for parts that are not described or recited in a certain embodiment, reference may be made to the description of other embodiments.

The technical features of the technical solution of the present application may be arbitrarily combined, and for brevity of description, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present application as long as there is no contradiction between the combinations of the technical features.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a controlled terminal, or a network device) to execute the method of each embodiment of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, memory Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. An image processing method, comprising:

constructing a second image layer according to the target image;

2. The method of claim 1, wherein constructing the second image layer from the target image comprises:

3. The method of claim 2, wherein if the target image includes the target object;

4. The method as claimed in claim 3, wherein the determining the region to be filled in of the target image according to the position information of the target object in the target image comprises:

5. The method of claim 3, wherein the filling the area to be filled with the target pixel comprises:

and filling the area to be filled with pixel information adjacent to the target object in the target video image frame.

6. The method of any of claims 1 to 5, wherein said forming a target image frame from the first image layer and the second image layer comprises:

7. The method of claim 6, wherein the method further comprises:

acquiring edge information of a target object in the video image frame;

8. The method of any one of claims 1 to 5, wherein the target object is a target subject identified in the video image frames; or, the target object is a target background, other than a target subject, identified in the video image frame.

9. The method of claim 8, wherein said target subject is at least one target human object identified from each of said video image frames to which the video corresponds; alternatively, the first and second electrodes may be,

10. The method of any one of claims 1 to 5, wherein the target image is acquired in a manner comprising:

and acquiring a target video image frame in a video, and taking the target video image frame as the target image.

11. The method of any of claims 1 to 5, wherein the video image frames comprise at least two;

12. The method of any of claims 1 to 5, wherein constructing the first image layer from object information of a target object in the video image frame comprises:

13. The method of any of claims 1 to 5, wherein prior to the step of constructing the first image layer from object information of target objects in the video image frames, the method further comprises:

14. A mobile terminal, characterized in that the mobile terminal comprises: memory, processor, wherein the memory has stored thereon a program which, when executed by the processor, carries out the steps of the image processing method according to any one of claims 1 to 13.

15. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the image processing method according to any one of claims 1 to 13.