CN111263073A

CN111263073A - Image processing method and electronic device

Info

Publication number: CN111263073A
Application number: CN202010124535.5A
Authority: CN
Inventors: 徐有健
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2020-06-09
Anticipated expiration: 2040-02-27
Also published as: CN111263073B

Abstract

The embodiment of the invention discloses an image processing method and electronic equipment. The method comprises the following steps: acquiring a target video, wherein the target video comprises a plurality of objects; carrying out face recognition and human body recognition on each frame of image in a plurality of frames of images in a target video to obtain a facial expression set and a human body posture characteristic information set of each object; screening target facial expressions from the facial expression set of each object based on preset expression screening conditions, and screening target human body posture characteristic information from the human body posture characteristic information set of each object based on preset posture screening conditions; determining an image to be synthesized of each object according to the target facial expression and the target human body posture characteristic information of each object; and processing the image to be synthesized of each object to obtain a target image comprising a plurality of objects. The embodiment of the invention can improve the shooting efficiency of the group photo.

Description

Image processing method and electronic device

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to an image processing method and electronic equipment.

Background

In daily life, in some occasions and places, the requirement of multi-person group photo often appears. Many times, multiple people group photos will be photographed in some neat position, such as jumping together, queue heart, etc.

However, since the motions of each person are not uniform, there may be no way for multiple persons to achieve uniform motions, expressions, and the like at the same time, which makes it difficult to capture a perfect picture desired by a user. At present, a better picture is required to be obtained, shooting can only be performed for multiple times until a group photo meeting the requirements is shot, the shooting efficiency is low, and the user experience is poor.

Disclosure of Invention

The embodiment of the invention provides an image processing method and electronic equipment, which can solve the problem of low efficiency in shooting group photo meeting the requirements of group photo staffs.

In order to solve the above technical problem, the embodiment of the present invention is implemented as follows:

in a first aspect, an embodiment of the present invention provides an image processing method applied to an electronic device, where the method includes:

acquiring a target video, wherein the target video comprises a plurality of objects;

carrying out face recognition and human body recognition on each frame of image in a plurality of frames of images in a target video to obtain a facial expression set and a human body posture characteristic information set of each object;

screening target facial expressions from the facial expression set of each object based on preset expression screening conditions, and screening target human body posture characteristic information from the human body posture characteristic information set of each object based on preset posture screening conditions;

determining an image to be synthesized of each object according to the target facial expression and the target human body posture characteristic information of each object;

and processing the image to be synthesized of each object to obtain a target image comprising a plurality of objects.

In a second aspect, an embodiment of the present invention provides an electronic device, including:

the video acquisition module is used for acquiring a target video, and the target video comprises a plurality of objects;

the recognition module is used for carrying out face recognition and human body recognition on each frame of image in a plurality of frames of images in the target video to obtain a facial expression set and a human body posture characteristic information set of each object;

the screening module is used for screening a target facial expression from the facial expression set of each object based on a preset expression screening condition and screening target human body posture characteristic information from the human body posture characteristic information set of each object based on a preset posture screening condition;

the image to be synthesized determining module is used for determining an image to be synthesized of each object according to the target facial expression and the target human body posture characteristic information of each object;

the first processing module is used for processing the image to be synthesized of each object to obtain a target image comprising a plurality of objects.

In a third aspect, an embodiment of the present invention provides an electronic device, where the device includes: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the image processing method as provided in the first aspect above.

In a fourth aspect, an embodiment of the present invention provides a computer storage medium, on which computer program instructions are stored, and when executed by a processor, the computer program instructions implement the image processing method provided in the first aspect.

In the embodiment of the invention, a facial expression set and a human body posture characteristic information set of each object are obtained from a shot target video comprising a plurality of objects by utilizing a face recognition technology and a human body recognition technology; and then, screening out target facial expressions and target human body posture characteristic information which meet the requirements of each object by using preset expression screening conditions and preset posture screening conditions. And then, determining an image to be synthesized of each object according to the target facial expression and the target human body posture characteristic information of each object, and processing the image to be synthesized of each object to obtain a target image comprising a plurality of objects. The group photo image meeting the requirements of each group photo object can be obtained by automatically processing the image to be synthesized of each object, videos of a plurality of objects are only required to be recorded once, a plurality of objects are not required to be shot for many times, the operation time of a user is shortened, and the shooting efficiency is improved.

Drawings

The present invention will be better understood from the following description of specific embodiments thereof taken in conjunction with the accompanying drawings, in which like or similar reference characters designate like or similar features.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present invention;

FIG. 2 is a diagram of a background image provided by an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 4 is a second schematic structural diagram of an electronic device according to an embodiment of the invention.

Detailed Description

Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Based on the technical problem, the embodiment of the invention provides an image processing method, which can improve the shooting efficiency of group photo photos meeting the requirements of group photo personnel. The following detailed description is to be read with reference to the specific drawings and examples.

Fig. 1 is a schematic flow chart illustrating an image processing method according to an embodiment of the present invention. As shown in fig. 1, the image processing method provided by the embodiment of the present invention is applied to an electronic device, and includes steps 110 to 150.

In step 110, the electronic device obtains a target video, where the target video includes a plurality of objects.

And 120, the electronic equipment performs face recognition and human body recognition on each frame of image in the multiple frames of images in the target video to obtain a facial expression set and a human body posture characteristic information set of each object.

In step 130, the electronic device screens out a target facial expression from the facial expression set of each object based on a preset expression screening condition, and screens out target human posture feature information from the human posture feature information set of each object based on a preset posture screening condition.

In step 140, the electronic device determines an image to be synthesized of each object according to the target facial expression and the target human body posture feature information of each object.

Step 150, the electronic device processes the image to be synthesized of each object to obtain a target image including a plurality of objects.

The specific implementation of each of steps 110 to 150 is described in detail below.

First, a specific implementation of step 110 will be described. In some embodiments, the object may be a person. The multiple objects are multiple persons needing group photo, namely multiple persons needing to shoot group photo images.

When a photographer opens a camera shooting collection assembly of the electronic equipment, a plurality of objects needing to shoot a group photo image are aligned, and therefore the plurality of objects are determined to be in a view finding range.

The electronic equipment receives a video recording request of a user, and in response to the request, a camera shooting and collecting assembly of the electronic equipment starts video recording on a plurality of objects. For example, the request to record a video may be a click input to a shoot button.

In the video recording process, a shot object puts a desired shooting posture and expression (such as postures of jumping together to empty and arranging in a row), and waits for the completion of the action of the shot object, and the shot object clicks an end shooting button, so that the electronic equipment can acquire a target video, wherein the target video comprises a plurality of shot objects.

The specific implementation of step 120 is described below. In some embodiments, a plurality of frames of images in the target video are first acquired. For example, a plurality of frames of images in the target video may be selected at preset time intervals, for example, one frame of image may be selected every 100 ms. For another example, a plurality of frames of images may be randomly selected.

And for each frame of image in the selected multi-frame images, carrying out face recognition and human body recognition on the image to obtain the facial expression and human body posture characteristics of each object in the image. For each object, the facial expressions of the object in each frame of image are collected together, and the facial expression set of the object can be obtained. For each object, the human posture characteristic information of the object in each frame of image is collected together, and the human posture characteristic information set of the object can be obtained.

As an example, an input image may be processed by using a pre-trained face recognition model, and a facial expression of each object in the image is obtained. The input image can be processed by using a human body recognition model trained in advance, and human body posture characteristic information of each object in the image is obtained.

As one example, the human posture feature information is some feature information for embodying a human posture. For example, the human posture feature information may include pixel position information of limbs, the trunk, and the head of the human and relative position information between these parts.

It should be noted that, for each object, when storing facial expression information of the object, the image of the frame corresponding to the facial feature information in the plurality of frames of images is stored in association for subsequent image synthesis. For each object, when the human body posture characteristic information of the object is stored, the frame image corresponding to the human body posture characteristic information in the multi-frame images is stored in a correlated mode for subsequent image synthesis.

The specific implementation of step 130 is described below. In some embodiments, step 130 comprises: if the expression change corresponding to the facial expression set does not meet the preset change condition, the electronic equipment takes the facial expression of the object in the first target image as a target facial expression; and if the expression change corresponding to the facial expression set meets a preset change condition, the electronic equipment takes the facial expression with the highest matching degree with the preset facial expression in the facial expression set as the target facial expression.

The first target image is a frame image of which the face image of the object in the multi-frame image meets a first preset definition condition.

In some embodiments, the expression change corresponding to the facial expression set may be characterized by the number of facial expression types in the facial expression set. As an example, the preset change condition is that the number of types of facial expressions in the facial expression set exceeds a preset threshold. For example, the preset threshold is 2.

For a single object, if the expression change corresponding to the facial expression set of the object does not satisfy a preset condition, for example, the number of the types of facial expressions in the facial expression set is less than or equal to 2, it means that the expression change of the object in the target video is not large.

Therefore, the facial expression of the subject in one frame image (i.e., the first target image) of the multi-frame image in which the facial image of the subject satisfies the first preset definition condition may be selected as the target facial expression.

As an example, if the image satisfying the first preset definition condition is a plurality of frames, any one frame is selected, or a frame with the highest definition is selected as the first target image.

By taking the facial expression of the object in the first target image as the target facial expression of the object, the definition of the group image is improved.

For a single object, if the expression change corresponding to the facial expression set of the object meets a preset change condition, it represents that the expression change of the object in the target video is large, and then the facial expression having the highest matching degree with the preset facial expression in the facial expression set may be taken as the target facial expression. For example, the preset facial expression may be a laugh or smile.

In the embodiment of the invention, the preset facial expression can be customized by the shooting user so as to meet the shooting requirement of the shooting user.

In some embodiments, step 130 comprises: the electronic equipment determines the human posture change trend of the object based on each human posture characteristic information in the human posture characteristic information set of the object; the electronic equipment determines first human posture characteristic information with the maximum posture change amplitude based on the human posture change trend; the electronic device takes the first human posture feature information as target human posture feature information of the object.

In some embodiments, for each object, the human posture variation trend of the object is a posture variation trend of the object successively with the time of the multi-frame image.

As one example, the human posture change tendency may include a position change tendency of the head, a position change tendency of the head with respect to the torso, and a position change tendency of the limbs with respect to the torso, and the like.

For each object, the pixel position of the head of the object in each frame of image can be obtained according to each body posture characteristic information of the object. According to the pixel position of the head of the object in each frame of image, the position change trend of the head of the object can be determined.

First, it is determined whether the height position of the head of each object has changed or not, based on the position change tendency of the head of the object. If the height position of the head changes, representing that the object undergoes a posture change similar to jumping, the human body posture characteristic information of the object with the largest posture change amplitude is selected as the target human body posture characteristic information. For example, the first human posture characteristic information of the maximum posture change width is the human posture characteristic information corresponding to the head position with the highest trend of the change of the head of the subject.

For each object, if the position change trend of the head of the object is relatively gentle, determining first human posture feature information according to the position change trend of the head relative to the trunk and the position change trend of the limbs relative to the trunk.

As one example, for each person, the human posture feature information of the person includes pixel position information of a preset organ in the head and pixel position information of a preset joint point (chest) in the torso of the person. Based on the pixel position information of the preset organ in the head and the pixel position information of the preset joint point in the trunk, the distance between the two can be calculated. The position change trend of the head relative to the trunk can be obtained through the distance change trend between the preset organs in the head and the preset joint points in the trunk. Similarly, the trend of the position change of the limbs relative to the trunk can be derived from the distance between important joint points (e.g. fingers, elbows, shoulders) on the limbs and preset joint points in the trunk.

Because the human posture joint change trends are similar, the first human posture feature information with the largest posture change amplitude is the human posture feature information corresponding to the maximum value, namely, the distance between the preset organ in the head and the preset joint point in the trunk of the subject is the maximum value, and the distance between the important joint point (such as a finger, an elbow and a shoulder) on the limbs of the subject and the preset joint point in the trunk is the maximum value.

If the posture of the human body changes smoothly, the human body position does not change greatly in the whole shooting process, and then the posture of the user can be judged to be in a static standing state. Therefore, the human body posture characteristic information of the person in one image of the plurality of images, in which the posture image of the object meets the preset definition condition, can be used as the target human body posture characteristic information.

Facial expression information and human body posture characteristic information meeting the shooting requirements of each person are screened out by utilizing the preset expression screening conditions and the preset posture screening conditions, multiple times of shooting are not needed, a group photo meeting the shooting requirements of each person is searched, and the shooting efficiency is improved.

The specific implementation of step 140 is described below. In some embodiments, step 140 comprises: the electronic equipment acquires a human body image of the object in the second target image; the electronic equipment updates the human body image based on the target human body posture characteristic information and the target facial expression of the object to obtain an image to be synthesized of the object.

The second target image is an image corresponding to target human body posture characteristic information of an object in the multi-frame image, or is a frame image of which the posture image of the object in the multi-frame image meets a preset definition condition

In some embodiments, in order to improve processing efficiency, the second target image may be an image corresponding to the target human posture feature information in the multi-frame image. And for each object, if the second target image is the image corresponding to the target human body posture characteristic information, updating the expression in the human body image of the person in the image into the target facial expression of the person, and obtaining the image to be synthesized of the object.

In some embodiments, in order to improve the definition of the target image, the second target image is one of the plurality of frame images in which the posture image of the object satisfies a preset definition condition. On the basis, the human posture feature information in the human body image of the person in the second target image can be updated to be the target human posture feature information of the object, and the facial expression of the person in the human body image can be updated to be the target facial expression of the object, so that the image to be synthesized of the object is obtained.

A specific implementation of step 150 is described below. In step 150, the electronic device may perform image stitching on the images to be synthesized of two adjacent objects according to the arrangement order of each object, so as to obtain a group image including each object, i.e., a target image.

In some embodiments, the position of each object may also be adjusted so that all people are aligned. Finally, a group photo with neat and beautiful figure posture, clear expression and neat and harmonious overall action is obtained.

In some embodiments, the image processing method according to an embodiment of the present invention further includes: the electronic equipment acquires the background overlapping area of the multi-frame image. On this basis, step 150 includes the electronic device performing fusion processing on the image to be synthesized and the background overlapping area of each object to obtain the target image.

Fig. 2 shows one frame of image in the target video, which includes 3 objects. By performing the matting processing on the object of the image in fig. 2, the background image corresponding to the image can be obtained. By carrying out object matting processing on each frame of image in the target video, a background image corresponding to the image can be obtained. Then, feature extraction is carried out on the background image of each frame of image, and a common background overlapping area corresponding to the plurality of frames of images can be obtained.

Then, the image to be synthesized of each object is embedded in the background overlapping area, so that a group image including a plurality of objects can be obtained.

In some embodiments, in order to further satisfy the group photo requirement of each object in the plurality of objects, the image processing method provided by the embodiment of the present invention further includes step 160 and step 170. At step 160, the electronic device receives a first input. In step 170, the electronic device processes the target image in response to the first input, and obtains a processed target image.

In some embodiments of the present invention, the first input may be a click input, a long press input, a slide input, or a preset gesture operation. Wherein the first input is an input to process the target image.

In the embodiment of the invention, the processed target image is obtained by processing the target image line in response to the first input of the user, so that the user can adjust the target image according to the own requirements, and the convenience is improved.

In some embodiments, the first input is associated with a processing parameter of a property of a target object of the plurality of objects, wherein step 170 comprises: and the electronic equipment processes the attribute of the target object based on the processing parameter to obtain a processed target image.

Wherein the attribute of the target object includes a position of the target object in the composite image, a facial expression of the target object, or a pose of the target object.

As one example, the attribute of the target object is a position of the target object in the target image. For example, the first input includes a selection input of a target location template. For example, a plurality of position templates may be provided in advance, each position template being used to characterize the position of each object in the plurality of objects in the target image. For example, the position templates may be arranged in a line or in an arc, etc. For another example, the first input may be associated with a position change parameter of the target object. And changing the position of the target object based on the position change parameter of the target object to obtain a target image after the position of the target object is changed. That is, the user can adjust the position of each object by one key, or can adjust the position of each object individually.

In other embodiments, the attribute of the target object is a facial expression of the target object. The first input includes a selection input for a target object and a selection input for a target facial expression. For example, when a user clicks on an object in the selected target image, i.e., the target object, each expression in the facial expression set of the object is displayed for the user to select. And the expression clicked and selected by the user is the target facial expression. Then, in response to the first input, replacing the original facial expression of the target person in the target image with the target facial expression selected by the user.

In other embodiments, the attribute of the target object is a gesture of the target object. For example, the first input includes a selection input for a target object and a selection input for a target limb motion of the target object. For example, when the user clicks one object in the selected target image, namely the target object, each limb action of the object in the shooting process is displayed for the user to select. And the limb action clicked and selected by the user is the target limb action. And then responding to the first input, and replacing the original target limb action of the target person in the target image with the target limb action selected by the user.

Similarly, the user may also adjust the torso and head of the target object as described above.

In some embodiments, the user can properly adjust the figure, adjust the length or width of the head, the trunk, and the limbs, and the different body parts can be properly stretched and spliced to obtain the satisfactory figure of the user, so as to obtain the best group photo effect.

In some embodiments, the user may also perform a beautifying process on each object or background picture in the target image to obtain a group image meeting the user's needs.

The embodiment of the invention can help the user to carry out private customization on the posture, the expression, the stature and the character arrangement of the character, fully meet all the requirements of the user, not only can further optimize the effect of group photo, but also bring full fun to the group photo operation of the user.

Fig. 3 shows a schematic structural diagram of an electronic device 300 according to an embodiment of the present invention. As shown in fig. 3, an electronic device 300 provided in an embodiment of the present invention includes:

the video acquiring module 310 is configured to acquire a target video, where the target video includes a plurality of objects.

The identifying module 320 is configured to perform face identification and human body identification on each frame of image in multiple frames of images in the target video, so as to obtain a facial expression set and a human body posture feature information set of each object.

And the screening module 330 is configured to screen a target facial expression from the facial expression set of each subject based on a preset expression screening condition, and screen target human posture feature information from the human posture feature information set of each subject based on a preset posture screening condition.

And the image to be synthesized determining module 340 is configured to determine an image to be synthesized of each object according to the target facial expression and the target human body posture feature information of each object.

The first processing module 350 is configured to process an image to be synthesized of each object to obtain a target image including a plurality of objects.

In some embodiments of the invention, the screening module 330 is configured to:

if the expression change corresponding to the facial expression set does not meet the preset change condition, taking the facial expression of the object in the first target image as the target facial expression; the first target image is a frame image of which the face image of the object in the multi-frame image meets a first preset definition condition;

and if the expression change corresponding to the facial expression set meets a preset change condition, taking the facial expression with the highest matching degree with the preset facial expression in the facial expression set as the target facial expression.

determining a human posture change trend of the object based on each human posture characteristic information in the human posture characteristic information set of the object;

determining first human posture characteristic information with the maximum posture change amplitude based on the human posture change trend;

the first human posture feature information is used as target human posture feature information of the target.

In some embodiments of the present invention, the image to be synthesized determination module 340:

acquiring a human body image of an object in a second target image; the second target image is a frame image corresponding to target human body posture characteristic information of an object in the multi-frame image, or is a frame image of which the posture image of the object in the multi-frame image meets a preset definition condition;

and updating the human body image based on the target human body posture characteristic information and the target facial expression of the object to obtain an image to be synthesized of the object.

In some embodiments of the present invention, electronic device 300 further comprises:

the acquisition module is used for acquiring a background overlapping area of a plurality of frames of images;

wherein, the first processing module 350 is configured to:

and carrying out fusion processing on the image to be synthesized of each object and the background overlapping area to obtain a target image.

The electronic device 300 provided in the embodiment of the present invention can implement each process in the embodiment of the image processing method provided in the embodiment of the present invention, and is not described here again to avoid repetition.

Fig. 4 is a schematic diagram of a hardware structure of an electronic device 400 for implementing various embodiments of the present invention, where the electronic device 400 includes, but is not limited to: radio frequency unit 401, network module 402, audio output unit 403, input unit 404, sensor 405, display unit 406, user input unit 407, interface unit 408, memory 409, processor 410, and power supply 411. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 4 does not constitute a limitation of the electronic device, and that the electronic device may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the electronic device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

The processor 410 is configured to obtain a target video, where the target video includes a plurality of objects; carrying out face recognition and human body recognition on each frame of image in a plurality of frames of images in a target video to obtain a facial expression set and a human body posture characteristic information set of each object; screening target facial expressions from the facial expression set of each object based on preset expression screening conditions, and screening target human body posture characteristic information from the human body posture characteristic information set of each object based on preset posture screening conditions; determining an image to be synthesized of each object according to the target facial expression and the target human body posture characteristic information of each object; and processing the image to be synthesized of each object to obtain a target image comprising a plurality of objects. .

It should be understood that, in the embodiment of the present invention, the radio frequency unit 401 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 410; in addition, the uplink data is transmitted to the base station. Typically, radio unit 401 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. Further, the radio unit 401 can also communicate with a network and other devices through a wireless communication system.

The electronic device provides wireless broadband internet access to the user via the network module 402, such as assisting the user in sending and receiving e-mails, browsing web pages, and accessing streaming media.

The audio output unit 403 may convert audio data received by the radio frequency unit 401 or the network module 402 or stored in the memory 409 into an audio signal and output as sound. Also, the audio output unit 403 may also provide audio output related to a specific function performed by the electronic apparatus 400 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 403 includes a speaker, a buzzer, a receiver, and the like.

The input unit 404 is used to receive audio or video signals. The input Unit 404 may include a Graphics Processing Unit (GPU) 4041 and a microphone 4042, and the Graphics processor 4041 processes image data of a still picture or video obtained by an image capturing apparatus (such as a camera module) in a video capture mode or an image capture mode. The processed image frames may be displayed on the display unit 406. The image frames processed by the graphic processor 4041 may be stored in the memory 409 (or other storage medium) or transmitted via the radio frequency unit 401 or the network module 402. The microphone 4042 may receive sound, and may be capable of processing such sound into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 401 in case of the phone call mode.

The electronic device 400 also includes at least one sensor 405, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor includes an ambient light sensor that adjusts the brightness of the display panel 4061 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 4061 and/or the backlight when the electronic apparatus 400 is moved to the ear. As one type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of an electronic device (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 405 may also include a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which will not be described in detail herein.

The display unit 406 is used to display information input by the user or information provided to the user. The Display unit 406 may include a Display panel 4061, and the Display panel 4061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 407 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 407 includes a touch panel 4071 and other input devices 4072. Touch panel 4071, also referred to as a touch screen, may collect touch operations by a user on or near it (e.g., operations by a user on or near touch panel 4071 using a finger, a stylus, or any suitable object or attachment). The touch panel 4071 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 410, receives a command from the processor 410, and executes the command. In addition, the touch panel 4071 can be implemented by using various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 4071, the user input unit 407 may include other input devices 4072. Specifically, the other input devices 4072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a track ball, a mouse, and a joystick, which are not described herein again.

Further, the touch panel 4071 can be overlaid on the display panel 4061, and when the touch panel 4071 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 410 to determine the type of the touch event, and then the processor 410 provides a corresponding visual output on the display panel 4061 according to the type of the touch event. Although in fig. 4, the touch panel 4071 and the display panel 4061 are two independent components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 4071 and the display panel 4061 may be integrated to implement the input and output functions of the electronic device, and the implementation is not limited herein.

The interface unit 408 is an interface for connecting an external device to the electronic apparatus 400. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 408 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic apparatus 400 or may be used to transmit data between the electronic apparatus 400 and an external device.

The memory 409 may be used to store software programs as well as various data. The memory 409 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 409 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 410 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the electronic device and processes data by operating or executing software programs and/or modules stored in the memory 409 and calling data stored in the memory 409, thereby performing overall monitoring of the electronic device. Processor 410 may include one or more processing units; preferably, the processor 410 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 410.

The electronic device 400 may further include a power supply 411 (e.g., a battery) for supplying power to various components, and preferably, the power supply 411 may be logically connected to the processor 410 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.

In addition, the electronic device 400 includes some functional modules that are not shown, and are not described in detail herein.

Preferably, an embodiment of the present invention further provides an electronic device, further including a processor, a memory, and a computer program stored in the memory and capable of running on the processor, where the computer program, when executed by the processor, implements each process of the above-mentioned embodiment of the image processing method, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the embodiment of the image processing method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An image processing method applied to an electronic device, the method comprising:

carrying out face recognition and human body recognition on each frame of image in a plurality of frames of images in the target video to obtain a facial expression set and a human body posture characteristic information set of each object;

determining an image to be synthesized of each object according to the target facial expression and target human body posture characteristic information of each object;

and processing the image to be synthesized of each object to obtain a target image comprising the plurality of objects.

2. The method of claim 1, wherein the screening of the target facial expression from the facial expression set of each subject based on preset expression screening conditions comprises:

if the expression change corresponding to the facial expression set does not meet a preset change condition, taking the facial expression of the object in the first target image as the target facial expression; the first target image is one frame image of the face image of the object in the multi-frame images, and the first target image meets a first preset definition condition;

and if the expression change corresponding to the facial expression set meets the preset change condition, taking the facial expression with the highest matching degree with the preset facial expression in the facial expression set as the target facial expression.

3. The method according to claim 1, wherein the screening target human body posture characteristic information from the human body posture characteristic information set of each object based on a preset posture screening condition comprises:

determining a human posture change trend of the object based on each human posture feature information in the set of human posture feature information of the object;

determining first human posture feature information with the largest posture change amplitude based on the human posture change trend;

and taking the first human posture characteristic information as target human posture characteristic information of the object.

4. The method of claim 1, wherein the determining the image to be synthesized of each object according to the target facial expression and target human posture feature information of each object comprises:

acquiring a human body image of the object in a second target image; the second target image is a frame image corresponding to target human body posture characteristic information of the object in the multi-frame images, or is a frame image in which the posture image of the object in the multi-frame images meets a preset definition condition;

5. The method of claim 1, further comprising:

acquiring a background overlapping area of the multi-frame image;

wherein the processing the image to be synthesized of each object to obtain a target image including the plurality of objects includes:

and performing fusion processing on the image to be synthesized of each object and the background overlapping area to obtain the target image.

6. An electronic device, characterized in that the electronic device comprises:

the recognition module is used for carrying out face recognition and human body recognition on each frame of image in the plurality of frames of images in the target video to obtain a facial expression set and a human body posture characteristic information set of each object;

and the first processing module is used for processing the image to be synthesized of each object to obtain a target image comprising the plurality of objects.

7. The electronic device of claim 6, wherein the filtering module is configured to:

8. The electronic device of claim 6, wherein the filtering module is configured to:

9. The electronic device of claim 6, wherein the image to be synthesized determination module is configured to:

10. The electronic device of claim 6, wherein the device further comprises:

the acquisition module is used for acquiring a background overlapping area of the multi-frame image;

wherein, the first processing module is used for: