WO2021259185A1

WO2021259185A1 - Image processing method and apparatus, device, and readable storage medium

Info

Publication number: WO2021259185A1
Application number: PCT/CN2021/101166
Authority: WO
Inventors: 陈露兰
Original assignee: 维沃移动通信有限公司
Priority date: 2020-06-24
Filing date: 2021-06-21
Publication date: 2021-12-30
Also published as: CN111722775A

Abstract

Disclosed in the present application are an image processing method and apparatus, a device, and a readable storage medium. The method comprises: when displaying a target image, receiving a first input of a user for a target subject in the target image; and, in response to the first input, playing a first video corresponding to the target subject, the first video being a video obtained by performing enlargement processing on the expression or action of the target subject in a second video, and the second video being a video of the target subject recorded during the filming process of the target image.

Description

Image processing method, device, equipment and readable storage medium

Cross-references to related applications

This application claims the priority of Chinese Patent Application No. 202010595589.X filed in China on June 24, 2020, the entire content of which is incorporated herein by reference.

Technical field

This application relates to the field of image technology, and in particular to an image processing method, device, device, and readable storage medium.

Background technique

The use of electronic devices such as mobile terminals is becoming more and more popular. At present, when using a mobile terminal or the like to generate a video with a magnified expression or action, it is generally necessary for the user to record a video or take multiple photos, and then process the video or picture through video processing software. However, the operation of generating a video in this way is complicated.

Summary of the invention

The embodiments of the present application provide an image processing method, device, equipment, and readable storage medium to solve the problem of complicated operations when generating a video.

In the first aspect, an embodiment of the present application provides an image processing method. The image processing method includes the following steps:

In the case of displaying the target image, receiving a user's first input on the target object in the target image;

In response to the first input, play a first video corresponding to the target object;

Wherein, the first video is a video obtained by amplifying the expression or action of the target object in the second video, and the second video is recorded for the target object during the shooting of the target image Video.

In the second aspect, an embodiment of the present application provides an image processing device, including:

The first receiving module is configured to receive the first input of the user to the target object in the target image when the target image is displayed;

The first play module is configured to play the first video corresponding to the target object in response to the first input;

In a third aspect, an embodiment of the present application provides an electronic device that includes a processor, a memory, and a program or instruction that is stored on the memory and can run on the processor. The program or instruction is The processor implements the steps of the method described in the first aspect when executed.

In a fourth aspect, an embodiment of the present application provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented .

In a fifth aspect, an embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, and the processor is used to run a program or an instruction to implement the chip as in the first aspect The method described.

In the embodiment of the present application, when the target image is displayed, the first input of the user to the target object in the target image is received, and in response to the first input, the first video corresponding to the target object is played , And the first video is a video obtained by amplifying the expression or action of the target object in the second video, and the second video is recorded for the target object during the shooting of the target image Video. It can be seen that there is no need to call other video processing software when generating a video that includes expressions or magnified actions, which simplifies the operation when generating a video.

Description of the drawings

FIG. 1 is one of the flowcharts of the image processing method provided by the embodiment of the present application;

FIG. 2 is the second flowchart of the image processing method provided by the embodiment of the present application;

Figures 3 to 4 are respectively schematic diagrams of display interfaces provided by embodiments of the present application;

FIG. 5 is a schematic diagram of a preprocessed video of a target object generated in the implementation of this application;

6 to 9 are schematic diagrams of display interfaces provided by embodiments of the present application;

Fig. 10(a) and Fig. 10(b) are schematic diagrams of the display interfaces provided by the embodiments of the present application;

FIG. 11 is the third flowchart of the image processing method provided by the embodiment of the present application;

Figures 12(a) and 12(b) are schematic diagrams of display interfaces provided by embodiments of the present application;

13(a) and 13(b) are schematic diagrams of display interfaces provided by embodiments of the present application;

14(a) and 14(b) are schematic diagrams of display interfaces provided by embodiments of the present application;

15(a) and 15(b) are schematic diagrams of display interfaces provided by embodiments of the present application;

16(a) and 16(b) are schematic diagrams of display interfaces provided by an embodiment of the present application;

Figure 17 is a structural diagram of an image processing device provided by an embodiment of the present application;

FIG. 18 is one of the structural diagrams of an electronic device provided by an embodiment of the present application;

FIG. 19 is the second structural diagram of the electronic device provided by the embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The terms "first" and "second" in the specification and claims of this application are used to distinguish similar objects, but not to describe a specific sequence or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the "first", "second", etc. are distinguished The objects are usually of one type, and the number of objects is not limited. For example, the first object may be one or multiple. In addition, "and/or" in the description and claims means at least one of the connected objects, and the character "/" generally means that the associated objects before and after are in an "or" relationship.

The image processing method provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings, through specific embodiments and application scenarios thereof.

Referring to FIG. 1, FIG. 1 is a flowchart of an image processing method provided by an embodiment of the present application. As shown in FIG. 1, it includes the following steps:

Step 101: In the case of displaying a target image, receive a user's first input of a target object in the target image.

Wherein, the target image may be an image captured by an electronic device when implementing the embodiments of the present application, may also be an image stored in an album of the electronic device, or may be an image corresponding to a shooting preview interface.

Wherein, the first input may be touch input, click input, etc., for example. The target object is not limited in the embodiment of the present application, and may be, for example, a certain person or a certain object.

Step 102: In response to the first input, play a first video corresponding to the target object.

Wherein, the first video is a video obtained by amplifying the expression or action of the target object in the second video, and the second video is recorded for the target object during the shooting of the target image Video. Since the second video is a video recorded for the target object during the shooting of the target image, in this case, the obtained image and video can reflect the action information of the target object. Therefore, in this way , Can make the video obtained more in line with the needs of users.

Among them, the video action amplification processing technology is based on signal processing technology, which treats the data of each pixel in the video as a time sequence, and amplifies a specific frequency domain signal through signal processing to obtain an action-amplified video.

Enlarging the video through video amplification technology can make those small changes that are hard to detect by the human eye amplify into an obvious look. Through AI (Artificial Intelligence) recognition technology, the user’s area of interest can be intercepted to obtain the emoticon package. Allow users to easily and conveniently obtain emoticons based on micro-expressions.

Wherein, the length of the first video may be about 1s-2s.

In the embodiments of this application, the above method can be applied to electronic devices, such as: mobile phones, tablet computers (Tablet Personal Computer), laptop computers (Laptop Computer), personal digital assistants (personal digital assistant, PDA), mobile Internet devices (Mobile Internet Device, MID) or Wearable Device (Wearable Device), etc.

Optionally, before step 102, the image processing method of the embodiment of the present application may further include:

In the case where the multiple target frames of the second video include the target object, an area image is respectively captured from the multiple target frames, and each of the area images includes the target object. Then, a plurality of the regional images are spliced to obtain a pre-processed video. Afterwards, a video action amplification process is performed on the preprocessed video to obtain the first video.

Specifically, in the above process, the AI technology can be used to identify the video frame of the second video, and identify whether the target object is included. If the target object is included, it can be used as the target frame. Wherein, the target object may be one or multiple.

Optionally, in this embodiment of the present application, in different target frames, the display positions of the captured regional images may be the same. Taking the target object as an example, the first display position of the target object in the first target frame is determined through intelligent recognition. Wherein, the first display position can be understood as the display position of the target object on the display screen of the electronic device. Since the size and pixel size of the display screen are known, if the target object is recognized, the first display position can be determined. In order to clearly identify the target object, a rectangular frame or the like can be used to identify the display position of the target object in the image. Optionally, what is identified by the rectangular frame needs to include at least the face of the target object, or may also include the overall outline of the target object.

After the first display position is determined, the region images corresponding to the first display position can be respectively intercepted from multiple target image frames. That is to say, the display position of the region image intercepted from different frames on the display screen is also the first display position.

By performing video action zoom processing on the pre-processed video, taking the target object as an example, those micro expressions that are not easily detectable by the naked eye, such as slightly lifting the corners of the mouth, slightly squeezing the eyebrows, slightly widening the eyes, etc. Processing, the amplitude of those micro-expression movements can be magnified, which is obvious. At this time, the first video is a video that contains information about the zoom-in of the micro-expression action.

Of course, in the embodiment of the present application, the first video obtained based on the video action magnification processing includes not only the magnification information of the micro-expression action, but also the magnification information of the movement such as limbs.

Through the above method, the region image is intercepted from the target frame including the target object, and then the region image is spliced to form a preprocessed video, and the video action is enlarged. In this way, since it is the first video obtained by processing the regional image, the processing speed is faster.

Extracting at least two images in the second video, wherein the image content of the at least two images includes micro-actions or micro-expressions of the target object. Then, the micro-action or micro-expression in the at least two images is subjected to video action enlargement processing. After that, the at least two images after the video action enlargement processing are spliced to obtain the first video. In this way, since the first video is obtained by processing at least two images, optionally, the target object may be one or multiple, and the obtained first video may include one or more If the target object includes multiple objects, the user can select the corresponding object as needed.

In the case where the target object includes at least two objects, in order to facilitate the user to select the desired content from the obtained first video, after the first video is obtained, the user's comments on the first object in the target image may be received And in response to the second input, obtain the video corresponding to the first object from the first video. Wherein, the second input may be an operation such as a click or a touch.

Specifically, in the process of acquiring the video corresponding to the first object from the first video, in the case where the multiple target frames of the first video include the first object, from the multiple The area images are respectively captured in the target frame, and each of the area images includes the first object, and then the area images are spliced to obtain the video of the first object.

In this way, after the micro-action or micro-expression is zoomed in, the user can select the corresponding video.

On the basis of the above embodiment, the target image can also be recognized to obtain the recognition result. Then, according to the recognition result, the object in the target image is highlighted. For example, the target image is intelligently recognized, and the object in the target image is highlighted according to the recognition result, for example, the location of the target object is displayed normally, and other locations of the target image are displayed blurred. Through the above method, it is convenient for the user to select the target object.

Optionally, after the first video corresponding to the target object is played, the method further includes receiving a third input from the user;

In response to the third input, a dynamic image corresponding to the target object is obtained based on the first video corresponding to the target object.

Optionally, the third input may be a touch input, a click input, and so on.

The dynamic image may be an image in the gif format, so that the first video can be quickly generated into a dynamic image, such as generating an emoticon package, which can increase interest.

Fig. 2 is a flowchart of an image processing method provided by an embodiment of the present application. In the embodiment of the present application, based on the video action amplification technology and the AI recognition technology, the micro-expression during the shooting of the subject is acquired. For a specified object, an enlarged version of the video based on its micro-expression action can be generated. Wherein, the video may be an emoticon package or the like.

As shown in Figure 2, it includes the following steps:

Step 201: Obtain an image of the target object.

In the embodiments of the present application, the target object is a person as an example for description.

In this step, turn on the shooting mode. As shown in Figure 3, when the camera function is activated, the terminal enters the camera preview interface. In response to the user's input, the video generation function can be turned on. For example, when the user's operation on 21 in FIG. 3 is detected, as shown in FIG. 4, the video generation function can be turned on.

Step 202: Obtain a video of the target object.

When the photo is taken, a photo is generated and stored in the album. And, the video recording function is turned on while taking a picture, and a video is obtained, that is, the video is recorded at the same time during the picture. The time of the video can be very short, for example, about 1s-2s. In the embodiment of the present application, only one photo is displayed in the album, and the video content is not displayed.

Step 203: Generate a preprocessed video of the target object.

As shown in Figure 5, after the video is obtained, the location of the target person in the video can be identified through AI detection. Based on the interception of the image of the target person at the same position in each frame of the video, the video of the target person is obtained. If there are multiple people in the video, you can capture multiple videos. Then, through the video action amplification technology, the intercepted video is processed to generate the pre-processed video after the action is amplified, that is, those micro expressions in the original intercepted video that are not easily detectable by the naked eye, such as slightly lifting the corner of the mouth, slightly squeezing the eyebrows, and slightly opening the eyes Wait, in the new video, the amplitude of those micro-expression movements can be enlarged due to the processing of the video action amplification technology, which is obvious.

After the video is generated as above, the position of the person in the corresponding image is recognized through AI, and the pre-processed video (the video containing the information of the micro-expression action enlargement) is one-to-one correspondence with the position of the person in the image. At this time, the album will store a photo and a video corresponding to the location of the person, as well as their corresponding information.

Step 204: Preview the generated preprocessed video.

When the pre-processed video is generated, a preview function can be provided. At this time, in response to the user's input, the target object that the user desires to preview is determined, and a video of the object is obtained.

As shown in FIG. 6, when the user enters the album and needs to view the photos taken in the video shooting mode, he can click the button 22 in FIG. 6. In response to the user's input, the terminal enters the album video preview mode. When the album video preview mode is turned on, in response to the user's input, on the image obtained in step 201, a rectangular dashed frame is displayed at the position of the recognized person according to the recognition result of the AI. Different characters have different colors of the dashed frame, and the terminal can display rectangular buttons with the same color as the rectangular dashed frame, as shown in the

buttons

23 and 24 in Figure 7, that is, the

buttons

23 and 24 of different colors are used to distinguish the interface The objects within the dashed box.

At this time, the user can long press any position in a certain dotted frame 25. As shown in Figure 8, in response to the user's input, except for the content of the dotted frame, other parts of the image become semi-transparent, and the content of the character corresponding to the character’s zoomed-in information video including micro-expression actions is played once in the dotted frame. . At this point, the user can observe the micro-expression of the person in the dashed frame at the moment of taking the photo, and the micro-expression actions have been processed to become obvious, and the user can clearly see the changes in the micro-actions that are usually difficult to notice.

After a preview is over, the photo content is restored from the translucent interface when the emoji package was previewed to the video preview mode interface, as shown in Figure 9. If the user wants to preview again, he can press and hold the area in the dashed rectangle again.

Step 205: Generate a video of the target object.

If the user finishes the preview of the current micro-emoji action zoomed-in video content, as shown in Figure 10(a), click the rectangular button below the photo with the same color as the rectangle dotted line of the previewed micro-emoji action zoomed video. In response to this operation, the corresponding micro-expression action magnifies the video content to generate a video (for example, generating an emoticon package in gif format) and saves it in the album, and prompts that the saving is successful (as shown in FIG. 10(b)).

When the user clicks the button 22 again, in response to the user's input, the video preview can be exited, and the normal photo preview mode can be restored.

In the above embodiment, the micro-expression during the shooting of the person is acquired through the video action amplification technology and the AI recognition technology. The user can specify a certain character to generate an enlarged version of the video based on its micro-expression, and the user does not need to perform additional video processing, which simplifies the operation and improves the user experience.

Fig. 11 is a flowchart of an image processing method provided by an embodiment of the present application. In the embodiment of the present application, based on the video action amplification technology and the AI recognition technology, the micro-expression during the shooting of the person is acquired. For a specified character, an expression package based on the enlarged version of the micro expression action can be generated. At the same time, the user can set the location of the emoticon package so that the generated emoticon package can meet the needs of more users and enhance the user interaction experience. Wherein, the video may be an emoticon package or the like.

As shown in Figure 11, it includes the following steps:

Step 1101: Obtain an image of the target object.

Step 1102: Obtain a video of the target object.

Among them, the detailed description of

steps

1101 and 1102 can refer to the description of the foregoing

steps

201 and 202.

Step 1103: Generate a preprocessed video of the target object.

The difference from step 203 is that in this step, the recorded video is processed through the video action amplification technology. After the processing is completed, a new video is generated after all the micro actions in the video are amplified by the action, that is, the micro expression action amplification information video.

Step 1104: Preview the generated preprocessed video.

As shown in Fig. 12(a), when the user enters the album and needs to view the photos taken in the video shooting mode, he can click the button 26 in Fig. 12(a). In response to the user's input, the terminal enters the video preview mode. When the video preview mode is turned on, in response to the user's operation, on the image obtained in step 201, a rectangular dashed frame is displayed at the position of the recognized person according to the recognition result of the AI. Different characters have different colors of the dashed frame, and the terminal can display

rectangular buttons

27 and 28 with the same color as the rectangular dashed frame, that is, the

buttons

27 and 28 of different colors are used to distinguish the objects in the dashed frame on the interface. At the same time, as shown in FIG. 12(b), the electronic device may display a button 29. When the user finishes adjusting the dashed frame, click button 29 to determine the final position of the dashed frame.

At this time, the user can preview the micro-expression action once to zoom in on the information video content by long pressing the rectangular button in Fig. 13(a) corresponding to the color of the dashed rectangular frame. As shown in Figure 13(b), in response to the user's input, except for the content of the dashed frame, other parts of the image become semi-transparent.

When the user long presses a dashed box, the video content of the micro-expression action magnification information corresponds to the position of the dotted rectangular box on the photo. It is the micro-emoji action magnification information video obtained after processing by applying the video action magnification technology (that is, the result of step 1103). ) Carry out the corresponding position tailoring generation. That is, after the user operates the dotted rectangular frame, the information of the person included in the rectangular frame can be recognized in response to the user's operation, and then, from the video obtained in step 1103, the video of the location of the person is intercepted .

After a preview is over, the photo content is restored from the translucent interface when the emoji package was previewed to the video preview mode interface. If the user wants to preview again, he can press and hold the area in the dashed rectangle again.

In addition, the user can also adjust the position of the dashed rectangular frame. As shown in Figure 14 (a) and (b), for example, if the user wants to preview the zoom-in action of the micro-expression of the entire face, the user can touch the dotted frame and move it. As shown in Fig. 14(a), for example, the dashed frame can be adjusted upward. In response to the user's operation, the dashed frame is adjusted to encompass the entire face. If the user only wants to preview the zooming action of the micro-expression of a certain eye, the dashed frame is adjusted to the position of a certain eye.

When the user is satisfied with the position of the current rectangular frame after adjusting, click the button 29 in Figure 15(a), that is, the confirmation button for adjusting the position of the rectangular frame to exit the editing mode, as shown in Figure 15(b).

Step 1105: Generate a video of the target object.

If the user finishes the preview of the current micro-emoji action zoomed-in video content, as shown in Figure 16(a), the user can click the rectangular button below the photo with the same color as the rectangle dotted line of the previewed micro-emoji action zoomed-in video. In response to this operation, the corresponding micro-expression action zoomed in on the video content will generate the video (such as an emoticon package in gif format) and save it in the album, and prompt the save success (as shown in Figure 16(b)).

When the user clicks the button 25 again, the video preview can be exited in response to the user's input, and the normal photo preview mode can be restored.

In the above embodiment, the micro-expression during the shooting of the person is acquired through the video action amplification technology and the AI recognition technology. The user can specify a certain character to generate an action-enlarged emoticon package based on its micro-expression, and the user does not need to perform additional video processing, which simplifies the operation and improves the user experience. At the same time, the user can set the location of the emoticon package so that the generated emoticon package can meet the needs of more users and enhance the user interaction experience.

It should be noted that the execution subject of the image processing method provided in the embodiments of the present application may be an image processing device, or a control module for executing the image processing method in the image processing device. In the embodiment of the present application, an image processing method executed by an image processing apparatus is taken as an example to illustrate the image processing apparatus provided in the embodiment of the present application.

Referring to FIG. 17, FIG. 17 is a structural diagram of an image processing device provided by an embodiment of the present application. As shown in FIG. 17, the image processing device 1700 includes:

The first receiving module 1701 is configured to receive a user's first input of the target object in the target image when the target image is displayed; the first playing module 1702 is configured to respond to the first input to play and The first video corresponding to the target object; wherein, the first video is a video obtained by amplifying the expression or action of the target object in the second video, and the second video is a video in the target image The video recorded for the target object during the shooting process.

Optionally, the device further includes:

The first interception module is configured to separately intercept regional images from the multiple target frames when the multiple target frames of the second video include the target object, and each of the regional images includes the target;

The first splicing module is used for splicing a plurality of the regional images to obtain a pre-processed video;

The first processing module is configured to perform video action amplifying processing on the pre-processed video to obtain the first video.

Optionally, the device further includes:

The first extraction module is configured to extract at least two images in the second video, wherein the image content of the at least one image includes the micro-actions or micro-expressions of the target object;

The second processing module is configured to perform video action magnification processing on the micro-actions or micro-expressions in the at least two images;

The third processing module is configured to splice the at least two images after the video action enlargement processing is performed to obtain the first video.

Optionally, the device further includes:

A second receiving module, configured to receive a second input of a user on the first object in the target image when the target object includes at least two objects;

The first obtaining module is configured to obtain a video corresponding to the first object from the first video in response to the second input.

Optionally, the first obtaining module includes:

The first interception submodule is configured to separately intercept regional images from the multiple target frames when the multiple target frames of the first video include the first object, and each of the regional images includes The first object;

The first splicing sub-module is used for splicing the regional images to obtain the video of the first object.

Optionally, the device further includes:

The recognition module is used to recognize the target image to obtain a recognition result;

The fourth processing module is configured to highlight the object in the target image according to the recognition result.

Optionally, the device further includes:

The third receiving module is used to receive the third input of the user;

The fifth processing module is configured to obtain a dynamic image corresponding to the target object based on the first video corresponding to the target object in response to the third input.

In the embodiment of the present application, when the target image is displayed, the first input of the user to the target object in the target image is received, and in response to the first input, the first video corresponding to the target object is played , And the first video is a video obtained by amplifying the expression or action of the target object in the second video, and the second video is recorded for the target object during the shooting of the target image Video. It can be seen from this that, by using the embodiments of the present application, when generating a video that includes expressions or magnified actions, there is no need to call other video processing software, which simplifies the operation when generating a video.

The image processing device in the embodiment of the present application may be a device, or a component, an integrated circuit, or a chip in a terminal. The device can be a mobile electronic device or a non-mobile electronic device. Exemplarily, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (personal digital assistant). assistant, PDA), etc. Non-mobile electronic devices can be servers, network attached storage (NAS), personal computers (PC), televisions (television, TV), teller machines or self-service machines, etc., this application The embodiments are not specifically limited.

The image processing device in the embodiment of the present application may be a device with an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiment of the present application.

The image processing apparatus provided by the embodiment of the present application can implement the various processes implemented by the method embodiments in FIG. 1 to FIG. 16. In order to avoid repetition, details are not described herein again.

Optionally, as shown in FIG. 18, an embodiment of the present application further provides an electronic device 1800, including a processor 1801, a memory 1802, and programs or instructions that are stored in the memory 1802 and run on the processor 1801, When the program or instruction is executed by the processor 1801, each process of the above-mentioned image processing method embodiment is realized, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.

It should be noted that the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.

FIG. 19 is a schematic diagram of the hardware structure of an electronic device that implements an embodiment of the present application. The electronic device 1900 includes, but is not limited to: a radio frequency unit 1901, a network module 1902, an audio output unit 1903, an input unit 1904, a sensor 1905, a display unit 1906, a user input unit 1907, an interface unit 1908, a memory 1909, a processor 1910, etc. part.

Those skilled in the art can understand that the electronic device 1900 may also include a power source (such as a battery) for supplying power to various components. The power source may be logically connected to the processor 1910 through a power management system, so that the power management system can manage charging, discharging, and power management. Consumption management and other functions. The structure of the electronic device shown in FIG. 19 does not constitute a limitation on the electronic device. The electronic device may include more or fewer components than those shown in the figure, or some components may be combined, or different component arrangements, which will not be repeated here. .

Wherein, the radio frequency unit 1901 is configured to receive a user's first input to the target object in the target image when the target image is displayed; the processor 1910 is configured to respond to the first input and play and The first video corresponding to the target object; wherein, the first video is a video obtained by amplifying the expression or action of the target object in the second video, and the second video is a shot of the target image The video recorded for the target object during the process.

Optionally, the processor 1910 is configured to, when the multiple target frames of the second video include the target object, respectively intercept regional images from the multiple target frames, and each of the regional images is Including the target object; splicing a plurality of the regional images to obtain a pre-processed video; performing video action amplification processing on the pre-processed video to obtain the first video.

Optionally, the processor 1910 is configured to extract at least two images in the second video, where the image content of the at least two images includes micro-actions or micro-expressions of the target object; The micro-actions or micro-expressions in the at least two images are subjected to video action enlargement processing; the at least two images after the video action enlargement processing are spliced to obtain the first video.

Optionally, the processor 1910 is configured to receive a second input from the user to the first object in the target image when the target object includes at least two objects; in response to the second input, from Obtain the video corresponding to the first object from the first video.

Optionally, the processor 1910 is configured to, when the multiple target frames of the first video include the first object, respectively intercept regional images from the multiple target frames, each of the regional images Both include the first object; stitching the region images to obtain a video of the first object.

Optionally, the processor 1910 is configured to recognize the target image to obtain a recognition result; according to the recognition result, highlight an object in the target image.

Optionally, the processor 1910 is configured to receive a third input of the user; in response to the third input, obtain a dynamic image corresponding to the target object based on the first video corresponding to the target object.

It should be understood that, in the embodiment of the present application, the input unit 1904 may include a graphics processing unit (GPU) 19041 and a microphone 19042. The graphics processor 19041 is paired by the image capture device ( For example, the image data of the still picture or video obtained by the camera) is processed. The display unit 1906 may include a display panel 19061, and the display panel 19061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1907 includes a touch panel 19071 and other input devices 19072. The touch panel 19071 is also called a touch screen. The touch panel 19071 may include two parts, a touch detection device and a touch controller. Other input devices 19072 may include, but are not limited to, a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, and joystick, which will not be repeated here. The memory 1909 may be used to store software programs and various data, including but not limited to application programs and operating systems. The processor 1910 may integrate an application processor and a modem processor. The application processor mainly processes an operating system, a user interface, and an application program, and the modem processor mainly processes wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 1910.

The embodiment of the present application also provides a readable storage medium with a program or instruction stored on the readable storage medium. When the program or instruction is executed by a processor, each process of the above-mentioned image processing method embodiment is realized, and the same technology can be achieved. The effect, in order to avoid repetition, will not be repeated here.

Wherein, the readable storage medium, such as read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk, or optical disk, etc.

An embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run a program or an instruction to implement each process of the above method embodiment , And can achieve the same technical effect, in order to avoid repetition, I will not repeat it here.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-level chips, system-on-chips, system-on-chips, or system-on-chips.

It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, method, article, or device. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or device that includes the element. In addition, it should be pointed out that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. The functions are performed, for example, the described method may be performed in an order different from the described order, and various steps may also be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment methods can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.

The embodiments of the present application are described above with reference to the accompanying drawings, but the present application is not limited to the above-mentioned specific embodiments. The above-mentioned specific embodiments are only illustrative and not restrictive. Those of ordinary skill in the art are Under the enlightenment of this application, many forms can be made without departing from the purpose of this application and the scope of protection of the claims, all of which fall within the protection of this application.

Claims

An image processing method includes the following steps:

In the case of displaying the target image, receiving a user's first input on the target object in the target image;

In response to the first input, play a first video corresponding to the target object;

Wherein, the first video is a video obtained by amplifying the expression or action of the target object in the second video, and the second video is recorded for the target object during the shooting of the target image Video.
The method according to claim 1, wherein, before the playing the first video corresponding to the target object, the method further comprises:

In the case that the multiple target frames of the second video include the target object, respectively intercepting area images from the multiple target frames, and each of the area images includes the target object;

Splicing a plurality of the regional images to obtain a pre-processed video;

Perform video action amplification processing on the preprocessed video to obtain the first video.
The method according to claim 1, wherein, before the playing the first video corresponding to the target object, the method further comprises:

Extracting at least two images in the second video, wherein the image content of the at least two images includes micro-actions or micro-expressions of the target object;

Performing video action magnification processing on the micro-actions or micro-expressions in the at least two images;

Splicing the at least two images after the video action enlargement processing is performed to obtain the first video.
The method according to claim 3, wherein, after said obtaining the first video, the method comprises:

In a case where the target object includes at least two objects, receiving a second input from the user to the first object in the target image;

In response to the second input, a video corresponding to the first object is obtained from the first video.
The method according to claim 4, wherein said obtaining the video of the first object from the first video comprises:

In the case that the multiple target frames of the first video include the first object, respectively intercepting regional images from the multiple target frames, and each of the regional images includes the first object;

Splicing the regional images to obtain the video of the first object.
The method according to claim 1, further comprising:

Recognizing the target image to obtain a recognition result;

According to the recognition result, the object in the target image is highlighted.
The method according to claim 1, wherein after the playing the first video corresponding to the target object, the method further comprises:

Receive the user's third input;

In response to the third input, a dynamic image corresponding to the target object is obtained based on the first video corresponding to the target object.
An image processing device, including:

The first receiving module is configured to receive the user's first input of the target object in the target image when the target image is displayed;

The first play module is configured to play the first video corresponding to the target object in response to the first input;

Wherein, the first video is a video obtained by amplifying the expression or action of the target object in the second video, and the second video is recorded for the target object during the shooting of the target image Video.
The device according to claim 8, further comprising:

The first interception module is configured to separately intercept regional images from the multiple target frames when the multiple target frames of the second video include the target object, and each of the regional images includes the target;

The first splicing module is used for splicing a plurality of the regional images to obtain a pre-processed video;

The first processing module is configured to perform video action amplifying processing on the pre-processed video to obtain the first video.
The device according to claim 8, further comprising:

The first extraction module is configured to extract at least two images in the second video, wherein the image content of the at least one image includes the micro-actions or micro-expressions of the target object;

The second processing module is configured to perform video action magnification processing on the micro-actions or micro-expressions in the at least two images;

The third processing module is configured to splice the at least two images after the video action enlargement processing is performed to obtain the first video.
The device according to claim 10, further comprising:

A second receiving module, configured to receive a second input of a user on the first object in the target image when the target object includes at least two objects;

The first obtaining module is configured to obtain a video corresponding to the first object from the first video in response to the second input.
The apparatus according to claim 11, wherein the first obtaining module comprises:

The first interception submodule is configured to separately intercept regional images from the multiple target frames when the multiple target frames of the first video include the first object, and each of the regional images includes The first object;

The first splicing sub-module is used for splicing the regional images to obtain the video of the first object.
The device according to claim 8, further comprising:

The recognition module is used to recognize the target image to obtain a recognition result;

The fourth processing module is configured to highlight the object in the target image according to the recognition result.
The device according to claim 8, further comprising:

The third receiving module is used to receive the third input of the user;

The fifth processing module is configured to obtain a dynamic image corresponding to the target object based on the first video corresponding to the target object in response to the third input.
An electronic device comprising: a memory, a processor, and a program or instruction stored on the memory and capable of being run on the processor. The processor executes the program or instruction as described in any one of claims 1 to 7 The steps in the image processing method described above.
A readable storage medium storing a program or instruction on the readable storage medium, and when the program or instruction is executed by a processor, the steps in the image processing method according to any one of claims 1 to 7 are realized.
A computer program product, which is executed by at least one processor to implement the image processing method according to any one of claims 1-7.
An image processing device configured to execute the image processing method according to any one of claims 1-7.
An electronic device configured to execute the image processing method according to any one of claims 1-7.
A chip comprising a processor and a communication interface, the communication interface is coupled with the processor, and the processor is used to run a program or an instruction to implement the image processing according to any one of claims 1-7 method.