WO2024140123A1

WO2024140123A1 - Stop motion animation generation method, electronic device, cloud server, and system

Info

Publication number: WO2024140123A1
Application number: PCT/CN2023/137534
Authority: WO
Inventors: 贾美霞; 黄宸宇; 金磊磊; 钟伟才
Original assignee: 华为技术有限公司
Priority date: 2022-12-29
Filing date: 2023-12-08
Publication date: 2024-07-04

Abstract

The present application relates to the technical field of terminals, provides a stop motion animation generation method, an electronic device, a cloud server, and a system, and solves, to a certain extent, the problems that existing stop animation production processes are tedious and have low manufacturing efficiency. The method is applied to an electronic device, and comprises: in response to a first operation of a user, determining a dynamic object; and determining a stop motion animation according to the dynamic object and a video to be processed, wherein said video comprises the dynamic object, and each image frame in the stop motion animation is a video frame in said video.

Description

Stop-motion animation generation method, electronic device, cloud server and system

This application claims priority to the Chinese patent application filed with the State Intellectual Property Office on December 29, 2022, with application number 202211711630.0 and application name “A stop-motion animation generation method, electronic device, cloud server and system”, the entire contents of which are incorporated by reference in this application.

Technical Field

The present application relates to the field of terminal technology, and in particular to a stop-motion animation generation method, electronic equipment, cloud server and system.

Background technique

Stop-motion animation, also known as frame-by-frame animation, is widely used in commercials, promotional videos, short films, and handmade creations. There are two common ways to generate stop-motion animation: one is to synthesize stop-motion animation by shooting multiple frames of accurate images frame by frame; the other is to first shoot a complete video, and then use manual editing to edit the shot video to generate a stop-motion animation. It is not difficult to see from the above two stop-motion animation generation methods that the existing stop-motion animation production process is relatively cumbersome and complicated, and the stop-motion animation production efficiency is low.

Summary of the invention

The present application provides a stop-motion animation generation method, electronic device, cloud server and system, which to a certain extent solve the problems of complicated production process and low production efficiency of existing stop-motion animation.

In order to achieve the above objectives, this application adopts the following technical solutions:

In a first aspect, the present application provides a stop-motion animation generation method, which is applied to an electronic device, and the method comprises:

In response to a first operation by a user, determining a dynamic object;

A stop-motion animation is determined according to the dynamic object and a video to be processed, wherein the video to be processed includes the dynamic object, and each frame image in the stop-motion animation is a video frame in the video to be processed.

Based on the stop-motion animation generation method provided by the present application, in the process of generating the stop-motion animation, the stop-motion animation corresponding to the dynamic object can be automatically generated from the video to be processed based on the dynamic object specified by the user, without the need for manual editing or separate shooting of each frame of the image. The corresponding stop-motion animation can be automatically generated based on the video to be processed that has been shot, thereby shortening the production cycle of the stop-motion animation and improving the production efficiency of the stop-motion animation.

In a possible implementation of the first aspect, determining the dynamic object in response to a first operation of the user includes:

Obtaining the video to be processed;

Displaying a plurality of first annotated images in the video to be processed, each of the first annotated images being annotated with at least one object;

In response to the first operation, the dynamic object is determined from the at least one object annotated in each of the first annotated images.

In a possible implementation of the first aspect, displaying the plurality of first annotated images in the video to be processed includes:

Determining multiple shooting scenes in the video to be processed;

Performing frame extraction processing on each shooting scene to obtain a scene image corresponding to each shooting scene;

Object recognition is performed on each of the scene images to obtain the first annotated image corresponding to each of the scene images.

Based on the above possible implementations, when the video to be processed is a video that has been shot in advance, after the electronic device obtains the video to be processed, the user can select the corresponding dynamic object from multiple first annotated images corresponding to the video to be processed, so as to facilitate the subsequent generation of the corresponding stop-motion animation according to the dynamic object of each first annotated image, thereby reducing the interference of other noises in the video to be processed on the stop-motion animation and improving the accuracy of generating the stop-motion animation. In addition, compared with determining the dynamic object in each video frame corresponding to the video to be processed, the above possible implementations can also shorten the determination time of the dynamic object and speed up the generation speed of the stop-motion animation.

Optionally, in response to a first operation of the user, determining the dynamic object includes:

Obtaining the video to be processed;

Display each video frame of the video to be processed, each of the video frames being marked with at least one object;

In response to a first operation of a user, the dynamic object is determined from the at least one object in each of the video frames.

In the process of shooting the video to be processed, when it is detected that the shooting scene changes from the first scene to the second scene, a video frame sequence corresponding to the first scene is acquired from the shot video clips;

Performing frame extraction processing on the video frame sequence to obtain a scene image corresponding to the first scene;

Object recognition is performed on the scene image to obtain the first annotated image of the first scene.

Based on the above possible implementations, a stop-motion animation shooting mode can be set in the electronic device, further enriching the stop-motion animation shooting mode. In the stop-motion animation shooting mode, if the electronic device detects that the shooting scene is updated, the corresponding scene image is determined according to the video frame before the shooting scene is updated, so as to quickly determine the dynamic object from at least one object in the scene image, realize the rapid processing of the video to be processed, and improve the shooting efficiency of the stop-motion animation.

Sending the video to be processed to a cloud server;

Receiving the plurality of first annotated images sent by the cloud server;

A plurality of the first annotated images are displayed.

Acquire a first image, wherein the first image includes at least one object;

displaying a second annotated image according to the first image, wherein the second annotated image is annotated with the at least one object;

In response to the first operation, the dynamic object is determined from the at least one object.

In a possible implementation of the first aspect, displaying the second annotated image according to the first image includes:

Performing object recognition on the first image to obtain the second annotated image annotated with at least one object;

The second annotated image is displayed.

Based on the above possible implementation methods, before shooting the video to be processed, the dynamic object can be determined based on the acquired first image. Then, during the process of shooting the video to be processed, the electronic device can process the shot video to be processed according to the determined dynamic object. When the shooting of the video to be processed is completed, the electronic device can quickly generate a stop-motion animation corresponding to the dynamic object, thereby improving the shooting efficiency of the stop-motion animation.

Sending the first image to a cloud server;

receiving the second annotated image corresponding to the first image and sent by the cloud server;

The second annotated image is displayed.

In a possible implementation of the first aspect, the first operation is a focusing operation, and determining the dynamic object in response to the first operation of the user includes:

Acquire a first image, wherein the first image includes at least one object;

In response to the focusing operation of the user on the first image, the dynamic object in the first image is determined from the at least one object.

In a possible implementation of the first aspect, the first image is an image captured before the video to be processed is captured, or is an image in the video to be processed.

In a possible implementation manner of the first aspect, determining the stop-motion animation according to the dynamic object and the video to be processed includes:

Determine a plurality of frames of images corresponding to each action of the dynamic object in the video to be processed;

Performing frame extraction processing on the multiple frames of images corresponding to each of the actions respectively to obtain a key frame sequence corresponding to each of the actions;

The stop-motion animation is generated according to the key frame sequence corresponding to each of the actions.

In a possible implementation of the first aspect, the method further includes:

If an interfering object exists in the first key frame of the key frame sequence, the interfering object in the first key frame is eliminated.

In a possible implementation manner of the first aspect, eliminating the interference object in the first key frame includes:

The interfering object in the first key frame is eliminated according to a region of the first key frame corresponding to the interfering object in adjacent frames of the key frame sequence.

In a possible implementation of the first aspect, the performing frame extraction processing on the multiple frames of images corresponding to each of the actions to obtain a key frame sequence corresponding to each of the actions includes:

The key frame sequence corresponding to each action is determined according to the pixel average value of the area corresponding to the dynamic object in the multiple frames of images corresponding to each action.

Exemplarily, the image with the smallest absolute deviation in the multiple frames of images corresponding to each action can be determined as the key frame corresponding to the multiple frames of images, and the key frame sequence corresponding to each action can be determined based on the key frame corresponding to each action.

Sending the dynamic object indication information to a cloud server;

The stop motion animation is received.

Sending the dynamic object indication information and the video to be processed to a cloud server;

The stop motion animation is received.

Based on this possible implementation, in actual applications, a stop-motion animation shooting mode can be set in the camera. After the user selects the stop-motion animation shooting mode, the stop-motion animation can be shot in this mode. This not only expands the way of generating stop-motion animation, but also greatly reduces the manual processing of the video to be processed, compared with the prior art method of generating stop-motion animation by manually editing the video to be processed, thereby improving the production efficiency of the stop-motion animation.

In a second aspect, an embodiment of the present application provides a stop-motion animation generation method, which is applied to a cloud server, and the method includes:

receiving indication information corresponding to a dynamic object and a video to be processed sent by an electronic device, and determining the dynamic object according to the indication information;

generating a stop-motion animation according to the key frame sequence corresponding to each of the actions;

The stop-motion animation is transmitted to an electronic device.

In a possible implementation manner of the second aspect, the receiving the indication information corresponding to the dynamic object and the video to be processed sent by the electronic device includes:

Receiving the video to be processed sent by the electronic device;

Sending a plurality of first annotated images determined from the video to be processed to the electronic device, each of the first annotated images being annotated with at least one object;

The indication information corresponding to the dynamic object determined from the at least one object annotated in each of the first annotated images is received and sent by the electronic device.

In a possible implementation of the second aspect, the method for determining the plurality of first annotated images includes:

Determining multiple shooting scenes in the video to be processed;

In a possible implementation manner of the second aspect, the method for determining the plurality of first annotated images includes:

In a possible implementation manner of the second aspect, the receiving electronic device sends indication information corresponding to the dynamic object, include:

Receiving a first image sent by an electronic device, wherein the first image includes at least one object;

determining a second annotated image according to the first image, wherein the second annotated image is annotated with the at least one object;

Sending the second annotated image corresponding to the first image to the electronic device;

The indication information of the dynamic object determined from the at least one object is received and sent by the electronic device.

In a possible implementation of the second aspect, determining the second annotated image according to the first image includes:

Perform object recognition on the first image to obtain the second annotated image.

In a possible implementation manner of the second aspect, the receiving indication information corresponding to the dynamic object sent by the electronic device includes:

Receive the indication information of the dynamic object in the first image sent by the electronic device.

In a possible implementation of the second aspect, the first image is an image captured before the video to be processed is captured, or is an image in the video to be processed.

In a possible implementation manner of the second aspect, the method further includes:

If an interfering object exists in a first key frame of the key frame sequence, the interfering object in the first key frame is eliminated.

In a possible implementation manner of the second aspect, eliminating the interference object in the first key frame includes:

In a possible implementation of the second aspect, the performing frame extraction processing on the multiple frames of images corresponding to each of the actions to obtain a key frame sequence corresponding to each of the actions includes:

Exemplarily, the image with the smallest absolute deviation in the multiple frames of images corresponding to each of the actions can be determined as the key frame corresponding to the multiple frames of images, and the key frame sequence corresponding to each of the actions can be determined based on the key frames corresponding to each of the actions.

In a third aspect, an embodiment of the present application provides an electronic device, the electronic device comprising:

a dynamic object determining unit, configured to determine a dynamic object in response to a first operation of a user;

The stop-motion animation determining unit is used to determine the stop-motion animation according to the dynamic object and the video to be processed, wherein the video to be processed includes the dynamic object, and each frame image in the stop-motion animation is a video frame in the video to be processed.

In a possible implementation manner of the third aspect, the dynamic object determining unit is further configured to:

Obtaining the video to be processed;

In response to a first operation of a user, the dynamic object is determined from the at least one object annotated in each of the first annotated images.

In a possible implementation manner of the third aspect, displaying the plurality of first annotated images in the video to be processed includes:

Determining multiple shooting scenes in the video to be processed;

Sending the video to be processed to a cloud server;

Receiving the plurality of first annotated images sent by the cloud server;

A plurality of the first annotated images are displayed.

Acquire a first image, wherein the first image includes at least one object;

In response to the first operation of the user, the dynamic object is determined from the at least one object.

In a possible implementation of the third aspect, displaying the second annotated image according to the first image includes:

Performing object recognition on the first image to obtain the second annotated image;

The second annotated image is displayed.

Sending the first image to a cloud server;

Receiving a second annotated image corresponding to the first image and sent by the cloud server;

The second annotated image is displayed.

In a possible implementation of the third aspect, the first operation is a focusing operation, and determining the dynamic object in response to the first operation of the user includes:

Acquire a first image, wherein the first image includes at least one object;

In a possible implementation of the third aspect, the first image is an image captured before the video to be processed is captured, or is an image in the video to be processed.

In a possible implementation manner of the third aspect, the stop-motion animation determination unit is further configured to:

In a possible implementation manner of the third aspect, the method further includes:

In a possible implementation manner of the third aspect, the eliminating the interference object in the first key frame includes:

In a possible implementation manner of the third aspect, the performing frame extraction processing on the multiple frames of images corresponding to each of the actions to obtain a key frame sequence corresponding to each of the actions includes:

In a possible implementation manner of the third aspect, determining the stop-motion animation according to the dynamic object and the video to be processed includes:

Sending indication information of the dynamic object to a cloud server;

The stop motion animation is received.

Sending the indication information of the dynamic object and the video to be processed to a cloud server;

The stop motion animation is received.

In a fourth aspect, an embodiment of the present application provides a cloud server, the cloud server comprising:

A receiving unit, configured to receive indication information corresponding to a dynamic object and a video to be processed sent by an electronic device, and determine the dynamic object according to the indication information;

A determination unit, used to determine a plurality of frames of images corresponding to each action of the dynamic object in the video to be processed;

A processing unit, used for performing frame extraction processing on the multiple frames of images corresponding to each of the actions, to obtain a key frame sequence corresponding to each of the actions;

A generating unit, configured to generate a stop-motion animation according to the key frame sequence corresponding to each of the actions;

The sending unit is used to send the stop-motion animation to an electronic device.

In a possible implementation manner of the fourth aspect, the receiving unit is further configured to:

Receiving the video to be processed sent by the electronic device;

The indication information of the dynamic object determined from the at least one object annotated in each of the first annotated images is received and sent by the electronic device.

In a possible implementation manner of the fourth aspect, the method for determining the plurality of first annotated images includes:

Determining multiple shooting scenes in the video to be processed;

Sending a second annotated image corresponding to the first image to the electronic device;

In a possible implementation manner of the fourth aspect, determining the second annotated image according to the first image includes:

In a possible implementation of the fourth aspect, the first image is an image captured before the video to be processed is captured, or is an image in the video to be processed.

In a possible implementation manner of the fourth aspect, the cloud server further includes:

The eliminating unit is configured to eliminate the interfering object in the first key frame of the key frame sequence if there is an interfering object in the first key frame.

In a possible implementation manner of the fourth aspect, the elimination unit is further used to:

In a possible implementation manner of the fourth aspect, the performing frame extraction processing on the multiple frames of images corresponding to each of the actions to obtain a key frame sequence corresponding to each of the actions includes:

In a fifth aspect, an embodiment of the present application provides an electronic device, comprising: a processor, wherein the processor is used to run a computer program stored in a memory to implement the method in the first aspect or any possible implementation manner of the first aspect.

In a sixth aspect, an embodiment of the present application provides a cloud server, comprising: a processor, the processor being used to run a computer program stored in a memory to implement the method in the second aspect or any possible implementation manner of the second aspect.

In a seventh aspect, the present application provides a stop-motion animation generation system, which includes the electronic device described in the fifth aspect and/or the cloud server described in the sixth aspect.

In an eighth aspect, the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the method in any possible implementation manner of the first aspect to the second aspect is implemented.

In a ninth aspect, the present application provides a computer program product. When the computer program product runs on an electronic device, the electronic device executes the method in any possible implementation of the first to second aspects.

The technical effects of the second to ninth aspects provided in the present application can refer to the technical effects of the various possible implementation methods of the first aspect mentioned above, and will not be repeated here.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.

FIG. 2 is a schematic diagram of a software structure of an electronic device provided in an embodiment of the present application.

FIG. 3-1 to FIG. 3-4 are flow charts of an embodiment of a stop-motion animation generation method provided in an embodiment of the present application.

4 to 8 are schematic diagrams of scenes corresponding to a stop-motion animation generation method provided in an embodiment of the present application.

FIG. 9 is an interactive schematic diagram of an embodiment of another stop-motion animation generation method provided in an embodiment of the present application.

FIG. 10 is an interactive schematic diagram of another embodiment of another stop-motion animation generation method provided in an embodiment of the present application.

FIG. 11 is an interactive schematic diagram of yet another embodiment of another stop-motion animation generation method provided in an embodiment of the present application.

FIG. 12 is a structural block diagram of an electronic device corresponding to a stop-motion animation generation method provided in an embodiment of the present application.

FIG13 is a structural block diagram of a cloud server corresponding to a stop-motion animation generation method provided in an embodiment of the present application.

Detailed ways

Stop-motion animation, also known as frame-by-frame animation, is widely used in commercials, promotional videos, short films, and handmade creations. There are generally two ways to make stop-motion animation. The first way is to manually shoot multiple frames of precise images frame by frame, and then synthesize the stop-motion animation based on the multiple frames of precise images. Using this method to make stop-motion animation requires shooting a large number of precise images in the early stage. The shooting process is cumbersome and complicated, which prolongs the production cycle of stop-motion animation and reduces the production efficiency of stop-motion animation.

The second method is to shoot a complete video in advance, and then manually edit the video to form a stop-motion animation. Using this method to make a stop-motion animation requires shooting a video in the early stage, and manual video editing is required in the later stage, which makes the production of stop-motion animation complicated. This method also has the problems of long production cycle and low production efficiency.

Therefore, in response to the above-mentioned problems, the present application provides a stop-motion animation generation method. In the process of generating the stop-motion animation, the stop-motion animation corresponding to the dynamic object can be automatically generated from the video to be processed based on the dynamic object specified by the user. There is no need to manually edit the video or shoot each frame of the image separately, which shortens the production cycle of the stop-motion animation and improves the production efficiency of the stop-motion animation.

The technical solutions in the embodiments of the present application are described below in conjunction with the drawings and related embodiments in the embodiments of the present application. Among them, in the description of the embodiments of the present application, the terms used in the following embodiments are only for the purpose of describing specific embodiments, and are not intended to be used as limitations on the present application. As used in the specification and the appended claims of the present application, the singular expressions "a", "said", "above", "the" and "this" are intended to also include expressions such as "one or more", unless there is a clear indication to the contrary in the context. It should also be understood that in the following embodiments of the present application, "at least one", "one or more" refer to one or more (including two). The term "and/or" is used to describe the association relationship of associated objects, indicating that three relationships can exist; for example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural. The character "/" generally indicates that the associated objects before and after are in a "or" relationship.

References to "one embodiment" or "some embodiments" etc. described in this specification mean that one or more embodiments of the present application include specific features, structures or characteristics described in conjunction with the embodiment. Therefore, the statements "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. that appear in different places in this specification do not necessarily refer to the same embodiment, but mean "one or more but not all embodiments", unless otherwise specifically emphasized in other ways. The terms "including", "comprising", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized in other ways. The term "connection" includes direct connection and indirect connection, unless otherwise specified. "First" and "second" are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated.

In the embodiments of the present application, the words "exemplarily" or "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described as "exemplarily" or "for example" in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplarily" or "for example" is intended to present related concepts in a specific way.

The stop-motion animation generation method provided in the embodiment of the present application can be applied to electronic devices. The electronic device can be a mobile phone, a tablet computer, or a The present invention relates to a computer system that can be used for the production of electronic devices, such as computers, wearable devices, AR devices, VR devices, laptop computers, ultra-mobile personal computers (UMPC), netbooks, personal digital assistants (PDA), vehicle-mounted devices, smart screens, cloud servers, etc. The embodiments of the present application do not impose any restrictions on the specific types of electronic devices.

Referring to FIG. 1 , it is a schematic diagram of the structure of an electronic device 100 provided in the present application. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 131, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a Subscriber Identification Module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.

It is to be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently. The components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.

For example, when the electronic device 100 is a mobile phone or a tablet computer, it may include all the components shown in the figure, or may include only some of the components shown in the figure.

The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a memory, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc. Different processing units may be independent devices or integrated in one or more processors.

The controller may be the nerve center and command center of the electronic device 100. The controller may generate an operation control signal according to the instruction operation code and the timing signal to complete the control of fetching and executing instructions.

The processor 110 may also be provided with a memory for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may store instructions or data that the processor 110 has just used or cyclically used. If the processor 110 needs to use the instruction or data again, it may be directly called from the memory. This avoids repeated access, reduces the waiting time of the processor 110, and thus improves the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interface may include an Inter-integrated Circuit (I2C) interface, an Inter-integrated Circuit Sound (I2S) interface, a Pulse Code Modulation (PCM) interface, a Universal Asynchronous Receiver/Transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a General-Purpose Input/Output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The USB interface 130 is an interface that complies with the USB standard specification, and specifically can be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc. The USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and a peripheral device. It can also be used to connect headphones to play audio through the headphones. The interface can also be used to connect other electronic devices, such as AR devices, etc.

It is understandable that the interface connection relationship between the modules illustrated in the embodiment of the present application is only a schematic illustration and does not constitute a structural limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.

The charging management module 140 is used to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger through the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. While the charging management module 140 is charging the battery 142, it may also power the electronic device through the power management module 141.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140 to power the processor 110, the internal memory 131, the external memory interface 120, the display screen 194, the camera 193, and the wireless communication module 160. The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle number, and battery health status (leakage, impedance).

In some other embodiments, the power management module 141 may also be disposed in the processor 110. In some other embodiments, the power management module 141 and the charging management module 140 may also be disposed in the same device.

The wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.

Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve the utilization of antennas. For example, antenna 1 can be reused as a diversity antenna for a wireless local area network. In some other embodiments, the antenna can be used in combination with a tuning switch.

The mobile communication module 150 can provide solutions for wireless communications including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), etc. The mobile communication module 150 can receive electromagnetic waves from the antenna 1, and filter, amplify, etc. the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and convert it into electromagnetic waves for radiation through the antenna 1.

In some embodiments, at least some functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some functional modules of the mobile communication module 150 may be disposed in the same device as at least some modules of the processor 110.

The modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After the low-frequency baseband signal is processed by the baseband processor, it is passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to a speaker 170A, a receiver 170B, etc.), or displays an image or video through a display screen 194. In some embodiments, the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 110 and be set in the same device as the mobile communication module 150 or other functional modules.

The wireless communication module 160 can provide wireless communication solutions including wireless local area networks (WLAN) (such as Wireless Fidelity (Wi-Fi) network), Bluetooth (BlueTooth, BT), Global Navigation Satellite System (Global Navigation Satellite System, GNSS), Frequency Modulation (Frequency Modulation, FM), Near Field Communication (Near Field Communication, NFC), Infrared (Infrared, IR) and the like applied to the electronic device 100. The wireless communication module 160 can be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the frequency of the electromagnetic wave signal and performs filtering, and sends the processed signal to the processor 110. The wireless communication module 160 can also receive the signal to be sent from the processor 110, modulate the frequency of the signal, amplify the signal, and convert it into electromagnetic waves for radiation via the antenna 2.

In some embodiments, the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include Global System for Mobile Communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Time-Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology. GNSS can include the Global Positioning System (GPS), the Global Navigation Satellite System (GLONASS), the Beidou Navigation Satellite System (BDS), the Quasi-Zenith Satellite System (QZSS) and/or the Satellite Based Augmentation Systems (SBAS).

The electronic device 100 implements the display function through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, which connects the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, etc. For example, icons, folders, folder names, etc. of the APP in the embodiment of the present application. The display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD). Organic Light-Emitting Diode (OLED), Active-Matrix Organic Light Emitting Diode or Active-Matrix Organic Light Emitting Diode (AMOLED), Flexible Light-Emitting Diode (FLED), Miniled, MicroLed, Micro-oLed, Quantum Dot Light Emitting Diodes (QLED), etc. In some embodiments, the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.

The electronic device 100 can realize the shooting function through ISP, camera 193, video codec, GPU, display screen 194 and application processor.

ISP is used to process the data fed back by camera 193. For example, when taking a photo, the shutter is opened, and the light is transmitted to the camera photosensitive element through the lens. The light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to ISP for processing and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on the noise, brightness, and skin color of the image. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, ISP can be set in camera 193.

The camera 193 is used to capture still images or videos. The object generates an optical image through the lens and projects it onto the photosensitive element. The focal length of the lens can be used to indicate the camera's field of view. The smaller the focal length of the lens, the larger the lens's field of view. The photosensitive element can be a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP for conversion into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV or other format.

In the present application, the electronic device 100 may include cameras 193 with 2 or more focal lengths.

The digital signal processor is used to process digital signals, and can process not only digital image signals but also other digital signals. For example, when the electronic device 100 is selecting a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.

Video codecs are used to compress or decompress digital videos. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record videos in a variety of coding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG1, MPEG3, MPEG4, etc.

NPU is a neural network (NN) computing processor. By drawing on the structure of biological neural networks, such as the transmission mode between neurons in the human brain, it can quickly process input information and can also continuously self-learn. Through NPU, applications such as intelligent cognition of the electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, etc.

In an embodiment of the present application, the NPU or other processors may be used to perform operations such as analyzing and processing images in a video stored in the electronic device 100.

The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music and videos can be stored in the external memory card.

The internal memory 131 can be used to store computer executable program codes, and the executable program codes include instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by running the instructions stored in the internal memory 131. The internal memory 131 may include a program storage area and a data storage area. Among them, the program storage area can store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.). The data storage area can store data created during the use of the electronic device 100 (such as audio data, a phone book, etc.).

In addition, the internal memory 131 may include a high-speed random access memory and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash storage (Universal Flash Storage, UFS), etc.

The electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor.

The audio module 170 is used to convert digital audio signals into analog audio signals for output, and is also used to convert analog audio inputs into digital audio signals. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 can be arranged in the processor 110, or some functional modules of the audio module 170 can be arranged in the processor 110.

The speaker 170A, also called a "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music or listen to hands-free calls through the speaker 170A. For example, the speaker can play the comparison analysis results provided in the embodiment of the present application.

The receiver 170B, also called a "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 100 receives a call or voice message, the voice can be received by placing the receiver 170B close to the human ear.

Microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak by putting their mouth close to microphone 170C to input the sound signal into microphone 170C. The electronic device 100 can be provided with at least one microphone 170C. In other embodiments, the electronic device 100 can be provided with two microphones 170C, which can not only collect sound signals but also realize noise reduction function. In other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify the sound source, realize directional recording function, etc.

The earphone interface 170D is used to connect a wired earphone. The earphone interface 170D may be the USB interface 130, or may be a 3.5 mm Open Mobile Terminal Platform (OMTP) standard interface or a Cellular Telecommunications Industry Association of the USA (CTIA) standard interface.

The key 190 includes a power key, a volume key, etc. The key 190 may be a mechanical key or a touch key. The electronic device 100 may receive key input and generate key signal input related to user settings and function control of the electronic device 100.

Motor 191 can generate vibration prompts. Motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback. For example, touch operations acting on different applications (such as taking pictures, audio playback, etc.) can correspond to different vibration feedback effects. For touch operations acting on different areas of the display screen 194, motor 191 can also correspond to different vibration feedback effects. Different application scenarios (for example: time reminders, receiving messages, alarm clocks, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.

The indicator 192 may be an indicator light, which may be used to indicate the charging status, power changes, messages, missed calls, notifications, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be connected to or separated from the electronic device 100 by inserting it into or removing it from the SIM card interface 195. The electronic device 100 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, and the like. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the multiple cards can be the same or different. The SIM card interface 195 can also be compatible with different types of SIM cards. The SIM card interface 195 can also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to implement functions such as calls and data communications. In some embodiments, the electronic device 100 uses an eSIM, i.e., an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.

See Figure 2, which is a schematic diagram of the software structure of an electronic device in an embodiment of the present application. The operating system in the electronic device may be an Android system, a Microsoft Windows system, an Apple mobile operating system (iOS) or a Harmony OS, etc. Here, the operating system of the electronic device is taken as an example for explanation.

In some embodiments, the Hongmeng system can be divided into four layers, including the kernel layer, the system service layer, the framework layer, and the application layer, and the layers communicate with each other through software interfaces.

As shown in Figure 2, the kernel layer includes the kernel abstract layer (KAL) and the driver subsystem. KAL includes multiple kernels, such as the Linux kernel of the Linux system and the LiteOS kernel of the lightweight IoT system. The driver subsystem can include the hardware driver framework (HDF). The hardware driver framework can provide unified peripheral access capabilities and a driver development and management framework. The kernel layer of multiple kernels can select the corresponding kernel for processing according to the needs of the system.

The system service layer is the core capability set of the Hongmeng system, and provides services to applications through the framework layer. This layer may include the system basic capability subsystem set, the basic software service subsystem set, the enhanced software service subsystem set, and the hardware service subsystem set.

The system basic capability subsystem set provides basic capabilities for the operation, scheduling, migration and other operations of distributed applications on devices of Hongmeng system. It may include distributed soft bus, distributed data management, distributed task scheduling, Ark multi-language runtime, public basic library, multi-mode input, graphics, security, artificial intelligence (AI), user program framework and other subsystems. Among them, Ark multi-language runtime provides C or C++ or JavaScript (JS) multi-language runtime and basic system class library, and can also provide runtime for Java programs statically compiled by Ark compiler (that is, the part developed in Java language in the application or framework layer).

The basic software service subsystem set provides public and general software services for the Hongmeng system, including event notification, telephone, multimedia, Design For X (DFX), MSDP&DV and other subsystems.

The enhanced software service subsystem set provides the Hongmeng system with differentiated capability-enhanced software services for different devices, including smart screen proprietary services, wearable proprietary services, and Internet of Things (IoT) proprietary service subsystems.

The hardware service subsystem set provides hardware services for the Hongmeng system, including location services, biometric recognition, wearable proprietary hardware services, IoT proprietary hardware services and other subsystems.

The framework layer provides multi-language user program frameworks and capability frameworks in Java, C, C++, JS, and other languages for Hongmeng system application development, two user interface (UI) frameworks (including the Java UI framework for Java language and the JS UI framework for JS language), and multi-language framework application programming interfaces (APIs) open to various software and hardware services. Depending on the degree of componentization of the system, the APIs supported by Hongmeng system devices will also vary.

The application layer includes system applications and third-party applications (or extended applications). System applications may include applications installed by default on electronic devices such as the desktop, control bar, settings, and phone. Extended applications can be non-essential applications developed and designed by the manufacturer of the electronic device, such as electronic device managers, device migration, notes, weather, and other applications. Third-party non-system applications can be developed by other manufacturers, but can run applications in the Hongmeng system, such as games, navigation, social or shopping applications.

Provides the ability to run tasks in the background and unified data access abstraction. PA mainly provides support for FA, such as providing computing power as a background service, or providing data access capabilities as a data warehouse. Applications developed based on FA or PA can implement specific business functions, support cross-device scheduling and distribution, and provide users with a consistent and efficient application experience.

Multiple electronic devices running the Hongmeng system can achieve hardware mutual assistance and resource sharing through distributed soft bus, distributed device virtualization, distributed data management and distributed task scheduling.

The stop-motion animation generation method provided in the present application can be executed by an electronic device, or can be executed collaboratively by an electronic device and a cloud server. The following, in combination with the drawings and related embodiments in the embodiments of the present application, takes the above two application methods as examples to exemplarily illustrate the stop-motion animation generation method provided in the embodiments of the present application.

First, an exemplary description is given of the execution method of the electronic device.

FIG. 3-1 to FIG. 3-4 are flowcharts of an embodiment of a stop-motion animation generation method provided in an embodiment of the present application. Referring to FIG. 3-1 , the stop-motion animation generation method includes:

301 : In response to a first operation of a user, determine a dynamic object.

In the embodiment of the present application, the dynamic object may also be referred to as the target object, that is, the object whose motion state or shape changes in the specified frame animation. The first operation may be at least one determination operation for determining the dynamic object, wherein the determination operation may be a voice control operation, a touch selection operation, an air gesture control operation, or a physical button selection operation, etc.

In one example, referring to FIG. 3-2 , the method for determining a dynamic object may include: 3011a, acquiring a video to be processed. 3012a, displaying a plurality of first annotated images in the video to be processed, each of which is annotated with at least one object. 3013a, in response to a first operation, determining a dynamic object from at least one object annotated in each of the first annotated images.

In this example, the video to be processed refers to a video clip used to generate a stop motion animation. The so-called video to be processed can refer to a complete video that has been shot in advance; it can also refer to a partial video in the process of shooting the above video clip, where the partial video can be understood as a video shot in real time.

The method of obtaining the video to be processed includes but is not limited to directly obtaining the video to be processed from the video storage module of the electronic device. For example, as shown in FIG4 , an add control for the video to be processed is set on the display interface of the electronic device display screen. After the user clicks the add control, it jumps to the video storage module of the electronic device. After the user selects the video to be processed and clicks the upload control, the video to be processed selected by the user will be added successfully, and the video to be processed will be displayed on the display interface.

In order to ensure the accuracy and completeness of the dynamic objects determined from the video to be processed, it should be understood that in a possible implementation, when the acquired video to be processed is a complete video shot in advance, each video frame of the video to be processed can be acquired, and then at least one object in each video frame can be identified to obtain a plurality of first annotated images corresponding to the video to be processed and annotated with at least one object, and then the user's first operation on at least one object in each first annotated image is detected, and the dynamic object is determined from at least one object annotated in each first annotated image. Among them, the method for identifying at least one object in each video frame includes but is not limited to target detection and recognition algorithms such as region convolutional neural network (Region CNN, R-CNN), region-based fast convolutional network (Fast Region-based Convolutional Network, Fast R-CNN), etc.

Exemplarily, as shown in the interface of FIG5 , assume that 9 video frames corresponding to the video to be processed are obtained, and object recognition is performed on each video frame. Taking the first video frame as an example, at least one object in the first video frame is recognized. When the "pepper" object is detected, the center coordinates of the detection box corresponding to the "pepper" object in the first video frame can be obtained. The center coordinates are used Indicate the "chili" object, and display the center coordinates on the "chili" object of the first video frame, so that the first annotated image corresponding to the first video frame is formed, and then refer to the above steps to identify at least one object in each of the other video frames to obtain the first annotated images corresponding to the 9 video frames of the video to be processed. The user can click on the center coordinates of at least one object in each first annotated image to determine the dynamic object in each first annotated image.

In order to speed up the determination of dynamic objects in the video to be processed, in another possible implementation, when the acquired video to be processed is a complete video shot in advance, the complete video shot in advance can also be divided into multiple shooting scenes, and each shooting scene is subjected to frame extraction processing to obtain a scene image corresponding to each shooting scene, and then object recognition is performed on each scene image to obtain a first annotated image corresponding to each scene image, and in response to a first operation of the user, a dynamic object is determined from at least one object in each first annotated image.

It is not difficult to understand that the pre-shot video can be divided into multiple shooting scenes according to the similarity between the video frames. For example, the pixel value of each video frame in the pre-shot video is obtained, and the pixel value difference between each video frame is compared. The two video frames with smaller pixel value difference can be determined as video frames in the same shooting scene, and the two video frames with larger pixel value difference can be determined as video frames in different shooting scenes. Specifically, if the pixel value difference between two adjacent video frames is less than a first preset threshold, the two video frames are determined to be similar; conversely, if the pixel value difference between two adjacent video frames is greater than or equal to the first preset threshold, the two video frames are determined to be dissimilar, and the previous video frame and at least one video frame before the previous video frame of the two video frames are determined as a shooting scene. In this way, multiple shooting scenes corresponding to the video that has been shot can be obtained, and each shooting scene can correspond to at least one video frame.

After the pre-shot video is divided into a plurality of shooting scenes, each shooting scene is subjected to frame extraction processing to obtain a scene image corresponding to each shooting scene. For example, a video frame in each shooting scene may be randomly extracted as the scene image corresponding to the shooting scene; or, a video frame with the most objects may be extracted from each shooting scene as the scene image corresponding to each shooting scene; or, a video frame with the smallest absolute deviation in at least one video frame of each shooting scene may be determined as the scene image corresponding to the shooting scene.

After obtaining the scene image corresponding to each shooting scene, at least one object in each scene image is identified to obtain a first annotated image corresponding to each scene image, and then, based on the first operation of the user, a dynamic object is determined from at least one object in each first annotated image. The method for obtaining the first annotated image is the same as the method for obtaining the first annotated image in the previous possible implementation manner, and will not be repeated here.

It should be noted that, in addition to being a complete video that has been shot in advance, the video to be processed can also be a video shot in real time (multiple video frames or images). Exemplarily, taking the display interface shown in Figure 6 as an example, the user clicks the "camera" icon to display the shooting interface, and then the user slides left and right to select the stop motion shooting mode in the shooting interface. When the user clicks the "shoot" control, the video to be processed begins to be shot. In this example, during the process of shooting the video to be processed, when it is detected that the shooting scene is updated in the captured video frame, the video frame before the shooting scene is updated is obtained; the video frame before the shooting scene is updated is subjected to frame extraction processing to obtain a scene image corresponding to the video frame before the shooting scene is updated; the above scene image is subjected to object recognition to obtain a first annotated image annotated with at least one object, and in response to the user's first operation, a dynamic object is determined from at least one object in each first annotated image.

In combination with actual application scenarios, during the process of shooting a video to be processed, the reason for updating the shooting scene may be an increase in objects in the video frame, a decrease in objects, a change in the motion state or shape of at least one object, and so on.

In the process of shooting the video to be processed, if it is detected that the shooting scene changes from the first scene to the second scene, the video frame sequence corresponding to the first scene can be obtained from the shot video clip. That is to say, if it is detected that the shooting scene is updated, the video frame sequence before the shooting scene is updated can be obtained from the shot video clip. Among them, whether the shooting scene is updated can be determined according to the following methods, which are not specifically limited in this application. For example, object recognition can be performed on each video frame, and whether the shooting scene is updated can be determined based on the object recognition result. Of course, the similarity between adjacent video frames in the shot video clip can also be compared to determine whether the shooting scene is updated, etc.

After obtaining a video frame sequence corresponding to the first scene, multiple video frames in the video frame sequence can be determined as a shooting scene, and then a scene image of the shooting scene can be determined from at least one video frame of the shooting scene, so as to perform object recognition on the scene image, obtain a first annotated image annotated with at least one object, and then determine the dynamic object in the scene image. It should be understood that the method of determining the scene image from at least one video frame in the shooting scene can be understood by referring to the above method of determining the scene image corresponding to each shooting scene when the video to be processed is a video that has been shot, and will not be repeated here.

In another example, referring to FIG. 3-3 , the method for determining a dynamic object may also include: 3011b, acquiring a first image, wherein the first image includes at least one object. 3012b, displaying a second annotated image according to the first image, wherein the second annotated image displays at least one object. 3013b, determining a dynamic object from the at least one object in response to a first operation of a user.

It should be understood that the first image may be an image captured before the video to be processed is captured. Taking the interface shown in FIG6 as an example, the user clicks the “camera” icon to display the capture interface, and then the user selects the capture mode as photo shooting in the capture interface. When the user clicks the “shoot” control, the first image can be obtained. After the electronic device determines the dynamic object based on the acquired first image, the user can slide the capture mode left and right to select the stop motion animation capture mode, so that the user clicks the “shoot” control to start capturing the video to be processed.

If the video to be processed is a complete video that has been shot, the first image may also be an image in the video to be processed. For example, if the video to be processed has 100 video frames, the first image may be the first video frame. For another example, if the video to be processed is a video with a duration of 100 minutes, the first image may be the image corresponding to the first second, or the image corresponding to the first minute.

After the electronic device acquires the first image, it can perform object recognition on the first image, obtain a second annotated image annotated with at least one object according to the object recognition result, and then determine the dynamic object from the second annotated image annotated with at least one object in response to the user's first operation. The method for determining the second annotated image can refer to the method for determining the first annotated image in the previous example, and will not be repeated here.

In other examples, referring to FIG3-4, the method for determining a dynamic object may further include: 3011c, acquiring a first image, wherein the first image includes at least one object. 3012c, in response to a user's focusing operation on the first image, determining a dynamic object in the first image from the at least one object.

It should be understood that, in this example, the first operation is a focusing operation, and when the electronic device detects the focusing operation of the user, the dynamic object in the first image is determined according to the focusing operation.

As an example but not limitation, for the shooting interface shown in Figure 7, assuming that the first image is the image corresponding to the shooting interface, after the user clicks on the display screen area corresponding to the pepper in the first image, a focusing operation is triggered. After the electronic device detects the focusing operation, the pepper in the first image is determined as a dynamic object based on the focusing operation.

The above several possible examples are only for illustration, and the present application does not limit the specific content of the first operation, nor the specific method for determining the dynamic object. After determining the dynamic object, the stop motion animation can be further determined according to the dynamic image and the video to be processed.

302 , determining a stop-motion animation according to the dynamic object and the video to be processed, where the video to be processed includes the dynamic object, and each frame of the stop-motion animation is a video frame in the video to be processed.

It should be understood that multiple frames of images corresponding to each action of the dynamic object can be determined from the video to be processed, and frame extraction can be performed on the multiple frames corresponding to each action to obtain a key frame sequence corresponding to each action, and a stop-motion animation can be generated based on the corresponding key frame sequence of each action.

Exemplarily, assuming that the video to be processed includes 100 video frames, the inter-frame pixel value difference between each two adjacent video frames in the 100 video frames can be calculated respectively, and the video frames with the same pixel value (or the inter-frame pixel value difference is less than a preset threshold) are determined as an action of the dynamic object, thereby determining multiple frame images corresponding to each action from the 100 video frames; or, the dynamic object and the video to be processed are input into the action classification model for processing, and multiple actions of the frozen object in the video to be processed are output, and then the multiple frame images corresponding to each action are determined based on the multiple actions.

In this embodiment, after identifying multiple frames of images corresponding to each action of the dynamic object from the video to be processed, the pixel mean of the area corresponding to the dynamic object in the multiple frames of images corresponding to each action can be obtained, and the key frame sequence corresponding to each action can be determined based on the above pixel mean.

In actual application, after obtaining the pixel mean of the area corresponding to the dynamic object in the multi-frame images corresponding to each action, the difference between the multi-frame images corresponding to each action and the pixel mean can be compared, and the image with the smallest absolute deviation can be determined as the key frame corresponding to each action. The key frame sequence in the video to be processed is determined according to the key frame of each action, and the key frame sequence is connected to generate a stop motion animation. It is not difficult to understand that the absolute deviation can be the difference between the pixel value of the area corresponding to the dynamic object in each frame image and the average value determined according to the pixel value of the area corresponding to the dynamic object in the multi-frame images.

In addition, after determining the average value of the pixel values of the multiple frames corresponding to each action, the difference between the pixel value of each frame in the multiple frames corresponding to each action and the average value can be compared, and the image with the smallest difference or less than a preset threshold value can be determined as the key frame corresponding to the action. For example, assuming that 4 frames of images corresponding to a certain cutting action of peppers are identified from the video to be processed, the pixel value corresponding to the first frame is a, the pixel value corresponding to the second frame is b, the pixel value corresponding to the third frame is c, and the pixel value corresponding to the fourth frame is d, then the average pixel value of the 4 frames corresponding to the cutting action is (a+b+c+d)/4, The difference between the first frame image and the mean is a-(a+b+c+d)/4, the difference between the second frame image and the mean is b-(a+b+c+d)/4, the difference between the third frame image and the mean is c-(a+b+c+d)/4, and the difference between the fourth frame image and the mean is d-(a+b+c+d)/4. The image corresponding to the smallest difference can be determined as the key frame of the cutting action.

Based on the above-mentioned stop-motion animation generation method, in the process of generating the stop-motion animation, in response to the user's first operation, the dynamic object in the stop-motion animation is determined, and the electronic device can generate a stop-motion animation corresponding to the dynamic object from the video to be processed according to the dynamic object. There is no need to manually shoot a large number of images in advance or manually edit the video to be processed, which simplifies the production process of the stop-motion animation and improves the production efficiency of the stop-motion animation.

In a possible implementation, after the stop-motion animation is generated, the user can also perform custom editing on the generated stop-motion animation to generate a new stop-motion animation. The user's custom editing of the stop-motion animation can be to add at least one frame of image to the stop-motion animation. For example, the first frame of image in the stop-motion animation corresponds to the video frame at the first second of the video to be processed, and the second frame of image in the stop-motion animation corresponds to the video frame at the tenth second of the video to be processed. Then, the user can select one or more video frames from the video frames at the first second to the tenth second of the video to be processed and add them between the first frame of image and the second frame of image of the stop-motion animation.

The user's custom editing of the stop-motion animation can also be to delete one or more frames of images from the generated stop-motion animation. For example, the stop-motion animation generated by the stop-motion animation generation method provided in the embodiment of the present application includes 100 frames of images, among which the images corresponding to a certain action account for a large proportion. Then the user can select and delete multiple frames of images from the images corresponding to the action to reduce the proportion of the images corresponding to the action in the stop-motion animation.

The user's customized editing of the stop-motion animation can also be to replace or modify one or more frames of images in the stop-motion animation. It should be understood that replacement can refer to overwriting one or more frames of images in the stop-motion animation with one or more frames of images in the video to be processed (or multiple frames of images corresponding to each action of the dynamic object); modification can refer to the improvement of at least one frame of the stop-motion animation in terms of color contrast, exposure, filters, text, special effects, etc.

In practical applications, the user's customized editing of the stop-motion animation may also be switching or moving the position of at least one frame of the stop-motion animation in the stop-motion animation, and so on.

In addition, the generated stop-motion animation is obtained from the video to be processed based on the dynamic object determined by the user. Therefore, the stop-motion animation corresponding to the dynamic object can be obtained from the video to be processed according to the different dynamic objects selected by the user. Even if there is noise interference in the video to be processed, it will not affect the generation of the stop-motion animation corresponding to the dynamic object. In other words, when there is noise interference in the video to be processed, there is no need to re-shoot the video to be processed. The stop-motion animation generation method provided in the present application can be used to generate the stop-motion animation corresponding to the dynamic object determined by the user, thereby avoiding the problem of repeatedly obtaining the video to be processed due to noise interference in the video to be processed, shortening the production cycle of the stop-motion animation, and further improving the production efficiency of the stop-motion animation.

In order to reduce the influence of noise interference (such as interfering objects) that may exist in the video to be processed on the accuracy of the stop-motion animation, optionally, after obtaining the key frame sequence corresponding to each action, the object in each key frame can also be identified. If there is an interfering object in the first key frame in the key frame sequence, the interfering object in the first key frame can be eliminated first, and then the key frame sequence can be connected to generate the stop-motion animation. Alternatively, the first key frame with the interfering object is directly deleted from the key frame sequence, and the other key frames in the key frame sequence except the first key frame are connected to generate the stop-motion animation.

In practical applications, the interference objects may be pre-defined for different application fields, or may be determined by the user from the video to be processed.

It should be understood that when the video to be processed is a video that has been shot in advance, the interference object may refer to an object other than a dynamic object. If there is an interference object in the first key frame, the content of the area corresponding to the interference object in the first key frame where the interference object exists can be inferred based on the area corresponding to the interference object in the adjacent frames of the key frame sequence of the first key frame, so as to eliminate the interference object in the first key frame.

Among them, the adjacent frame is the key frame adjacent to the first key frame with the interference object. Assuming that the key frame with the interference object is the Nth frame key frame, the key frame adjacent to the Nth frame key frame can be the N-1th frame key frame and/or the N+1th frame key frame, or the N-xth frame key frame and/or the N+yth frame key frame, wherein x≥0, y≥0, and the specific values of x and y can be determined according to the actual number of key frames, and this application does not limit it.

For example, taking the interface shown in FIG8 as an example, assuming that during the process of identifying the objects in the key frames, it is determined that there is an interference object "human hand" in the key frame 5, the key frames 4 and 6 adjacent to the key frame 5 can be obtained, and the first area and the second area corresponding to the interference object in the key frames 4 and 6 can be determined. Assuming that the values of each pixel in the first area and the second area are both 255, then based on the first area and the second area, it can be inferred that the interference object in the key frame 5 The value of each pixel point in the area where the disturbing object is located is also 255. Then, each pixel point in the area where the disturbing object is located in the fifth key frame can be updated to 255 to eliminate the disturbing object in the fifth key frame.

It is not difficult to understand that the method of eliminating the interfering object in the first key frame includes but is not limited to updating the specific values of the pixels in the area where the interfering object is located, or obtaining the image of the area corresponding to the interfering object in the adjacent key frame to cover the area corresponding to the interfering object in the first key frame where the interfering object exists, etc. This application does not make specific limitations on this.

Optionally, taking the interface shown in FIG6 as an example, the user selects the stop-motion shooting mode in the shooting interface, clicks the "shoot" control to start shooting the video to be processed, and during the shooting of the video to be processed, if the electronic device detects a newly added object, the electronic device can directly display a first indication information, and the first indication information is used to determine whether the newly added object is an interference object. In response to the user's confirmation of the above-mentioned first indication information, if the newly added object is an interference object, at least one video frame containing the interference object can be deleted from the already shot video to be processed. On the contrary, if the newly added object is not an interference object, the video to be processed can continue to be shot.

Optionally, if the electronic device detects a new object and the new object is an interference object, the electronic device may stop adding the captured images within the shooting range to the video to be processed, and continue shooting the video to be processed when no interference object is detected within the detection range.

Alternatively, if the electronic device detects a new object and the new object is an interference object, the electronic device stops shooting the video to be processed and displays a confirmation message such as "Do you want to continue shooting the video to be processed?", and continues shooting the video to be processed in response to the user's confirmation of the confirmation message.

It can be understood that the above-mentioned optional methods are only some examples that can be executed in the embodiments of the present application. In the actual process of shooting the video to be processed, there may be other operations or variations of various operations.

The following is an exemplary description of the collaborative execution between the electronic device and the cloud server.

FIG. 9 is a flowchart of an embodiment of a stop-motion animation generation method provided in an embodiment of the present application. Referring to FIG. 9 , the stop-motion animation generation method is applied to a cloud server, and includes the following steps 901 to 906:

901, the electronic device sends indication information of the dynamic object in the first image and the video to be processed to the cloud server.

In this embodiment, the first image may be an image taken before the video to be processed is shot, or an image in the video to be processed. The indication information may be the position information of the dynamic object in the first image. The cloud server may directly receive the indication information of the dynamic object in the first image sent by the electronic device, and determine the dynamic object in the first image according to the indication information of the dynamic object. Exemplarily, the electronic device may detect objects within the shooting range. Taking the shooting interface shown in FIG. 7 as an example, the electronic device detects the "pepper" object within the shooting range. After the user clicks on the display screen area corresponding to the pepper in the shooting interface, the focusing operation on the dynamic object in the shooting interface is triggered, and the position information corresponding to the selection operation in the shooting interface is determined as the indication information of the dynamic object. After the cloud server receives the position information of the dynamic object in the shooting interface sent by the electronic device, the dynamic object in the shooting interface may be determined according to the position information.

In other examples, the indication information may also be identification information of a dynamic object in the first image. It can be understood that the object in the first image is detected to obtain identifications corresponding to at least one object in the first image, and in response to the first operation of the user, the identification of the dynamic object is determined from the identification corresponding to at least one object, and the electronic device sends the identification information of the dynamic object in the first image to the cloud server, and the cloud server determines the dynamic object according to the identification information after receiving the identification information. For example, the object in the first image is detected to obtain a "pepper" object, a "tomato" object, and a "plate" object in the first image, wherein "pepper", "tomato" and "plate" respectively identify the objects in the first image, and in response to the first operation of the user, the identification of the dynamic object is selected from the identification corresponding to at least one object as the "pepper" identification, that is, the "pepper" is determined as the identification information of the dynamic object, and after the cloud server receives the identification information of "pepper", it can be determined that the dynamic object is pepper.

902, the cloud server determines the dynamic object according to the indication information.

903, the cloud server determines multiple frames of images corresponding to each action of the dynamic object in the video to be processed.

904 , the cloud server performs frame extraction processing on the multiple frames of images corresponding to each action to obtain a key frame sequence corresponding to each action.

905 , the cloud server generates a stop-motion animation according to the corresponding key frame sequence of each action.

906 , the cloud server sends the stop-motion animation to the electronic device.

It should be understood that after the cloud server generates a stop-motion animation based on the dynamic object and the video to be processed, the generated stop-motion animation is sent to the electronic device. After the electronic device receives the stop-motion animation, the user can view the generated stop-motion animation in the interface shown in Figure 4, or can view the generated stop-motion animation in the video storage module of the electronic device.

If the video to be processed is a video that has been pre-shot, a stop-motion animation can be obtained by the method shown in FIG10 . Referring to FIG10 , the stop-motion animation generation method specifically includes the following steps 1001 to 1011:

1001, the electronic device sends the video to be processed to the cloud server.

It should be understood that the video to be processed is a video that has been shot in advance. Taking the interface shown in FIG4 as an example, an add control for the video to be processed is set on the display interface of the electronic device display screen. After the user clicks the add control, it jumps to the video storage module of the electronic device. After the user selects the video to be processed and clicks the upload control, the electronic device sends the video to be processed to the cloud server.

1002. The cloud server determines a plurality of first annotated images in the video to be processed, each of which is annotated with at least one object.

Exemplarily, after receiving the video to be processed sent by the electronic device, the cloud server can obtain multiple video frames corresponding to the video to be processed, identify at least one object in each video frame, obtain a first annotated image marked with at least one object, and send the multiple first annotated images to the electronic device. Alternatively, after receiving the video to be processed sent by the electronic device, the cloud server can first divide the video to be processed into multiple shooting scenes, perform frame extraction processing on each shooting scene, obtain a scene image corresponding to each shooting scene, and then perform object recognition on each scene image to obtain a first annotated image marked with at least one object, and then send the multiple first annotated images to the electronic device.

1003. The cloud server sends a plurality of first annotated images to the electronic device.

1004. The electronic device displays a plurality of first annotated images.

After receiving the multiple first annotated images, the electronic device displays the multiple first annotated images on a display interface so that the user can determine the dynamic object from at least one object annotated in each first annotated image.

1005 : In response to a first operation of the user, the electronic device determines a dynamic object from at least one object annotated in each first annotated image.

1006 , the electronic device sends, to the cloud server, indication information of a dynamic object determined from at least one object annotated in each first annotated image.

After receiving the indication information of the dynamic object determined from at least one object annotated in each first annotated image sent by the electronic device, the cloud server determines the dynamic object in each first annotated image according to the indication information of the dynamic object in each first annotated image.

1007, the cloud server determines the dynamic object according to the indication information;

After receiving the indication information of the dynamic object sent by the electronic device, the cloud server can determine the dynamic object in the video to be processed according to the indication information.

1008. The cloud server determines multiple frames of images corresponding to each action of the dynamic object in the video to be processed.

1009 , perform frame extraction processing on multiple frames of images corresponding to each action to obtain a key frame sequence corresponding to each action.

At 1010 , a stop-motion animation is generated according to a corresponding key frame sequence of each action.

1011, sending a stop motion animation to an electronic device.

The above steps 1001 to 1011 can be understood with reference to the stop-motion animation generation method of steps 301 to 302 and the stop-motion animation generation method of steps 901 to 906, and will not be described in detail here.

Based on the above implementation, when the video to be processed is a video that has been shot in advance, after obtaining the video to be processed, the user can select the corresponding dynamic object from multiple first annotated images corresponding to the video to be processed, so as to facilitate the subsequent generation of a stop-motion animation corresponding to the video to be processed according to the dynamic objects of each first annotated image, thereby reducing the interference of other noises in the video to be processed on the stop-motion animation and improving the accuracy of generating the stop-motion animation.

In order to expand the shooting mode of stop-motion animation in practical applications, the video to be processed can also be a video shot in real time, so that the dynamic object can be determined before shooting the video to be processed, and then the stop-motion animation can be generated according to the determined dynamic object and the video shot in real time. As shown in FIG11, a flowchart of another embodiment of a stop-motion animation generation method provided in an embodiment of the present application is shown in FIG11. The following steps 1101 to 1111 of the stop-motion animation generation method include:

1101. An electronic device sends a first image to a cloud server, where the first image includes at least one object.

It should be understood that the first image can be an image taken before shooting the video to be processed. Taking the interface shown in Figure 6 as an example, the user clicks the "camera" icon to display the shooting interface, and then the user selects the shooting mode as photo shooting in the shooting interface. When the user clicks the "shoot" control, the image taken before shooting the video to be processed can be obtained.

1102. The cloud server determines a second annotated image according to the first image, where at least one object is annotated in the second annotated image.

After receiving the first image sent by the electronic device, the cloud server can identify the object in the first image to obtain a second annotated image corresponding to the first image and annotated with at least one object.

1103. The cloud server sends a second annotated image corresponding to the first image to the electronic device.

1104, display a second annotated image on the electronic device.

1105 , in response to a first operation of the user, determine indication information of a dynamic object from at least one object.

1106. The electronic device sends the indication information of the dynamic object and the video to be processed to the cloud server.

Still taking the interface shown in FIG. 6 as an example, the video to be processed is a video shot in real time, that is, the video shot after the user selects the stop-motion animation shooting mode in the shooting interface and clicks the “shoot” control.

1107, the cloud server determines the dynamic object according to the indication information;

1108. The cloud server determines multiple frames of images corresponding to each action of the dynamic object in the video to be processed.

After receiving the indication information of the dynamic object and the video to be processed sent by the electronic device, the cloud server can determine the dynamic object in the video to be processed according to the indication information of the dynamic object, and then determine multiple frames of images corresponding to each action of the dynamic object from the video to be processed.

1109, the cloud server extracts frames of the multiple frames corresponding to each action to obtain a key frame sequence corresponding to each action.

At 1110 , the cloud server generates a stop-motion animation according to a corresponding key frame sequence of each action.

1111, the cloud server sends the stop-motion animation to the electronic device.

The above steps 1101 to 1111 can be understood by referring to the stop-motion animation generation method of steps 301 to 302 and the stop-motion animation generation method of steps 901 to 906, which will not be described in detail here.

Based on the above implementation, before shooting the video to be processed, the dynamic object can be determined based on the acquired first image, and then in the process of shooting the video to be processed, the shot video to be processed can be processed according to the determined dynamic object. When the video to be processed is shot, it can be quickly turned into a corresponding stop-motion animation, which not only enriches the shooting mode of the stop-motion animation, but also speeds up the processing of the video to be processed and improves the shooting efficiency of the stop-motion animation.

It should be understood that the size of the serial numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Corresponding to the stop-motion animation generation method described in the above embodiment, FIG12 is a schematic block diagram of an electronic device 1200 provided in an embodiment of the present application. The electronic device 1200 shown in FIG12 includes a dynamic object determination unit 1210 and a stop-motion animation determination unit 1220 .

The dynamic object determining unit 1210 is configured to determine a dynamic object in response to a first operation of a user;

The stop-motion animation determining unit 1220 is used to determine the stop-motion animation according to the dynamic object and the video to be processed, wherein the video to be processed includes the dynamic object, and each frame image in the stop-motion animation is a video frame in the video to be processed.

Optionally, the dynamic object determination unit 1210 is further used to: obtain the video to be processed; display multiple first annotated images in the video to be processed, each of the first annotated images annotated with at least one object; and determine the dynamic object from the at least one object annotated in each of the first annotated images in response to a first operation of the user.

Optionally, displaying multiple first annotated images in the video to be processed includes: determining multiple shooting scenes in the video to be processed; performing frame extraction processing on each shooting scene to obtain a scene image corresponding to each of the shooting scenes; performing object recognition on each of the scene images to obtain the first annotated images corresponding to each scene image.

Optionally, displaying the plurality of first annotated images in the video to be processed includes: in the process of shooting the video to be processed, when detecting that the shooting scene changes from a first scene to a second scene, acquiring a video frame sequence corresponding to the first scene from a shot video clip;

Object recognition is performed on the scene image to obtain the first annotated image corresponding to the first scene and annotated with at least one object.

Optionally, displaying the multiple first annotated images in the video to be processed includes: sending the video to be processed to a cloud server; receiving the multiple first annotated images sent by the cloud server, and displaying the multiple first annotated images.

Optionally, the dynamic object determination unit 1210 is further used to: acquire a first image, the first image including at least one object; display a second annotated image based on the first image, the second annotated image annotated with at least one object; and determine the dynamic object from the at least one object in response to a first operation of a user.

Optionally, displaying a second annotated image according to the first image includes:

The second annotated image is displayed.

Sending the first image to a cloud server;

The second annotated image is displayed.

Optionally, the first operation is a focusing operation, and determining the dynamic object in response to the first operation of the user includes:

Acquire a first image, wherein the first image includes at least one object;

Optionally, the first image is an image captured before the video to be processed is captured, or is an image in the video to be processed.

Optionally, the stop-motion animation determination unit 1220 is further configured to:

Optionally, the method further comprises:

Optionally, eliminating the interfering object in the first key frame includes:

The interfering object in the first key frame is eliminated according to a region of the first key frame corresponding to the interfering object in adjacent frames of a key frame sequence.

Optionally, the performing frame extraction processing on the multiple frames of images corresponding to each of the actions to obtain a key frame sequence corresponding to each of the actions includes:

Optionally, determining the stop-motion animation according to the dynamic object and the video to be processed includes:

Sending indication information of the dynamic object to a cloud server;

The stop motion animation is received.

Fig. 13 is a schematic block diagram of a cloud server 1300 provided in an embodiment of the present application. The cloud server 1300 shown in Fig. 13 includes a receiving unit 1310, a determining unit 1320, a processing unit 1330, a generating unit 1340 and a sending unit 1350.

The receiving unit 1310 is used to receive indication information and a video to be processed corresponding to a dynamic object sent by an electronic device, and determine the dynamic object according to the indication information; the determining unit 1320 is used to determine a plurality of frames of images corresponding to each action of the dynamic object in the video to be processed; the processing unit 1330 is used to perform frame extraction processing on the plurality of frames of images corresponding to each action, respectively, to obtain a key frame sequence corresponding to each action; the generating unit 1340 is used to generate a stop-motion animation according to the key frame sequence corresponding to each action; and the sending unit 1350 is used to send the stop-motion animation to the electronic device.

Optionally, the receiving unit 1310 is further used to: receive the video to be processed sent by an electronic device; send a plurality of first annotated images determined from the video to be processed to the electronic device, each of the first annotated images being annotated with at least one object; and receive indication information of the dynamic object determined from the at least one object annotated in each of the first annotated images sent by the electronic device.

Optionally, the method for determining the plurality of first annotated images comprises: determining a plurality of shooting scenes in the video to be processed; performing frame extraction processing on each shooting scene to obtain a scene image corresponding to each shooting scene; The object recognition is performed on the scene image to obtain the first annotated image corresponding to each scene image.

Optionally, the method for determining the multiple first annotated images includes: in the process of shooting the video to be processed, when it is detected that the shooting scene changes from a first scene to a second scene, obtaining a video frame sequence corresponding to the first scene from the shot video clip; performing frame extraction processing on the video frame sequence to obtain a scene image corresponding to the first scene; and performing object recognition on the scene image to obtain the first annotated image of the first scene.

Optionally, the receiving unit 1310 is further used to: receive a first image sent by an electronic device, the first image including at least one object; determine a second annotated image based on the first image, the second annotated image annotated with at least one object; send a second annotated image corresponding to the first image to the electronic device; and receive indication information of the dynamic object determined from the at least one object sent by the electronic device.

Optionally, determining the second annotated image according to the first image includes: performing object recognition on the first image to obtain the second annotated image.

Optionally, the receiving unit 1310 is further used to: receive indication information of the dynamic object in the first image sent by the electronic device.

Optionally, the cloud server 1300 further includes: an elimination unit, configured to eliminate the interference object in the first key frame if there is an interference object in the first key frame.

Optionally, the eliminating unit is further configured to eliminate the interfering object in the first key frame according to a region of the first key frame that corresponds to the interfering object in adjacent frames of a key frame sequence.

Optionally, the processing unit 1330 is further used to determine the key frame sequence corresponding to each of the actions according to the pixel mean of the area corresponding to the dynamic object in the multiple frames of images corresponding to each of the actions.

It should be understood that the description of the device embodiment can refer to the above-mentioned description of the electronic device and the stop-motion animation generation method embodiment. Its implementation principle and technical effects are similar to those of the above-mentioned method embodiment and will not be repeated here.

Based on the methods provided in the above embodiments, the embodiments of the present application also provide the following contents:

An embodiment of the present application provides a computer program product, which includes a program. When the program is executed by an electronic device, the electronic device implements the stop-motion animation generation method shown in the above embodiments.

An embodiment of the present application provides a computer-readable storage medium, which stores a computer program. When the computer program is executed by a processor, the stop-motion animation generation method shown in the above embodiments is implemented.

An embodiment of the present application provides a chip, which includes a memory and a processor. The processor executes a computer program stored in the memory to control the above-mentioned electronic device to execute the stop-motion animation generation method shown in the above-mentioned embodiments.

It should be understood that the processor mentioned in the embodiments of the present application may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc.

It should also be understood that the memory mentioned in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. Among them, the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. The volatile memory may be a random access memory (RAM), which is used as an external cache. By way of example and not limitation, many forms of RAM are available, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (Synchlink DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM).

Those skilled in the art will clearly understand that for the convenience and brevity of description, only the above-mentioned functional units, modules and The division of blocks is illustrated by way of example. In practical applications, the above-mentioned functions can be distributed to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application. The specific working process of the units and modules in the above-mentioned system can refer to the corresponding process in the aforementioned method embodiment, which will not be repeated here.

In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described or recorded in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

In the embodiments provided in the present application, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the system embodiments described above are merely schematic. For example, the division of the modules or units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or units, which can be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present application implements all or part of the processes in the above-mentioned embodiment method, which can be completed by instructing the relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium, and the computer program can implement the steps of the above-mentioned various method embodiments when executed by the processor. Among them, the computer program includes computer program code, and the computer program code can be in source code form, object code form, executable file or some intermediate form. The computer-readable medium may at least include: any entity or device that can carry the computer program code to a large-screen device, a recording medium, a computer memory, a read-only memory (ROM), a random access memory (RAM), an electric carrier signal, a telecommunication signal, and a software distribution medium. For example, a USB flash drive, a mobile hard disk, a magnetic disk or an optical disk. In some jurisdictions, according to legislation and patent practice, computer-readable media cannot be electric carrier signals and telecommunication signals.

Finally, it should be noted that the above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any changes or substitutions within the technical scope disclosed in the present application should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

A stop-motion animation generation method, characterized in that it is applied to an electronic device, the method comprising:

In response to a first operation by a user, determining a dynamic object;

A stop-motion animation is determined according to the dynamic object and a video to be processed, wherein the video to be processed includes the dynamic object, and each frame image in the stop-motion animation is a video frame in the video to be processed.
The method according to claim 1, wherein determining the dynamic object in response to the first operation of the user comprises:

Obtaining the video to be processed;

Displaying a plurality of first annotated images in the video to be processed, each of the first annotated images being annotated with at least one object;

In response to the first operation, the dynamic object is determined from the at least one object annotated in each of the first annotated images.
The method according to claim 2, characterized in that the displaying of the plurality of first annotated images in the video to be processed comprises:

Determining multiple shooting scenes in the video to be processed;

Performing frame extraction processing on each shooting scene to obtain a scene image corresponding to each shooting scene;

Object recognition is performed on each of the scene images to obtain the first annotated image corresponding to each of the scene images.
The method according to claim 2, characterized in that the displaying of the plurality of first annotated images in the video to be processed comprises:

In the process of shooting the video to be processed, when it is detected that the shooting scene changes from the first scene to the second scene, a video frame sequence corresponding to the first scene is acquired from the shot video clips;

Performing frame extraction processing on the video frame sequence to obtain a scene image corresponding to the first scene;

Object recognition is performed on the scene image to obtain the first annotated image of the first scene.
The method according to claim 2, characterized in that the displaying of the plurality of first annotated images in the video to be processed comprises:

Sending the video to be processed to a cloud server;

Receiving the plurality of first annotated images sent by the cloud server;

A plurality of the first annotated images are displayed.
The method according to claim 1, wherein determining the dynamic object in response to the first operation of the user comprises:

Acquire a first image, wherein the first image includes at least one object;

displaying a second annotated image according to the first image, wherein the second annotated image is annotated with the at least one object;

In response to the first operation, the dynamic object is determined from the at least one object.
The method according to claim 6, characterized in that displaying the second annotated image according to the first image comprises:

Performing object recognition on the first image to obtain the second annotated image;

The second annotated image is displayed.
The method according to claim 6, characterized in that displaying the second annotated image according to the first image comprises:

Sending the first image to a cloud server;

receiving the second annotated image corresponding to the first image and sent by the cloud server;

The second annotated image is displayed.
The method according to claim 1, wherein the first operation is a focusing operation, and determining the dynamic object in response to the first operation of the user comprises:

Acquire a first image, wherein the first image includes at least one object;

In response to the focusing operation of the user on the first image, the dynamic object is determined from the at least one object.
The method according to any one of claims 6 to 9, characterized in that the first image is captured by the video to be processed The image captured previously, or the image in the video to be processed.
The method according to any one of claims 1 to 10, characterized in that the step of determining the stop motion animation according to the dynamic object and the video to be processed comprises:

Determine a plurality of frames of images corresponding to each action of the dynamic object in the video to be processed;

Performing frame extraction processing on the multiple frames of images corresponding to each of the actions respectively to obtain a key frame sequence corresponding to each of the actions;

The stop-motion animation is generated according to the key frame sequence corresponding to each of the actions.
The method according to claim 11, characterized in that the method further comprises:

If an interfering object exists in a first key frame of the key frame sequence, the interfering object in the first key frame is eliminated.
The method according to claim 12, characterized in that eliminating the interfering object in the first key frame comprises:

The interfering object in the first key frame is eliminated according to a region of the first key frame corresponding to the interfering object in adjacent frames of the key frame sequence.
The method according to any one of claims 11 to 13, characterized in that the step of performing frame extraction processing on the multiple frames of images corresponding to each of the actions to obtain a key frame sequence corresponding to each of the actions comprises:

The key frame sequence corresponding to each action is determined according to the pixel average value of the area corresponding to the dynamic object in the multiple frames of images corresponding to each action.
The method according to claim 5, characterized in that the step of determining the stop motion animation according to the dynamic object and the video to be processed comprises:

Sending the dynamic object indication information to a cloud server;

The stop motion animation is received.
The method according to claim 8 or 9, characterized in that the step of determining the stop motion animation according to the dynamic object and the video to be processed comprises:

Sending the indication information of the dynamic object and the video to be processed to a cloud server;

The stop motion animation is received.
A stop-motion animation generation method, characterized in that it is applied to a cloud server, and the method comprises:

Receiving indication information of a dynamic object and a video to be processed sent by an electronic device, and determining the dynamic object according to the indication information;

Determine a plurality of frames of images corresponding to each action of the dynamic object in the video to be processed;

Performing frame extraction processing on the multiple frames of images corresponding to each of the actions respectively to obtain a key frame sequence corresponding to each of the actions;

generating a stop-motion animation according to the key frame sequence corresponding to each of the actions;

The stop-motion animation is transmitted to an electronic device.
The method according to claim 17, characterized in that the step of receiving the indication information of the dynamic object and the video to be processed sent by the electronic device comprises:

Receiving the video to be processed sent by the electronic device;

Sending a plurality of first annotated images determined from the video to be processed to the electronic device, each of the first annotated images being annotated with at least one object;

The indication information of the dynamic object determined from the at least one object annotated in each of the first annotated images is received and sent by the electronic device.
The method according to claim 18, characterized in that the method for determining the plurality of first annotated images comprises:

Determining multiple shooting scenes in the video to be processed;

Performing frame extraction processing on each shooting scene to obtain a scene image corresponding to each shooting scene;

Object recognition is performed on each of the scene images to obtain the first annotated image corresponding to each of the scene images.
The method according to claim 18, characterized in that the method for determining the plurality of first annotated images comprises:

In the process of shooting the video to be processed, when it is detected that the shooting scene changes from the first scene to the second scene, a video frame sequence corresponding to the first scene is acquired from the shot video clips;

Performing frame extraction processing on the video frame sequence to obtain a scene image corresponding to the first scene;

Object recognition is performed on the scene image to obtain the first annotated image of the first scene.
The method according to claim 17, wherein the receiving the indication information of the dynamic object sent by the electronic device comprises:

Receiving a first image sent by the electronic device, wherein the first image includes at least one object;

determining a second annotated image according to the first image, wherein the second annotated image is annotated with the at least one object;

Sending the second annotated image corresponding to the first image to the electronic device;

The indication information of the dynamic object determined from the at least one object is received and sent by the electronic device.
The method according to claim 21, characterized in that determining the second annotated image according to the first image comprises:

Perform object recognition on the first image to obtain the second annotated image.
The method according to claim 17, wherein the receiving the indication information corresponding to the dynamic object sent by the electronic device comprises:

Receive the indication information of the dynamic object in the first image sent by the electronic device.
The method according to any one of claims 21 to 23 is characterized in that the first image is an image taken before the video to be processed is shot, or is an image in the video to be processed.
The method according to any one of claims 17 to 24, characterized in that the method further comprises:

If an interfering object exists in a first key frame of the key frame sequence, the interfering object in the first key frame is eliminated.
The method according to claim 25, characterized in that eliminating the interfering object in the first key frame comprises:

The interfering object in the first key frame is eliminated according to a region of the first key frame corresponding to the interfering object in adjacent frames of the key frame sequence.
The method according to any one of claims 17 to 26, characterized in that the step of performing frame extraction processing on the multiple frames of images corresponding to each of the actions to obtain a key frame sequence corresponding to each of the actions comprises:

The key frame sequence corresponding to each action is determined according to the pixel average value of the area corresponding to the dynamic object in the multiple frames of images corresponding to each action.
An electronic device, characterized in that it comprises: a processor, wherein the processor is used to run a computer program stored in a memory to implement the method according to any one of claims 1 to 16.
A cloud server, characterized in that it comprises: a processor, wherein the processor is used to run a computer program stored in a memory to implement the method according to any one of claims 17 to 27.
A stop-motion animation generation system, characterized in that it comprises: at least one electronic device as described in claim 28 and/or a cloud server as described in claim 29.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements the method according to any one of claims 1 to 27.