CN112825544A

CN112825544A - Picture processing method and device and storage medium

Info

Publication number: CN112825544A
Application number: CN201911150201.9A
Authority: CN
Inventors: 梁瑀航
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2021-05-21

Abstract

The present disclosure relates to a picture processing method, an apparatus and a storage medium, wherein the method may include: identifying picture contents contained in a plurality of pictures in the album; selecting at least two target pictures related to the picture content from the photo album; selecting a first target audio matched with the picture contents of the at least two target pictures from a set audio set; and generating a target video based on the at least two target pictures and the first target audio. According to the technical scheme, the target picture can be automatically selected according to the picture content, and the target video is generated according to the target picture and the target audio associated with the target picture, so that on one hand, the steps of user operation are reduced, and further the workload of a user is reduced; on the other hand, the automatic matching process of the target audio is realized, the style of the target audio can be matched with the style of the generated target video, and the user experience is improved.

Description

Picture processing method and device and storage medium

Technical Field

The present disclosure relates to computer technologies, and in particular, to a method and an apparatus for processing an image, and a storage medium.

Background

At present, most electronic devices have an image capturing function, for example, a mobile terminal has become an indispensable part in life, work and study of people, and more users select to use a camera on the mobile terminal to capture images, and then process the captured images as required, for example, automatically generate images and movies, etc.

Disclosure of Invention

The disclosure provides a picture processing method, a picture processing device and a storage medium.

According to a first aspect of the embodiments of the present disclosure, there is provided an image processing method, including:

identifying picture contents contained in a plurality of pictures in the album;

selecting at least two target pictures related to the picture content from the photo album;

selecting a first target audio matched with the picture contents of the at least two target pictures from a set audio set;

and generating a target video based on the at least two target pictures and the first target audio.

Optionally, the method further includes:

setting labels of a plurality of pictures according to the picture content;

the selecting at least two target pictures related to the picture content from the photo album comprises:

and selecting at least two target pictures with the same label from the photo album.

Optionally, the method further includes:

acquiring shooting time of the at least two target pictures;

arranging the at least two target pictures according to the sequence of the shooting time;

generating a target video based on the at least two target pictures and the first target audio, including:

generating a picture sequence based on at least two target pictures which are arranged according to the sequence of the shooting time;

and combining the picture sequence and the first target audio to generate the target video.

Optionally, the method further includes:

acquiring the playing time length of the first target audio;

and calculating the playing time interval between each target picture in the picture sequence based on the playing time length of the first target audio.

Optionally, the method further includes:

storing the target video;

when an audio update operation for the target video is detected, determining a second target audio from the set of set audios based on the audio update operation;

and updating the first target audio contained in the target video into the second target audio.

According to a second aspect of the embodiments of the present disclosure, there is provided a picture processing apparatus including:

the identification module is configured to identify the picture content contained in a plurality of pictures in the album;

the first selection module is configured to select at least two target pictures related to the picture content from the photo album;

the second selection module is configured to select a first target audio matched with the picture contents of the at least two target pictures from a set audio set;

a generating module configured to generate a target video based on the at least two target pictures and the first target audio.

Optionally, the apparatus further comprises:

the setting module is configured to set labels of a plurality of pictures according to the picture content;

the first selecting module is further configured to:

Optionally, the apparatus further comprises:

the first acquisition module is configured to acquire the shooting time of the at least two target pictures;

the sorting module is configured to arrange the at least two target pictures according to the sequence of the shooting time;

the generation module is further configured to:

Optionally, the apparatus further comprises:

the second acquisition module is configured to acquire the playing time length of the first target audio;

and the calculation module is configured to calculate the playing time interval between each target picture in the picture sequence based on the playing time length of the first target audio.

Optionally, the apparatus further comprises:

a storage module configured to store the target video;

a determining module configured to determine a second target audio from the set of set audios based on an audio update operation when the audio update operation for the target video is detected;

and the updating module is configured to update the first target audio contained in the target video to the second target audio.

According to a third aspect of the embodiments of the present disclosure, there is provided a picture processing apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: the steps in the picture processing method of the first aspect are implemented when executed.

According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor of a picture processing apparatus, enable the apparatus to perform the picture processing method of the first aspect.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

according to the embodiment, the target video is generated by selecting at least two target pictures related to the picture content from the photo album, selecting the first target audio matched with the picture content of the at least two target pictures from the set audio set, and based on the at least two target pictures and the first target audio. Therefore, the target picture can be automatically selected according to the picture content, and the target video is generated according to the target picture and the target audio associated with the target picture, so that the steps of user operation are reduced on one hand, and the workload of a user is further reduced; on the other hand, the automatic matching process of the target audio is realized, the style of the target audio can be matched with the style of the generated target video, and the user experience is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a first flowchart illustrating a picture processing method according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating a picture processing method according to an exemplary embodiment.

Fig. 3 is a third flowchart illustrating a picture processing method according to an exemplary embodiment.

Fig. 4 is a block diagram of a picture processing apparatus shown in accordance with an example embodiment.

Fig. 5 is a block diagram of a picture processing apparatus shown in accordance with an exemplary embodiment.

Fig. 6 is a block diagram of a picture processing apparatus shown in accordance with an exemplary embodiment.

Fig. 7 is a block diagram illustrating a method for a picture processing apparatus according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

An embodiment of the present disclosure provides a picture processing method, fig. 1 is a first flowchart illustrating the picture processing method according to an exemplary embodiment, and as shown in fig. 1, the method mainly includes the following steps:

in step 101, identifying picture content contained in a plurality of pictures in an album;

at step 102, at least two target pictures related to the picture content are selected from the photo album;

in step 103, selecting a first target audio matched with the picture contents of at least two target pictures from a set audio set;

in step 104, a target video is generated based on the at least two target pictures and the first target audio.

In the embodiment of the disclosure, the image processing method may be applied to an electronic device, and the electronic device may be a mobile terminal or a fixed terminal. Wherein, the mobile terminal can include: cell-phone, panel computer, notebook computer etc. fixed terminal can include: personal computers, and the like. The image recognition technology is a technology for recognizing objects in various modes by carrying out object recognition on images, and in the implementation process, the images can be recognized based on different image recognition models. For example, the image recognition model may be trained based on a set image scene to obtain a scene recognition model, and then the scene content of the target image may be recognized based on the scene recognition model. Here, the image recognition model may be a neural network model. Taking the electronic device as a mobile terminal as an example, the plurality of pictures may be pictures collected based on an image collection component included in the mobile terminal and stored in an album of the mobile terminal or pictures collected by using an application installed on the mobile terminal. For example, a user may store a collected picture in an album of the mobile terminal by collecting the picture through a picture collecting component in the mobile terminal. The picture content may include information of a collection location of the picture, information of a use of the collection location included in the picture, and/or identity information of a portrait included in the picture.

After the information contained in the pictures is determined, whether the pictures are related or not can be determined, and at least two target pictures can be further determined. For example, if the acquisition location information of the first picture is the same as that of the second picture, it is determined that the first picture and the second picture are associated, and thus the first picture and the second picture can be determined as the target picture.

After the target picture is determined, a first target audio matched with the picture content of the target picture can be selected from the set audio set. Here, also taking the first picture and the second picture with the same acquisition location as an example, if the acquisition location information indicates that the location where the first picture and the second picture are acquired is a family party address, it can be determined that the audio matching the acquisition location information should match the family party, and then the audio with a warmer style can be determined from the set audio set as the first target audio. The audio included in the set of audio may be the audio acquired by an audio acquisition component included in the electronic device, or may be the audio downloaded based on the network, and is not limited specifically herein.

After the first target audio is determined, a target video may be generated based on the target picture and the first target audio. The target video may be generated by combining the target picture and the first target audio, for example, a picture sequence generated based on at least two target pictures and the first target audio are placed in a setting folder, and an association relationship between the picture sequence and the first target audio is established, so as to play the first target audio while playing the picture sequence.

In the embodiment of the application, any one picture in the image sequence is taken as an image frame in the target video. For example, in the process of playing the target video, each target picture of the at least two target pictures may be played at a set time interval, and in the process of playing each target picture, the first target audio may be played at the same time.

In the embodiment of the disclosure, the target picture can be automatically selected according to the picture content, and the target video is generated according to the target picture and the target audio associated with the target picture, so that on one hand, the steps of user operation are reduced, and further the workload of the user is reduced; on the other hand, the automatic matching process of the target audio is realized, the style of the target audio can be matched with the style of the generated target video, and the user experience is improved.

Fig. 2 is a flowchart illustrating a second picture processing method according to an exemplary embodiment, where as shown in fig. 2, the method mainly includes the following steps:

in step 201, identifying picture content contained in a plurality of pictures in an album;

in step 202, setting labels of a plurality of pictures according to the picture contents of the plurality of pictures;

in step 203, at least two target pictures with the same label are selected from the photo album;

in step 204, selecting a first target audio matched with the picture contents of at least two target pictures from a set audio set;

in step 205, a target video is generated based on the at least two target pictures and the first target audio.

Here, the labels of the plurality of pictures may be set according to the picture contents of the plurality of pictures. For example, if it is determined that the picture content of the first picture includes information related to a family party, such as address information of the family party, based on the picture content of the first picture, the tag of the first picture may be set as a family tag; if the picture content of the second picture is determined to include information related to the motion, such as fitness equipment information and the like, based on the picture content of the second picture, the tag of the second picture can be set as a motion tag; if it is determined that the picture content of the third picture includes the related information of the family party, for example, the identity information of the family members, based on the picture content of the third picture, the tag of the third picture may be set as the family tag.

After setting the tags of the respective pictures, at least two related target pictures may be selected from the album based on the tags of the respective pictures, that is, at least two pictures with the same tag are selected. For example, if the tags of the first picture and the third picture are both family tags, the first picture and the third picture may be determined as at least two target pictures. The picture content of each picture is determined by analyzing the picture content of the picture, and a label corresponding to the style of the picture is added according to the style of each picture. Therefore, the associated target picture can be determined quickly according to the label of the picture, and the picture processing efficiency can be improved.

In other alternative embodiments, after a picture is acquired, the label of the picture may be set according to the image information included in each picture. For example, after a picture is acquired based on the image acquisition component, the content of the picture contained in the picture can be analyzed, the label of the picture is further set, and then the association relationship between the picture and the label is established and stored. Therefore, when the target video needs to be generated, the label of the corresponding picture can be directly determined based on the association relation, and the label can be directly used, so that the process of generating the label can be saved, and the efficiency of generating the target video is further improved.

In other optional embodiments, the method further comprises:

acquiring shooting time of at least two target pictures;

arranging at least two target pictures according to the sequence of shooting time;

generating a target video based on the at least two target pictures and the first target audio, comprising:

and combining the picture sequence and the first target audio to generate a target video.

Fig. 3 is a third schematic flowchart illustrating a picture processing method according to an exemplary embodiment, where as shown in fig. 3, the method mainly includes the following steps:

in step 31, identifying the picture content contained in a plurality of pictures in the album;

at step 32, at least two target pictures related to the picture content are selected from the photo album;

in step 33, acquiring the shooting time of at least two target pictures;

in step 34, arranging at least two target pictures according to the sequence of the shooting time;

in step 35, selecting a first target audio matched with the picture contents of at least two target pictures from a set audio set;

in step 36, a picture sequence is generated based on at least two target pictures arranged according to the sequence of the shooting time;

in step 37, the sequence of pictures and the first target audio are combined to generate a target video.

Here, when the target image is captured by the image capturing assembly, the photographing time for capturing the target image can be simultaneously acquired and stored. Arranging at least two target pictures according to the sequence of shooting time, comprising the following steps: sequencing at least two target pictures from front to back according to shooting time; or sequencing at least two target pictures from back to front according to the shooting time.

In the embodiment of the disclosure, after at least two target pictures are selected from the photo album, the shooting time of the at least two target pictures can be simultaneously obtained, and the at least two selected target pictures are sequenced based on the shooting time, so that the target pictures with similar shooting time are sequenced together, the continuity of the content can be ensured, and the user can watch the target pictures conveniently.

In other optional embodiments, the method further comprises:

acquiring the playing time length of a first target audio;

Here, after the first target audio is determined, the playing time duration of the first target audio may be obtained, so that the playing time interval between each target picture in the picture sequence can be calculated according to the playing time duration of the first target audio. For example, if the playing duration of the first target audio is one minute and there are thirty target pictures in total, the playing time interval between the target pictures can be set to two seconds, so that it can be ensured that when the playing of the first target audio is completed, the target pictures in the picture sequence are also just played, and the matching degree between the first target audio and the target pictures included in the picture sequence is higher.

In other alternative embodiments, the relationship between the playing time length of the first target audio and the playing time interval between the target pictures may also be set as required.

In other optional embodiments, the method further comprises:

storing the target video;

when an audio updating operation aiming at the target video is detected, determining a second target audio from the set audio based on the audio updating operation;

Here, since the user needs to update the first target audio included in the target video during the playing of the target video, the audio updating operation may be input based on the electronic device, and after the electronic device receives the audio updating operation, the second target audio may be determined from the set audio set based on the audio updating operation, and the first target audio in the target video may be updated to the second target audio. For example, family style audio is updated to sports style audio. In the embodiment of the disclosure, the first target audio contained in the target video can be updated based on the detected audio updating operation, so that not only is the flexibility of the picture processing process improved, but also the use experience of a user is improved.

In other optional embodiments, identifying the picture content included in the plurality of pictures in the album includes at least one of:

acquiring shooting location information of a plurality of pictures;

obtaining scene contents contained in a plurality of pictures based on a scene identification technology;

and obtaining the identity information of the portrait contained in the plurality of pictures based on an image recognition technology.

Here, a plurality of pictures shot in the same shooting place can be acquired, a picture sequence is generated from the plurality of pictures shot in the same shooting place according to the shooting time of the pictures, and then a target video is generated based on the picture sequence and the first target audio; or acquiring a plurality of pictures shot at a plurality of different shooting places, generating a picture sequence from the plurality of pictures shot at the plurality of different shooting places according to the distance between the shooting places, and further generating a target video based on the picture sequence and the first target audio. For example, pictures having relatively close shooting places are arranged together.

The method can also obtain the scene content contained in the pictures, generate a picture sequence from a plurality of pictures with the same scene content according to the shooting time of the pictures, and further generate the target video based on the picture sequence and the first target audio. For example, if the scene content of the collection location contained in the picture contains motion information, a plurality of pictures containing the motion information may be generated into a picture sequence according to the shooting time of the picture.

The identity information of the portrait contained in the picture can be obtained, a picture sequence is generated by a plurality of pictures with the same identity information of the portrait according to the shooting time of the pictures, and then the target video is generated based on the picture sequence and the first target audio. Therefore, on one hand, different picture contents can be identified, and the target video corresponding to the picture contents is generated based on the different picture contents, so that the diversity of the generated target video is improved; on the other hand, various pictures are classified and collected, and target audio corresponding to emotion is matched, so that a target video meeting the requirements of a user can be generated, and the use experience of the user is improved.

In other alternative embodiments, the scene content of the picture may be identified based on a scene identification technology, then the theme of the picture is determined based on the scene content, and a corresponding tag is added to the picture. In the process of generating the target video, the pictures (target pictures) with the same label can be classified into a collection, background music (first target audio) with the same type style is matched, and a picture movie (target video) is generated.

For example, based on a specific theme, the pictures in the picture library are filtered and combined according to the conditions of a timeline, people, places, tags and the like; and then generating a picture movie based on the background music matched with the corresponding emotional style by the system.

In the embodiment of the disclosure, the system can identify the scene content of the picture based on an intelligent scene identification technology, judge the theme of the picture according to the identified scene content, classify and collect the pictures with the same label, and generate the picture movie by matching with the background music with the same type and style. Carry out intelligent classification collection to the picture through the system to match the background music of similar emotion, generate the picture film that has commemorative meaning, let the warmth of user's perception recall, and need not the user and carry out numerous and diverse operation, can improve user's use experience degree.

Fig. 4 is a block diagram of a first image processing apparatus according to an exemplary embodiment, and as shown in fig. 4, the first image processing apparatus 400 mainly includes:

the identification module 401 is configured to identify picture contents included in a plurality of pictures in the album;

a first selecting module 402 configured to select at least two target pictures associated with the picture content from the album;

a second selecting module 403, configured to select a first target audio matched with the picture contents of the at least two target pictures from the set audio set;

a generating module 404 configured to generate a target video based on the at least two target pictures and the first target audio.

Fig. 5 is a block diagram of a second image processing apparatus according to an exemplary embodiment, and as shown in fig. 5, the image processing apparatus 500 mainly includes:

a setting module 501 configured to set labels of a plurality of pictures according to picture contents;

a first selecting module 402 configured to select at least two target pictures with the same label from the album;

Fig. 6 is a block diagram of a third image processing apparatus according to an exemplary embodiment, and as shown in fig. 6, the image processing apparatus 600 mainly includes:

a first obtaining module 601 configured to obtain shooting times of at least two target pictures;

the sorting module 602 is configured to arrange at least two target pictures according to the order of shooting time;

a generating module 404 configured to generate a picture sequence based on at least two target pictures arranged according to the sequence of the shooting time; and combining the picture sequence and the first target audio to generate a target video.

In other optional embodiments, the apparatus further comprises:

a storage module configured to store a target video;

the determining module is configured to determine a second target audio from the set audio set based on the audio updating operation when the audio updating operation aiming at the target video is detected;

and the updating module is configured to update the first target audio contained in the target video into the second target audio.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 7 is a block diagram illustrating a method for a picture processing apparatus 300 according to an example embodiment. For example, the apparatus 300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and so forth.

Referring to fig. 7, the apparatus 300 may include one or more of the following components: a processing component 302, a memory 304, a power component 306, a multimedia component 308, an audio component 310, an input/output (I/O) interface 312, a sensor component 314, and a communication component 316.

The processing component 302 generally controls overall operation of the device 300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 302 may include one or more processors 320 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 302 can also include one or more modules that facilitate interaction between the processing component 302 and other components. For example, the processing component 302 may include a multimedia module to facilitate interaction between the multimedia component 308 and the processing component 302.

The memory 304 is configured to store various types of data to support operations at the apparatus 300. Examples of such data include instructions for any application or method operating on the device 300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 304 may be implemented by any type or combination of volatile or non-volatile memory devices, such as static random access memory (SRBM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 306 provide power to the various components of device 300. The power assembly 306 may include: a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 300.

The multimedia component 308 includes a screen that provides an output interface between the device 300 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 308 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 300 is in an operating mode, such as a shooting mode or a video mode. Each front camera and/or rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 310 is configured to output and/or input audio signals. For example, audio component 310 includes a Microphone (MIC) configured to receive external audio signals when apparatus 300 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 304 or transmitted via the communication component 316. In some embodiments, audio component 310 also includes a speaker for outputting audio signals.

The I/O interface 312 provides an interface between the processing component 302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 314 includes one or more sensors for providing various aspects of status assessment for the device 300. For example, the sensor assembly 314 may detect the open/closed status of the device 300, the relative positioning of components, such as a display and keypad of the device 300, the sensor assembly 314 may also detect a change in the position of the device 300 or a component of the device 300, the presence or absence of user contact with the device 300, the orientation or acceleration/deceleration of the device 300, and a change in the temperature of the device 300. Sensor assembly 314 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 314 may also include a light sensor, such as a CMOS or CCD picture sensor, for use in imaging applications. In some embodiments, the sensor assembly 314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 316 is configured to facilitate wired or wireless communication between the apparatus 300 and other devices. The device 300 may access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 316 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 316 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDB) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, or other technologies.

In an exemplary embodiment, the apparatus 300 may be implemented by one or more application specific integrated circuits (BSICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), field programmable gate arrays (FPGB), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 304 comprising instructions, executable by the processor 320 of the apparatus 300 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a random access memory (RBM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Accordingly, the present disclosure also provides a non-transitory computer-readable storage medium, wherein when the instructions in the storage medium are executed by a processor of a picture processing apparatus, the apparatus is enabled to execute the picture processing method in the above embodiments, the method includes:

identifying picture contents contained in a plurality of pictures in the album;

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, comprising:

identifying picture contents contained in a plurality of pictures in the album;

2. The method of claim 1, further comprising:

setting labels of a plurality of pictures according to the picture content;

3. The method of claim 1, further comprising:

acquiring shooting time of the at least two target pictures;

4. The method of claim 3, further comprising:

acquiring the playing time length of the first target audio;

5. The method of claim 1, further comprising:

storing the target video;

6. A picture processing apparatus, comprising:

7. The apparatus of claim 6, further comprising:

the first selecting module is further configured to:

8. The apparatus of claim 6, further comprising:

the generation module is further configured to:

9. The apparatus of claim 8, further comprising:

10. The apparatus of claim 6, further comprising:

a storage module configured to store the target video;

11. A picture processing apparatus, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: when executed, implement the steps in any of the above-mentioned picture processing methods of claims 1 to 5.

12. A non-transitory computer readable storage medium, wherein instructions, when executed by a processor of a picture processing apparatus, enable the apparatus to perform the picture processing method of any of claims 1 to 5.