CN107122393B

CN107122393B - electronic album generating method and device

Info

Publication number: CN107122393B
Application number: CN201710138877.0A
Authority: CN
Inventors: 沙安澜
Original assignee: Beijing Small Mutual Entertainment Technology Co Ltd
Current assignee: Beijing Small Mutual Entertainment Technology Co Ltd
Priority date: 2017-03-09
Filing date: 2017-03-09
Publication date: 2019-12-10
Anticipated expiration: 2037-03-09
Also published as: CN107122393A

Abstract

The invention discloses an electronic photo album generating method and device. The method comprises the following steps: analyzing the content of the photo to generate descriptive information of the photo; and performing video synthesis according to the photos and the descriptive information to generate a target electronic photo album. Therefore, in the generation process of the electronic album, the user can intelligently complete the electronic album without requiring image processing knowledge, the album content is enriched while the manual manufacturing cost is saved, the interest of the album is improved, the imagination space is expanded, and the user experience is improved.

Description

electronic album generating method and device

Technical Field

The invention relates to the technical field of image processing, in particular to an electronic album generating method and device.

Background

with the development of computer technology and multimedia technology, multimedia resources that people come into contact with are increasingly abundant. With the expansion of interest of people, many people can directly shoot videos by using a video camera or a digital camera, and watch video programs by using player software on a computer, so that the method becomes a very common learning, leisure and entertainment mode for users.

Meanwhile, some very wonderful pictures are noted on the film and hopefully can be stored, and the wonderful pictures form an exquisite electronic photo album. Electronic photo albums are videos produced from a given set of photos, usually accompanied by background music and descriptive text. In the related art, most of electronic photo album manufacturing processes are that users match description characters and background music on photos based on video editing software to complete the electronic photo album.

however, there are problems that: the electronic photo album is completed based on video editing software, the manufacturing mode belongs to a pure artificial product, and a user with certain professional knowledge can complete matching description characters on the photo, so that the artificial manufacturing cost is greatly increased, and the electronic photo album is not intelligent.

Disclosure of Invention

The object of the present invention is to solve at least to some extent one of the above mentioned technical problems.

To this end, a first object of the present invention is to provide an electronic album creating method. The method enriches the content of the photo album, improves the interest of the photo album, expands the imagination space and improves the user experience while saving the manual manufacturing cost.

A second object of the present invention is to provide an electronic album creating apparatus.

In order to achieve the above object, an electronic album creating method according to an embodiment of the first aspect of the present invention includes: analyzing the content of the photo to generate descriptive information of the photo; and performing video synthesis according to the photos and the descriptive information to generate a target electronic photo album.

according to the electronic album generating method, the descriptive information of the photo can be generated by analyzing the content of the photo, and the target electronic album can be generated by performing video synthesis according to the photo and the descriptive information. The method and the system have the advantages that the content of the photo image is analyzed, and the proper descriptive information is automatically given to the photo based on the analysis result, so that the electronic photo album can be intelligently manufactured without requiring the user to have image processing knowledge in the generation process of the electronic photo album, the content of the photo album is enriched while the manual manufacturing cost is saved, the interest of the photo album is improved, the imagination space is expanded, and the user experience is improved.

in order to achieve the above object, an electronic album creating apparatus according to a second embodiment of the present invention includes: the descriptive information generating module is used for analyzing the content of the photo and generating the descriptive information of the photo; and the electronic photo album generating module is used for carrying out video synthesis according to the photos and the descriptive information so as to generate a target electronic photo album.

according to the electronic album generating device provided by the embodiment of the invention, the content of the photo can be analyzed through the descriptive information generating module to generate the descriptive information of the photo, and the electronic album generating module carries out video synthesis according to the photo and the descriptive information to generate the target electronic album. The method and the system have the advantages that the content of the photo image is analyzed, and the proper descriptive information is automatically given to the photo based on the analysis result, so that the electronic photo album can be intelligently manufactured without requiring the user to have image processing knowledge in the generation process of the electronic photo album, the content of the photo album is enriched while the manual manufacturing cost is saved, the interest of the photo album is improved, the imagination space is expanded, and the user experience is improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

the above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

Fig. 1 is a flowchart of an electronic album creating method according to an embodiment of the present invention;

FIG. 2 is a flow chart of an electronic album creating method according to an embodiment of the present invention;

FIGS. 3(a), (b), (c) and (d) are schematic diagrams of the classification results of object classes in photographs according to embodiments of the invention;

FIG. 3(e) is a schematic diagram of a vector representation of the word "gift" in accordance with an embodiment of the invention;

FIG. 4 is an exemplary diagram of vgg16 model structures, according to an embodiment of the invention;

FIG. 5 is a flow diagram of generating an alternative set of descriptive information according to one embodiment of the invention;

FIG. 6 is a flow diagram of generating descriptive information according to one embodiment of the invention;

Fig. 7 is a schematic structural view of an electronic album creating apparatus according to an embodiment of the present invention;

Fig. 8 is a schematic structural view of an electronic album creating apparatus according to an embodiment of the present invention;

FIG. 9 is a block diagram of a generate sub-module according to one embodiment of the invention;

Fig. 10 is a schematic structural view of an electronic album creating apparatus according to another embodiment of the present invention;

Fig. 11 is a schematic structural view of an electronic album creating apparatus according to still another embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

an electronic album creating method and apparatus according to an embodiment of the present invention will be described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of an electronic album creating method according to an embodiment of the present invention. It should be noted that the electronic album creating method according to the embodiment of the present invention is applicable to the electronic album creating apparatus according to the embodiment of the present invention. The electronic album creating apparatus may be configured in a terminal device, and the terminal device may be a mobile terminal (e.g., a hardware device such as a mobile phone, a tablet computer, a personal digital assistant, etc.), a PC, etc.

As shown in fig. 1, the electronic album creating method may include:

And S110, analyzing the content of the photo to generate descriptive information of the photo.

it will be appreciated that to facilitate content analysis of the photograph, the photograph may be pre-processed prior to content analysis of the photograph. For example, the original photograph may have an excessively large size and an inconsistent aspect ratio, and it is necessary to perform a size conversion process of converting the photograph into a width and height size suitable for the target electronic album while keeping the aspect ratio unchanged.

after the photo is preprocessed, the preprocessed photo may be subjected to content analysis, and descriptive information of the photo may be generated based on the analysis result. Specifically, a CNN (Convolutional Neural Network) method may be adopted to load a pre-trained model to analyze an object in a photo, so as to obtain an object class in the photo, and then, according to the object class, the descriptive information of the photo may be selected from a pre-generated alternative descriptive information set. Specific implementation manners can be referred to in the description of the subsequent embodiments.

it should be noted that the language type of the descriptive information is not limited to chinese, but may also be english, german, russian, etc.; the file format of the descriptive information may be text and/or speech, etc. That is, the descriptive information collocated on the target electronic album may be a chinese character, a character in other languages, a chinese voice, etc., and the specific presentation mode may be determined according to the actual requirement, which is not limited herein.

And S120, performing video synthesis according to the photos and the descriptive information to generate a target electronic photo album.

Specifically, video synthesis of the electronic album can be performed through a video coding technology according to the photos and the descriptive information, and finally the target electronic album is obtained. Specific implementation processes can be referred to in the description of the subsequent embodiments.

in order to make the implementation of the present invention more clear to those skilled in the art, the electronic album creating method according to the present invention will be further described with reference to fig. 2.

fig. 2 is a flowchart of an electronic album creating method according to an embodiment of the present invention. As shown in fig. 2, the electronic album creating method may include:

S210, identifying the photo according to a preset classification model to obtain the object class in the photo.

for example, taking the descriptive information as the chinese text as an example, after preprocessing the photo, a pre-trained classification model may be loaded by using the CNN method to classify the objects in the photo. It can be understood that the classification result is the probability that the object in the photo belongs to each category, a threshold min _ prob may be preset (for example, may be set to 0.5), a maximum value of the probabilities of each category is taken, if the value is greater than min _ prob, the chinese label corresponding to the category is taken, otherwise, the extraction of label fails, that is, the object category acquisition fails. And if the object type in the picture corresponds to a plurality of Chinese labels, selecting one label with the highest word frequency in each label.

For example, the objects in the photo may be classified using a pre-trained vgg16 model, with the classification result being the probability that the objects in the photo belong to 1000 classes. As shown in fig. 3(a), prob is the probability of identifying a classification, predict _ index is the vgg16 classification result number, and cn _ label is the chinese label corresponding to the original vgg 161000 classification. It can be seen that the maximum probability value in fig. 3(a) is 0.628759, and if it is greater than the threshold value of 0.5, the main object in the photograph can be determined to be a hamster.

It should be noted that the CNN is used as a classification algorithm in the present invention, because the current CNN classification accuracy has reached a certain degree, as shown in fig. 3(b) -3 (d), the classification result is obtained by classifying objects of each photo using vgg16 model. Wherein the highest probability value in fig. 3(b) is 0.374364, which is the highest probability value among these various categories, so that the main object in the photo can be determined to be a doll; the maximum probability value in fig. 3(c) is 0.403069, which is the maximum value of the probabilities of the categories, and corresponds to a plurality of labels (i.e. billiard table, billiard table), and since the number of occurrences of billiard table is greater than that of billiard table, the object in the picture can be considered to be billiard table mainly; the maximum probability value in fig. 3(d) is 0.756128, which is the maximum value among the probabilities of the respective categories, and it can be assumed that the main object in the photograph is an upright piano.

As an example, as shown in fig. 4, it is vgg16 model structure, and the model is a 16-layer convolutional neural network. It should be noted that other CNN classification models may also be used in this step, and only the multi-classification probability result of the photo needs to be obtained, which is not specifically limited herein.

S220, selecting descriptive information of the photo from a pre-generated alternative descriptive information set according to the object type.

It should be noted that, in the embodiment of the present invention, the alternative descriptive information set may be generated in advance. As an example, as shown in FIG. 5, the alternative set of descriptive information may be generated in advance by:

S510, a descriptive word list in the high-frequency word list is obtained, wherein the descriptive word list comprises a plurality of adjective samples and a plurality of adverb samples.

for example, taking descriptive information as a chinese word as an example, a descriptive word list in a chinese high-frequency word list may be obtained through internet collection, where the descriptive word list may include a plurality of adjective samples and a plurality of adverb samples. For example, an adjective sample in a Chinese high-frequency word list can be taken, such as: "big size, good, new, general height, old, full length, quick height, quick natural, easy free, quick red, simple, far, rich, obvious, objective, happy and maximum beauty are equally clear, advanced, balanced, huge, necessary, stable, wide, normal, bad, great heat and serious health and famous … …"; taking an adverb sample in a Chinese high-frequency word list, and manually filtering to obtain an adverb sample describing the forward degree, such as: "very particularly quite too much and indeed quite particularly quite and the like are quite abnormal enough".

S520, respectively obtaining the vector representation of each adjective sample and each adverb sample, and calculating a second similarity between each adjective sample and each adverb sample.

Specifically, continuing with the example of using descriptive information as the chinese characters, a chinese corpus can be acquired through internet acquisition, and vector representations of words in the corpus can be acquired through chinese word segmentation and neural network training. For example: words in the material may be processed using word2vec to generate a K-dimensional real vector representation of the words. For example, the term "gift" can be calculated by word2vec to obtain a vector representation as shown in fig. 3 (e). In this way, from the corpus word vector set, vector representations of each adjective sample and each adverb sample can be obtained respectively, and a second similarity (e.g., cosine similarity) between each adjective sample and each adverb sample is calculated.

S530, aiming at each adjective sample, obtaining N adverb samples with the highest second similarity with each adjective sample, wherein N is a positive integer.

That is, each adjective sample gets the highest N adverb samples with which it resembles. For example, if N is 3, the 3 adverb samples with the highest similarity to each adjective sample can be obtained.

And S540, combining each adjective sample with the corresponding N adverb samples to generate an alternative descriptive information set.

That is, each adjective sample is combined with the corresponding N adverb samples to become a set of collocated words, such as: "excellent, how strong, beautiful" etc., and finally, all collocation terms are put together to obtain an alternative descriptive information set. It can be understood that the collocated words obtained by the above obtaining manner are text descriptions conforming to the corpus language habit, and serve as alternative photo text descriptions.

It should be added that the composition of the corpus may determine the style of generating the descriptive information. For example, daily descriptions may be generated from encyclopedia, new word phrases having a network style may be generated from social networks, and so on.

It should be noted that, in an embodiment of the present invention, the new text may be continuously obtained from the internet to update the corpus. Therefore, the freshness of words in the corpus is ensured by continuously updating the corpus, so that the accuracy and the flexibility of generated descriptive information can be improved.

Thus, the alternative descriptive information sets can be generated through the above steps S510-S540. In this way, in this step, the descriptive information of the photo can be selected from the candidate set of descriptive information generated in advance according to the object class by the neuro-linguistic programming NLP technique, that is, the descriptive information is generated for the photo. As an example, as shown in fig. 6, a specific implementation process for selecting descriptive information of a photo from a pre-generated alternative descriptive information set according to an object category may include the following steps:

S610, obtaining the text word vector of the object type.

Specifically, text of an object category may be processed through word2vec to generate a word vector of the text of the object category.

S620, a high-frequency adjective list corresponding to the object category is obtained, wherein the high-frequency adjective list comprises a plurality of high-frequency adjectives.

As an example, a high-frequency adjective list corresponding to the object category may be obtained from an adjective sample in a descriptive word list obtained in advance.

s630, calculating a first similarity between the text word vector of the object category and the word vector of each high-frequency adjective.

and S640, acquiring the target adjectives with the first similarity greater than a preset threshold.

It is understood that a preset threshold min _ similar may be predefined, for example, the preset threshold may take a value of 0.3.

s650, selecting descriptive information from the alternative descriptive information set according to the target adjective.

specifically, a matching combination of the adjective sample and the adverb sample corresponding to the target adjective may be found from the candidate descriptive information set, and then the descriptive information of the target adjective for the photo may be determined according to the second similarity corresponding to the matching combination of the adjective sample and the adverb sample.

Optionally, in an embodiment of the present invention, before selecting the descriptive information from the candidate descriptive information set according to the target adjective, M adjectives with the highest first similarity among the target adjectives with the first similarity larger than a preset threshold may be further selected as the target adjectives, where M is a positive integer, and M is, for example, 3. In this way, descriptive information is selected from the set of alternative descriptive information based on the M target adjectives.

for example, taking the upright piano object shown in fig. 3(d) as an example, as shown in table 1 below, Label- > Adj is similarity between the photo Label text and the target adjective top3, Adj- > Adv is a word collocation in the alternative descriptive information set, and Description is selected photo descriptive information. Thus, by acquiring descriptive information of a photograph by the present invention, even if some recognition effects are not absolutely accurate, the text description effects are still good.

s230, the photo and the descriptive information of the photo are converted into a video clip.

for example, taking the number of the descriptive information as 3 as an example, three positions are randomly taken from the top left 1/3, the top right 1/3, the bottom left 1/3 and the bottom right 1/3 as text appearance positions of the descriptive information based on a predefined target size of the electronic album. In this way, the photos and their descriptive information can be converted into video clips by: firstly, taking an original photo as a background, copying 20 frames of numbered 1-20 frames, using a predefined font to change from small to large to small according to the text occurrence position of each frame, and drawing first descriptive information; copying 20 frames with the number of 21-40 frames by taking the 20 th frame as a background, and drawing second descriptive information in the same manner; finally, drawing third descriptive information by taking a 40 th frame as a background; and finally, the 60 frames are combined and coded into video clips played by 10-20 frames per second.

and S240, linking the video clips to generate a target electronic album.

specifically, all video clips are linked, that is, the video clips generated by each photo are linked together to form a sequence photo video, the linked video is matched with background music, and the video is matched with a title, so that the target electronic photo album is manufactured.

The electronic album generating method of the embodiment of the invention is comprehensively completed by using a plurality of technologies, namely, the electronic album is intelligently manufactured without requiring a user to have image processing knowledge in the generating process of the electronic album even if the photo is preprocessed by using an image processing technology, descriptive words are generated for the photo by using a CNN + NLP technology, and the electronic album video synthesis is performed by using a video coding and decoding technology, so that the electronic album is saved in the manual manufacturing cost, the album content is enriched, the interest of the album is improved, the imagination space is expanded, and the user experience is improved.

Corresponding to the electronic album creating methods provided in the foregoing embodiments, an embodiment of the present invention further provides an electronic album creating apparatus, and since the electronic album creating apparatus provided in the embodiment of the present invention corresponds to the electronic album creating methods provided in the foregoing embodiments, the implementation of the foregoing electronic album creating method is also applicable to the electronic album creating apparatus provided in this embodiment, and will not be described in detail in this embodiment. Fig. 7 is a schematic structural view of an electronic album creating apparatus according to an embodiment of the present invention. As shown in fig. 7, the electronic album creating apparatus may include: a descriptive information generating module 700 and an electronic album generating module 800.

Specifically, the descriptive information generating module 700 is configured to perform content analysis on the photo and generate descriptive information of the photo.

After preprocessing the photo, the descriptive information generating module 700 may perform content analysis on the preprocessed photo and generate descriptive information of the photo based on the analysis result. Specifically, a CNN method may be adopted to load a pre-trained model to analyze an object in a photo to obtain an object class in the photo, and then, the descriptive information of the photo may be selected from a pre-generated alternative descriptive information set according to the object class. Specific implementation manners can be referred to in the description of the subsequent embodiments.

The electronic album creating module 800 is configured to perform video composition according to the photos and the descriptive information to create a target electronic album. More specifically, the electronic album creating module 800 may perform video synthesis of the electronic album by using a video coding technique according to the photos and the descriptive information, and finally obtain the target electronic album. Specific implementation processes can be referred to in the description of the subsequent embodiments.

Fig. 8 is a schematic structural diagram of an electronic album creating apparatus according to an embodiment of the present invention. As shown in fig. 8, the electronic album creating apparatus may include: a descriptive information generating module 700 and an electronic album generating module 800. The descriptive information generating module 700 may include: an identification submodule 710 and a generation submodule 720.

the recognition sub-module 710 is configured to recognize the photo according to a preset classification model to obtain an object class in the photo.

The generation submodule 720 is configured to select descriptive information of the photo from a pre-generated candidate descriptive information set according to the object category. As an example, as shown in fig. 9, the generating sub-module 720 may include: a first acquisition unit 721, a second acquisition unit 722, a calculation unit 723, a third acquisition unit 724, and a generation unit 725.

The first obtaining unit 721 is configured to obtain a text word vector of an object category.

The second obtaining unit 722 is configured to obtain a high-frequency adjective list corresponding to the object category, where the high-frequency adjective list has a plurality of high-frequency adjectives therein.

The calculating unit 723 is configured to calculate a first similarity between the text word vector of the object category and the word vector of each high-frequency adjective.

the third obtaining unit 724 is configured to obtain a target adjective of which the first similarity is greater than a preset threshold.

the generating unit 725 is configured to select descriptive information from the alternative descriptive information sets according to the target adjective.

it should be noted that, in the embodiment of the present invention, the alternative descriptive information set may be generated in advance. As an example, as shown in fig. 10, the electronic album creating apparatus may further include: a pre-processing module 900, configured to generate a set of alternative descriptive information in advance. As shown in fig. 10, the preprocessing module 900 may include: a first acquisition submodule 910, a calculation submodule 920, a second acquisition submodule 930 and a generation submodule 940.

The first obtaining sub-module 910 is configured to obtain a descriptive word list in the high-frequency word list, where the descriptive word list includes a plurality of adjective samples and a plurality of adverb samples.

The calculating sub-module 920 is configured to obtain vector representations of each adjective sample and each adverb sample, and calculate a second similarity between each adjective sample and each adverb sample.

the second obtaining sub-module 930 is configured to obtain, for each adjective sample, N adverb samples with the highest second similarity to each adjective sample, where N is a positive integer.

The generating sub-module 940 is configured to combine each adjective sample with the corresponding N adverb samples to generate an alternative set of descriptive information.

As an example, as shown in fig. 11, the electronic album creating module 800 may include: a conversion sub-module 810 and a generation sub-module 820. The converting sub-module 810 is configured to convert the photo and the descriptive information of the photo into a video clip. The generation sub-module 820 is used for linking the video clips to generate a target electronic album.

The electronic album generating device of the embodiment of the invention is comprehensively completed by using a plurality of technologies, namely, the electronic album can be intelligently manufactured without requiring a user to have image processing knowledge in the generation process of the electronic album even if the photo is preprocessed by using an image processing technology, descriptive words are generated for the photo by using a CNN + NLP technology, and the electronic album video synthesis is performed by using a video coding and decoding technology, so that the electronic album can be generated, the manual manufacturing cost is saved, the album content is enriched, the interest of the album is improved, the imagination space is expanded, and the user experience is improved.

In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

in the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An electronic album creating method is characterized by comprising the following steps:

Analyzing the content of the photo to generate descriptive information of the photo; wherein the analyzing the content of the photo to generate the descriptive information of the photo comprises: identifying the photo according to a preset classification model to acquire the object category in the photo; selecting descriptive information of the photo from a pre-generated alternative descriptive information set according to the object category; wherein, according to the object category, selecting the descriptive information of the photo from a pre-generated alternative descriptive information set includes:

Acquiring a text word vector of the object type;

Acquiring a high-frequency adjective list corresponding to the object category, wherein the high-frequency adjective list is provided with a plurality of high-frequency adjectives;

Calculating a first similarity between the text word vector of the object category and the word vector of each high-frequency adjective;

Acquiring a target adjective with the first similarity larger than a preset threshold;

selecting the descriptive information from the alternative descriptive information set according to the target adjective; and performing video synthesis according to the photos and the descriptive information to generate a target electronic photo album.

2. The electronic album creating method according to claim 1, wherein the alternative descriptive information set is created in advance by:

Obtaining a descriptive word list in a high-frequency word list, wherein the descriptive word list comprises a plurality of adjective samples and a plurality of adverb samples;

respectively obtaining vector representations of each adjective sample and each adverb sample, and calculating a second similarity between each adjective sample and each adverb sample;

For each adjective sample, obtaining N adverb samples with the highest second similarity with each adjective sample, wherein N is a positive integer;

Combining the each adjective sample with the corresponding N adverb samples to generate the set of alternative descriptive information.

3. The electronic album creating method according to claim 1, wherein said video composition based on said photo and said descriptive information to create a target electronic album comprises:

Converting the photo and descriptive information of the photo into a video clip;

And linking the video clips to generate the target electronic photo album.

4. An electronic album creating apparatus comprising:

The descriptive information generating module is used for analyzing the content of the photo and generating the descriptive information of the photo; wherein the descriptive information generating module comprises: the recognition submodule is used for recognizing the photo according to a preset classification model so as to acquire the object category in the photo; the generation submodule is used for selecting the descriptive information of the photo from a pre-generated alternative descriptive information set according to the object category; wherein the generation submodule comprises:

The first acquisition unit is used for acquiring the text word vector of the object type;

A second obtaining unit, configured to obtain a high-frequency adjective list corresponding to the object category, where the high-frequency adjective list has a plurality of high-frequency adjectives;

The calculation unit is used for calculating first similarity between the text word vector of the object category and the word vector of each high-frequency adjective;

The third acquisition unit is used for acquiring the target adjectives with the first similarity greater than a preset threshold;

The generating unit is used for selecting the descriptive information from the alternative descriptive information set according to the target adjective; and the electronic photo album generating module is used for carrying out video synthesis according to the photos and the descriptive information so as to generate a target electronic photo album.

5. the electronic album creating apparatus according to claim 4, further comprising:

The preprocessing module is used for generating the alternative descriptive information set in advance;

Wherein the preprocessing module comprises:

The first obtaining sub-module is used for obtaining a descriptive word list in the high-frequency word list, wherein the descriptive word list comprises a plurality of adjective samples and a plurality of adverb samples;

The calculation submodule is used for respectively obtaining the vector representation of each adjective sample and each adverb sample and calculating the second similarity between each adjective sample and each adverb sample;

The second obtaining sub-module is used for obtaining N adverb samples with the highest second similarity with each adjective sample aiming at each adjective sample, wherein N is a positive integer;

A generating sub-module, configured to combine each adjective sample with the corresponding N adverb samples to generate the candidate descriptive information set.

6. The electronic album creating apparatus according to claim 4, wherein the electronic album creating module comprises:

The conversion sub-module is used for converting the photos and the descriptive information of the photos into video clips;

And the generation submodule is used for linking the video clips to generate the target electronic album.