WO2021212601A1

WO2021212601A1 - Image-based writing assisting method and apparatus, medium, and device

Info

Publication number: WO2021212601A1
Application number: PCT/CN2020/092724
Authority: WO
Inventors: 杨翰章; 邓黎明; 庄伯金; 王少军
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-04-24
Filing date: 2020-05-27
Publication date: 2021-10-28
Also published as: CN111611805A; CN111611805B

Abstract

The present application relates to the field of artificial intelligence, and provides an image-based writing assisting method, comprising: acquiring image information of a target image; inputting the image information into a multi-label classification model to obtain keyword labels of a first attribute of the image; inputting the image information into a single-label classification model to obtain candidate keyword labels of a second attribute of the image, the second attribute and the first attribute being features of different information in the target image; mapping the keyword labels of the first attribute and the candidate keyword labels of the second attribute to obtain image keyword label information; and inputting the image keyword label information into a poem generation model, and generating poem content on the basis of a model memory matrix. In addition, the present application also relates to the field of blockchains. The multi-label classification model, single-label model, and poem generation model can be stored in a blockchain. The present application can increase the relevance of generated verses to the image information and the target image.

Description

Image-based auxiliary writing method, device, medium and equipment

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 24, 2020, the application number is 2020103324099, and the invention title is "an image-based auxiliary writing method, device, medium and equipment", and its entire content Incorporated in this application by reference.

Technical field

This application relates to the field of software technology and ancient poetry generation, in particular to an image-based auxiliary writing method, device, medium and equipment.

Background technique

Poetry is a text form with concise language and condensed expression. It also has certain structure and phonological requirements. In many scenes, ancient poetry generation tools are used. For example, in teacher teaching, the teacher will choose a poem similar to a certain scene. In order to teach information, it is necessary to use the image information of the relevant scene to generate the verse information associated with the image in the poetry generation tool; in the park, visitors will need to use the poetry generation tool to generate the image associated with the tourist's desired image according to a certain scene. Verse information.

However, the inventor realized that the task of generating poetry based on image content by the ancient poetry generation tool is more difficult than ordinary text. The meaning expressed by the generated ancient poetry deviates from the image content, and the contextual semantic coherence of the poetry sentence cannot be guaranteed, resulting in ancient poetry The correlation between the poems and images generated by the generation tool is not strong, and the semantic coherence of the generated poems is weak.

Summary of the invention

In order to solve the problem that the generated ancient poems are not strongly related to the image content and the contextual semantic coherence of the poems is weak when the ancient poems are generated, this application proposes an image-based auxiliary writing method, including:

Obtain the image information of the target image; input the image information into the multi-label classification model to obtain the keyword label of the first attribute of the image; input the image information into the single-label classification model to obtain the first attribute of the image Candidate tags with two attributes; the second attribute and the first attribute are features of different information in the target image; mapping the keyword tags of the first attribute with the candidate tags of the second attribute, Obtain image keyword tag information; input the image keyword tag information into a poetry generation model, and generate poetry content based on a model memory matrix, the model memory matrix including a keyword matrix, a current memory matrix, and a historical memory matrix; The keyword memory matrix is used to store the image keyword tag information, and the current memory matrix and the historical memory matrix are used to store information about the generated verses.

This application also proposes an image-based auxiliary writing device, including:

The image information acquisition module is used to obtain the information of the target image; the first attribute keyword module is used to input the image information into the multi-label classification model to obtain the keyword tag of the first attribute of the image; the second attribute keyword A module for inputting image information into a single-label classification model to obtain candidate labels for the second attribute of the image; the second attribute and the first attribute are feature information of different information in the target image; keyword mapping Module, used to map the keyword tags of the first attribute with the candidate tags of the second attribute to obtain image keyword tag information; ancient poetry generation module, used to input the image keyword tag information to In the poem generation model, the poem content is generated based on the model memory matrix, the model memory matrix includes a keyword matrix, a current memory matrix, and a historical memory matrix; the keyword memory matrix is used to store the image keyword tag information, the The current memory matrix and the historical memory matrix are used to store the information of the generated verses.

The application also proposes a computer-readable storage medium on which a computer program is stored, wherein the computer program is executed by a processor to realize the above-mentioned image-based assisted writing method.

This application also proposes an electronic device, which includes: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute by executing the executable instructions:

Acquiring image information of the target image;

Input the image information into a multi-label classification model to obtain the keyword label of the first attribute of the image;

Inputting the image information into a single-label classification model to obtain candidate keyword tags of the second attribute of the image; the second attribute and the first attribute are features of different information in the target image;

Mapping the keyword tag of the first attribute with the candidate keyword tag of the second attribute to obtain image keyword tag information;

Input the image keyword tag information into the poetry generation model, and generate the poem content based on the model memory matrix. The model memory matrix includes a keyword matrix, a current memory matrix, and a historical memory matrix; the keyword memory matrix is used for storage The image keyword tag information, the current memory matrix and the historical memory matrix are used to store information about the generated verses.

In this application, the generated verses can have a strong continuity, and the relevance of the generated verses and the given image information is also enhanced, and the relevance of the generated poem content and the target image is further enhanced.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the accompanying drawings used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description belong to the present application. For some embodiments, for those of ordinary skill in the art, other drawings may be obtained based on these drawings without creative labor.

FIG. 1 is a block diagram of an image-based auxiliary writing method provided by an embodiment of this application;

Fig. 2 schematically shows an example diagram of an application scenario of an image-based auxiliary writing method;

FIG. 3 is a construction diagram of a model memory matrix provided by an embodiment of the application;

4 is a block diagram of an image-based auxiliary writing device provided by an embodiment of the application;

5 is an example block diagram of an electronic device of an image-based auxiliary writing method provided by an embodiment of the application;

Fig. 6 is a computer-readable storage medium of an image-based auxiliary writing method provided by an embodiment of the application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments These are a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments of the present application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present application.

In the existing poetry generation tools, there will be problems of lack of strong relevance to the image content and weak semantic coherence of the verse context. In order to enhance the correlation between the generated ancient poems and the image keywords, as well as the coherence between the verses, this application proposes an auxiliary writing method based on the image classification model and the model memory matrix to generate the poems associated with the target image information.

The image-based auxiliary writing method provided by the embodiments of this application can realize auxiliary writing through neural networks, deep learning, etc., and can also be applied to robotics, knowledge representation and reasoning, etc., and can be determined based on actual application scenarios. Do restrictions.

In order to achieve the above-mentioned application purpose, this application provides the following technical solutions:

Fig. 1 is an image-based auxiliary writing method provided by an embodiment of the present application. The method includes but is not limited to the following steps:

Step S110: Obtain image information of the target image. It should be noted that, for image information, it may include a variety of characteristic information, such as people, landscapes, and animals.

Step S120: Input the image information into a multi-label model to obtain the keyword label of the first attribute of the target image.

Among them, in this application, the deep residual convolutional neural network ResNet-101 model is used as the image classification model, and the model training of the multi-label image classification task is performed on the large-scale image multi-label data set ML-Images. Input a picture I into the trained model, and output as nouns corresponding to multiple objects in the image

Where k represents the name of the object corresponding to the top k probability values predicted by the model. Therefore, in an exemplary embodiment of the present application, the multi-label classification model is obtained based on the model training of the multi-label image classification task based on the multi-label data set ML-Images.

It should be noted that the first attribute of the image may be information of a person in an exemplary embodiment of the present application.

This application uses the deep residual convolutional neural network ResNet-101 model as the image classification model, and conducts model training for multi-label image classification tasks on the large-scale image multi-label data set ML-Images. Input a picture I into the trained model, and output as nouns corresponding to multiple objects in the image

Where k represents the name of the object corresponding to the top k probability values predicted by the model.

Secondly, this application uses the above-trained multi-label classification model as a pre-training model, and uses the ImageNet-ILSVRC2012 data set (the data set has 1000 labels) to perform fine-tuning training of the single-label model to obtain a single-label image classification model, namely After inputting the model for a picture I, the output is the most likely top k labels corresponding to the picture I

In addition, due to the lack of semantic tags related to people in the ImageNet-ILSVRC2012 data set, self-portraits and landscape photos with people often appear in the task scenes of composing poems based on pictures. Therefore, an exemplary embodiment of the present application corresponds to tags of image keywords. It has been extended so that the image attribute tags obtained by the multi-label classification model can meet the degree of association with relevant image information. That is, in this embodiment of the application, a multi-label classification model is used to predict the keyword tags related to the person in the picture

The expansion of the multi-label classification model is specifically reflected in the processing of multi-label classification model tags. Because the data set ML-Images tags are based on the WordNet hierarchical structure, it uses the synset as the basic construction unit to organize, and the synsets are based on A certain number of relationship types are related, and these relationships include upper and lower relationships, integral and partial relationships, inheritance relationships, and so on. In the embodiment of this application, with the help of the upper and lower relationship between the synonym sets, each label of the data set is searched for its hypernym. If the hypernym contains the word "person", then this label is regarded as related to the person Keyword tags. Through this processing method, not only the tags associated with the person in the candidate prediction tags are screened out, but also the image of the keywords is increased due to the richer semantics contained in the tags. For example, the content of the picture is a man and a woman holding hands on the beach by the sea, the candidate tags are "couple, beach, beach", and the filtered word "couple" is a keyword tag related to the person, which is compared with the existing target detection The model only outputs "male" or "female" tags, which has richer semantic characteristics.

Step S130: Input the image information into a single-label classification model to obtain keyword candidate labels of the second attribute of the target image.

Among them, the single-label classification model refers to the above-trained multi-label classification model as a pre-training model, with the help of ImageNet-ILSVRC2012 data set (the data set has 1000 labels) for fine-tuning training of the single-label model, to obtain the image classification list The label model, that is, after inputting the model for a picture I, the output is the most likely top k labels corresponding to the picture I

Step S140: Map the keyword tag of the first attribute and the candidate keyword tag of the second attribute to obtain image keyword tag information.

The embodiment of the application uses the "Poetics with English" dictionary to map the keyword tags of the first attribute and the candidate tags of the second attribute of the image. The "Poetics Hanying" dictionary is composed of 1016 theme words and corresponding ancient poem words, that is, each line of the dictionary is separated by a space. The first word represents the theme word and is basically expressed in modern Chinese. The remaining words in this line are the ancient poems corresponding to the theme Words. Therefore, this application separately compares the combined predicted label

And each subject heading

Use pre-trained word2vec for word embedding, and then calculate each predicted label

With each subject heading

Word similarity of:

in

They are the word embedding vectors of the label and the topic word respectively, and d represents the vector dimension. Set the similarity threshold δ=0.6, then the method for selecting keywords in this application is:

When there are t (t≥1) similarities satisfying similarity≥δ, select the topic word corresponding to the maximum similarity, and randomly select a word from its corresponding ancient poem words to map it as a keyword;

For a small number of tags with similarity<δ, they need to be mapped manually according to the "Poetics with English" dictionary.

_{Finally, select K 1} keywords from the mapped keyword set as the keyword tags of the image information and input them into the poetry generation model for poetry creation.

In terms of image generation keywords, this application combines the output tags of the single-label classification model and the output tags of the multi-label classification model, and then obtains the final keywords through the mapping relationship according to the "Poetics with English" dictionary. The keywords corresponding to the pictures are generated in the above-mentioned manner, so that they can be associated with the contents of the pictures and the semantics represented by the objects in the pictures as much as possible. Since the ancient poem generation model of this scheme is directly related to the input keywords, the accuracy of the keywords also ensures the relevance of the generated ancient poems and images.

Step S150: Input the image keyword tag information into the poem generation model, and generate the poem content based on the model memory matrix.

The poetry generation model is based on the model memory matrix to input image information into the sequence-to-sequence structure to generate ancient poems.

Among them, the model memory matrix includes: keyword memory matrix

Current memory matrix

And historical memory matrix

The keyword memory matrix is used to store image keyword tag information, and the current memory matrix and historical memory matrix are used to store information about the generated verses. Each row of the model memory matrix represents a memory segment, d _m represents the size of the memory segment, K ₂ and K ₃ represent the segment lengths of current memory and historical memory respectively; the memory of the entire model memory matrix is expressed as the splicing of three memories

[;] means matrix splicing, and K=K ₁ +K ₂ +K ₃ .

In the process of model learning and generation, the keyword memory matrix M ₁ consists of all keywords

The hidden layer state information is composed of the hidden layer state information and remains unchanged during the entire poetry generation process. The model refers to the keyword memory information when generating each line of verse, and determines the probability of each memory segment being selected through the memory reading function, and calculates This probability is used as the weight of the weighted summation of the memory segment in the final read memory information. Before generating the i-th row verse L _i, L previous line poem hidden layer state information corresponding to each character of the _i-1 are written to the current memory matrix M _2. In ancient Chinese poems, because two adjacent lines often have a strong semantic connection, this application _{saves the information of Li-1} in the current memory to provide complete recent memory information. Among them, Li is the generated ith line verse information. i is an integer greater than 2.

Different from the other two parts, when saving historical memory information, the model will select some significant model state information in _{historical verses L 1:i-2 for writing.} In this way, the historical memory matrix M ₃ stores long-distance historical memory information.

In addition, constructing the model memory matrix may specifically include but not limited to the following steps:

Step S1501: Embed the image keyword tags in the keyword matrix of the memory entity as keyword tag information.

When the image information is processed by the image classification model, the predicted labels are expanded using the multi-label model, and these expanded words have rich semantic meanings, so the final output keywords are not only compatible with the image Relevance, but also has the semantic characteristics of image representation. Using this feature, this application embeds each keyword word learned by the poetry model into the memory of the model independently, so that the model can decode the key words according to the model state and global information in the sequence-to-sequence structure. The key part of the word information is selected, thereby ensuring the relevance between the image keyword information input into the poetry model and the image.

Step S1502: Read the keyword tag information associated with the generation of the i-th line of verse information through the memory reading function from the keyword matrix. The specific implementation of this step is as follows:

a. Based on the attention mechanism, use the memory reading function α _r =A _r (M, query) to determine the probability of each memory segment M[k,:] being selected:

z _k = v ^T σ(M[k,:],q),

α _r [k,:]=softmax(z _k ),

Among them, query represents the state information of the current model; the formula z _k =v ^T σ(M[k,:],q), where z _k represents the degree of correlation calculated based on the memory segment M and the current state information query of the model Variable, v ^T represents the model parameters; the formula α _r [k,:] = softmax(z _k ), where α _r [k,:] represents the probability that the memory segment M[k,:] is selected;

b. When calculating the probability of each memory segment is then selected weighted average of its own, read the memory to obtain model information memory matrix vectors O _t, thereby generating the first verses time t i L _i is the character input The relevant content of the image keyword is still the content of the historical line verse:

α _r ＝A _r (M,[s _t-1 ；v _i-1 ]),

In the formula α _r =A _r (M,[s _t-1 ;vi _-1 ]), the query vector is formed by concatenating the hidden layer vector s _{t-1 of the} decoder and the global tracking vector v _i-1 , v _{i-1 is} used here to prevent the model from reading redundant content.

Step S1203: Fill the hidden state information corresponding to the character information in the i-1th line of the poem into the current memory matrix as the current memory information.

Before generating the i-th row verse L _i, L previous line poem hidden layer state information corresponding to each character of the _i-1 are written to the current memory matrix M _2. In ancient Chinese poems, because two adjacent lines often have a strong semantic connection, this application _{saves the information of Li-1} in the current memory to provide complete recent memory information. The current memory information is the input of the current time model, which represents the semantic features input at the current time of the model. In the process of generating text by the model, this application selects these semantic features through the training model to learn the grammar and prosody format of the ancient poetry corpus.

Step S1504: The hidden state information corresponding to the character information in the i-2th line of the poem is calculated by the memory write function, and the calculated information is filled into the history memory matrix.

The specific implementation of this step is as follows:

When saving historical memory information, the model will select some significant model state information in _{historical verses L 1:i-2 for writing.} In this way, the historical memory matrix < ₃ saves long-distance historical memory information.

After generating verse L _i and L _{i + 1} verse verse generation before, the time history of each character L of _i-1 corresponding to the encoder after hidden h _t, the write function is calculated by selecting a memory for Historical memory matrix, and then _{fill the state h t} into the historical memory matrix:

α _w ＝A _w (M ₃ ,[h _t ；vi _-1 ]),

β=tanh(γ×(α _w -1×max(a _w )))+1,

M ₃ [k,:]:=(1-β[k,:])×M ₂ [k,:]+β[k,:]×h _t ,

Among them, in the formula α _w =A _w (M ₃ ,[h _t ;vi _-1 ]), _{the calculation method of the function A w is the same} as the calculation method in the formula α _r ＝A _r (M,query), α _w represents the probability of writing the hidden state h _t into the memory segment;

The formula β=tanh(γ×(α _w -1×max(α _w )))+, where 1 represents a vector whose elements are all 1, and γ represents a positive threshold selected by experience. The formula is differentiable. Through the formula _{The hidden state h t} with higher writing probability can be filled into the historical memory matrix.

Step S1505: Read the keyword matrix and the current memory matrix and historical memory matrix information of the Li verses to generate the Li line verses through a sequence-to-sequence structure.

Based on the prologue i-1 line verse information L _1:i-1 and image keyword information

To generate the i-th line of verses, so as to generate a poem line by line, get the verses that are more related to the picture, and the correlation between each line of verses is greater.

In terms of generating ancient poems based on keywords, this application uses a sequence-to-sequence neural network model with a memory mechanism. The memory of the model consists of three segments: keyword memory information, historical memory information and current memory information. The characteristics of model memory are described as follows:

Since the multi-label model was used in the first stage to expand the predicted labels with characters-related words, and these expanded words have rich semantic meanings, the final output keywords not only have relevance to the image, but also have image representation. Semantic characteristics. Using this feature, this application embeds each keyword word learned by the model into the memory of the model independently, so that the model can select the key part of the keyword information according to the model state and global information when decoding In fact, the key information on the image is indirectly used in this way.

At the same time, by dynamically reading and writing the historical memory of the model, the model memory only retains limited historical information, which is different from the existing method of retaining all the state information during the historical generation process. This approach of this application requires the training model to learn to focus on information closely related to the generation of poetry, while screening and ignoring the interference information in the generation process, to further ensure the coherence between the generated verses of the model.

Finally, the current memory information is the input of the model at the current moment, which represents the semantic features input by the model at the current moment. In the process of generating text by the model, this solution selects these semantic features by training the model to learn the grammar and prosody in the ancient poetry corpus Format.

In the process of model generation, the memory information needs to be selectively read and the memory information needs to be selectively updated. This is also the same as the existing patented method that only uses keyword information and preamble verse information as input to the model decoder Different ideas. The key steps are as follows:

When generating a line of verse, the ancient poetry generation model of this scheme gradually learns to read from memory the most relevant keyword information and memory information of the line of verse to guide the generation of the current line of verse. This reading is a kind of attention The realization of the mechanism, through training the model will learn to find the most important information in memory when generating each character of the verse, such as choosing whether to generate characters associated with keywords or characters that are semantically connected to historical verses.

After generating a line of verse, the model will be trained to select the part of the current memory state of the model that has the most prominent role in generating a line of poetry, and write it into the historical memory of the model to complete the update of the overall memory of the model. This writing is also the realization of an attention mechanism. Through the dynamic reading and writing of historical memory, the model is trained to focus on the generation of relevant information of the verse, and ignore the interference information in the generation process, so as to ensure the inter-verse. Continuity.

After the model memory matrix is constructed, the image information of the target image is input into the poetry generation model, and the entire poem information associated with the image is generated according to the poetry generation method of the model memory matrix. The specific implementation of this step is as follows:

In this application, a sequence-to-sequence structure is used to generate each ancient poem. The structure is composed of an encoder and a decoder. In this application, a bidirectional gated recurrent unit (GRU) is used as The encoder of the model uses a unidirectional GRU as the decoder of the model. Input X of the encoder is a line poem _{L i-1, X = (} x 1, x 2, ..., x L_enc), wherein L_enc represents encoder defined output Y maximum input length, the decoder is also a line of poetry L _i , Y=(y ₁ , y ₂ ,..., y _{L_dec} ), where L_dec represents the maximum output length defined by the decoder.

H _t and S _t represent the hidden layer state of the encoder and decoder; e (y _t) represents the time t, the decoder output character Y _t word embedded vector; calculation model generating verse L _i the probability that at time t character generation distribution :

s _t =GRU(s _t-1 ,[e(y _t-1 ); o _t ; p _t ; vi _-1 ])

In the formula s _t =GRU(s _t-1 ,[e(y _t-1 ); o _t ; p _t ; vi _-1 ]), where o _t represents the output vector of the model memory matrix; p _t is the model The splicing vector of the prosody embedding and the length embedding of the verse learned in the training;

formula

, W represents the model need training parameters; formula represents the learning verses generated L _i, the bank has the reference model requires information generated _{y 1:t-1,} the preamble information generated by the row has L _{1: i- 1} , and keyword information

V _i-1 represents the global tracking vector, its initial value is all-zero vector, for recording the content information until the current row L _i-1 generation, and to provide a next generation global information L _i verses time model.

Finally, at the end of L _i verses generation, it requires the use of a common neural network RNN cycle to update:

formula

In, σ represents the non-linear layer mapping function.

Train the above model memory matrix to maximize the log likelihood of the training data, and output the entire verse information associated with the image from the sequence-to-sequence structure:

This application uses a sequence-to-sequence neural network model with a memory mechanism. The training model effectively pays attention to and selects the important information of keywords and the key information in the historically generated verses, while filtering and ignoring the interference part of the information. When generating a line of verse, read the memory information to find the information that the model needs most attention to when generating a single character, such as focusing on the association with keywords or the continuity with historical verses, so as to guide the generation of single characters. After a line of verse is generated, the historical memory information is updated to ensure that the model writes the state information related to the generation of the verse into the memory. By dynamically reading and writing the model memory, the relevance of the generated ancient poems and keywords and the continuity between the verses are finally guaranteed.

It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned multi-label classification model, single-label model and poetry generation model, the above-mentioned multi-label classification model, single-label model and poetry generation model can also be stored in a node of a blockchain In, through blockchain storage, data information can be shared between different platforms, and data can also be prevented from being tampered with.

Among them, the blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. The blockchain is essentially a decentralized database, which is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify the validity of the information. (Anti-counterfeiting) and generate the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Fig. 4 is an image-based auxiliary writing device 300 shown in another exemplary embodiment, including:

The image information acquisition module 310 is used to acquire the image information of the target image.

The first attribute keyword module 320 is used to input the image information into the multi-label classification model to obtain the keyword label of the first attribute of the image.

The second attribute keyword module 330 is used to input the image information into the single-label classification model again to obtain the candidate label of the second attribute of the image.

The keyword mapping module 340 is configured to map the keyword tags of the first attribute and the candidate keyword tags of the second attribute to obtain image keyword tag information.

The poem generation module 350 is used to read the keyword matrix and the current memory matrix of the i-th line of verses and the historical memory matrix information to generate the i-th line of verses through the sequence-to-sequence structure.

The multi-label classification model is used to extract the character keyword label information related to the image, and the prediction label is used to expand the character-related words using the multi-label model, and these expanded words have rich semantic meanings, so the final output keywords Not only is it related to the image, it also has the semantic characteristics of image representation. The relevance of the extracted character keyword information and image information is guaranteed. Then the image attribute keyword tag information obtained by the image classification model is mapped to select the image keywords that have a certain relevance to the image, which ensures that the image attribute keyword parameters input into the poetry model at the beginning have a strong relevance.

In addition, in the model memory matrix, by dynamically reading and writing the historical memory, the model memory only retains limited historical information, which is different from the existing method of retaining all the state information in the history generation process. In this approach of this application, it is necessary to train the model to learn to focus on the information closely related to the generation of poetry, and to filter and ignore the interference information in the generation process, so as to further ensure the coherence between the generated verses of the model. Thereby, the user's needs can be met, the generated verse information and the target image have a strong relevance, and the coherence between the verses can be enhanced.

The electronic device 400 according to this embodiment of the present application will be described with reference to FIG. 5. The electronic device 400 shown in FIG. 5 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.

As shown in FIG. 5, the electronic device 400 is represented in the form of a general-purpose computing device. The components of the electronic device 400 may include, but are not limited to: the aforementioned at least one processing unit 410, the aforementioned at least one storage unit 420, and a bus 430 connecting different system components (including the storage unit 420 and the processing unit 410).

Wherein, the storage unit stores program code, and the program code can be executed by the processing unit 410, so that the processing unit 410 executes the various exemplary methods described in the “Exemplary Method” section of this specification. Steps of implementation. For example, the processing unit 410 may perform step S110 as shown in FIG. 1: acquiring image information of the target image. It should be noted that the image information may include a variety of characteristic information, such as people, landscapes, animals, etc.; Step S120: input the image information into a multi-label model to obtain the first attribute of the target image Keyword tags; Step S130: Input the image information into a single-tag classification model to obtain the keyword candidate tags of the second attribute of the target image; Step S140: Combine the keyword tags of the first attribute with the The candidate keyword tags of the second attribute are mapped to obtain image keyword tag information; step S150: input the image keyword tag information into the poetry generation model, and generate the poetry content based on the model memory matrix.

The storage unit 420 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 4201 and/or a cache storage unit 4202, and may further include a read-only storage unit (ROM) 5203.

The storage unit 420 may also include a program/utility tool 4204 having a set (at least one) program module 4205. Such program module 4205 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.

The bus 430 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.

The electronic device 400 may also communicate with one or more external devices 600 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 400, and/or communicate with Any device (such as a router, modem, etc.) that enables the electronic device 400 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 450. In addition, the electronic device 400 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 460. As shown in the figure, the network adapter 460 communicates with other modules of the electronic device 400 through the bus 430. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.

Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present application.

In the exemplary embodiment of the present application, a computer-readable storage medium is also provided, on which a program product capable of implementing the above method of this specification is stored. Wherein, the computer-readable storage medium may be non-volatile or volatile. In some possible implementation manners, each aspect of the present application can also be implemented in the form of a program product, which includes program code. When the program product runs on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present application described in the above-mentioned "Exemplary Method" section of this specification.

Referring to FIG. 6, a program product 500 for implementing the above-mentioned method according to an embodiment of the present application is described. It can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be installed in a terminal device, For example, running on a personal computer. However, the program product of this application is not limited to this. In this document, the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.

The program product can use any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

The computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.

The program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.

The program code used to perform the operations of the present application can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on. In the case of a remote computing device, the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).

The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

In addition, the above-mentioned drawings are merely schematic illustrations of the processing included in the method according to the exemplary embodiments of the present application, and are not intended for limitation. It is easy to understand that the processing shown in the above drawings does not indicate or limit the time sequence of these processings. In addition, it is easy to understand that these processes can be executed synchronously or asynchronously in multiple modules, for example.

It should be understood that the present application is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be performed without departing from its scope. The scope of the application is only limited by the appended claims.

Claims

An image-based auxiliary writing method, which includes:

Acquiring image information of the target image;

Input the image information into a multi-label classification model to obtain the keyword label of the first attribute of the image;

Inputting the image information into a single-label classification model to obtain candidate keyword tags of the second attribute of the image; the second attribute and the first attribute are features of different information in the target image;

Mapping the keyword tag of the first attribute with the candidate keyword tag of the second attribute to obtain image keyword tag information;

Input the image keyword tag information into the poetry generation model, and generate the poem content based on the model memory matrix. The model memory matrix includes a keyword matrix, a current memory matrix, and a historical memory matrix; the keyword memory matrix is used for storage The image keyword tag information, the current memory matrix and the historical memory matrix are used to store information about the generated verses.
The method according to claim 1, wherein the model memory matrix is stored in the poetry generation model, and the keyword memory matrix is used to store the image keyword tag information, the current memory matrix and the The historical memory matrix is used to store the information of the verses that have been generated, including:

Embedding the image keyword tags into the keyword matrix of the model memory matrix as keyword tag information;

Read the keyword tag information associated with the generation of the ith line verse information from the keyword matrix through a memory reading function;

Filling the hidden state information corresponding to the character information in the i-1th line of the poem into the current memory matrix of the model memory matrix as the current memory information;

Filling the hidden state information corresponding to the character information in the i-2th line of the poem into the historical memory matrix of the model memory matrix through the memory write function;

Read the keyword matrix and the current memory matrix and historical memory matrix information of the i-th line of verses to generate the i-th line of verses through a poetry generation model.
The method according to claim 1, further comprising:

The multi-label classification model is obtained by model training for multi-label image classification tasks based on the multi-label data set ML-Images;

The single-label model is obtained by fine-tuning the single-label model on the multi-label classification model based on the ImageNet-ILSVRC2012 data set.
The method according to claim 1, wherein mapping the first attribute keyword tag and the second attribute candidate keyword tag to obtain image keyword tag information includes:

The keyword label of the first attribute and the candidate label of the second attribute are input into the pre-trained word2vec model for word embedding, and the keyword label of each of the first attributes and the value of each of the second attributes are calculated. Word similarity of candidate tags;

When there are t words with similarity greater than or equal to a predetermined threshold, the keyword tags and candidate tags corresponding to the largest word similarity are selected as the image topic words, and a word is randomly selected from the ancient poetry words corresponding to the image topic words to map to Map keyword tags;

When there are t words with similarity less than a predetermined threshold, map the t keyword tags and candidate tags according to the preset keyword dictionary, and select one word to map as the mapped keyword tag;

Based on the mapped keyword tags, a plurality of the mapped keyword tags are selected from the mapped keyword tags as image keyword tag information input into the model memory matrix.
3. The method according to claim 2, wherein reading the keyword tag information associated with generating the i-th line of verse information from the keyword matrix through a memory reading function comprises:

According to the memory reading function, the weighted average of the probability of each memory segment being selected and its own is determined, and the vector of the memory information read in the model memory matrix is obtained, so as to generate the t-th time character of the i-th line of verse.
The method according to claim 1, wherein said calculating the hidden state information corresponding to the character information in the i-2th line of the poem through the memory write function and then filling the calculated information into the historical memory matrix comprises:

After the generation of the verse in line i-1 of the verse and before the generation of the verse in line i of the verse, each character of the verse in line i-2 of the historical moment passes through the corresponding hidden state after the encoder, which is calculated by the memory write function It selects one of the historical memory matrix, and then fills the hidden state into the historical memory matrix.
The method according to claim 1, wherein the model memory matrix includes a keyword memory matrix
Current memory matrix
And historical memory matrix

The keyword memory matrix is used to store the image keyword tag information, and the current memory matrix and the historical memory matrix are used to store the information of the generated verses;

Each row of the model memory matrix represents a memory segment, d m represents the size of the memory segment, K 2 and K 3 represent the segment lengths of current memory and historical memory, respectively; the memory of the model memory matrix is expressed as three segments of memory Splicing M = [M 1 ; M 2 ; M 3 ],
[M 1 ; M 2 ; M 3 ] represents matrix splicing, and K=K 1 +K 2 +K 3 .
The method according to claim 5, wherein the memory reading function is: α r =A r (M, query), and the probability of each memory segment being selected is α r [k,:]=softmax (z k ),;

Wherein, α r represents the memory reading function, and α r =A r (M, query), query represents the state information of the poetry generation model, and z k represents according to each memory segment M and the poetry The relevance variable of the state information query of the generation model, and z k =v T σ(M[k,:],q), v T represents the model parameter of the poetry generation model.
The method according to any one of claims 1-8, further comprising:

The single-label classification model, the multi-label classification model, and the poetry generation model are stored in a blockchain.
An image-based auxiliary writing device, which includes:

The image information acquisition module is used to acquire the information of the target image;

The first attribute keyword module is used to input the image information into the multi-label classification model to obtain the keyword label of the first attribute of the image;

The second attribute keyword module is used to input image information into a single-label classification model to obtain candidate labels of the second attribute of the image; the second attribute and the first attribute are features of different information in the target image information;

The keyword mapping module is used to map the keyword tags of the first attribute with the candidate tags of the second attribute to obtain image keyword tag information;

The ancient poetry generation module is used to input the image keyword tag information into the poetry generation model, and generate the poetry content based on the model memory matrix, the model memory matrix including the keyword matrix, the current memory matrix and the historical memory matrix; The keyword memory matrix is used to store the image keyword tag information, and the current memory matrix and the historical memory matrix are used to store information about the generated verses.
An electronic device, including:

Processor; and

A memory for storing executable instructions of the processor;

Wherein, the processor is configured to execute by executing the executable instruction:

Acquiring image information of the target image;

Input the image information into a multi-label classification model to obtain the keyword label of the first attribute of the image;

Inputting the image information into a single-label classification model to obtain candidate keyword tags of the second attribute of the image; the second attribute and the first attribute are features of different information in the target image;

Mapping the keyword tag of the first attribute with the candidate keyword tag of the second attribute to obtain image keyword tag information;

Input the image keyword tag information into the poem generation model, and generate the poem content based on the model memory matrix. The model memory matrix includes a keyword matrix, a current memory matrix and a historical memory matrix; the keyword memory matrix is used for storage The image keyword tag information, the current memory matrix and the historical memory matrix are used to store information about the generated verses.
The electronic device according to claim 11, wherein the model memory matrix is stored in the poetry generation model, the keyword memory matrix is used to store the image keyword tag information, the current memory matrix and all The historical memory matrix is used to store information about the generated verses, and the processor is configured to execute by executing the executable instructions:

Embedding the image keyword tags into the keyword matrix of the model memory matrix as keyword tag information;

Read the keyword tag information associated with the generation of the ith line verse information from the keyword matrix through a memory reading function;

Filling the hidden state information corresponding to the character information in the i-1th line of the poem into the current memory matrix of the model memory matrix as the current memory information;

Filling the hidden state information corresponding to the character information in the i-2th line of the poem into the historical memory matrix of the model memory matrix through the memory write function;

Read the keyword matrix and the current memory matrix and historical memory matrix information of the i-th line of verses to generate the i-th line of verses through a poetry generation model.
The electronic device of claim 11, wherein the processor is configured to execute via execution of the executable instructions:

The multi-label classification model is obtained by model training for multi-label image classification tasks based on the multi-label data set ML-Images;

The single-label model is obtained by fine-tuning the single-label model on the multi-label classification model based on the ImageNet-ILSVRC2012 data set.
The electronic device of claim 11, wherein the processor is configured to execute via execution of the executable instructions:

The keyword label of the first attribute and the candidate label of the second attribute are input into the pre-trained word2vec model for word embedding, and the keyword label of each of the first attributes and the value of each of the second attributes are calculated. Word similarity of candidate tags;

When there are t words with similarity greater than or equal to a predetermined threshold, the keyword tags and candidate tags corresponding to the largest word similarity are selected as the image topic words, and a word is randomly selected from the ancient poetry words corresponding to the image topic words to map to Map keyword tags;

When there are t words with similarity less than a predetermined threshold, map the t keyword tags and candidate tags according to the preset keyword dictionary, and select one word to map as the mapped keyword tag;

Based on the mapped keyword tags, a plurality of the mapped keyword tags are selected from the mapped keyword tags as image keyword tag information input into the model memory matrix.
The electronic device of claim 12, wherein the processor is configured to execute via execution of the executable instructions:

According to the memory reading function, the weighted average of the probability of each memory segment being selected and its own is determined, and the vector of the memory information read in the model memory matrix is obtained, thereby generating the characters at the t time of the i-th line of verse.
The electronic device of claim 11, wherein the processor is configured to execute via execution of the executable instructions:

After the generation of the verse in line i-1 of the verse and before the generation of the verse in line i of the verse, each character of the verse in line i-2 of the historical moment passes through the corresponding hidden state after the encoder, which is calculated by the memory write function It selects one of the historical memory matrix, and then fills the hidden state into the historical memory matrix.
The electronic device according to claim 11, wherein the model memory matrix comprises a keyword memory matrix
Current memory matrix
And historical memory matrix

The keyword memory matrix is used to store the image keyword tag information, and the current memory matrix and the historical memory matrix are used to store the information of the generated verses;

Each row of the model memory matrix represents a memory segment, d m represents the size of the memory segment, K 2 and K 3 represent the segment lengths of current memory and historical memory, respectively; the memory of the model memory matrix is expressed as three segments of memory Splicing M = [M 1 ; M 2 ; M 3 ],
[M 1 ; M 2 ; M 3 ] represents matrix splicing, and K=K 1 +K 2 +K 3 .
The electronic device according to claim 15, wherein the memory reading function is: α r =A r (M, query), and the probability of each memory segment being selected is α r [k,:]= softmax(z k ),;

Wherein, α r represents the memory reading function, and α r =A r (M, query), query represents the state information of the poetry generation model, and z k represents according to each memory segment M and the poetry The relevance variable of the state information query of the generation model, and z k =v T σ(M[K,:],q), v T represents the model parameter of the poetry generation model.
The electronic device according to any one of claims 11-18, wherein the processor is configured to execute via execution of the executable instruction:

The single-label classification model, the multi-label classification model, and the poetry generation model are stored in a blockchain.
A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the image-based auxiliary writing method according to any one of claims 1-9 is realized.