WO2021212601A1 - Image-based writing assisting method and apparatus, medium, and device - Google Patents

Image-based writing assisting method and apparatus, medium, and device Download PDF

Info

Publication number
WO2021212601A1
WO2021212601A1 PCT/CN2020/092724 CN2020092724W WO2021212601A1 WO 2021212601 A1 WO2021212601 A1 WO 2021212601A1 CN 2020092724 W CN2020092724 W CN 2020092724W WO 2021212601 A1 WO2021212601 A1 WO 2021212601A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyword
model
information
image
memory
Prior art date
Application number
PCT/CN2020/092724
Other languages
French (fr)
Chinese (zh)
Inventor
杨翰章
邓黎明
庄伯金
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021212601A1 publication Critical patent/WO2021212601A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of software technology and ancient poetry generation, in particular to an image-based auxiliary writing method, device, medium and equipment.
  • Poetry is a text form with concise language and condensed expression. It also has certain structure and phonological requirements. In many scenes, ancient poetry generation tools are used. For example, in teacher teaching, the teacher will choose a poem similar to a certain scene. In order to teach information, it is necessary to use the image information of the relevant scene to generate the verse information associated with the image in the poetry generation tool; in the park, visitors will need to use the poetry generation tool to generate the image associated with the tourist's desired image according to a certain scene. Verse information.
  • the inventor realized that the task of generating poetic based on image content by the ancient poetry generation tool is more difficult than ordinary text.
  • the meaning expressed by the generated ancient poetry deviates from the image content, and the contextual semantic coherence of the poetry sentence cannot be guaranteed, resulting in ancient poetry
  • the correlation between the poems and images generated by the generation tool is not strong, and the semantic coherence of the generated poems is weak.
  • this application proposes an image-based auxiliary writing method, including:
  • the keyword memory matrix is used to store the image keyword tag information, and the current memory matrix and the historical memory matrix are used to store information about the generated verses.
  • This application also proposes an image-based auxiliary writing device, including:
  • the image information acquisition module is used to obtain the information of the target image;
  • the first attribute keyword module is used to input the image information into the multi-label classification model to obtain the keyword tag of the first attribute of the image;
  • the second attribute keyword A module for inputting image information into a single-label classification model to obtain candidate labels for the second attribute of the image;
  • the second attribute and the first attribute are feature information of different information in the target image;
  • keyword mapping Module used to map the keyword tags of the first attribute with the candidate tags of the second attribute to obtain image keyword tag information;
  • ancient poetry generation module used to input the image keyword tag information to In the poem generation model, the poem content is generated based on the model memory matrix, the model memory matrix includes a keyword matrix, a current memory matrix, and a historical memory matrix;
  • the keyword memory matrix is used to store the image keyword tag information, the The current memory matrix and the historical memory matrix are used to store the information of the generated verses.
  • the application also proposes a computer-readable storage medium on which a computer program is stored, wherein the computer program is executed by a processor to realize the above-mentioned image-based assisted writing method.
  • This application also proposes an electronic device, which includes: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute by executing the executable instructions:
  • the model memory matrix includes a keyword matrix, a current memory matrix, and a historical memory matrix; the keyword memory matrix is used for storage
  • the image keyword tag information, the current memory matrix and the historical memory matrix are used to store information about the generated verses.
  • the generated verses can have a strong continuity, and the relevance of the generated verses and the given image information is also enhanced, and the relevance of the generated poem content and the target image is further enhanced.
  • FIG. 1 is a block diagram of an image-based auxiliary writing method provided by an embodiment of this application
  • Fig. 2 schematically shows an example diagram of an application scenario of an image-based auxiliary writing method
  • FIG. 3 is a construction diagram of a model memory matrix provided by an embodiment of the application.
  • FIG. 4 is a block diagram of an image-based auxiliary writing device provided by an embodiment of the application.
  • FIG. 5 is an example block diagram of an electronic device of an image-based auxiliary writing method provided by an embodiment of the application
  • Fig. 6 is a computer-readable storage medium of an image-based auxiliary writing method provided by an embodiment of the application.
  • this application proposes an auxiliary writing method based on the image classification model and the model memory matrix to generate the poems associated with the target image information.
  • the image-based auxiliary writing method provided by the embodiments of this application can realize auxiliary writing through neural networks, deep learning, etc., and can also be applied to robotics, knowledge representation and reasoning, etc., and can be determined based on actual application scenarios. Do restrictions.
  • Fig. 1 is an image-based auxiliary writing method provided by an embodiment of the present application. The method includes but is not limited to the following steps:
  • Step S110 Obtain image information of the target image. It should be noted that, for image information, it may include a variety of characteristic information, such as people, landscapes, and animals.
  • Step S120 Input the image information into a multi-label model to obtain the keyword label of the first attribute of the target image.
  • the deep residual convolutional neural network ResNet-101 model is used as the image classification model, and the model training of the multi-label image classification task is performed on the large-scale image multi-label data set ML-Images.
  • the first attribute of the image may be information of a person in an exemplary embodiment of the present application.
  • This application uses the deep residual convolutional neural network ResNet-101 model as the image classification model, and conducts model training for multi-label image classification tasks on the large-scale image multi-label data set ML-Images.
  • Input a picture I into the trained model, and output as nouns corresponding to multiple objects in the image
  • this application uses the above-trained multi-label classification model as a pre-training model, and uses the ImageNet-ILSVRC2012 data set (the data set has 1000 labels) to perform fine-tuning training of the single-label model to obtain a single-label image classification model, namely After inputting the model for a picture I, the output is the most likely top k labels corresponding to the picture I
  • an exemplary embodiment of the present application corresponds to tags of image keywords. It has been extended so that the image attribute tags obtained by the multi-label classification model can meet the degree of association with relevant image information. That is, in this embodiment of the application, a multi-label classification model is used to predict the keyword tags related to the person in the picture
  • the expansion of the multi-label classification model is specifically reflected in the processing of multi-label classification model tags.
  • the data set ML-Images tags are based on the WordNet hierarchical structure, it uses the synset as the basic construction unit to organize, and the synsets are based on A certain number of relationship types are related, and these relationships include upper and lower relationships, integral and partial relationships, inheritance relationships, and so on.
  • each label of the data set is searched for its hypernym. If the hypernym contains the word "person", then this label is regarded as related to the person Keyword tags.
  • the candidate tags are "couple, beach, beach"
  • the filtered word "couple” is a keyword tag related to the person, which is compared with the existing target detection
  • the model only outputs "male” or "female” tags, which has richer semantic characteristics.
  • Step S130 Input the image information into a single-label classification model to obtain keyword candidate labels of the second attribute of the target image.
  • the single-label classification model refers to the above-trained multi-label classification model as a pre-training model, with the help of ImageNet-ILSVRC2012 data set (the data set has 1000 labels) for fine-tuning training of the single-label model, to obtain the image classification list
  • the label model that is, after inputting the model for a picture I, the output is the most likely top k labels corresponding to the picture I
  • Step S140 Map the keyword tag of the first attribute and the candidate keyword tag of the second attribute to obtain image keyword tag information.
  • the embodiment of the application uses the "Poetics with English” dictionary to map the keyword tags of the first attribute and the candidate tags of the second attribute of the image.
  • the "Poetics Hanying” dictionary is composed of 1016 theme words and corresponding ancient poem words, that is, each line of the dictionary is separated by a space.
  • the first word represents the theme word and is basically expressed in modern Chinese.
  • the remaining words in this line are the ancient poems corresponding to the theme Words. Therefore, this application separately compares the combined predicted label And each subject heading Use pre-trained word2vec for word embedding, and then calculate each predicted label With each subject heading Word similarity of:
  • this application combines the output tags of the single-label classification model and the output tags of the multi-label classification model, and then obtains the final keywords through the mapping relationship according to the "Poetics with English" dictionary.
  • the keywords corresponding to the pictures are generated in the above-mentioned manner, so that they can be associated with the contents of the pictures and the semantics represented by the objects in the pictures as much as possible. Since the ancient poem generation model of this scheme is directly related to the input keywords, the accuracy of the keywords also ensures the relevance of the generated ancient poems and images.
  • Step S150 Input the image keyword tag information into the poem generation model, and generate the poem content based on the model memory matrix.
  • the poetry generation model is based on the model memory matrix to input image information into the sequence-to-sequence structure to generate ancient poems.
  • the model memory matrix includes: keyword memory matrix Current memory matrix And historical memory matrix
  • the keyword memory matrix is used to store image keyword tag information
  • the current memory matrix and historical memory matrix are used to store information about the generated verses.
  • Each row of the model memory matrix represents a memory segment
  • d m represents the size of the memory segment
  • K 2 and K 3 represent the segment lengths of current memory and historical memory respectively
  • the keyword memory matrix M 1 consists of all keywords
  • the hidden layer state information is composed of the hidden layer state information and remains unchanged during the entire poetry generation process.
  • the model refers to the keyword memory information when generating each line of verse, and determines the probability of each memory segment being selected through the memory reading function, and calculates This probability is used as the weight of the weighted summation of the memory segment in the final read memory information.
  • L previous line poem hidden layer state information corresponding to each character of the i-1 are written to the current memory matrix M 2.
  • this application saves the information of Li-1 in the current memory to provide complete recent memory information. Among them, Li is the generated ith line verse information. i is an integer greater than 2.
  • the model will select some significant model state information in historical verses L 1:i-2 for writing. In this way, the historical memory matrix M 3 stores long-distance historical memory information.
  • model memory matrix may specifically include but not limited to the following steps:
  • Step S1501 Embed the image keyword tags in the keyword matrix of the memory entity as keyword tag information.
  • the predicted labels are expanded using the multi-label model, and these expanded words have rich semantic meanings, so the final output keywords are not only compatible with the image Relevance, but also has the semantic characteristics of image representation.
  • this application embeds each keyword word learned by the poetry model into the memory of the model independently, so that the model can decode the key words according to the model state and global information in the sequence-to-sequence structure. The key part of the word information is selected, thereby ensuring the relevance between the image keyword information input into the poetry model and the image.
  • Step S1502 Read the keyword tag information associated with the generation of the i-th line of verse information through the memory reading function from the keyword matrix.
  • the specific implementation of this step is as follows:
  • query represents the state information of the current model
  • formula z k v T ⁇ (M[k,:],q)
  • z k represents the degree of correlation calculated based on the memory segment M and the current state information query of the model Variable
  • v T represents the model parameters
  • formula ⁇ r [k,:] softmax(z k ), where ⁇ r [k,:] represents the probability that the memory segment M[k,:] is selected;
  • the query vector is formed by concatenating the hidden layer vector s t-1 of the decoder and the global tracking vector v i-1 , v i-1 is used here to prevent the model from reading redundant content.
  • Step S1203 Fill the hidden state information corresponding to the character information in the i-1th line of the poem into the current memory matrix as the current memory information.
  • L previous line poem hidden layer state information corresponding to each character of the i-1 are written to the current memory matrix M 2.
  • the current memory information is the input of the current time model, which represents the semantic features input at the current time of the model.
  • this application selects these semantic features through the training model to learn the grammar and prosody format of the ancient poetry corpus.
  • Step S1504 The hidden state information corresponding to the character information in the i-2th line of the poem is calculated by the memory write function, and the calculated information is filled into the history memory matrix.
  • the model When saving historical memory information, the model will select some significant model state information in historical verses L 1:i-2 for writing. In this way, the historical memory matrix ⁇ 3 saves long-distance historical memory information.
  • the write function is calculated by selecting a memory for Historical memory matrix, and then fill the state h t into the historical memory matrix:
  • Step S1505 Read the keyword matrix and the current memory matrix and historical memory matrix information of the Li verses to generate the Li line verses through a sequence-to-sequence structure.
  • this application uses a sequence-to-sequence neural network model with a memory mechanism.
  • the memory of the model consists of three segments: keyword memory information, historical memory information and current memory information.
  • the characteristics of model memory are described as follows:
  • the final output keywords not only have relevance to the image, but also have image representation. Semantic characteristics.
  • this application embeds each keyword word learned by the model into the memory of the model independently, so that the model can select the key part of the keyword information according to the model state and global information when decoding In fact, the key information on the image is indirectly used in this way.
  • the model memory only retains limited historical information, which is different from the existing method of retaining all the state information during the historical generation process.
  • This approach of this application requires the training model to learn to focus on information closely related to the generation of poetry, while screening and ignoring the interference information in the generation process, to further ensure the coherence between the generated verses of the model.
  • the current memory information is the input of the model at the current moment, which represents the semantic features input by the model at the current moment.
  • this solution selects these semantic features by training the model to learn the grammar and prosody in the ancient poetry corpus Format.
  • the ancient poetry generation model of this scheme gradually learns to read from memory the most relevant keyword information and memory information of the line of verse to guide the generation of the current line of verse. This reading is a kind of attention
  • the realization of the mechanism, through training the model will learn to find the most important information in memory when generating each character of the verse, such as choosing whether to generate characters associated with keywords or characters that are semantically connected to historical verses.
  • the model After generating a line of verse, the model will be trained to select the part of the current memory state of the model that has the most prominent role in generating a line of poetry, and write it into the historical memory of the model to complete the update of the overall memory of the model. This writing is also the realization of an attention mechanism. Through the dynamic reading and writing of historical memory, the model is trained to focus on the generation of relevant information of the verse, and ignore the interference information in the generation process, so as to ensure the inter-verse. Continuity.
  • the image information of the target image is input into the poetry generation model, and the entire poem information associated with the image is generated according to the poetic generation method of the model memory matrix.
  • the specific implementation of this step is as follows:
  • a sequence-to-sequence structure is used to generate each ancient poem.
  • the structure is composed of an encoder and a decoder.
  • a bidirectional gated recurrent unit (GRU) is used as The encoder of the model uses a unidirectional GRU as the decoder of the model.
  • H t and S t represent the hidden layer state of the encoder and decoder; e (y t) represents the time t, the decoder output character Y t word embedded vector; calculation model generating verse L i the probability that at time t character generation distribution :
  • s t GRU(s t-1 ,[e(y t-1 ); o t ; p t ; vi -1 ]), where o t represents the output vector of the model memory matrix; p t is the model The splicing vector of the prosody embedding and the length embedding of the verse learned in the training;
  • W represents the model need training parameters; formula represents the learning verses generated L i, the bank has the reference model requires information generated y 1:t-1, the preamble information generated by the row has L 1: i- 1 , and keyword information
  • V i-1 represents the global tracking vector, its initial value is all-zero vector, for recording the content information until the current row L i-1 generation, and to provide a next generation global information L i verses time model.
  • represents the non-linear layer mapping function
  • This application uses a sequence-to-sequence neural network model with a memory mechanism.
  • the training model effectively pays attention to and selects the important information of keywords and the key information in the historically generated verses, while filtering and ignoring the interference part of the information.
  • When generating a line of verse read the memory information to find the information that the model needs most attention to when generating a single character, such as focusing on the association with keywords or the continuity with historical verses, so as to guide the generation of single characters.
  • the historical memory information is updated to ensure that the model writes the state information related to the generation of the verse into the memory.
  • the above-mentioned multi-label classification model, single-label model and poetry generation model can also be stored in a node of a blockchain In, through blockchain storage, data information can be shared between different platforms, and data can also be prevented from being tampered with.
  • the blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • the blockchain is essentially a decentralized database, which is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify the validity of the information. (Anti-counterfeiting) and generate the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • Fig. 4 is an image-based auxiliary writing device 300 shown in another exemplary embodiment, including:
  • the image information acquisition module 310 is used to acquire the image information of the target image.
  • the first attribute keyword module 320 is used to input the image information into the multi-label classification model to obtain the keyword label of the first attribute of the image.
  • the second attribute keyword module 330 is used to input the image information into the single-label classification model again to obtain the candidate label of the second attribute of the image.
  • the keyword mapping module 340 is configured to map the keyword tags of the first attribute and the candidate keyword tags of the second attribute to obtain image keyword tag information.
  • the poem generation module 350 is used to read the keyword matrix and the current memory matrix of the i-th line of verses and the historical memory matrix information to generate the i-th line of verses through the sequence-to-sequence structure.
  • the multi-label classification model is used to extract the character keyword label information related to the image, and the prediction label is used to expand the character-related words using the multi-label model, and these expanded words have rich semantic meanings, so the final output keywords Not only is it related to the image, it also has the semantic characteristics of image representation.
  • the relevance of the extracted character keyword information and image information is guaranteed.
  • the image attribute keyword tag information obtained by the image classification model is mapped to select the image keywords that have a certain relevance to the image, which ensures that the image attribute keyword parameters input into the poetry model at the beginning have a strong relevance.
  • the model memory matrix by dynamically reading and writing the historical memory, the model memory only retains limited historical information, which is different from the existing method of retaining all the state information in the history generation process.
  • the user's needs can be met, the generated verse information and the target image have a strong relevance, and the coherence between the verses can be enhanced.
  • the electronic device 400 according to this embodiment of the present application will be described with reference to FIG. 5.
  • the electronic device 400 shown in FIG. 5 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.
  • the electronic device 400 is represented in the form of a general-purpose computing device.
  • the components of the electronic device 400 may include, but are not limited to: the aforementioned at least one processing unit 410, the aforementioned at least one storage unit 420, and a bus 430 connecting different system components (including the storage unit 420 and the processing unit 410).
  • the storage unit stores program code, and the program code can be executed by the processing unit 410, so that the processing unit 410 executes the various exemplary methods described in the “Exemplary Method” section of this specification. Steps of implementation.
  • the processing unit 410 may perform step S110 as shown in FIG. 1: acquiring image information of the target image.
  • the image information may include a variety of characteristic information, such as people, landscapes, animals, etc.;
  • Step S120 input the image information into a multi-label model to obtain the first attribute of the target image Keyword tags;
  • Step S130 Input the image information into a single-tag classification model to obtain the keyword candidate tags of the second attribute of the target image;
  • Step S140 Combine the keyword tags of the first attribute with the The candidate keyword tags of the second attribute are mapped to obtain image keyword tag information;
  • step S150 input the image keyword tag information into the poetry generation model, and generate the poetry content based on the model memory matrix.
  • the storage unit 420 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 4201 and/or a cache storage unit 4202, and may further include a read-only storage unit (ROM) 5203.
  • RAM random access storage unit
  • ROM read-only storage unit
  • the storage unit 420 may also include a program/utility tool 4204 having a set (at least one) program module 4205.
  • program module 4205 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
  • the bus 430 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
  • the electronic device 400 may also communicate with one or more external devices 600 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 400, and/or communicate with Any device (such as a router, modem, etc.) that enables the electronic device 400 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 450.
  • the electronic device 400 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 460.
  • networks for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet
  • the network adapter 460 communicates with other modules of the electronic device 400 through the bus 430. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present application.
  • a computing device which can be a personal computer, a server, a terminal device, or a network device, etc.
  • a computer-readable storage medium is also provided, on which a program product capable of implementing the above method of this specification is stored.
  • the computer-readable storage medium may be non-volatile or volatile.
  • each aspect of the present application can also be implemented in the form of a program product, which includes program code.
  • the program product runs on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present application described in the above-mentioned "Exemplary Method" section of this specification.
  • a program product 500 for implementing the above-mentioned method according to an embodiment of the present application is described. It can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be installed in a terminal device, For example, running on a personal computer.
  • CD-ROM compact disk read-only memory
  • the program product of this application is not limited to this.
  • the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
  • the program product can use any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
  • the program code used to perform the operations of the present application can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service providers for example, using Internet service providers.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The present application relates to the field of artificial intelligence, and provides an image-based writing assisting method, comprising: acquiring image information of a target image; inputting the image information into a multi-label classification model to obtain keyword labels of a first attribute of the image; inputting the image information into a single-label classification model to obtain candidate keyword labels of a second attribute of the image, the second attribute and the first attribute being features of different information in the target image; mapping the keyword labels of the first attribute and the candidate keyword labels of the second attribute to obtain image keyword label information; and inputting the image keyword label information into a poem generation model, and generating poem content on the basis of a model memory matrix. In addition, the present application also relates to the field of blockchains. The multi-label classification model, single-label model, and poem generation model can be stored in a blockchain. The present application can increase the relevance of generated verses to the image information and the target image.

Description

一种基于图像的辅助写作方法、装置、介质及设备Image-based auxiliary writing method, device, medium and equipment
本申请要求于2020年4月24日提交中国专利局,申请号为2020103324099、发明名称为“一种基于图像的辅助写作方法、装置、介质及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 24, 2020, the application number is 2020103324099, and the invention title is "an image-based auxiliary writing method, device, medium and equipment", and its entire content Incorporated in this application by reference.
技术领域Technical field
本申请涉及软件技术以及古诗词生成领域,尤其是涉及一种基于图像的辅助写作方法、装置、介质及设备。This application relates to the field of software technology and ancient poetry generation, in particular to an image-based auxiliary writing method, device, medium and equipment.
背景技术Background technique
诗歌是一种语言简洁、表达凝练的文本形式,同时还具有一定的结构和音韵要求,在很多场景会用到古诗词生成工具,例如,老师教学中,老师会选择与某一场景类似的诗词信息进行教学,需要借助相关场景的图像信息从而在诗词生成工具中生成与该图像关联的诗句信息;在公园里游客会根据某一景物图需要借助诗词生成工具中生成与游客所需图像相关联的诗句信息。Poetry is a text form with concise language and condensed expression. It also has certain structure and phonological requirements. In many scenes, ancient poetry generation tools are used. For example, in teacher teaching, the teacher will choose a poem similar to a certain scene. In order to teach information, it is necessary to use the image information of the relevant scene to generate the verse information associated with the image in the poetry generation tool; in the park, visitors will need to use the poetry generation tool to generate the image associated with the tourist's desired image according to a certain scene. Verse information.
但是发明人意识到,古诗词生成工具基于图像内容生成诗歌的任务比普通文本更加困难,会出现生成的古诗词所表达的含义偏离图像内容,而且也无法保证诗句上下文语意连贯性,导致古诗词生成工具生成的诗词与图像之间关联性不强且生成的诗句上下文语义连贯性较弱。However, the inventor realized that the task of generating poetry based on image content by the ancient poetry generation tool is more difficult than ordinary text. The meaning expressed by the generated ancient poetry deviates from the image content, and the contextual semantic coherence of the poetry sentence cannot be guaranteed, resulting in ancient poetry The correlation between the poems and images generated by the generation tool is not strong, and the semantic coherence of the generated poems is weak.
发明内容Summary of the invention
为了解决古诗词生成时,出现生成的古诗词中与图像内容关联性不强,且诗句上下文语意连贯性较弱的问题,本申请提出了一种基于图像的辅助写作方法,包括:In order to solve the problem that the generated ancient poems are not strongly related to the image content and the contextual semantic coherence of the poems is weak when the ancient poems are generated, this application proposes an image-based auxiliary writing method, including:
获取目标图像的图像信息;将所述图像信息输入多标签分类模型中,得到所述图像的第一属性的关键词标签;将所述图像信息输入单标签分类模型中,得到所述图像的第二属性的候选标签;所述第二属性与所述第一属性为所述目标图像中不同信息的特征;将所述第一属性的关键词标签与所述第二属性的候选标签进行映射,得到图像关键词标签信息;将所述图像关键词标签信息输入至诗歌生成模型中,基于模型记忆矩阵生成诗词内容,所述模型记忆矩阵包括关键词矩阵、当前记忆矩阵及历史记忆矩阵;所述关键词记忆矩阵用于存储所述图像关键词标签信息,所述当前记忆矩阵和所述历史记忆矩阵用于存储生已生成的诗句的信息。Obtain the image information of the target image; input the image information into the multi-label classification model to obtain the keyword label of the first attribute of the image; input the image information into the single-label classification model to obtain the first attribute of the image Candidate tags with two attributes; the second attribute and the first attribute are features of different information in the target image; mapping the keyword tags of the first attribute with the candidate tags of the second attribute, Obtain image keyword tag information; input the image keyword tag information into a poetry generation model, and generate poetry content based on a model memory matrix, the model memory matrix including a keyword matrix, a current memory matrix, and a historical memory matrix; The keyword memory matrix is used to store the image keyword tag information, and the current memory matrix and the historical memory matrix are used to store information about the generated verses.
本申请还提出了一种基于图像的辅助写作装置,包括:This application also proposes an image-based auxiliary writing device, including:
图像信息获取模块,用于获取目标图像的信息;第一属性关键词模块,用于将图像信息输入多标签分类模型中,得到所述图像的第一属性的关键词标签;第二属性关键词模块,用于将图像信息输入单标签分类模型中,得到所述图像的第二属性的候选标签;所述第二属性与所述第一属性为目标图像中不同信息的特征信息;关键词映射模块,用于将所述第一属性的关键词标签与所述第二属性的候选标签进行映射,得到图像关键词标签信息;古诗词生成模块,用于将所述图像关键词标签信息输入至诗歌生成模型中,基于模型记忆矩阵生成诗词内容,所述模型记忆矩阵包括关键词矩阵、当前记忆矩阵及历史记忆矩阵;所述关键词记忆矩阵用于存储所述图像关键词标签信息,所述当前记忆矩阵和所述历史记忆矩阵用于存储生已生成的诗句的信息。The image information acquisition module is used to obtain the information of the target image; the first attribute keyword module is used to input the image information into the multi-label classification model to obtain the keyword tag of the first attribute of the image; the second attribute keyword A module for inputting image information into a single-label classification model to obtain candidate labels for the second attribute of the image; the second attribute and the first attribute are feature information of different information in the target image; keyword mapping Module, used to map the keyword tags of the first attribute with the candidate tags of the second attribute to obtain image keyword tag information; ancient poetry generation module, used to input the image keyword tag information to In the poem generation model, the poem content is generated based on the model memory matrix, the model memory matrix includes a keyword matrix, a current memory matrix, and a historical memory matrix; the keyword memory matrix is used to store the image keyword tag information, the The current memory matrix and the historical memory matrix are used to store the information of the generated verses.
本申请还提出了一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现上述的基于图像的辅助写作方法。The application also proposes a computer-readable storage medium on which a computer program is stored, wherein the computer program is executed by a processor to realize the above-mentioned image-based assisted writing method.
本申请还提出了一种电子设备,其中,包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行:This application also proposes an electronic device, which includes: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute by executing the executable instructions:
获取目标图像的图像信息;Acquiring image information of the target image;
将所述图像信息输入多标签分类模型中,得到所述图像的第一属性的关键词标签;Input the image information into a multi-label classification model to obtain the keyword label of the first attribute of the image;
将所述图像信息输入单标签分类模型中,得到所述图像的第二属性的候选关键词标签;所述第二属性与所述第一属性为所述目标图像中不同信息的特征;Inputting the image information into a single-label classification model to obtain candidate keyword tags of the second attribute of the image; the second attribute and the first attribute are features of different information in the target image;
将所述第一属性的关键词标签与所述第二属性的候选关键词标签进行映射,得到图像关键词标签信息;Mapping the keyword tag of the first attribute with the candidate keyword tag of the second attribute to obtain image keyword tag information;
将所述图像关键词标签信息输入至诗歌生成模型中,基于模型记忆矩阵生成诗词内容,所述模型记忆矩阵包括关键词矩阵、当前记忆矩阵及历史记忆矩阵;所述关键词记忆矩阵用于存储所述图像关键词标签信息,所述当前记忆矩阵和所述历史记忆矩阵用于存储生已生成的诗句的信息。Input the image keyword tag information into the poetry generation model, and generate the poem content based on the model memory matrix. The model memory matrix includes a keyword matrix, a current memory matrix, and a historical memory matrix; the keyword memory matrix is used for storage The image keyword tag information, the current memory matrix and the historical memory matrix are used to store information about the generated verses.
在本申请可以使生成的诗句之间具有较强的连贯性,而且也增强了生成的诗句与所给图像信息关联性,进一步也增强了生成的诗词内容与目标图像的关联性。In this application, the generated verses can have a strong continuity, and the relevance of the generated verses and the given image information is also enhanced, and the relevance of the generated poem content and the target image is further enhanced.
附图说明Description of the drawings
为了更清楚的说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单的介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the accompanying drawings used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description belong to the present application. For some embodiments, for those of ordinary skill in the art, other drawings may be obtained based on these drawings without creative labor.
图1为本申请实施例提供的一种基于图像的辅助写作方法框图;FIG. 1 is a block diagram of an image-based auxiliary writing method provided by an embodiment of this application;
图2示意性示出一种基于图像的辅助写作方法的应用场景示例图;Fig. 2 schematically shows an example diagram of an application scenario of an image-based auxiliary writing method;
图3为本申请实施例提供的一种模型记忆矩阵构建图;FIG. 3 is a construction diagram of a model memory matrix provided by an embodiment of the application;
图4为本申请实施例提供的一种基于图像的辅助写作装置框图;4 is a block diagram of an image-based auxiliary writing device provided by an embodiment of the application;
图5为本申请实施例提供的一种基于图像的辅助写作方法的电子设备示例框图;5 is an example block diagram of an electronic device of an image-based auxiliary writing method provided by an embodiment of the application;
图6为本申请实施例提供的一种基于图像的辅助写作方法的计算机可读存储介质。Fig. 6 is a computer-readable storage medium of an image-based auxiliary writing method provided by an embodiment of the application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请的一部分实施例,而不是全部的实施例。基于本申请的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments These are a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments of the present application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present application.
在现有诗词生成工具中,会出现缺乏与图像内容关联性强且诗句上下文语意连贯性较弱的问题。为了增强生成的古诗词与图像关键词之间的关联性,以及诗句之间的连贯性,本申请通过提出一种基于图像分类模型和模型记忆矩阵的辅助写作方法来生成与目标图像关联的诗词信息。In the existing poetry generation tools, there will be problems of lack of strong relevance to the image content and weak semantic coherence of the verse context. In order to enhance the correlation between the generated ancient poems and the image keywords, as well as the coherence between the verses, this application proposes an auxiliary writing method based on the image classification model and the model memory matrix to generate the poems associated with the target image information.
本申请实施例提供的基于图像的辅助写作方法可通过神经网络、深度学习等方式实现辅助写作,可同样应用于机器人技术、知识表示与推理等方面,具体可基于实际应用场景确定,在此不做限制。The image-based auxiliary writing method provided by the embodiments of this application can realize auxiliary writing through neural networks, deep learning, etc., and can also be applied to robotics, knowledge representation and reasoning, etc., and can be determined based on actual application scenarios. Do restrictions.
为了实现上述申请目的,本申请提供以下技术方案:In order to achieve the above-mentioned application purpose, this application provides the following technical solutions:
图1是本申请实施例提供的一种基于图像的辅助写作方法,该方法包括但不限于以下步骤:Fig. 1 is an image-based auxiliary writing method provided by an embodiment of the present application. The method includes but is not limited to the following steps:
步骤S110:获取目标图像的图像信息。需要说明的是,对于图像信息,可能会包括多种特征信息,例如人物、山水风景、动物等。Step S110: Obtain image information of the target image. It should be noted that, for image information, it may include a variety of characteristic information, such as people, landscapes, and animals.
步骤S120:将所述图像信息输入多标签模型中,得到所述目标图像的第一属性的关键词标签。Step S120: Input the image information into a multi-label model to obtain the keyword label of the first attribute of the target image.
其中,在本申请中,使用了深度残差的卷积神经网络ResNet-101模型作为图像分类模型,在大型图像多标签数据集ML-Images上进行了多标签图像分类任务的模型训练。将一张图片I输入训练后的模型,输出为图像中多个物体对应的名词
Figure PCTCN2020092724-appb-000001
其中k表示模型预测 的前k大的概率值对应的物体名称。因此在本申请中一示例性实施例中多标签分类模型是基于多标签数据集ML-Images进行多标签图像分类任务的模型训练得到的。
Among them, in this application, the deep residual convolutional neural network ResNet-101 model is used as the image classification model, and the model training of the multi-label image classification task is performed on the large-scale image multi-label data set ML-Images. Input a picture I into the trained model, and output as nouns corresponding to multiple objects in the image
Figure PCTCN2020092724-appb-000001
Where k represents the name of the object corresponding to the top k probability values predicted by the model. Therefore, in an exemplary embodiment of the present application, the multi-label classification model is obtained based on the model training of the multi-label image classification task based on the multi-label data set ML-Images.
需要说明的是,图像的第一属性在本申请中一示例性实施例中可以是人物的信息。It should be noted that the first attribute of the image may be information of a person in an exemplary embodiment of the present application.
本申请使用了深度残差的卷积神经网络ResNet-101模型作为图像分类模型,在大型图像多标签数据集ML-Images上进行了多标签图像分类任务的模型训练。将一张图片I输入训练后的模型,输出为图像中多个物体对应的名词
Figure PCTCN2020092724-appb-000002
其中k表示模型预测的前k大的概率值对应的物体名称。
This application uses the deep residual convolutional neural network ResNet-101 model as the image classification model, and conducts model training for multi-label image classification tasks on the large-scale image multi-label data set ML-Images. Input a picture I into the trained model, and output as nouns corresponding to multiple objects in the image
Figure PCTCN2020092724-appb-000002
Where k represents the name of the object corresponding to the top k probability values predicted by the model.
其次,本申请将上述训练的多标签分类模型作为预训练模型,借助ImageNet-ILSVRC2012数据集(该数据集具有1000个标签)上进行了单标签模型的微调训练,得到图像分类单标签模型,即对于一张图片I输入该模型后,输出为图片I对应的最有可能的前k个标签
Figure PCTCN2020092724-appb-000003
Secondly, this application uses the above-trained multi-label classification model as a pre-training model, and uses the ImageNet-ILSVRC2012 data set (the data set has 1000 labels) to perform fine-tuning training of the single-label model to obtain a single-label image classification model, namely After inputting the model for a picture I, the output is the most likely top k labels corresponding to the picture I
Figure PCTCN2020092724-appb-000003
另外,由于使用ImageNet-ILSVRC2012数据集中缺少与人有关的语义标签,而依据图片作诗的任务场景中常会出现自拍及有人存在的风景照,因此本申请一示例性实施例对图像关键词对应标签进行了扩充以使多标签分类模型得到的图像属性标签可以满足与相关图像信息的关联程度。即在本申请实施例中,使用多标签分类模型预测图片中与人物相关的关键词标签
Figure PCTCN2020092724-appb-000004
In addition, due to the lack of semantic tags related to people in the ImageNet-ILSVRC2012 data set, self-portraits and landscape photos with people often appear in the task scenes of composing poems based on pictures. Therefore, an exemplary embodiment of the present application corresponds to tags of image keywords. It has been extended so that the image attribute tags obtained by the multi-label classification model can meet the degree of association with relevant image information. That is, in this embodiment of the application, a multi-label classification model is used to predict the keyword tags related to the person in the picture
Figure PCTCN2020092724-appb-000004
多标签分类模型扩充具体体现在在处理多标签分类模型标签时,因数据集ML-Images标签基于WordNet层次结构,它以同义词集合(synset)作为基本建构单位进行组织,且同义词集合之间是以一定数量的关系类型相关联,这些关系包括上下位关系、整体部分关系、继承关系等。本申请实施例中借助同义词集合间的上下位关系,对数据集的每一个标签,寻找其上位词,如果其上位词中包含“人物(person)”这个词,则将此标签作为与人物相关的关键词标签。通过此处理方式,不仅筛选出了候选预测标签中与人物相关联的标签,还因标签中包含更丰富的语义而增加了关键词的意象。例如,图片内容为一男一女牵手在海边沙滩上,候选标签为“情侣、海边、沙滩”,经过筛选词语“情侣”是与人物相关的关键词标签,此时相较于现有的目标检测模型仅输出“男”或“女”标签,具有更丰富的语义特性。The expansion of the multi-label classification model is specifically reflected in the processing of multi-label classification model tags. Because the data set ML-Images tags are based on the WordNet hierarchical structure, it uses the synset as the basic construction unit to organize, and the synsets are based on A certain number of relationship types are related, and these relationships include upper and lower relationships, integral and partial relationships, inheritance relationships, and so on. In the embodiment of this application, with the help of the upper and lower relationship between the synonym sets, each label of the data set is searched for its hypernym. If the hypernym contains the word "person", then this label is regarded as related to the person Keyword tags. Through this processing method, not only the tags associated with the person in the candidate prediction tags are screened out, but also the image of the keywords is increased due to the richer semantics contained in the tags. For example, the content of the picture is a man and a woman holding hands on the beach by the sea, the candidate tags are "couple, beach, beach", and the filtered word "couple" is a keyword tag related to the person, which is compared with the existing target detection The model only outputs "male" or "female" tags, which has richer semantic characteristics.
步骤S130:将所述图像信息输入单标签分类模型中,得到所述目标图像的第二属性的关键词候选标签。Step S130: Input the image information into a single-label classification model to obtain keyword candidate labels of the second attribute of the target image.
其中,单标签分类模型是指将上述训练的多标签分类模型作为预训练模型,借助ImageNet-ILSVRC2012数据集(该数据集具有1000个标签)上进行了单标签模型的微调训练,得到图像分类单标签模型,即对于一张图片I输入该模型后,输出为图片I对应的最有可能的前k个标签
Figure PCTCN2020092724-appb-000005
Among them, the single-label classification model refers to the above-trained multi-label classification model as a pre-training model, with the help of ImageNet-ILSVRC2012 data set (the data set has 1000 labels) for fine-tuning training of the single-label model, to obtain the image classification list The label model, that is, after inputting the model for a picture I, the output is the most likely top k labels corresponding to the picture I
Figure PCTCN2020092724-appb-000005
步骤S140:将所述第一属性的关键词标签与所述第二属性的候选关键词标签进行映射,得到图像关键词标签信息。Step S140: Map the keyword tag of the first attribute and the candidate keyword tag of the second attribute to obtain image keyword tag information.
本申请实施例利用《诗学含英》词典对图像的第一属性的关键词标签和第二属性的候选标签进行了关键词映射。《诗学含英》词典由1016种主题词及对应古诗词语构成,即词典每行经空格隔开,首词表示主题词且基本以现代汉语表示,该行其余各词为对应该主题的古诗中词语。因此本申请分别对合并后的预测标签
Figure PCTCN2020092724-appb-000006
以及各主题词
Figure PCTCN2020092724-appb-000007
Figure PCTCN2020092724-appb-000008
利用预训练过的word2vec进行词嵌入,然后计算每个预测标签
Figure PCTCN2020092724-appb-000009
与各主题词
Figure PCTCN2020092724-appb-000010
的词语相似度:
The embodiment of the application uses the "Poetics with English" dictionary to map the keyword tags of the first attribute and the candidate tags of the second attribute of the image. The "Poetics Hanying" dictionary is composed of 1016 theme words and corresponding ancient poem words, that is, each line of the dictionary is separated by a space. The first word represents the theme word and is basically expressed in modern Chinese. The remaining words in this line are the ancient poems corresponding to the theme Words. Therefore, this application separately compares the combined predicted label
Figure PCTCN2020092724-appb-000006
And each subject heading
Figure PCTCN2020092724-appb-000007
Figure PCTCN2020092724-appb-000008
Use pre-trained word2vec for word embedding, and then calculate each predicted label
Figure PCTCN2020092724-appb-000009
With each subject heading
Figure PCTCN2020092724-appb-000010
Word similarity of:
Figure PCTCN2020092724-appb-000011
Figure PCTCN2020092724-appb-000011
其中
Figure PCTCN2020092724-appb-000012
分别为标签与主题词的词嵌入向量,d表示向量维度。设置相似度阈值δ=0.6,则本申请中选择关键词的方法为:
in
Figure PCTCN2020092724-appb-000012
They are the word embedding vectors of the label and the topic word respectively, and d represents the vector dimension. Set the similarity threshold δ=0.6, then the method for selecting keywords in this application is:
当存在t个(t≥1)相似度满足similarity≥δ时,选择最大相似度对应的主题词,并从其对应的古诗词语中随机选择一个词语映射为关键词;When there are t (t≥1) similarities satisfying similarity≥δ, select the topic word corresponding to the maximum similarity, and randomly select a word from its corresponding ancient poem words to map it as a keyword;
对于较少部分标签存在相似度similarity<δ时,需要人工根据《诗学含英》词典,对其进行映射。For a small number of tags with similarity<δ, they need to be mapped manually according to the "Poetics with English" dictionary.
最后从映射后的关键词集合中选择K 1个关键词作为图像信息的关键词标签将其输入到诗歌生成模型中进行诗歌创作。 Finally, select K 1 keywords from the mapped keyword set as the keyword tags of the image information and input them into the poetry generation model for poetry creation.
在图像生成关键词方面,本申请将单标签分类模型的输出标签和多标签分类模型的输出标签的进行合并,再通过根据《诗学含英》词典的映射关系,得到最终的关键词。通过上述方式生成与图片对应的关键词,使其能够尽量与图片内容以及图片中物体表示的语义相关联。由于本方案的古诗生成模型与输入关键词具有直接关联,因此关键词的准确性也保证了生成古诗与图像的关联性。In terms of image generation keywords, this application combines the output tags of the single-label classification model and the output tags of the multi-label classification model, and then obtains the final keywords through the mapping relationship according to the "Poetics with English" dictionary. The keywords corresponding to the pictures are generated in the above-mentioned manner, so that they can be associated with the contents of the pictures and the semantics represented by the objects in the pictures as much as possible. Since the ancient poem generation model of this scheme is directly related to the input keywords, the accuracy of the keywords also ensures the relevance of the generated ancient poems and images.
步骤S150:将所述图像关键词标签信息输入至诗歌生成模型中,基于模型记忆矩阵生成诗词内容。Step S150: Input the image keyword tag information into the poem generation model, and generate the poem content based on the model memory matrix.
诗歌生成模型是基于模型记忆矩阵将图像信息输入至序列到序列的结构中生成古诗词的。The poetry generation model is based on the model memory matrix to input image information into the sequence-to-sequence structure to generate ancient poems.
其中,模型记忆矩阵包括:关键词记忆矩阵
Figure PCTCN2020092724-appb-000013
当前记忆矩阵
Figure PCTCN2020092724-appb-000014
以及历史记忆矩阵
Figure PCTCN2020092724-appb-000015
关键词记忆矩阵用于存储图像关键词标签信息,当前记忆矩阵和历史记忆矩阵用于存储生已生成的诗句的信息。其中模型记忆矩阵的每一行表示一个记忆片段,d m表示记忆片段的尺寸,K 2和K 3分别表示当前记忆和历史记忆的片段长度;整个模型记忆矩阵的记忆表示为三段记忆的拼接
Figure PCTCN2020092724-appb-000016
[;]表示矩阵拼接,且K=K 1+K 2+K 3
Among them, the model memory matrix includes: keyword memory matrix
Figure PCTCN2020092724-appb-000013
Current memory matrix
Figure PCTCN2020092724-appb-000014
And historical memory matrix
Figure PCTCN2020092724-appb-000015
The keyword memory matrix is used to store image keyword tag information, and the current memory matrix and historical memory matrix are used to store information about the generated verses. Each row of the model memory matrix represents a memory segment, d m represents the size of the memory segment, K 2 and K 3 represent the segment lengths of current memory and historical memory respectively; the memory of the entire model memory matrix is expressed as the splicing of three memories
Figure PCTCN2020092724-appb-000016
[;] means matrix splicing, and K=K 1 +K 2 +K 3 .
在模型学习和生成过程中,关键词记忆矩阵M 1由所有的关键词
Figure PCTCN2020092724-appb-000017
的隐藏层状态信息组成,并在整个诗歌生成过程中保持不变,模型在生成每行诗句时都会参考关键词记忆信息,通过记忆读取函数确定每个记忆片段被选择的概率,并在计算最终读取的记忆信息时将此概率作为记忆片段加权求和时的权重。在生成第i行诗句L i之前,前一行诗句L i-1的每个字符对应的隐藏层状态信息都将被写入当前记忆矩阵M 2。汉语古诗中,因相邻的两行之间往往有较强的语义关联,因此本申请将L i-1的信息保存至当前记忆,以提供完整的近期记忆信息。其中,Li为生成的第i行诗句信息。i为大于2的整数。
In the process of model learning and generation, the keyword memory matrix M 1 consists of all keywords
Figure PCTCN2020092724-appb-000017
The hidden layer state information is composed of the hidden layer state information and remains unchanged during the entire poetry generation process. The model refers to the keyword memory information when generating each line of verse, and determines the probability of each memory segment being selected through the memory reading function, and calculates This probability is used as the weight of the weighted summation of the memory segment in the final read memory information. Before generating the i-th row verse L i, L previous line poem hidden layer state information corresponding to each character of the i-1 are written to the current memory matrix M 2. In ancient Chinese poems, because two adjacent lines often have a strong semantic connection, this application saves the information of Li-1 in the current memory to provide complete recent memory information. Among them, Li is the generated ith line verse information. i is an integer greater than 2.
与其他两部分不同的是,在保存历史记忆信息时,模型将选择历史诗句L 1:i-2中一些显著的模型状态信息进行写入。通过这种方式让历史记忆矩阵M 3保存着远距离的历史记忆信息。 Different from the other two parts, when saving historical memory information, the model will select some significant model state information in historical verses L 1:i-2 for writing. In this way, the historical memory matrix M 3 stores long-distance historical memory information.
另外,构建模型记忆矩阵具体可以包括但不限于以下步骤:In addition, constructing the model memory matrix may specifically include but not limited to the following steps:
步骤S1501:将所述图像关键词标签嵌入记忆实体的关键词矩阵中作为关键词标签信息。Step S1501: Embed the image keyword tags in the keyword matrix of the memory entity as keyword tag information.
由于在对图像信息利用图像分类模型进行处理时,使用多标签模型对预测标签进行了 与人物相关词的扩充,并且这些扩充词带有丰富的语义涵义,因此最终输出的关键词不仅和图像具有相关性,同时还具有图像表示的语义特性。利用这一特性,本申请将诗歌模型学到的每个关键词词嵌入独立地保存在模型的记忆中,使模型在序列到序列的结构中能够在解码时根据模型状态及全局信息,对关键词信息中的重点部分进行选择,从而保证了输入到诗歌模型中的图像关键词信息与图像之间的关联性。When the image information is processed by the image classification model, the predicted labels are expanded using the multi-label model, and these expanded words have rich semantic meanings, so the final output keywords are not only compatible with the image Relevance, but also has the semantic characteristics of image representation. Using this feature, this application embeds each keyword word learned by the poetry model into the memory of the model independently, so that the model can decode the key words according to the model state and global information in the sequence-to-sequence structure. The key part of the word information is selected, thereby ensuring the relevance between the image keyword information input into the poetry model and the image.
步骤S1502:从所述关键词矩阵中通过记忆读取函数读取与生成第i行诗句信息关联的关键词标签信息。该步骤具体实现方式如下:Step S1502: Read the keyword tag information associated with the generation of the i-th line of verse information through the memory reading function from the keyword matrix. The specific implementation of this step is as follows:
a.基于注意力机制,使用记忆读取函数α r=A r(M,query)来确定每个记忆片段M[k,:]被选择的概率: a. Based on the attention mechanism, use the memory reading function α r =A r (M, query) to determine the probability of each memory segment M[k,:] being selected:
z k=v Tσ(M[k,:],q), z k = v T σ(M[k,:],q),
α r[k,:]=softmax(z k), α r [k,:]=softmax(z k ),
其中,query表示的是当前模型的状态信息;公式z k=v Tσ(M[k,:],q),中,z k表示根据记忆片段M和模型当前状态信息query计算的相关性程度变量,v T表示模型参数;公式α r[k,:]=softmax(z k),中,α r[k,:]表示记忆片段M[k,:]被选中的概率; Among them, query represents the state information of the current model; the formula z k =v T σ(M[k,:],q), where z k represents the degree of correlation calculated based on the memory segment M and the current state information query of the model Variable, v T represents the model parameters; the formula α r [k,:] = softmax(z k ), where α r [k,:] represents the probability that the memory segment M[k,:] is selected;
b.接着计算每个记忆片段被选中的概率与其自身的加权平均,得到模型记忆矩阵中读取记忆信息的向量o t,从而生成第i行诗句L i的第t时刻字符时是与输入的图像关键词相关内容还是继续历史行诗句的内容: b. When calculating the probability of each memory segment is then selected weighted average of its own, read the memory to obtain model information memory matrix vectors O t, thereby generating the first verses time t i L i is the character input The relevant content of the image keyword is still the content of the historical line verse:
α r=A r(M,[s t-1;v i-1]), α r =A r (M,[s t-1 ;v i-1 ]),
Figure PCTCN2020092724-appb-000018
Figure PCTCN2020092724-appb-000018
公式α r=A r(M,[s t-1;v i-1]),中,query向量由解码器的隐藏层向量s t-1和全局的跟踪向量v i-1拼接而成,v i-1在此用于避免模型读取冗余内容。 In the formula α r =A r (M,[s t-1 ;vi -1 ]), the query vector is formed by concatenating the hidden layer vector s t-1 of the decoder and the global tracking vector v i-1 , v i-1 is used here to prevent the model from reading redundant content.
步骤S1203:将第i-1行诗词中字符信息对应的隐藏状态信息填充到当前记忆矩阵中作为当前记忆信息。Step S1203: Fill the hidden state information corresponding to the character information in the i-1th line of the poem into the current memory matrix as the current memory information.
在生成第i行诗句L i之前,前一行诗句L i-1的每个字符对应的隐藏层状态信息都将被写入当前记忆矩阵M 2。汉语古诗中,因相邻的两行之间往往有较强的语义关联,因此本申请将L i-1的信息保存至当前记忆,以提供完整的近期记忆信息。当前记忆信息为当前时刻模型的输入,表示的是模型当前时刻输入的语义特征,在模型生成文本的过程中,本申请通过训练模型对这些语义特征进行选择从而学习古诗语料中语法及韵律格式。 Before generating the i-th row verse L i, L previous line poem hidden layer state information corresponding to each character of the i-1 are written to the current memory matrix M 2. In ancient Chinese poems, because two adjacent lines often have a strong semantic connection, this application saves the information of Li-1 in the current memory to provide complete recent memory information. The current memory information is the input of the current time model, which represents the semantic features input at the current time of the model. In the process of generating text by the model, this application selects these semantic features through the training model to learn the grammar and prosody format of the ancient poetry corpus.
步骤S1504:将第i-2行诗词中字符信息对应的隐藏状态信息通过记忆写入函数计算后将计算后的信息填充到历史记忆矩阵中。Step S1504: The hidden state information corresponding to the character information in the i-2th line of the poem is calculated by the memory write function, and the calculated information is filled into the history memory matrix.
该步骤具体实现方式如下:The specific implementation of this step is as follows:
在保存历史记忆信息时,模型将选择历史诗句L 1:i-2中一些显著的模型状态信息进行写入。通过这种方式让历史记忆矩阵< 3保存着远距离的历史记忆信息。 When saving historical memory information, the model will select some significant model state information in historical verses L 1:i-2 for writing. In this way, the historical memory matrix < 3 saves long-distance historical memory information.
在诗句L i生成后并且在诗句L i+1生成前,历史时刻的诗句L i-1的每个字符经过编码器后对应的隐藏状态h t,通过记忆写入函数计算后为其选择一个历史记忆矩阵,然后将该状态h t填入历史记忆矩阵中: After generating verse L i and L i + 1 verse verse generation before, the time history of each character L of i-1 corresponding to the encoder after hidden h t, the write function is calculated by selecting a memory for Historical memory matrix, and then fill the state h t into the historical memory matrix:
α w=A w(M 3,[h t;v i-1]), α w =A w (M 3 ,[h t ;vi -1 ]),
β=tanh(γ×(α w-1×max(a w)))+1, β=tanh(γ×(α w -1×max(a w )))+1,
M 3[k,:]∶=(1-β[k,:])×M 2[k,:]+β[k,:]×h t, M 3 [k,:]:=(1-β[k,:])×M 2 [k,:]+β[k,:]×h t ,
其中,公式α w=A w(M 3,[h t;v i-1]),中,函数A w的计算方法与公式α r=A r(M,query)中的计算方式相同,α w表示将隐藏状态h t写入记忆片段的概率; Among them, in the formula α w =A w (M 3 ,[h t ;vi -1 ]), the calculation method of the function A w is the same as the calculation method in the formula α r =A r (M,query), α w represents the probability of writing the hidden state h t into the memory segment;
公式β=tanh(γ×(α w-1×max(α w)))+,中,1表示元素均为1的向量,γ表示经 验选择的正数阈值,公式是可微的,通过公式能够将写入概率更高的隐藏状态h t填充到历史记忆矩阵上。 The formula β=tanh(γ×(α w -1×max(α w )))+, where 1 represents a vector whose elements are all 1, and γ represents a positive threshold selected by experience. The formula is differentiable. Through the formula The hidden state h t with higher writing probability can be filled into the historical memory matrix.
步骤S1505:读取所述关键词矩阵以及所述Li诗句的当前记忆矩阵以及历史记忆矩阵信息通过序列到序列的结构生成所述Li行诗句。Step S1505: Read the keyword matrix and the current memory matrix and historical memory matrix information of the Li verses to generate the Li line verses through a sequence-to-sequence structure.
基于前序的i-1行诗句信息L 1:i-1和图像关键词信息
Figure PCTCN2020092724-appb-000019
来生成第i行诗句,从而逐行地生成一首诗,得到与图片关联性较强的诗句,且每行诗句之间相关性较大。
Based on the prologue i-1 line verse information L 1:i-1 and image keyword information
Figure PCTCN2020092724-appb-000019
To generate the i-th line of verses, so as to generate a poem line by line, get the verses that are more related to the picture, and the correlation between each line of verses is greater.
在基于关键词生成古诗方面,本申请采用了具有记忆机制的序列到序列的神经网络模型。模型的记忆由关键词记忆信息,历史记忆信息以及当前记忆信息三段组成。关于模型记忆的特点,描述如下:In terms of generating ancient poems based on keywords, this application uses a sequence-to-sequence neural network model with a memory mechanism. The memory of the model consists of three segments: keyword memory information, historical memory information and current memory information. The characteristics of model memory are described as follows:
由于第一阶段中使用多标签模型对预测标签进行了与人物相关词的扩充,并且这些扩充词带有丰富的语义涵义,因此最终输出的关键词不仅和图像具有相关性,同时还具有图像表示的语义特性。利用这一特性,本申请将模型学到的每个关键词词嵌入独立地保存在模型的记忆中,使模型能够在解码时根据模型状态及全局信息,对关键词信息中的重点部分进行选择,实际上也通过这种方式间接利用了图像上的关键信息。Since the multi-label model was used in the first stage to expand the predicted labels with characters-related words, and these expanded words have rich semantic meanings, the final output keywords not only have relevance to the image, but also have image representation. Semantic characteristics. Using this feature, this application embeds each keyword word learned by the model into the memory of the model independently, so that the model can select the key part of the keyword information according to the model state and global information when decoding In fact, the key information on the image is indirectly used in this way.
同时,通过对模型历史记忆地动态读写,让模型记忆只保留有限的历史信息,这有别于现有方法中,将历史生成过程中的状态信息全部保留下来的做法。本申请的这一做法,需要训练模型学会专注于与诗歌生成密切相关的信息,而筛选并忽略生成过程中的干扰信息,进一步地保证模型的生成诗句之间的连贯性。At the same time, by dynamically reading and writing the historical memory of the model, the model memory only retains limited historical information, which is different from the existing method of retaining all the state information during the historical generation process. This approach of this application requires the training model to learn to focus on information closely related to the generation of poetry, while screening and ignoring the interference information in the generation process, to further ensure the coherence between the generated verses of the model.
最后,当前记忆信息为当前时刻模型的输入,表示的是模型当前时刻输入的语义特征,在模型生成文本的过程中,本方案通过训练模型对这些语义特征进行选择从而学习古诗语料中语法及韵律格式。Finally, the current memory information is the input of the model at the current moment, which represents the semantic features input by the model at the current moment. In the process of generating text by the model, this solution selects these semantic features by training the model to learn the grammar and prosody in the ancient poetry corpus Format.
模型生成过程中,需要对记忆信息进行选择性地读取,以及需要对记忆信息进行选择性地更新,这也与现有专利方法中仅将关键词信息与前序诗句信息作为模型解码器输入的思路相区别。关键步骤如下:In the process of model generation, the memory information needs to be selectively read and the memory information needs to be selectively updated. This is also the same as the existing patented method that only uses keyword information and preamble verse information as input to the model decoder Different ideas. The key steps are as follows:
在生成一行诗句时,本方案的古诗生成模型逐步学习从记忆中读取与该行诗句最相关的关键词信息及记忆信息,来指导当前行诗句的生成,这种读取是一种注意力机制的实现,通过训练模型将学会在生成诗句每个字符的时候找到记忆中最需要关注的信息,例如选择需要生成与关键词关联的字符还是与历史诗句语义相连贯的字符。When generating a line of verse, the ancient poetry generation model of this scheme gradually learns to read from memory the most relevant keyword information and memory information of the line of verse to guide the generation of the current line of verse. This reading is a kind of attention The realization of the mechanism, through training the model will learn to find the most important information in memory when generating each character of the verse, such as choosing whether to generate characters associated with keywords or characters that are semantically connected to historical verses.
在生成一行诗句后,模型将被训练学习挑选出生成一句诗时作用最突出的模型当前记忆状态部分,写入模型的历史记忆中以完成模型整体记忆的更新。这种写入同样也是一种注意力机制的实现,通过历史记忆动态读写的方式,来训练模型学会专注于诗句生成相关的信息,而忽略生成过程中的干扰信息,从而保证诗句之间的连贯性。After generating a line of verse, the model will be trained to select the part of the current memory state of the model that has the most prominent role in generating a line of poetry, and write it into the historical memory of the model to complete the update of the overall memory of the model. This writing is also the realization of an attention mechanism. Through the dynamic reading and writing of historical memory, the model is trained to focus on the generation of relevant information of the verse, and ignore the interference information in the generation process, so as to ensure the inter-verse. Continuity.
在所述模型记忆矩阵构建完成之后,将目标图像的图像信息输入至诗歌生成模型中,并按照模型记忆矩阵的诗词生成方法,从而生成与所述图像关联的整首诗句信息。该步骤具体实现方式如下:After the model memory matrix is constructed, the image information of the target image is input into the poetry generation model, and the entire poem information associated with the image is generated according to the poetry generation method of the model memory matrix. The specific implementation of this step is as follows:
在本申请中,使用序列到序列(sequence-to-sequence)的结构生成每首古诗,该结构由编码器和解码器构成,本申请中使用双向的门控循环单元GRU(Gated Recurrent Unit)作为模型的编码器,使用单向的GRU作为模型的解码器。编码器的输入X为一行诗句L i-1,X=(x 1,x 2,…,x L_enc),其中L_enc表示编码器限定的最大输入长度,解码器的输出Y也为一行诗句L i,Y=(y 1,y 2,…,y L_dec),其中L_dec表示解码器限定的最大输出长度。 In this application, a sequence-to-sequence structure is used to generate each ancient poem. The structure is composed of an encoder and a decoder. In this application, a bidirectional gated recurrent unit (GRU) is used as The encoder of the model uses a unidirectional GRU as the decoder of the model. Input X of the encoder is a line poem L i-1, X = ( x 1, x 2, ..., x L_enc), wherein L_enc represents encoder defined output Y maximum input length, the decoder is also a line of poetry L i , Y=(y 1 , y 2 ,..., y L_dec ), where L_dec represents the maximum output length defined by the decoder.
h t和s t分别表示编码器和解码器的隐藏层状态;e(y t)表示t时刻解码器输出字符y t的词嵌入向量;计算模型生成诗句L i中生成t时刻字符的概率分布: H t and S t represent the hidden layer state of the encoder and decoder; e (y t) represents the time t, the decoder output character Y t word embedded vector; calculation model generating verse L i the probability that at time t character generation distribution :
s t=GRU(s t-1,[e(y t-1);o t;p t;v i-1]) s t =GRU(s t-1 ,[e(y t-1 ); o t ; p t ; vi -1 ])
Figure PCTCN2020092724-appb-000020
Figure PCTCN2020092724-appb-000020
在公式s t=GRU(s t-1,[e(y t-1);o t;p t;v i-1])中,其中o t表示模型记忆矩阵的输出向量;p t是模型在训练中学习到的韵律嵌入和诗句长度嵌入的拼接向量; In the formula s t =GRU(s t-1 ,[e(y t-1 ); o t ; p t ; vi -1 ]), where o t represents the output vector of the model memory matrix; p t is the model The splicing vector of the prosody embedding and the length embedding of the verse learned in the training;
公式
Figure PCTCN2020092724-appb-000021
中,W表示模型需训练参数;所述公式表示在学习生成L i行诗句时,模型需要参考该行已经生成的信息y 1∶t-1,前序行已经生成的信息L 1:i-1,以及关键词信息
Figure PCTCN2020092724-appb-000022
formula
Figure PCTCN2020092724-appb-000021
, W represents the model need training parameters; formula represents the learning verses generated L i, the bank has the reference model requires information generated y 1:t-1, the preamble information generated by the row has L 1: i- 1 , and keyword information
Figure PCTCN2020092724-appb-000022
v i-1表示全局的跟踪向量,其初始值为全零向量,用于记录到目前行L i-1为止生成的内容信息,并为模型生成下一行诗句L i时提供全局信息。 V i-1 represents the global tracking vector, its initial value is all-zero vector, for recording the content information until the current row L i-1 generation, and to provide a next generation global information L i verses time model.
最后,在L i行诗句生成结束,需要使用一个普通的循环神经网络RNN对其进行更新: Finally, at the end of L i verses generation, it requires the use of a common neural network RNN cycle to update:
Figure PCTCN2020092724-appb-000023
Figure PCTCN2020092724-appb-000023
公式
Figure PCTCN2020092724-appb-000024
中,σ表示非线性层映射函数。
formula
Figure PCTCN2020092724-appb-000024
In, σ represents the non-linear layer mapping function.
对上述模型记忆矩阵进行训练,使训练数据的对数似然最大,并从序列到序列的结构中输出与所述图像关联的整首诗句信息:Train the above model memory matrix to maximize the log likelihood of the training data, and output the entire verse information associated with the image from the sequence-to-sequence structure:
Figure PCTCN2020092724-appb-000025
Figure PCTCN2020092724-appb-000025
本申请采用了具有记忆机制的序列到序列的神经网络模型。训练模型有效地关注和选择关键词的重要信息以及历史生成诗句中的关键信息,同时筛除和忽略信息中的干扰部分。在生成一行诗句时,通过读取记忆信息来找到模型生成单个字符时最需要关注的信息,如侧重与关键词的关联或是与历史诗句的连贯,从而指导单个字符的生成。在生成一行诗句后,通过对历史记忆信息进行更新,来保证模型将与诗句生成相关的状态信息写入记忆。通过对模型记忆进行动态地读取和写入,最终保证了生成的古诗与关键词的关联性,以及诗句之间的连贯性。This application uses a sequence-to-sequence neural network model with a memory mechanism. The training model effectively pays attention to and selects the important information of keywords and the key information in the historically generated verses, while filtering and ignoring the interference part of the information. When generating a line of verse, read the memory information to find the information that the model needs most attention to when generating a single character, such as focusing on the association with keywords or the continuity with historical verses, so as to guide the generation of single characters. After a line of verse is generated, the historical memory information is updated to ensure that the model writes the state information related to the generation of the verse into the memory. By dynamically reading and writing the model memory, the relevance of the generated ancient poems and keywords and the continuity between the verses are finally guaranteed.
需要强调的是,为进一步保证上述多标签分类模型、单标签模型和诗歌生成模型的私密和安全性,上述多标签分类模型、单标签模型和诗歌生成模型还可以存储于一区块链的节点中,通过区块链存储,实现数据信息在不同平台之间的共享,也可防止数据被篡改。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned multi-label classification model, single-label model and poetry generation model, the above-mentioned multi-label classification model, single-label model and poetry generation model can also be stored in a node of a blockchain In, through blockchain storage, data information can be shared between different platforms, and data can also be prevented from being tampered with.
其中,区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层。Among them, the blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. The blockchain is essentially a decentralized database, which is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify the validity of the information. (Anti-counterfeiting) and generate the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
图4是另一示例性实施例示出的一种基于图像的辅助写作装置300,包括:Fig. 4 is an image-based auxiliary writing device 300 shown in another exemplary embodiment, including:
图像信息获取模块310,用于获取目标图像的图像信息。The image information acquisition module 310 is used to acquire the image information of the target image.
第一属性关键词模块320,用于将图像信息输入多标签分类模型中,得到图像的第一属性的关键词标签。The first attribute keyword module 320 is used to input the image information into the multi-label classification model to obtain the keyword label of the first attribute of the image.
第二属性关键词模块330,用于将图像信息再次输入单标签分类模型中,得到图像的第二属性的候选标签。The second attribute keyword module 330 is used to input the image information into the single-label classification model again to obtain the candidate label of the second attribute of the image.
关键词映射模块340,用于将所述第一属性的关键词标签与所述第二属性的候选关键词标签进行映射,得到图像关键词标签信息。The keyword mapping module 340 is configured to map the keyword tags of the first attribute and the candidate keyword tags of the second attribute to obtain image keyword tag information.
诗词生成模块350,用于读取关键词矩阵以及第i行诗句的当前记忆矩阵以及历史记忆矩阵信息通过序列到序列的结构生成第i行诗句。The poem generation module 350 is used to read the keyword matrix and the current memory matrix of the i-th line of verses and the historical memory matrix information to generate the i-th line of verses through the sequence-to-sequence structure.
通过多标签分类模型提取出与图像相关的人物关键词标签信息,使用多标签模型对预测标签进行了与人物相关词的扩充,并且这些扩充词带有丰富的语义涵义,因此最终输出的关键词不仅和图像具有相关性,同时还具有图像表示的语义特性。保证了提取的人物关键词信息与图像信息的关联性。然后将图像分类模型得到的图像属性关键词标签信息进行映射选取其中与图像有一定关联性的图像关键词,确保了一开始输入诗歌模型的图像属性关键词参数具有较强的关联性。The multi-label classification model is used to extract the character keyword label information related to the image, and the prediction label is used to expand the character-related words using the multi-label model, and these expanded words have rich semantic meanings, so the final output keywords Not only is it related to the image, it also has the semantic characteristics of image representation. The relevance of the extracted character keyword information and image information is guaranteed. Then the image attribute keyword tag information obtained by the image classification model is mapped to select the image keywords that have a certain relevance to the image, which ensures that the image attribute keyword parameters input into the poetry model at the beginning have a strong relevance.
另外,在模型记忆矩阵中,通过对历史记忆地动态读写,让模型记忆只保留有限的历史信息,这有别于现有方法中,将历史生成过程中的状态信息全部保留下来的做法。本申请的这一做法中,需要训练模型学会专注于与诗歌生成密切相关的信息,而筛选并忽略生成过程中的干扰信息,进一步地保证模型的生成诗句之间的连贯性。从而可以满足使用者的需求,使生成的诗句信息与目标图像具有较强的关联性以及增强诗句之间的连贯性。In addition, in the model memory matrix, by dynamically reading and writing the historical memory, the model memory only retains limited historical information, which is different from the existing method of retaining all the state information in the history generation process. In this approach of this application, it is necessary to train the model to learn to focus on the information closely related to the generation of poetry, and to filter and ignore the interference information in the generation process, so as to further ensure the coherence between the generated verses of the model. Thereby, the user's needs can be met, the generated verse information and the target image have a strong relevance, and the coherence between the verses can be enhanced.
参照图5来描述根据本申请的这种实施方式的电子设备400。图5显示的电子设备400仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。The electronic device 400 according to this embodiment of the present application will be described with reference to FIG. 5. The electronic device 400 shown in FIG. 5 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.
如图5所示,电子设备400以通用计算设备的形式表现。电子设备400的组件可以包括但不限于:上述至少一个处理单元410、上述至少一个存储单元420、连接不同系统组件(包括存储单元420和处理单元410)的总线430。As shown in FIG. 5, the electronic device 400 is represented in the form of a general-purpose computing device. The components of the electronic device 400 may include, but are not limited to: the aforementioned at least one processing unit 410, the aforementioned at least one storage unit 420, and a bus 430 connecting different system components (including the storage unit 420 and the processing unit 410).
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元410执行,使得所述处理单元410执行本说明书上述“示例性方法”部分中描述的根据本申请各种示例性实施方式的步骤。例如,所述处理单元410可以执行如图1中所示的步骤S110:获取目标图像的图像信息。需要说明的是,对于图像信息,可能会包括多种特征信息,例如人物、山水风景、动物等;步骤S120:将所述图像信息输入多标签模型中,得到所述目标图像的第一属性的关键词标签;步骤S130:将所述图像信息输入单标签分类模型中,得到所述目标图像的第二属性的关键词候选标签;步骤S140:将所述第一属性的关键词标签与所述第二属性的候选关键词标签进行映射,得到图像关键词标签信息;步骤S150:将所述图像关键词标签信息输入至诗歌生成模型中,基于模型记忆矩阵生成诗词内容。Wherein, the storage unit stores program code, and the program code can be executed by the processing unit 410, so that the processing unit 410 executes the various exemplary methods described in the “Exemplary Method” section of this specification. Steps of implementation. For example, the processing unit 410 may perform step S110 as shown in FIG. 1: acquiring image information of the target image. It should be noted that the image information may include a variety of characteristic information, such as people, landscapes, animals, etc.; Step S120: input the image information into a multi-label model to obtain the first attribute of the target image Keyword tags; Step S130: Input the image information into a single-tag classification model to obtain the keyword candidate tags of the second attribute of the target image; Step S140: Combine the keyword tags of the first attribute with the The candidate keyword tags of the second attribute are mapped to obtain image keyword tag information; step S150: input the image keyword tag information into the poetry generation model, and generate the poetry content based on the model memory matrix.
存储单元420可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)4201和/或高速缓存存储单元4202,还可以进一步包括只读存储单元(ROM)5203。The storage unit 420 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 4201 and/or a cache storage unit 4202, and may further include a read-only storage unit (ROM) 5203.
存储单元420还可以包括具有一组(至少一个)程序模块4205的程序/实用工具4204,这样的程序模块4205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。The storage unit 420 may also include a program/utility tool 4204 having a set (at least one) program module 4205. Such program module 4205 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
总线430可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。The bus 430 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
电子设备400也可以与一个或多个外部设备600(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备400交互的设备通信,和/或与使得该电子设备400能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口450进行。并且,电子设备400还可以通过网络适配器460与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器460通过总线430与电子设备400的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备400使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The electronic device 400 may also communicate with one or more external devices 600 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 400, and/or communicate with Any device (such as a router, modem, etc.) that enables the electronic device 400 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 450. In addition, the electronic device 400 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 460. As shown in the figure, the network adapter 460 communicates with other modules of the electronic device 400 through the bus 430. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实 施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本申请实施方式的方法。Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present application.
在本申请的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。其中,该计算机可读存储介质可以是非易失性,也可以是易失性。在一些可能的实施方式中,本申请的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本申请各种示例性实施方式的步骤。In the exemplary embodiment of the present application, a computer-readable storage medium is also provided, on which a program product capable of implementing the above method of this specification is stored. Wherein, the computer-readable storage medium may be non-volatile or volatile. In some possible implementation manners, each aspect of the present application can also be implemented in the form of a program product, which includes program code. When the program product runs on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present application described in the above-mentioned "Exemplary Method" section of this specification.
参考图6所示,描述了根据本申请的实施方式的用于实现上述方法的程序产品500,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本申请的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。Referring to FIG. 6, a program product 500 for implementing the above-mentioned method according to an embodiment of the present application is described. It can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be installed in a terminal device, For example, running on a personal computer. However, the program product of this application is not limited to this. In this document, the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product can use any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。The computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。The program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。The program code used to perform the operations of the present application can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on. In the case of a remote computing device, the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
此外,上述附图仅是根据本申请示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。In addition, the above-mentioned drawings are merely schematic illustrations of the processing included in the method according to the exemplary embodiments of the present application, and are not intended for limitation. It is easy to understand that the processing shown in the above drawings does not indicate or limit the time sequence of these processings. In addition, it is easy to understand that these processes can be executed synchronously or asynchronously in multiple modules, for example.
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围执行各种修改和改变。本申请的范围仅由所附的权利要求来限制。It should be understood that the present application is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be performed without departing from its scope. The scope of the application is only limited by the appended claims.

Claims (20)

  1. 一种基于图像的辅助写作方法,其中,包括:An image-based auxiliary writing method, which includes:
    获取目标图像的图像信息;Acquiring image information of the target image;
    将所述图像信息输入多标签分类模型中,得到所述图像的第一属性的关键词标签;Input the image information into a multi-label classification model to obtain the keyword label of the first attribute of the image;
    将所述图像信息输入单标签分类模型中,得到所述图像的第二属性的候选关键词标签;所述第二属性与所述第一属性为所述目标图像中不同信息的特征;Inputting the image information into a single-label classification model to obtain candidate keyword tags of the second attribute of the image; the second attribute and the first attribute are features of different information in the target image;
    将所述第一属性的关键词标签与所述第二属性的候选关键词标签进行映射,得到图像关键词标签信息;Mapping the keyword tag of the first attribute with the candidate keyword tag of the second attribute to obtain image keyword tag information;
    将所述图像关键词标签信息输入至诗歌生成模型中,基于模型记忆矩阵生成诗词内容,所述模型记忆矩阵包括关键词矩阵、当前记忆矩阵及历史记忆矩阵;所述关键词记忆矩阵用于存储所述图像关键词标签信息,所述当前记忆矩阵和所述历史记忆矩阵用于存储生已生成的诗句的信息。Input the image keyword tag information into the poetry generation model, and generate the poem content based on the model memory matrix. The model memory matrix includes a keyword matrix, a current memory matrix, and a historical memory matrix; the keyword memory matrix is used for storage The image keyword tag information, the current memory matrix and the historical memory matrix are used to store information about the generated verses.
  2. 根据权利要求1所述的方法,其中,所述模型记忆矩阵存储于所述诗歌生成模型中,所述关键词记忆矩阵用于存储所述图像关键词标签信息,所述当前记忆矩阵和所述历史记忆矩阵用于存储生已生成的诗句的信息,包括:The method according to claim 1, wherein the model memory matrix is stored in the poetry generation model, and the keyword memory matrix is used to store the image keyword tag information, the current memory matrix and the The historical memory matrix is used to store the information of the verses that have been generated, including:
    将所述图像关键词标签嵌入所述模型记忆矩阵的关键词矩阵中作为关键词标签信息;Embedding the image keyword tags into the keyword matrix of the model memory matrix as keyword tag information;
    从所述关键词矩阵中通过记忆读取函数读取与生成第i行诗句信息关联的关键词标签信息;Read the keyword tag information associated with the generation of the ith line verse information from the keyword matrix through a memory reading function;
    将第i-1行诗句中字符信息对应的隐藏状态信息填充到所述模型记忆矩阵的当前记忆矩阵中作为当前记忆信息;Filling the hidden state information corresponding to the character information in the i-1th line of the poem into the current memory matrix of the model memory matrix as the current memory information;
    将第i-2行诗句中字符信息对应的隐藏状态信息通过记忆写入函数将计算后的信息填充到所述模型记忆矩阵的历史记忆矩阵中;Filling the hidden state information corresponding to the character information in the i-2th line of the poem into the historical memory matrix of the model memory matrix through the memory write function;
    读取所述关键词矩阵以及所述第i行诗句的当前记忆矩阵以及历史记忆矩阵信息通过诗歌生成模型生成所述第i行诗句。Read the keyword matrix and the current memory matrix and historical memory matrix information of the i-th line of verses to generate the i-th line of verses through a poetry generation model.
  3. 根据权利要求1所述的方法,其中,还包括:The method according to claim 1, further comprising:
    所述多标签分类模型是基于多标签数据集ML-Images进行多标签图像分类任务的模型训练得到的;The multi-label classification model is obtained by model training for multi-label image classification tasks based on the multi-label data set ML-Images;
    所述单标签模型是基于ImageNet-ILSVRC2012数据集在所述多标签分类模型上进行单标签模型的微调训练得到的。The single-label model is obtained by fine-tuning the single-label model on the multi-label classification model based on the ImageNet-ILSVRC2012 data set.
  4. 根据权利要求1所述的方法,其中,将所述第一属性关键词标签与所述第二属性候选关键词标签进行映射,得到图像关键词标签信息,包括:The method according to claim 1, wherein mapping the first attribute keyword tag and the second attribute candidate keyword tag to obtain image keyword tag information includes:
    将所述第一属性的关键词标签和所述第二属性的候选标签输入预训练的word2vec模型进行词嵌入,计算每个所述第一属性的关键词标签和每个所述第二属性的候选标签的词语相似度;The keyword label of the first attribute and the candidate label of the second attribute are input into the pre-trained word2vec model for word embedding, and the keyword label of each of the first attributes and the value of each of the second attributes are calculated. Word similarity of candidate tags;
    当存在t个词语相似度大于等于预定阈值时,选择最大的词语相似度对应的关键词标签及候选标签作为图像主题词,并从所述图像主题词对应的古诗词语中随机选择一个词语映射为映射关键词标签;When there are t words with similarity greater than or equal to a predetermined threshold, the keyword tags and candidate tags corresponding to the largest word similarity are selected as the image topic words, and a word is randomly selected from the ancient poetry words corresponding to the image topic words to map to Map keyword tags;
    当存在t个词语相似度小于预定阈值时,根据预设关键词词典,对t个所述关键词标签及候选标签进行映射,并选择出一个词语映射为映射关键词标签;When there are t words with similarity less than a predetermined threshold, map the t keyword tags and candidate tags according to the preset keyword dictionary, and select one word to map as the mapped keyword tag;
    基于所述映射关键词标签,从所述映射关键词标签中选择多个所述映射关键词标签作为输入至模型记忆矩阵中的图像关键词标签信息。Based on the mapped keyword tags, a plurality of the mapped keyword tags are selected from the mapped keyword tags as image keyword tag information input into the model memory matrix.
  5. 根据权利要求2所述的方法,其中,从所述关键词矩阵中通过记忆读取函数读取与生成第i行诗句信息关联的关键词标签信息包括:3. The method according to claim 2, wherein reading the keyword tag information associated with generating the i-th line of verse information from the keyword matrix through a memory reading function comprises:
    根据记忆读取函数,确定每个记忆片段被选中的概率与其自身的加权平均,得到所述 模型记忆矩阵中读取记忆信息的向量,从而生成第i行诗句的第t时刻字符。According to the memory reading function, the weighted average of the probability of each memory segment being selected and its own is determined, and the vector of the memory information read in the model memory matrix is obtained, so as to generate the t-th time character of the i-th line of verse.
  6. 根据权利要求1所述的方法,其中,所述将第i-2行诗句中字符信息对应的隐藏状态信息通过记忆写入函数计算后将计算后的信息填充到历史记忆矩阵中,包括:The method according to claim 1, wherein said calculating the hidden state information corresponding to the character information in the i-2th line of the poem through the memory write function and then filling the calculated information into the historical memory matrix comprises:
    在诗句第i-1行诗句生成后并且在诗句第i行诗句生成前,历史时刻的第i-2行诗句的每个字符经过编码器后对应的隐藏状态,通过记忆写入函数计算后为其选择一个所述历史记忆矩阵,然后将所述隐藏状态填入所述历史记忆矩阵中。After the generation of the verse in line i-1 of the verse and before the generation of the verse in line i of the verse, each character of the verse in line i-2 of the historical moment passes through the corresponding hidden state after the encoder, which is calculated by the memory write function It selects one of the historical memory matrix, and then fills the hidden state into the historical memory matrix.
  7. 根据权利要求1所述的方法,其中,所述模型记忆矩阵包括关键词记忆矩阵
    Figure PCTCN2020092724-appb-100001
    当前记忆矩阵
    Figure PCTCN2020092724-appb-100002
    以及历史记忆矩阵
    Figure PCTCN2020092724-appb-100003
    The method according to claim 1, wherein the model memory matrix includes a keyword memory matrix
    Figure PCTCN2020092724-appb-100001
    Current memory matrix
    Figure PCTCN2020092724-appb-100002
    And historical memory matrix
    Figure PCTCN2020092724-appb-100003
    所述关键词记忆矩阵用于存储所述图像关键词标签信息,所述当前记忆矩阵和历史记忆矩阵用于存储所述已生成的诗句的信息;The keyword memory matrix is used to store the image keyword tag information, and the current memory matrix and the historical memory matrix are used to store the information of the generated verses;
    所述模型记忆矩阵的每一行表示一个记忆片段,d m表示记忆片段的尺寸,K 2和K 3分别表示当前记忆和历史记忆的片段长度;所述模型记忆矩阵的记忆表示为三段记忆的拼接M=[M 1;M 2;M 3],
    Figure PCTCN2020092724-appb-100004
    [M 1;M 2;M 3]表示矩阵拼接,且K=K 1+K 2+K 3
    Each row of the model memory matrix represents a memory segment, d m represents the size of the memory segment, K 2 and K 3 represent the segment lengths of current memory and historical memory, respectively; the memory of the model memory matrix is expressed as three segments of memory Splicing M = [M 1 ; M 2 ; M 3 ],
    Figure PCTCN2020092724-appb-100004
    [M 1 ; M 2 ; M 3 ] represents matrix splicing, and K=K 1 +K 2 +K 3 .
  8. 根据权利要求5所述的方法,其中,所述记忆读取函数为:α r=A r(M,query),所述每个记忆片段被选择的概率为α r[k,:]=softmax(z k),; The method according to claim 5, wherein the memory reading function is: α r =A r (M, query), and the probability of each memory segment being selected is α r [k,:]=softmax (z k ),;
    其中,α r表示所述记忆读取函数,且α r=A r(M,query),query表示所述诗歌生成模型的状态信息,z k表示根据所述每个记忆片段M和所述诗歌生成模型的状态信息query的相关性程度变量,且z k=v Tσ(M[k,:],q),v T表示所述诗歌生成模型的模型参数。 Wherein, α r represents the memory reading function, and α r =A r (M, query), query represents the state information of the poetry generation model, and z k represents according to each memory segment M and the poetry The relevance variable of the state information query of the generation model, and z k =v T σ(M[k,:],q), v T represents the model parameter of the poetry generation model.
  9. 根据权利要求1-8任一项所述的方法,其中,还包括:The method according to any one of claims 1-8, further comprising:
    将所述单标签分类模型,所述多标签分类模型以及所述诗歌生成模型存储至区块链中。The single-label classification model, the multi-label classification model, and the poetry generation model are stored in a blockchain.
  10. 一种基于图像的辅助写作装置,其中,包括:An image-based auxiliary writing device, which includes:
    图像信息获取模块,用于获取目标图像的信息;The image information acquisition module is used to acquire the information of the target image;
    第一属性关键词模块,用于将图像信息输入多标签分类模型中,得到所述图像的第一属性的关键词标签;The first attribute keyword module is used to input the image information into the multi-label classification model to obtain the keyword label of the first attribute of the image;
    第二属性关键词模块,用于将图像信息输入单标签分类模型中,得到所述图像的第二属性的候选标签;所述第二属性与所述第一属性为目标图像中不同信息的特征信息;The second attribute keyword module is used to input image information into a single-label classification model to obtain candidate labels of the second attribute of the image; the second attribute and the first attribute are features of different information in the target image information;
    关键词映射模块,用于将所述第一属性的关键词标签与所述第二属性的候选标签进行映射,得到图像关键词标签信息;The keyword mapping module is used to map the keyword tags of the first attribute with the candidate tags of the second attribute to obtain image keyword tag information;
    古诗词生成模块,用于将所述图像关键词标签信息输入至诗歌生成模型中,基于模型记忆矩阵生成诗词内容,所述模型记忆矩阵包括关键词矩阵、当前记忆矩阵及历史记忆矩阵;所述关键词记忆矩阵用于存储所述图像关键词标签信息,所述当前记忆矩阵和所述历史记忆矩阵用于存储生已生成的诗句的信息。The ancient poetry generation module is used to input the image keyword tag information into the poetry generation model, and generate the poetry content based on the model memory matrix, the model memory matrix including the keyword matrix, the current memory matrix and the historical memory matrix; The keyword memory matrix is used to store the image keyword tag information, and the current memory matrix and the historical memory matrix are used to store information about the generated verses.
  11. 一种电子设备,其中,包括:An electronic device, including:
    处理器;以及Processor; and
    存储器,用于存储所述处理器的可执行指令;A memory for storing executable instructions of the processor;
    其中,所述处理器配置为经由执行所述可执行指令来执行:Wherein, the processor is configured to execute by executing the executable instruction:
    获取目标图像的图像信息;Acquiring image information of the target image;
    将所述图像信息输入多标签分类模型中,得到所述图像的第一属性的关键词标签;Input the image information into a multi-label classification model to obtain the keyword label of the first attribute of the image;
    将所述图像信息输入单标签分类模型中,得到所述图像的第二属性的候选关键词标签;所述第二属性与所述第一属性为所述目标图像中不同信息的特征;Inputting the image information into a single-label classification model to obtain candidate keyword tags of the second attribute of the image; the second attribute and the first attribute are features of different information in the target image;
    将所述第一属性的关键词标签与所述第二属性的候选关键词标签进行映射,得到图像关键词标签信息;Mapping the keyword tag of the first attribute with the candidate keyword tag of the second attribute to obtain image keyword tag information;
    将所述图像关键词标签信息输入至诗歌生成模型中,基于模型记忆矩阵生成诗词内容,所述模型记忆矩阵包括关键词矩阵、当前记忆矩阵及历史记忆矩阵;所述关键词记忆矩阵 用于存储所述图像关键词标签信息,所述当前记忆矩阵和所述历史记忆矩阵用于存储生已生成的诗句的信息。Input the image keyword tag information into the poem generation model, and generate the poem content based on the model memory matrix. The model memory matrix includes a keyword matrix, a current memory matrix and a historical memory matrix; the keyword memory matrix is used for storage The image keyword tag information, the current memory matrix and the historical memory matrix are used to store information about the generated verses.
  12. 根据权利要求11所述的电子设备,其中,所述模型记忆矩阵存储于所述诗歌生成模型中,所述关键词记忆矩阵用于存储所述图像关键词标签信息,所述当前记忆矩阵和所述历史记忆矩阵用于存储生已生成的诗句的信息,所述处理器配置为经由执行所述可执行指令来执行:The electronic device according to claim 11, wherein the model memory matrix is stored in the poetry generation model, the keyword memory matrix is used to store the image keyword tag information, the current memory matrix and all The historical memory matrix is used to store information about the generated verses, and the processor is configured to execute by executing the executable instructions:
    将所述图像关键词标签嵌入所述模型记忆矩阵的关键词矩阵中作为关键词标签信息;Embedding the image keyword tags into the keyword matrix of the model memory matrix as keyword tag information;
    从所述关键词矩阵中通过记忆读取函数读取与生成第i行诗句信息关联的关键词标签信息;Read the keyword tag information associated with the generation of the ith line verse information from the keyword matrix through a memory reading function;
    将第i-1行诗句中字符信息对应的隐藏状态信息填充到所述模型记忆矩阵的当前记忆矩阵中作为当前记忆信息;Filling the hidden state information corresponding to the character information in the i-1th line of the poem into the current memory matrix of the model memory matrix as the current memory information;
    将第i-2行诗句中字符信息对应的隐藏状态信息通过记忆写入函数将计算后的信息填充到所述模型记忆矩阵的历史记忆矩阵中;Filling the hidden state information corresponding to the character information in the i-2th line of the poem into the historical memory matrix of the model memory matrix through the memory write function;
    读取所述关键词矩阵以及所述第i行诗句的当前记忆矩阵以及历史记忆矩阵信息通过诗歌生成模型生成所述第i行诗句。Read the keyword matrix and the current memory matrix and historical memory matrix information of the i-th line of verses to generate the i-th line of verses through a poetry generation model.
  13. 根据权利要求11所述的电子设备,其中,所述处理器配置为经由执行所述可执行指令来执行:The electronic device of claim 11, wherein the processor is configured to execute via execution of the executable instructions:
    所述多标签分类模型是基于多标签数据集ML-Images进行多标签图像分类任务的模型训练得到的;The multi-label classification model is obtained by model training for multi-label image classification tasks based on the multi-label data set ML-Images;
    所述单标签模型是基于ImageNet-ILSVRC2012数据集在所述多标签分类模型上进行单标签模型的微调训练得到的。The single-label model is obtained by fine-tuning the single-label model on the multi-label classification model based on the ImageNet-ILSVRC2012 data set.
  14. 根据权利要求11所述的电子设备,其中,所述处理器配置为经由执行所述可执行指令来执行:The electronic device of claim 11, wherein the processor is configured to execute via execution of the executable instructions:
    将所述第一属性的关键词标签和所述第二属性的候选标签输入预训练的word2vec模型进行词嵌入,计算每个所述第一属性的关键词标签和每个所述第二属性的候选标签的词语相似度;The keyword label of the first attribute and the candidate label of the second attribute are input into the pre-trained word2vec model for word embedding, and the keyword label of each of the first attributes and the value of each of the second attributes are calculated. Word similarity of candidate tags;
    当存在t个词语相似度大于等于预定阈值时,选择最大的词语相似度对应的关键词标签及候选标签作为图像主题词,并从所述图像主题词对应的古诗词语中随机选择一个词语映射为映射关键词标签;When there are t words with similarity greater than or equal to a predetermined threshold, the keyword tags and candidate tags corresponding to the largest word similarity are selected as the image topic words, and a word is randomly selected from the ancient poetry words corresponding to the image topic words to map to Map keyword tags;
    当存在t个词语相似度小于预定阈值时,根据预设关键词词典,对t个所述关键词标签及候选标签进行映射,并选择出一个词语映射为映射关键词标签;When there are t words with similarity less than a predetermined threshold, map the t keyword tags and candidate tags according to the preset keyword dictionary, and select one word to map as the mapped keyword tag;
    基于所述映射关键词标签,从所述映射关键词标签中选择多个所述映射关键词标签作为输入至模型记忆矩阵中的图像关键词标签信息。Based on the mapped keyword tags, a plurality of the mapped keyword tags are selected from the mapped keyword tags as image keyword tag information input into the model memory matrix.
  15. 根据权利要求12所述的电子设备,其中,所述处理器配置为经由执行所述可执行指令来执行:The electronic device of claim 12, wherein the processor is configured to execute via execution of the executable instructions:
    根据记忆读取函数,确定每个记忆片段被选中的概率与其自身的加权平均,得到所述模型记忆矩阵中读取记忆信息的向量,从而生成第i行诗句的第t时刻字符。According to the memory reading function, the weighted average of the probability of each memory segment being selected and its own is determined, and the vector of the memory information read in the model memory matrix is obtained, thereby generating the characters at the t time of the i-th line of verse.
  16. 根据权利要求11所述的电子设备,其中,所述处理器配置为经由执行所述可执行指令来执行:The electronic device of claim 11, wherein the processor is configured to execute via execution of the executable instructions:
    在诗句第i-1行诗句生成后并且在诗句第i行诗句生成前,历史时刻的第i-2行诗句的每个字符经过编码器后对应的隐藏状态,通过记忆写入函数计算后为其选择一个所述历史记忆矩阵,然后将所述隐藏状态填入所述历史记忆矩阵中。After the generation of the verse in line i-1 of the verse and before the generation of the verse in line i of the verse, each character of the verse in line i-2 of the historical moment passes through the corresponding hidden state after the encoder, which is calculated by the memory write function It selects one of the historical memory matrix, and then fills the hidden state into the historical memory matrix.
  17. 根据权利要求11所述的电子设备,其中,所述模型记忆矩阵包括关键词记忆矩阵
    Figure PCTCN2020092724-appb-100005
    当前记忆矩阵
    Figure PCTCN2020092724-appb-100006
    以及历史记忆矩阵
    Figure PCTCN2020092724-appb-100007
    The electronic device according to claim 11, wherein the model memory matrix comprises a keyword memory matrix
    Figure PCTCN2020092724-appb-100005
    Current memory matrix
    Figure PCTCN2020092724-appb-100006
    And historical memory matrix
    Figure PCTCN2020092724-appb-100007
    所述关键词记忆矩阵用于存储所述图像关键词标签信息,所述当前记忆矩阵和历史记忆矩阵用于存储所述已生成的诗句的信息;The keyword memory matrix is used to store the image keyword tag information, and the current memory matrix and the historical memory matrix are used to store the information of the generated verses;
    所述模型记忆矩阵的每一行表示一个记忆片段,d m表示记忆片段的尺寸,K 2和K 3分别表示当前记忆和历史记忆的片段长度;所述模型记忆矩阵的记忆表示为三段记忆的拼接M=[M 1;M 2;M 3],
    Figure PCTCN2020092724-appb-100008
    [M 1;M 2;M 3]表示矩阵拼接,且K=K 1+K 2+K 3
    Each row of the model memory matrix represents a memory segment, d m represents the size of the memory segment, K 2 and K 3 represent the segment lengths of current memory and historical memory, respectively; the memory of the model memory matrix is expressed as three segments of memory Splicing M = [M 1 ; M 2 ; M 3 ],
    Figure PCTCN2020092724-appb-100008
    [M 1 ; M 2 ; M 3 ] represents matrix splicing, and K=K 1 +K 2 +K 3 .
  18. 根据权利要求15所述的电子设备,其中,所述记忆读取函数为:α r=A r(M,query),所述每个记忆片段被选择的概率为α r[k,:]=softmax(z k),; The electronic device according to claim 15, wherein the memory reading function is: α r =A r (M, query), and the probability of each memory segment being selected is α r [k,:]= softmax(z k ),;
    其中,α r表示所述记忆读取函数,且α r=A r(M,query),query表示所述诗歌生成模型的状态信息,z k表示根据所述每个记忆片段M和所述诗歌生成模型的状态信息query的相关性程度变量,且z k=v Tσ(M[K,:],q),v T表示所述诗歌生成模型的模型参数。 Wherein, α r represents the memory reading function, and α r =A r (M, query), query represents the state information of the poetry generation model, and z k represents according to each memory segment M and the poetry The relevance variable of the state information query of the generation model, and z k =v T σ(M[K,:],q), v T represents the model parameter of the poetry generation model.
  19. 根据权利要求11-18任一项所述的电子设备,其中,所述处理器配置为经由执行所述可执行指令来执行:The electronic device according to any one of claims 11-18, wherein the processor is configured to execute via execution of the executable instruction:
    将所述单标签分类模型,所述多标签分类模型以及所述诗歌生成模型存储至区块链中。The single-label classification model, the multi-label classification model, and the poetry generation model are stored in a blockchain.
  20. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1-9任一项所述的基于图像的辅助写作方法。A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the image-based auxiliary writing method according to any one of claims 1-9 is realized.
PCT/CN2020/092724 2020-04-24 2020-05-27 Image-based writing assisting method and apparatus, medium, and device WO2021212601A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010332409.9A CN111611805B (en) 2020-04-24 2020-04-24 Auxiliary writing method, device, medium and equipment based on image
CN202010332409.9 2020-04-24

Publications (1)

Publication Number Publication Date
WO2021212601A1 true WO2021212601A1 (en) 2021-10-28

Family

ID=72197888

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092724 WO2021212601A1 (en) 2020-04-24 2020-05-27 Image-based writing assisting method and apparatus, medium, and device

Country Status (2)

Country Link
CN (1) CN111611805B (en)
WO (1) WO2021212601A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115080786A (en) * 2022-08-22 2022-09-20 科大讯飞股份有限公司 Picture poetry-based method, device and equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766284B (en) * 2021-01-26 2023-11-21 北京有竹居网络技术有限公司 Image recognition method and device, storage medium and electronic equipment
CN113010717B (en) * 2021-04-26 2022-04-22 中国人民解放军国防科技大学 Image verse description generation method, device and equipment
CN114419402B (en) * 2022-03-29 2023-08-18 中国人民解放军国防科技大学 Image story description generation method, device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120269436A1 (en) * 2011-04-20 2012-10-25 Xerox Corporation Learning structured prediction models for interactive image labeling
CN107480132A (en) * 2017-07-25 2017-12-15 浙江工业大学 A kind of classic poetry generation method of image content-based
CN108664989A (en) * 2018-03-27 2018-10-16 北京达佳互联信息技术有限公司 Image tag determines method, apparatus and terminal
CN108829667A (en) * 2018-05-28 2018-11-16 南京柯基数据科技有限公司 It is a kind of based on memory network more wheels dialogue under intension recognizing method
CN109214409A (en) * 2018-07-10 2019-01-15 上海斐讯数据通信技术有限公司 A kind of vegetable recognition methods and system
CN110334195A (en) * 2019-06-26 2019-10-15 北京科技大学 A kind of answering method and system based on local attention mechanism memory network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8873867B1 (en) * 2012-07-10 2014-10-28 Google Inc. Assigning labels to images
CN110297933A (en) * 2019-07-01 2019-10-01 山东浪潮人工智能研究院有限公司 A kind of theme label recommended method and tool based on deep learning
CN110728255B (en) * 2019-10-22 2022-12-16 Oppo广东移动通信有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN111368514B (en) * 2019-12-10 2024-04-19 爱驰汽车有限公司 Model training and ancient poem generating method, ancient poem generating device, equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120269436A1 (en) * 2011-04-20 2012-10-25 Xerox Corporation Learning structured prediction models for interactive image labeling
CN107480132A (en) * 2017-07-25 2017-12-15 浙江工业大学 A kind of classic poetry generation method of image content-based
CN108664989A (en) * 2018-03-27 2018-10-16 北京达佳互联信息技术有限公司 Image tag determines method, apparatus and terminal
CN108829667A (en) * 2018-05-28 2018-11-16 南京柯基数据科技有限公司 It is a kind of based on memory network more wheels dialogue under intension recognizing method
CN109214409A (en) * 2018-07-10 2019-01-15 上海斐讯数据通信技术有限公司 A kind of vegetable recognition methods and system
CN110334195A (en) * 2019-06-26 2019-10-15 北京科技大学 A kind of answering method and system based on local attention mechanism memory network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115080786A (en) * 2022-08-22 2022-09-20 科大讯飞股份有限公司 Picture poetry-based method, device and equipment and storage medium

Also Published As

Publication number Publication date
CN111611805A (en) 2020-09-01
CN111611805B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
WO2021212601A1 (en) Image-based writing assisting method and apparatus, medium, and device
CN110717017B (en) Method for processing corpus
WO2021223323A1 (en) Image content automatic description method based on construction of chinese visual vocabulary list
CN111026861B (en) Text abstract generation method, training device, training equipment and medium
CN112685565A (en) Text classification method based on multi-mode information fusion and related equipment thereof
CN109543820B (en) Image description generation method based on architecture phrase constraint vector and double vision attention mechanism
JP2021152963A (en) Word meaning feature generating method, model training method, apparatus, device, medium, and program
WO2023134082A1 (en) Training method and apparatus for image caption statement generation module, and electronic device
CN110990555B (en) End-to-end retrieval type dialogue method and system and computer equipment
CN111881292B (en) Text classification method and device
US20220292805A1 (en) Image processing method and apparatus, and device, storage medium, and image segmentation method
RU2712101C2 (en) Prediction of probability of occurrence of line using sequence of vectors
CN109710842B (en) Business information pushing method and device and readable storage medium
EP4302234A1 (en) Cross-modal processing for vision and language
CN114416995A (en) Information recommendation method, device and equipment
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN115129839A (en) Visual dialogue answer generation method and device based on graph perception
CN113392265A (en) Multimedia processing method, device and equipment
CN113360683B (en) Method for training cross-modal retrieval model and cross-modal retrieval method and device
Cao et al. Visual question answering research on multi-layer attention mechanism based on image target features
CN110717316B (en) Topic segmentation method and device for subtitle dialog flow
CN117093687A (en) Question answering method and device, electronic equipment and storage medium
CN116977992A (en) Text information identification method, apparatus, computer device and storage medium
CN116561272A (en) Open domain visual language question-answering method and device, electronic equipment and storage medium
CN114092931B (en) Scene character recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20932255

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20932255

Country of ref document: EP

Kind code of ref document: A1