CN110807118B

CN110807118B - Image comment generation method and device and electronic equipment

Info

Publication number: CN110807118B
Application number: CN201911050319.4A
Authority: CN
Inventors: 张宏龙; 雷瑞生
Original assignee: Guangdong 3vjia Information Technology Co Ltd
Current assignee: Guangdong 3vjia Information Technology Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2023-10-03
Anticipated expiration: 2039-10-30
Also published as: CN110807118A

Abstract

The invention provides a method and a device for generating image comments and electronic equipment, wherein the method comprises the following steps: determining a keyword set according to the characteristics in the image to be reviewed; generating a plurality of candidate comments according to the keyword set and a pre-trained language generation model; and selecting comments conforming to the preset language rule from the plurality of candidate comments, and determining the comments conforming to the preset language rule as the comments of the image to be commented. The invention can effectively reduce the cost of manually generating comments, so that the generated comments are closer to human language expression, and the authenticity of the comments is improved.

Description

Image comment generation method and device and electronic equipment

Technical Field

The present invention relates to the field of image description technologies, and in particular, to a method and an apparatus for generating an image comment, and an electronic device.

Background

Many merchants engaged in the home industry often show partial home effect graphs on online websites, applets and the like, and attach corresponding comment sentences in order to attract more users to pay attention to products. Current reviews are mostly artificially generated, and rarely are machine synthesized. However, the cost required for manually generating comments is high, and machine synthesized comments often differ greatly from human language expression modes, and lack of authenticity.

Disclosure of Invention

In view of the above, the invention aims to provide a method and a device for generating image comments and electronic equipment, which can effectively reduce the cost of manually generating comments, enable the generated comments to be closer to human language expression and improve the authenticity of the comments.

In a first aspect, an embodiment of the present invention provides a method for generating an image comment, including: determining a keyword set according to the characteristics in the image to be reviewed; generating a plurality of candidate comments according to the keyword set and a pre-trained language generation model; and selecting comments conforming to the preset language rule from the plurality of candidate comments, and determining the comments conforming to the preset language rule as the comments of the image to be commented.

In one embodiment, the step of determining the keyword set according to the features in the image to be reviewed includes: inputting an image to be reviewed into a keyword generation model; generating keywords corresponding to the image to be evaluated according to the characteristics of the image to be evaluated; ranking the keywords based on the frequency of occurrence of each keyword; and based on the sorting result, starting from the keyword with the highest occurrence frequency, sequentially selecting a first preset number of keywords as a keyword set of the images to be reviewed.

In one embodiment, the step of screening comments conforming to a preset language rule from a plurality of candidate comments, and determining the comments conforming to the preset language rule as comments of the image to be commented on includes: judging whether each candidate comment meets a preset language rule according to the candidate comments and a pre-trained screening model; if yes, determining the alternative comment as a comment to be output; sorting the comments to be output based on the occurrence frequency of each comment to be output; and based on the sorting result, starting from the comment to be output with the highest appearance frequency, sequentially selecting a second preset number of comments to be output as comments of the image to be comment.

In one embodiment, the method for generating an image comment further includes: acquiring a training image and a comment corresponding to the training image; extracting keywords of the corresponding comments of the training images according to the corresponding comments of the training images and a preset extraction model; and inputting the extracted keywords and training images into a first neural network model for training to obtain a keyword generation model.

In one embodiment, the method for generating an image comment further includes: inputting the extracted keywords and the comments corresponding to the training images into a second neural network model for training to obtain a language generation model.

In one embodiment, the keyword generation model includes an image description-based neural network model and/or a neural network model classified according to multi-label.

In one embodiment, the language generation model includes a recurrent neural network model based on long-short-term memory networks and/or a language generation model based on a Seq2Seq neural network architecture.

In a second aspect, an embodiment of the present invention provides an apparatus for generating an image comment, including: the keyword set acquisition module is used for determining a keyword set according to the characteristics in the image to be reviewed; the comment generation module is used for generating a plurality of candidate comments according to the keyword set and the pre-trained language generation model; the comment determining module is used for screening comments conforming to a preset language rule from the plurality of candidate comments and determining that the comments conforming to the preset language rule are comments of the image to be commented.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory; a computer program is stored on a memory, which when run by a processor performs the method according to any one of the first aspects.

In a fourth aspect, an embodiment of the present invention provides a computer storage medium storing a computer program for use with any one of the methods provided in the first aspect.

The embodiment of the invention provides a method, a device and electronic equipment for generating image comments, which can obtain a keyword set corresponding to an image to be commented (the keyword set comprises a plurality of keywords with high occurrence frequency) according to the characteristics in the image to be commented; according to the keyword set and the pre-trained language generation model, a plurality of candidate comments can be generated; and selecting comments conforming to a preset language rule from the plurality of candidate comments, and determining the comments as comments corresponding to the image to be commented. According to the method for generating the image comments, through extraction of the keywords and generation and screening of the comments, the comments which are closer to human language expression can be obtained, the cost of manually generating the comments is effectively reduced, and the authenticity of the comments is improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a method for generating an image comment according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a training process of a keyword generation model and a language generation model according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an apparatus for generating an image comment according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Currently, some reviews of household products are mostly generated manually, and a few are synthesized by machines. However, the cost required for manually generating comments is high, and the comments synthesized by a machine often have a large difference from the language expression mode of human beings and lack of authenticity.

Based on the method, the device and the electronic equipment for generating the image comments, which are provided by the embodiment of the invention, the cost of manually generating the comments can be effectively reduced, the generated comments are closer to the human language expression, and the authenticity of the comments is improved.

For the convenience of understanding the present embodiment, first, a detailed description will be given of a method for generating an image comment disclosed in the embodiment of the present invention, referring to a flowchart of a method for generating an image comment shown in fig. 1, the method mainly includes the following steps S101 to S103:

step S101: and determining a keyword set according to the characteristics in the image to be reviewed.

The image to be reviewed can be a two-dimensional home image drawn through drawing software, or can be a two-dimensional home image downloaded through a network. Specifically, the corresponding keyword set, such as a warm tone, a conciseness, a foreside, and the like, may be determined according to features in the two-dimensional home image, such as color, style, structure, and the like.

Step S102: and generating a plurality of candidate comments according to the keyword set and the pre-trained language generation model.

The pre-trained language generation model can be a cyclic neural network model based on a long-short-term memory network and/or a language generation model based on a Seq2Seq neural network architecture. The model input may be a keyword set corresponding to the home image output in step S101, and is output as an alternative comment of the home image. Each keyword can correspond to a plurality of comments, and the comment with the highest occurrence probability is selected to be output as the comment corresponding to the keyword.

Step S103: and selecting comments conforming to the preset language rule from the plurality of candidate comments, and determining the comments conforming to the preset language rule as the comments of the image to be commented.

Wherein, the preset language rule may be a rule conforming to human language expression, such as a structure satisfying a grammatical order of subject-predicate-object; the number of the determined comments conforming to the preset language rule can be multiple, and the first N comments with high occurrence probability are selected as comments corresponding to the image to be commented.

The embodiment of the invention provides a method for generating image comments, which can obtain a keyword set corresponding to an image to be commented (the keyword set comprises a plurality of keywords with high occurrence frequency) according to the characteristics in the image to be commented; according to the keyword set and the pre-trained language generation model, a plurality of candidate comments can be generated; and selecting comments conforming to a preset language rule from the plurality of candidate comments, and determining the comments as comments corresponding to the image to be commented. According to the method for generating the image comments, through extraction of the keywords and generation and screening of the comments, the comments which are closer to human language expression can be obtained, the cost of manually generating the comments is effectively reduced, and the authenticity of the comments is improved.

For step S101, in one embodiment, the step of determining the keyword set according to the features in the image to be reviewed may be performed according to the following steps a1 to a4:

step a1: and inputting the images to be reviewed into the keyword generation model. In a specific embodiment, the keyword extraction model may be a neural network model based on image description, or may be a neural network model classified according to multiple labels, and specifically may be a show-attention-and-toll model with attention mechanisms introduced.

Step a2: and generating keywords corresponding to the image to be evaluated according to the characteristics of the image to be evaluated. Wherein the generated keywords may be plural. In a specific embodiment, the image feature extraction module of the keyword generation model may be a feature extraction module based on a VGG16 network model, where the VGG16 network model is a 16-layer deep convolutional neural network model. And extracting features contained in the image to be reviewed through the VGG16 network model, and generating corresponding keywords according to the extracted features.

Step a3: keywords are ranked based on the frequency of occurrence of each keyword.

Step a4: and based on the sorting result, starting from the keywords with highest occurrence frequency, sequentially selecting a first preset number of keywords as a keyword set of the images to be reviewed.

In a specific application, there may be multiple keywords, so that the keywords may be ranked according to the occurrence frequency of each keyword, and the first N results with the highest probability are taken as the output keyword set (that is, the first preset number of keywords are sequentially selected as the keyword set of the image to be reviewed from the keyword with the highest occurrence frequency).

For step S103, in one embodiment, the step of screening the comments conforming to the preset language rule from the plurality of candidate comments and determining that the comment conforming to the preset language rule is a comment of the image to be reviewed may be performed according to the following steps b1 to b4:

step b1: and judging whether each candidate comment meets a preset language rule according to the candidate comments and the pre-trained screening model.

The pre-trained screening model comprises preset language rules, and the preset language rules can be rules conforming to human language expression, such as structures meeting the grammar order as subjects-predicates-objects.

Step b2: if yes, determining the alternative comment as the comment to be output.

In a specific embodiment, the screening model may remove candidate comments that do not conform to the preset language rules; and determining the candidate comments conforming to the preset language rules as comments to be output.

Step b3: and sorting the comments to be output based on the occurrence frequency of each comment to be output.

Step b4: and based on the sorting result, starting from the comment to be output with the highest occurrence frequency, sequentially selecting a second preset number of comments to be output as comments of the image to be reviewed.

In a specific embodiment, since there may be a plurality of comments to be output, all the comments to be output may be ranked according to the occurrence probability thereof, and according to the ranking result, the first N comments to be output with high occurrence probability are selected as comments corresponding to the image to be comment (that is, starting from the comment to be output with the highest occurrence frequency, a second preset number of comments to be output are sequentially selected as comments of the image to be comment).

Further, the embodiment of the present invention further provides a schematic diagram of a training process of a keyword generation model and a language generation model, referring to fig. 2, taking a home image comment as an example, including the following steps S201 to S204:

step S201: and obtaining the training image and the corresponding comment of the training image.

In a specific embodiment, the training image and the comment corresponding to the training image may be a pair of a household image and a comment corresponding to the household image obtained by capturing from a household community platform.

Step S202: and extracting keywords of the corresponding comments of the training images according to the corresponding comments of the training images and a preset extraction model.

In a specific application, the preset extraction model can convert the home image comments into home image keywords.

Step S203: and inputting the extracted keywords and training images into a first neural network model for training to obtain a keyword generation model.

Specifically, the first neural network model may be a neural network model based on image description, or may be a neural network model classified according to multiple labels.

Step S204: inputting the extracted keywords and the comments corresponding to the training images into a second neural network model for training to obtain a language generation model.

In one embodiment, the second neural network model may be a recurrent neural network model based on a long-short-term memory network, or may be a language generation model based on a Seq2Seq neural network architecture.

In addition, the image feature extraction module of the home image keyword generation model can be a feature extraction module based on VGG16, and the pre-training model is obtained through training an open-source furniture classification set. In the embodiment of the invention, the image labels are keywords corresponding to the images, but not comments, mainly because the keywords have certainty, and the comments have uncertainty due to the fact that the keywords have human language expression colors, the direct training model outputs the keywords more simply and accurately than the training model outputs the comments.

For the method for recommending image labels provided in the foregoing embodiment, the embodiment of the present invention further provides a device for generating image comments, referring to a schematic structural diagram of a device for generating image comments shown in fig. 3, where the device may include the following parts:

the keyword set obtaining module 301 is configured to determine a keyword set according to features in an image to be reviewed.

Comment generation module 302 is configured to generate a plurality of candidate comments according to the keyword set and the pre-trained language generation model.

The comment determining module 303 is configured to screen comments conforming to a preset language rule from the plurality of candidate comments, and determine that the comment conforming to the preset language rule is a comment of the image to be reviewed.

The embodiment of the invention provides a generating device of image comments, which can obtain a keyword set corresponding to an image to be commented (the keyword set comprises a plurality of keywords with high occurrence frequency) according to the characteristics in the image to be commented; according to the keyword set and the pre-trained language generation model, a plurality of candidate comments can be generated; and selecting comments conforming to a preset language rule from the plurality of candidate comments, and determining the comments as comments corresponding to the image to be commented. According to the method for generating the image comments, through extraction of the keywords and generation and screening of the comments, the comments which are closer to human language expression can be obtained, the cost of manually generating the comments is effectively reduced, and the authenticity of the comments is improved.

In one embodiment, the keyword set obtaining module 301 is further configured to: inputting an image to be reviewed into a keyword generation model; generating keywords corresponding to the image to be evaluated according to the characteristics of the image to be evaluated; ranking the keywords based on the frequency of occurrence of each keyword; and based on the sorting result, starting from the keyword with the highest occurrence frequency, sequentially selecting a first preset number of keywords as a keyword set of the images to be reviewed.

In one embodiment, the comment determination module 303 is further configured to: judging whether each candidate comment meets a preset language rule according to the candidate comments and a pre-trained screening model; if yes, determining the alternative comment as a comment to be output; sorting the comments to be output based on the occurrence frequency of each comment to be output; and based on the sorting result, starting from the comment to be output with the highest appearance frequency, sequentially selecting a second preset number of comments to be output as comments of the image to be comment.

In one embodiment, the generating device of the image comments further includes a model training module, configured to obtain a training image and a comment corresponding to the training image; extracting keywords of the corresponding comments of the training images according to the corresponding comments of the training images and a preset extraction model; and inputting the extracted keywords and training images into a first neural network model for training to obtain a keyword generation model.

In one embodiment, the model training module is further configured to input the extracted keyword and the comment corresponding to the training image into the second neural network model for training, so as to obtain a language generation model.

The device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brevity, reference may be made to the corresponding content in the foregoing method embodiment where the device embodiment is not mentioned.

The embodiment of the invention also provides electronic equipment, which comprises a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any of the embodiments described above. In one embodiment, the electronic device may include a smart phone, tablet, car computer, smart wearable device, or the like.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device 100 includes: a processor 40, a memory 41, a bus 42 and a communication interface 43, the processor 40, the communication interface 43 and the memory 41 being connected by the bus 42; the processor 40 is arranged to execute executable modules, such as computer programs, stored in the memory 41.

The memory 41 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and the at least one other network element is achieved via at least one communication interface 43 (which may be wired or wireless), which may use the internet, a wide area network, a local network, a metropolitan area network, etc.

Bus 42 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 4, but not only one bus or type of bus.

The memory 41 is configured to store a program, and the processor 40 executes the program after receiving an execution instruction, and the method executed by the apparatus for flow defining disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 40 or implemented by the processor 40.

The processor 40 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in processor 40. The processor 40 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 41 and the processor 40 reads the information in the memory 41 and in combination with its hardware performs the steps of the method described above.

The computer program product of the readable storage medium provided by the embodiment of the present invention includes a computer readable storage medium storing a program code, where the program code includes instructions for executing the method described in the foregoing method embodiment, and the specific implementation may refer to the foregoing method embodiment and will not be described herein.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The method for generating the image comments is characterized by comprising the following steps:

determining a keyword set according to the characteristics in the image to be reviewed;

generating a plurality of candidate comments according to the keyword set and a pre-trained language generation model;

selecting comments conforming to a preset language rule from the plurality of candidate comments, and determining the comments conforming to the preset language rule as the comments of the image to be commented;

the method for generating the image comments further comprises the following steps: acquiring a training image and a comment corresponding to the training image; extracting keywords of the comments corresponding to the training images according to the comments corresponding to the training images and a preset extraction model; inputting the extracted keywords and the training images into a first neural network model for training to obtain a keyword generation model;

the method for generating the image comments further comprises the following steps: inputting the extracted keywords and the comments corresponding to the training images into a second neural network model for training to obtain the language generation model.

2. The method for generating an image comment according to claim 1, wherein the step of determining a keyword set from a feature in an image to be comment includes:

inputting an image to be reviewed into a keyword generation model trained in advance;

generating keywords corresponding to the image to be commented according to the characteristics in the image to be commented;

ranking the keywords based on the frequency of occurrence of each keyword;

and based on the sorting result, starting from the keyword with the highest occurrence frequency, sequentially selecting a first preset number of keywords as a keyword set of the images to be reviewed.

3. The method for generating an image comment according to claim 1, wherein the step of screening comments conforming to a preset language rule from the plurality of candidate comments, and determining the comment conforming to the preset language rule as the comment of the image to be reviewed includes:

judging whether each candidate comment meets a preset language rule according to the candidate comments and a pre-trained screening model;

if yes, determining the alternative comment as a comment to be output;

sorting the comments to be output based on the occurrence frequency of each comment to be output;

and based on the sorting result, starting from the comment to be output with the highest appearance frequency, sequentially selecting a second preset number of comments to be output as comments of the image to be comment.

4. The method for generating an image comment according to claim 1, wherein the keyword generation model includes a neural network model based on image description and/or a neural network model classified according to multi-labels.

5. The method for generating image comments according to claim 1, wherein the language generation model includes a cyclic neural network model based on a long-short-time memory network and/or a language generation model based on a Seq2Seq neural network.

6. An image comment generation apparatus, comprising:

the keyword set acquisition module is used for determining a keyword set according to the characteristics in the image to be reviewed;

the comment generation module is used for generating a plurality of candidate comments according to the keyword set and a pre-trained language generation model;

the comment determining module is used for screening comments conforming to a preset language rule from the plurality of candidate comments, and determining the comments conforming to the preset language rule as the comments of the image to be commented;

the image comment generating device further comprises a model training module, wherein the model training module is used for: acquiring a training image and a comment corresponding to the training image; extracting keywords of the comments corresponding to the training images according to the comments corresponding to the training images and a preset extraction model; inputting the extracted keywords and the training images into a first neural network model for training to obtain a keyword generation model;

the model training module is further configured to: inputting the extracted keywords and the comments corresponding to the training images into a second neural network model for training to obtain the language generation model.

7. An electronic device comprising a processor and a memory;

stored on the memory is a computer program which, when executed by the processor, performs the method of any one of claims 1 to 5.

8. A computer storage medium storing a computer program for use in the method of any one of claims 1 to 5.