CN115830610A - Multi-mode advertisement recognition method and system, electronic equipment and storage medium - Google Patents

Multi-mode advertisement recognition method and system, electronic equipment and storage medium Download PDF

Info

Publication number
CN115830610A
CN115830610A CN202211387710.5A CN202211387710A CN115830610A CN 115830610 A CN115830610 A CN 115830610A CN 202211387710 A CN202211387710 A CN 202211387710A CN 115830610 A CN115830610 A CN 115830610A
Authority
CN
China
Prior art keywords
text
advertisement
information
image
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211387710.5A
Other languages
Chinese (zh)
Inventor
李琳
崔健
吴小华
袁景凌
李国强
刘磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202211387710.5A priority Critical patent/CN115830610A/en
Publication of CN115830610A publication Critical patent/CN115830610A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a multi-mode advertisement recognition method, a multi-mode advertisement recognition system, electronic equipment and a storage medium, wherein the multi-mode advertisement recognition system comprises the following steps: the system comprises an advertisement image-text recognition module, an information augmentation module, a feature coding module, a model training module and an advertisement technology detection module; the advertising technology detection function based on the image-text multi-mode matching task is realized based on natural language processing, computer vision, an OCR technology and a multi-mode pre-training task, so that text characteristics and visual characteristics in the advertising poster are combined from a multi-mode angle, the accuracy of the advertising technology detection is greatly improved, further, reverse text characteristics are generated based on label information, the number of samples of the model is increased, an advertising technology detection module based on the distance between the image characteristics and the text characteristics is introduced to predict the advertising technology used in the advertising data, and the accuracy of the advertising technology identification under a small sample data set is greatly improved.

Description

Multi-mode advertisement recognition method and system, electronic device and storage medium
Technical Field
The invention relates to the technical field of advertisement propaganda recognition, in particular to a multi-mode advertisement recognition method, a multi-mode advertisement recognition system, electronic equipment and a storage medium.
Background
The advertisement, as the name implies, is an advertisement that informs the general public of the society of something. The meaning of advertisement is broadly and narrowly defined. Non-economic advertising refers to advertising that is not targeted for profit, such as government announcements, revenues, statements, etc. in education, culture, municipality, social group, etc. Economic advertising refers to advertising for profit purposes, typically commercial advertising, which is a means of disseminating information of goods or services to consumers or users through advertising media for a fee in order to promote the goods or services. Commercial advertisements are such economic advertisements.
In the era of information diversification, advertisement information is affluent to the lives of people. In the advertisement poster, a merchant can improve the effect of promotion by using promotion techniques such as exaggeration, contrast, repetition and the like in both visual and text modes. Therefore, analyzing the advertisement from the perspective of the advertising technology is a key factor for mastering the advertising trend of the market advertisement and improving the advertising effect of enterprises. Currently, the analysis of advertisement information still targets at the goods and advertisement content in the context of small sample data sets. Thus, the traditional method can not learn the characteristics of the propaganda skills from the small sample data in a centralized way, and can not analyze the advertisement propagation effect from the perspective of the multi-mode propaganda technology. Therefore, how to effectively analyze the propaganda technology of the advertisement poster is an urgent problem to be solved.
Disclosure of Invention
The invention provides a multi-mode advertisement identification method, a multi-mode advertisement identification system, electronic equipment and a storage medium, aiming at the technical problems in the prior art, and aims to solve the problem of how to effectively analyze the propaganda technology of advertisement posters.
According to a first aspect of the present invention, there is provided a multimodal advertisement recognition system comprising: the system comprises an advertisement image-text recognition module, an information augmentation module, a feature coding module, a model training module and an advertisement technology detection module;
the advertisement image-text identification module is used for converting characters in the target advertisement into text information based on a character identification ORC technology and acquiring a manual labeling label of the target advertisement;
the information augmentation module is used for augmenting the text information into a forward text and a reverse text based on the artificial labeling label to obtain augmented text information;
the feature coding module is used for coding the image information of the target advertisement and the augmented text information to construct a multi-mode data set to be trained;
the model training module is used for training a preset model to be trained by using the multi-mode data set to be trained based on a preset training task to obtain an advertisement technology detection model;
and the advertisement technology detection module is used for detecting the propaganda effect of the advertisement to be detected based on the advertisement technology detection model.
On the basis of the technical scheme, the invention can be improved as follows.
Preferably, the augmenting the text information into a forward text and a reverse text based on the artificial labeling tag includes:
acquiring an artificial labeling label and text information of the target advertisement;
and performing semantic modification on the text information based on the artificial labeling labels to obtain the text information with the semantic opposite to that of the artificial labeling labels.
Preferably, the encoding the image information of the target advertisement and the augmented text information includes:
coding the image information and the augmented text information based on a preset language model and a preset visual coding model to obtain a forward text feature pair and a reverse text feature pair of the target advertisement; the preset language model is BERT, XLNET or Roberta; the preset visual coding model is ResNet, VGG or Faster RCNN.
Preferably, the model training module includes: an attribute prediction unit and a relationship prediction unit;
the attribute prediction unit is used for replacing the attribute information in the forward text feature pair by using a preset replacement label and predicting the replaced attribute information;
and the relation prediction unit is used for shielding the relation information in the forward text feature pair by using the preset replacement label and predicting the relation in the forward text feature pair.
Preferably, the model training module includes: a mask region unit;
and the shielding region unit is used for shielding region information in the image information and predicting the shielded image information.
Preferably, the model training module further comprises: an image-text matching unit;
and the image-text matching unit is used for predicting the image-text relationship between the forward text feature pair and the backward text feature.
Preferably, the tasks based on preset training include an attribute prediction task, a relationship prediction task, a mask region task and a graph matching task.
According to a second aspect of the present invention, there is provided a multi-modal advertisement recognition method, comprising:
converting characters in the target advertisement into text information based on a character recognition ORC technology, and acquiring an artificial labeling label of the target advertisement;
augmenting the text information into forward text and reverse text based on the artificial labeling label to obtain augmented text information;
encoding the image information of the target advertisement and the augmented text information to construct a multi-mode data set to be trained;
training a preset model to be trained by using the multi-mode data set to be trained based on a preset training task to obtain an advertising technology detection model;
and carrying out propaganda effect detection on the advertisement to be detected based on the advertisement technology detection model.
According to a third aspect of the present invention, there is provided an electronic device comprising a memory, and a processor, wherein the processor is configured to implement the steps of any one of the multimodal advertisement recognition methods of the second aspect when executing a computer management class program stored in the memory.
According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer management-like program, which when executed by a processor, performs the steps of any of the multi-modal advertisement recognition methods of the second aspect.
The invention provides a multi-mode advertisement recognition method, a multi-mode advertisement recognition system, electronic equipment and a storage medium, wherein the multi-mode advertisement recognition system comprises the following steps: the system comprises an advertisement image-text recognition module, an information augmentation module, a feature coding module, a model training module and an advertisement technology detection module; the advertisement image-text identification module is used for converting characters in the target advertisement into text information based on a character identification ORC technology and acquiring an artificial labeling label of the target advertisement; the information augmentation module is used for augmenting the text information into a forward text and a reverse text based on the artificial labeling label to obtain augmented text information; the feature coding module is used for coding the image information of the target advertisement and the augmented text information to construct a multi-mode data set to be trained; the model training module is used for training a preset model to be trained by using the multi-mode data set to be trained based on a preset training task to obtain an advertisement technology detection model; and the advertisement technology detection module is used for detecting the propaganda effect of the advertisement to be detected based on the advertisement technology detection model. The advertising technology detection function based on the image-text multi-mode matching task is realized based on natural language processing, computer vision, an OCR technology and a multi-mode pre-training task, so that text characteristics and visual characteristics in the advertising poster are combined from a multi-mode angle, the accuracy of the advertising technology detection is greatly improved, further, reverse text characteristics are generated based on label information, the number of samples of the model is increased, an advertising technology detection module based on the distance between the image characteristics and the text characteristics is introduced to predict the advertising technology used in the advertising data, and the accuracy of the advertising technology identification under a small sample data set is greatly improved.
Drawings
FIG. 1 is a schematic structural diagram of a multi-modal advertisement recognition system according to the present invention;
FIG. 2 is a schematic diagram of a process for constructing advertisement training data according to the present invention;
FIG. 3 is a diagram illustrating encoding of image-text data provided by the present invention;
FIG. 4 is a schematic diagram of the flow of pre-training task data provided by the present invention;
FIG. 5 is a schematic diagram of an advertisement detection process provided by the present invention;
FIG. 6 is a schematic overall flow chart of the multi-modal advertisement recognition method provided by the present invention;
FIG. 7 is a flow chart of a multi-modal advertisement recognition method provided by the present invention;
fig. 8 is a schematic diagram of a hardware structure of a possible electronic device provided in the present invention;
fig. 9 is a schematic diagram of a hardware structure of a possible computer-readable storage medium according to the present invention.
Detailed Description
The following detailed description of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Fig. 1 is a schematic structural diagram of a multi-modal advertisement recognition system provided by the present invention, as shown in fig. 1, the system includes: the system comprises an advertisement image-text recognition module, an information augmentation module, a feature coding module, a model training module and an advertisement technology detection module;
the advertisement image-text identification module is used for converting characters in a target advertisement into text information based on a character identification ORC technology and acquiring a manual labeling label of the target advertisement; the information augmentation module is used for augmenting the text information into a forward text and a reverse text based on the artificial labeling label to obtain augmented text information; the feature coding module is used for coding the image information of the target advertisement and the augmented text information to construct a multi-mode data set to be trained; the model training module is used for training a preset model to be trained by using the multi-mode data set to be trained based on a preset training task to obtain an advertisement technology detection model; and the advertisement technology detection module is used for detecting the propaganda effect of the advertisement to be detected based on the advertisement technology detection model.
It should be noted that the preset model to be trained may be an image-text prediction model composed of an image encoder and a text encoder, and the text encoder may be BERT, XLNET or Roberta; the above-described picture encoder may be ResNet, VGG, or fast RCNN.
It will be appreciated that the word recognition ORC technique described above is primarily used to recognize words in advertising posters and then convert them into machine recognizable textual information, and that the manual Label tag may be Label information that a Label person has for each advertising poster tag, including but not limited to exaggeration, contrast, and/or repetition, etc. After passing through the advertisement Image-Text recognition module, the system constructs the data structure of the advertisement poster into < Image, text, label >.
As an embodiment, the augmenting the text information into forward text and backward text based on the artificial labeling tag includes: acquiring an artificial labeling label and text information of the target advertisement; and performing semantic modification on the text information based on the artificial labeling labels to obtain the text information with the semantic opposite to that of the artificial labeling labels.
It can be understood that the information augmentation module is configured to augment the text of the advertisement according to the Label information of the poster to obtain two text contents, one of the text contents is consistent with the Label information, and the other is opposite to the Label information. For example: and adding two different prompt prefix texts in the texts by utilizing the prompt project. Wherein Text data prefixed by "My technology is < Label >" is denoted as Text +. Text data with My technology is Not < Label > "as a prefix is marked as Text-, namely semantic modification is carried out based on manual labeling labels and Text information to obtain forward Text information and reverse Text information. Both texts will each constitute an image text pair with the original poster visual information. Wherein the Image Text pair < Image, text + > consisting of Text + will be used as a positive sample pair in the subsequent model training, and the other Image Text pair < Image, text- > will be used as an additional negative sample Image Text pair to participate in the subsequent model training. The two texts will each form an image text pair with the original poster visual information. And the Image Text pair < Image, text- > formed by Text + is used as a positive sample pair in the subsequent model training, and the other Image Text pair < Image, text- > is used as an additional negative sample Image Text pair to participate in the subsequent model training. The flow is shown in FIG. 2.
As an embodiment, the encoding the image information and the augmented text information of the target advertisement includes: coding the image information and the augmented text information based on a preset language model and a preset visual coding model to obtain a forward text feature pair and a reverse text feature pair of the target advertisement; the preset language model is BERT, XLNET or Roberta; the preset visual coding model is ResNet, VGG or Faster RCNN.
Referring to fig. 3, fig. 3 is a schematic diagram of encoding the image-text data provided by the present invention; feature extraction is performed through an Image encoder and a Text encoder respectively, feature extraction is performed on the Image Text pair < Image, text + > and the Image Text pair < Image, text- > to obtain two types of Image Text feature pairs respectively, and the Image encoder and the Text encoder are mature in industry and academia. For texts, pre-training language models such as BERT, XLNET, roberta and the like exist. For images, there are visual coding models such as ResNet, VGG, faster RCNN, etc. Both types of encoders are often derived after pre-training based on large-scale data. The user can directly call the model to encode the image data and the text data, and finally two types of image text characteristic pairs < I, T + > and < I, T- > are obtained respectively.
As an embodiment, the model training module comprises: the system comprises an attribute prediction unit, a relation prediction unit, a shielding area unit and a graph-text matching unit;
the attribute prediction unit is used for replacing the attribute information in the forward text feature pair by using a preset replacement tag and predicting the replaced attribute information; and the relation prediction unit is used for shielding the relation information in the forward text feature pair by using the preset replacement label and predicting the relation in the forward text feature pair. And the shielding region unit is used for shielding region information in the image information and predicting the shielded image information. And the image-text matching unit is used for predicting the image-text relationship between the forward text feature pair and the backward text feature.
It is to be understood that the model training module is configured to take the two image text features as input, and train the model by using four cross-modal pre-training tasks, where the pre-training tasks include: the system comprises an attribute prediction task, a relation prediction task, a shielding region task and a graph-text matching task.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating a data flow of a pre-training task provided by the present invention; in FIG. 4, in connection with the attribute prediction task, the model identifies information in the text message that represents attributes of the object, and then MASKs and replaces it with a MASK (which may include an alphanumeric or character string). The information of the attributes of the occluded object will then be predicted from the information in the image. For the relation prediction task, the relation words between the objects are shielded, and the [ MASK ] is still used for replacement. And the relationship between the two objects is predicted, again based on the content in the image. For the task of masking the region, region information in the image information is masked, and the region information is often the most obvious part of the image. The model will then predict the image areas that are occluded from the description in the text. The three types are most popular cross-modal pre-training tasks, and the model can be well trained to improve the learning capability of the model on multi-modal data. But due to these three types of pre-training tasks, in practice two cases are corresponded. First, the text information is masked, and then the correct masking information is predicted. And secondly, the image information is shielded, and then the text is used as a prompt for prediction.
Therefore, the T-obtained in the feature coding module is used as a text contradictory to the original propaganda technology, and if the T-is involved in the three pre-training tasks, the difficulty of model training is undoubtedly increased. Therefore, the three cross-modality pre-training tasks are to use only the positive image text feature pair < I, T + >. In addition, the invention also combines the self-supervision contrast learning task with the image text matching task. Specifically, i.e., within a batch, positive and negative sample pairs are automatically constructed and optimized using InfonCE Loss. The purpose of this task is to zoom in the distance between the positive image text pairs and then to make the distance between the negative image text pairs farther. Wherein a positive sample pair refers to an image feature and a text feature from the same sample in < I, T +, and image features and text features from different samples will be taken as negative samples. While for < I, T- > all will participate in this task as negative sample pairs. By the design, various defects caused by insufficient information under small sample data can be effectively avoided.
In this embodiment, the method further includes a step of detecting using the trained model, see fig. 5, where fig. 5 is a schematic diagram of an advertisement detection process provided by the present invention; in this stage, the advertising poster will be pre-processed and then it is predicted whether some promotional technique is used in the current advertising poster data based on the distance between the image and the text. As can be seen from fig. 5, the above-mentioned advertisement technology detection module processes data in accordance with the method in the training phase. The method comprises the steps of recognizing text information through an OCR technology, processing text data through a prompt project, and coding respectively by using an image coder and a text coder in a training stage to obtain two text characteristics and corresponding image characteristics. Then, the distance between the image feature and the two text features is calculated, and whether a certain propaganda technology is used in the current sample is judged according to the distance. It is worth mentioning that the classification task is redefined as a distance-based image-text matching task in the prediction stage, and the image-text matching task is completely consistent with the image-text matching task based on comparison learning in the pre-training task in the thought, so that the design can further improve the detection effect on the small sample category.
It is appreciated that in light of the deficiencies in the background art, embodiments of the present invention provide a multimodal advertisement recognition system. The system comprises: the system comprises an advertisement image-text recognition module, an information augmentation module, a feature coding module, a model training module and an advertisement technology detection module; the advertisement image-text identification module is used for converting characters in the target advertisement into text information based on a character identification ORC technology and acquiring a manual labeling label of the target advertisement; the information augmentation module is used for augmenting the text information into a forward text and a reverse text based on the artificial labeling label to obtain augmented text information; the feature coding module is used for coding the image information of the target advertisement and the augmented text information to construct a multi-mode data set to be trained; the model training module is used for training a preset model to be trained by using the multi-mode data set to be trained based on a preset training task to obtain an advertisement technology detection model; and the advertisement technology detection module is used for detecting the propaganda effect of the advertisement to be detected based on the advertisement technology detection model. The advertising technology detection function based on the image-text multi-mode matching task is realized based on natural language processing, computer vision, an OCR technology and a multi-mode pre-training task, so that text characteristics and visual characteristics in the advertising poster are combined from a multi-mode angle, the accuracy of the advertising technology detection is greatly improved, further, reverse text characteristics are generated based on label information, the number of samples of the model is increased, an advertising technology detection module based on the distance between the image characteristics and the text characteristics is introduced to predict the advertising technology used in the advertising data, and the accuracy of the advertising technology identification under a small sample data set is greatly improved.
In one possible application scenario, referring to fig. 6, fig. 6 is a schematic overall flowchart of a multi-modal advertisement recognition method provided by the present invention; as shown in fig. 6, the present embodiment is mainly divided into two stages: a training phase and a prediction phase.
In the training phase, the Text recognition technology is used to recognize characters in the Image, and the propaganda technology of the poster is manually marked out, so that the data structure < Image, text, label > is obtained. The text in the advertisement will pass through the prompt project, adding the two prompt templates. Text data prefixed with "My technology is < Label >" is denoted as Text +. Text data prefixed with "My technology is Not < Label >" is denoted as Text-. Text + will be the positive image Text pair that matches the visual information, and Text-will be the negative image Text pair that does not match the visual information. The two types of image text pairs are then input into an image encoder and a text encoder for processing, respectively, to generate two types of image text features < I, T + > and < I, T- >. In the pre-training stage of the model, the two image text features participate in a plurality of cross-modal pre-training tasks. Specifically, the attribute prediction task, the relationship prediction task, and the occlusion region task are only for positive image text features, and the teletext matching task is for both image text features. And for the image text matching task, the invention combines the task with self-supervision contrast learning, thereby increasing the number of negative image text pairs in the training process and enhancing the propaganda technology learning ability of small sample categories.
In the prediction phase, the processing work of the advertisement data in the embodiment is completely consistent with the training phase. Specifically, text information in the advertisement is extracted, two prompting templates are added through a prompting project, and the two prompting templates are respectively input into an image encoder and a text encoder. Different from the training stage, in the later link, the image features and the text features under the two prompting templates are respectively subjected to distance calculation, and the distance is used as a criterion for judgment so as to judge whether a certain propaganda technology is used in the advertisement.
Referring to fig. 7, fig. 7 is a flowchart of a multi-modal advertisement recognition method according to an embodiment of the present invention, and as shown in fig. 7, the multi-modal advertisement recognition method includes:
step S100: converting characters in the target advertisement into text information based on a character recognition ORC technology, and acquiring an artificial labeling label of the target advertisement;
it should be noted that the main executing body of the method of this embodiment may be a computer terminal device having functions of data processing, network communication, and program execution, for example: computers, tablet computers, etc.; the cloud server may also be a server device having the same similar function, or may also be a cloud server having the similar function, which is not limited in this embodiment. For ease of understanding, this embodiment and the following embodiments will be described by taking a server device as an example.
Step S200: augmenting the text information into forward text and reverse text based on the artificial labeling label to obtain augmented text information;
step S300: encoding the image information of the target advertisement and the augmented text information to construct a multi-mode data set to be trained;
step S400: training a preset model to be trained by using the multi-mode data set to be trained based on a preset training task to obtain an advertisement technology detection model;
step S500: and carrying out propaganda effect detection on the advertisement to be detected based on the advertisement technology detection model.
It can be understood that the multi-modal advertisement recognition method provided by the present invention corresponds to the multi-modal advertisement recognition system provided by the foregoing embodiments, and the relevant technical features of the multi-modal advertisement recognition method can refer to the relevant technical features of the multi-modal advertisement recognition system, and are not described herein again.
Referring to fig. 8, fig. 8 is a schematic view illustrating an embodiment of an electronic device according to an embodiment of the invention. As shown in fig. 8, an embodiment of the present invention provides an electronic device, which includes a memory 1310, a processor 1320, and a computer program 1311 stored in the memory 1310 and executable on the processor 1320, where the processor 1320 executes the computer program 1311 to implement the following steps:
converting characters in the target advertisement into text information based on a character recognition ORC technology, and acquiring an artificial labeling label of the target advertisement; augmenting the text information into a forward text and a reverse text based on the artificial labeling label to obtain augmented text information; encoding the image information of the target advertisement and the augmented text information to construct a multi-modal data set to be trained; training a preset model to be trained by using the multi-mode data set to be trained based on a preset training task to obtain an advertisement technology detection model; and carrying out propaganda effect detection on the advertisement to be detected based on the advertisement technology detection model.
Referring to fig. 9, fig. 9 is a schematic diagram of an embodiment of a computer-readable storage medium according to the present invention. As shown in fig. 9, the present embodiment provides a computer-readable storage medium 1400, on which a computer program 1411 is stored, which computer program 1411, when executed by a processor, implements the steps of:
converting characters in the target advertisement into text information based on a character recognition ORC technology, and acquiring an artificial labeling label of the target advertisement; augmenting the text information into a forward text and a reverse text based on the artificial labeling label to obtain augmented text information; encoding the image information of the target advertisement and the augmented text information to construct a multi-modal data set to be trained; training a preset model to be trained by using the multi-mode data set to be trained based on a preset training task to obtain an advertisement technology detection model; and carrying out propaganda effect detection on the advertisement to be detected based on the advertisement technology detection model.
The embodiment of the invention provides a multi-mode advertisement recognition method, a multi-mode advertisement recognition system, electronic equipment and a storage medium, wherein the multi-mode advertisement recognition system comprises: the system comprises an advertisement image-text recognition module, an information augmentation module, a feature coding module, a model training module and an advertisement technology detection module; the advertisement image-text identification module is used for converting characters in the target advertisement into text information based on a character identification ORC technology and acquiring an artificial labeling label of the target advertisement; the information augmentation module is used for augmenting the text information into a forward text and a reverse text based on the artificial labeling label to obtain augmented text information; the feature coding module is used for coding the image information of the target advertisement and the augmented text information to construct a multi-mode data set to be trained; the model training module is used for training a preset model to be trained by using the multi-mode data set to be trained based on a preset training task to obtain an advertisement technology detection model; and the advertisement technology detection module is used for detecting the propaganda effect of the advertisement to be detected based on the advertisement technology detection model. The advertising technology detection function based on the image-text multi-mode matching task is realized based on natural language processing, computer vision, an OCR technology and a multi-mode pre-training task, so that text characteristics and visual characteristics in the advertising poster are combined from a multi-mode angle, the accuracy of the advertising technology detection is greatly improved, further, reverse text characteristics are generated based on label information, the number of samples of the model is increased, an advertising technology detection module based on the distance between the image characteristics and the text characteristics is introduced to predict the advertising technology used in the advertising data, and the accuracy of the advertising technology identification under a small sample data set is greatly improved.
It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A multimodal advertisement recognition system, the system comprising: the system comprises an advertisement image-text recognition module, an information augmentation module, a feature coding module, a model training module and an advertisement technology detection module;
the advertisement image-text identification module is used for converting characters in the target advertisement into text information based on a character identification ORC technology and acquiring an artificial labeling label of the target advertisement;
the information augmentation module is used for augmenting the text information into a forward text and a reverse text based on the artificial labeling label to obtain augmented text information;
the feature coding module is used for coding the image information of the target advertisement and the augmented text information to construct a multi-mode data set to be trained;
the model training module is used for training a preset model to be trained by using the multi-mode data set to be trained based on a preset training task to obtain an advertisement technology detection model;
and the advertisement technology detection module is used for detecting the propaganda effect of the advertisement to be detected based on the advertisement technology detection model.
2. The multi-modal advertisement recognition system of claim 1, wherein the augmenting the textual information into forward text and reverse text based on the artificial tagging comprises:
acquiring an artificial labeling label and text information of the target advertisement;
and performing semantic modification on the text information based on the artificial labeling labels to obtain the text information with the semantic opposite to that of the artificial labeling labels.
3. The system of claim 1, wherein the encoding of the image information and the augmented text information of the targeted advertisement comprises:
coding the image information and the augmented text information based on a preset language model and a preset visual coding model to obtain a forward text feature pair and a reverse text feature pair of the target advertisement; the preset language model is BERT, XLNET or Roberta; the preset visual coding model is ResNet, VGG or Faster RCNN.
4. The multi-modal advertisement recognition system of claim 3, wherein the model training module comprises: an attribute prediction unit and a relationship prediction unit;
the attribute prediction unit is used for replacing the attribute information in the forward text feature pair by using a preset replacement tag and predicting the replaced attribute information;
and the relation prediction unit is used for shielding the relation information in the forward text feature pair by using the preset replacement label and predicting the relation in the forward text feature pair.
5. The multi-modal advertisement recognition system of claim 4, wherein the model training module comprises: a mask region unit;
and the shielding region unit is used for shielding region information in the image information and predicting the shielded image information.
6. The multi-modal advertisement recognition system of claim 4, wherein the model training module further comprises: an image-text matching unit;
and the image-text matching unit is used for predicting the image-text relationship between the forward text feature pair and the backward text feature.
7. The system of claim 1, wherein the pre-set training-based tasks include an attribute prediction task, a relationship prediction task, a mask area task, and a graph-text matching task.
8. A method for multi-modal advertisement recognition, comprising:
converting characters in the target advertisement into text information based on a character recognition ORC technology, and acquiring an artificial labeling label of the target advertisement;
augmenting the text information into a forward text and a reverse text based on the artificial labeling label to obtain augmented text information;
coding the image information of the target advertisement and the augmented text information to construct a multi-mode data set to be trained;
training a preset model to be trained by using the multi-mode data set to be trained based on a preset training task to obtain an advertising technology detection model;
and carrying out propaganda effect detection on the advertisement to be detected based on the advertisement technology detection model.
9. An electronic device comprising a memory, a processor for implementing the steps of the multi-modal advertisement recognition method of claim 8 when executing a computer management class program stored in the memory.
10. A computer-readable storage medium, having stored thereon a computer management class program, which when executed by a processor, performs the steps of the multi-modal advertisement recognition method of claim 8.
CN202211387710.5A 2022-11-07 2022-11-07 Multi-mode advertisement recognition method and system, electronic equipment and storage medium Pending CN115830610A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211387710.5A CN115830610A (en) 2022-11-07 2022-11-07 Multi-mode advertisement recognition method and system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211387710.5A CN115830610A (en) 2022-11-07 2022-11-07 Multi-mode advertisement recognition method and system, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115830610A true CN115830610A (en) 2023-03-21

Family

ID=85526996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211387710.5A Pending CN115830610A (en) 2022-11-07 2022-11-07 Multi-mode advertisement recognition method and system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115830610A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824278A (en) * 2023-08-29 2023-09-29 腾讯科技(深圳)有限公司 Image content analysis method, device, equipment and medium
CN117575702A (en) * 2023-11-16 2024-02-20 北京鸿途信达科技股份有限公司 Multi-mode advertisement putting system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824278A (en) * 2023-08-29 2023-09-29 腾讯科技(深圳)有限公司 Image content analysis method, device, equipment and medium
CN116824278B (en) * 2023-08-29 2023-12-19 腾讯科技(深圳)有限公司 Image content analysis method, device, equipment and medium
CN117575702A (en) * 2023-11-16 2024-02-20 北京鸿途信达科技股份有限公司 Multi-mode advertisement putting system

Similar Documents

Publication Publication Date Title
CN110363252B (en) End-to-end trend scene character detection and identification method and system
CN115830610A (en) Multi-mode advertisement recognition method and system, electronic equipment and storage medium
CN110168535B (en) Information processing method and terminal, computer storage medium
CN113435529B (en) Model pre-training method, model training method and image processing method
CN108984530A (en) A kind of detection method and detection system of network sensitive content
CN111681681A (en) Voice emotion recognition method and device, electronic equipment and storage medium
CN110149265B (en) Message display method and device and computer equipment
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN112686243A (en) Method and device for intelligently identifying picture characters, computer equipment and storage medium
CN113051380A (en) Information generation method and device, electronic equipment and storage medium
CN115953788A (en) Green financial attribute intelligent identification method and system based on OCR (optical character recognition) and NLP (non-line-segment) technologies
CN115002491A (en) Network live broadcast method, device, equipment and storage medium based on intelligent machine
CN113205814A (en) Voice data labeling method and device, electronic equipment and storage medium
CN111311364A (en) Commodity recommendation method and system based on multi-mode commodity comment analysis
CN113806574A (en) Software and hardware integrated artificial intelligent image recognition data processing method
CN113254814A (en) Network course video labeling method and device, electronic equipment and medium
CN110866172B (en) Data analysis method for block chain system
CN117351336A (en) Image auditing method and related equipment
CN112084788A (en) Automatic marking method and system for implicit emotional tendency of image captions
CN111881900A (en) Corpus generation, translation model training and translation method, apparatus, device and medium
CN110851597A (en) Method and device for sentence annotation based on similar entity replacement
CN116303951A (en) Dialogue processing method, device, electronic equipment and storage medium
CN112506405B (en) Artificial intelligent voice large screen command method based on Internet supervision field
CN116306506A (en) Intelligent mail template method based on content identification
CN115439850A (en) Image-text character recognition method, device, equipment and storage medium based on examination sheet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination