CN114638307A

CN114638307A - Information detection method, information detection device, electronic equipment and storage medium

Info

Publication number: CN114638307A
Application number: CN202210279038.1A
Authority: CN
Inventors: 何永明; 李涛; 梅丰
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-03-21
Filing date: 2022-03-21
Publication date: 2022-06-17

Abstract

The disclosure relates to an information detection method, an information detection device, electronic equipment and a storage medium, and relates to the technical field of data processing. The embodiment of the disclosure at least solves the problem that in the related art, the result of detecting whether different information representation modes of the same object are consistent is inaccurate due to inaccurate classification models. The method comprises the following steps: acquiring first characteristic data and second characteristic data of a target object; searching a target preset object corresponding to the target object in the preset object set according to the first characteristic data; obtaining second modal similarity according to fourth characteristic data and second characteristic data of the target preset object; and if the second modality similarity is larger than a second threshold value, determining that the first modality data corresponding to the target object is matched with the second modality data corresponding to the target object. The single modal data of the target object are respectively compared to judge whether the different modal data of the same target object are matched, so that the accuracy of detecting whether the representation modes are consistent can be improved.

Description

Information detection method, information detection device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to an information detection method and apparatus, an electronic device, and a storage medium.

Background

In an information release scene, at least two following information representation modes can be adopted for representing the same object: text, video, images, and voice. In the fields of searching, recommending, advertising and the like, whether the corresponding same information meets the user requirements or not is judged according to the information representation mode with less covered information, so that the consistency of at least two information representation modes corresponding to the same information is represented very importantly.

In the related technology, the same object is assumed to adopt two information representation modes of a target text and a target video, in order to detect whether the target text is consistent with the target video, a data set is firstly obtained, the data set comprises a positive sample with the consistent text and the consistent video and a negative sample with the inconsistent text and the inconsistent video, then the data set is input into an encoder to obtain characteristic data, then a classification model is obtained through training according to the characteristic data input and a binary cross entropy, and finally whether the target text is consistent with the target video is detected according to the classification model.

However, the classification model strongly depends on a data set, and the data set required by the training mode of the classification model is a high-quality confidence sample obtained by manual acquisition, so that the difficulty in acquiring the high-quality confidence sample is high, the classification model is inaccurate, and further, the result of detecting whether different information representation modes of the same object are consistent is inaccurate.

Disclosure of Invention

The disclosure provides an information detection method, an information detection device, an electronic device and a storage medium, which are used for at least solving the problem that in the related technology, the difficulty of collecting a high-quality confidence sample is high, so that a classification model is inaccurate, and further, the result that whether different information representation modes of the same object are consistent is inaccurate. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a method for detecting information consistency, including: acquiring first characteristic data of a target object according to a first single-mode double-tower model, wherein the first characteristic data is characteristic data of first-mode data corresponding to the target object, and the target object is a multimedia resource object; acquiring second characteristic data of the target object according to the second single-mode double-tower model, wherein the second characteristic data is characteristic data of second mode data corresponding to the target object; searching a target preset object corresponding to the target object in the preset object set according to the first characteristic data, wherein the first modal similarity between the third characteristic data of the target preset object and the first characteristic data is greater than or equal to a first threshold value, and the third characteristic data is the characteristic data of the first modal data corresponding to the target preset object; the preset object set comprises at least one preset object, and the characteristic data of the first modal data corresponding to the preset object is matched with the characteristic data of the second modal data corresponding to the preset object; according to fourth feature data and second feature data of the target preset object, obtaining second modal similarity between the target preset object and the target object, wherein the fourth feature data are feature data of the second modal data corresponding to the target preset object; and under the condition that the similarity of the second modality is larger than a second threshold value, determining that the first modality data corresponding to the target object is matched with the second modality data corresponding to the target object.

Optionally, when the number of the target preset objects is greater than 1, the obtaining a second modal similarity between the target preset object and the target object according to the fourth feature data and the second feature data of the target preset object includes: obtaining a third modal similarity between each target preset object and the target object according to the fourth characteristic data and the second characteristic data of each target preset object; and determining the average value of the third modal similarity between each target preset object and the target object as the second modal similarity between the target preset object and the target object.

Optionally, the information detection method further includes: before first characteristic data of a target object are obtained according to a first single-mode double-tower model, determining the first single-mode double-tower model according to the data type of the first mode data corresponding to the target object, wherein the first single-mode double-tower model is obtained by training according to first sample data, and the data type of the first sample data is the same as that of the first mode data; and determining a second single-mode double-tower model according to the data type of second-mode data corresponding to the target object before acquiring second characteristic data of the target object according to the second single-mode double-tower model, wherein the second single-mode double-tower model is obtained by training according to second sample data, and the data type of the second sample data is the same as that of the second mode data.

Optionally, the information detection method further includes: according to the first characteristic data, before searching a target preset object corresponding to the target object in a preset object set, extracting and storing characteristic data of the first modal data corresponding to each preset object in the preset object set according to a first single-modal double-tower model; and extracting and storing the preset object set and the characteristic data corresponding to the second modal data corresponding to each preset object according to the second single-modal double-tower model.

Optionally, the information detection method further includes: and under the condition that the similarity of the second modality is larger than a second threshold, after the first modality data corresponding to the target object is determined to be matched with the second modality data corresponding to the target object, adding the target object into a preset object set, and recording the first characteristic data and the second characteristic data.

Optionally, in the information detection method, the first modality data is any one of: text data, video data, audio data, and image data; the second modality data is any one of: text data, video data, audio data, and image data.

According to a second aspect of the embodiments of the present disclosure, there is provided an information detecting apparatus including: the device comprises an acquisition unit, a search unit and a determination unit; the acquiring unit is used for acquiring first characteristic data of a target object according to the first single-mode double-tower model, wherein the first characteristic data is characteristic data of first-mode data corresponding to the target object, and the target object is a multimedia resource object; the obtaining unit is further configured to obtain second feature data of the target object according to the second single-mode double-tower model, where the second feature data is feature data of second-mode data corresponding to the target object; the searching unit is used for searching a target preset object corresponding to the target object in the preset object set according to the first characteristic data acquired by the acquiring unit, wherein the first modal similarity between third characteristic data of the target preset object and the first characteristic data is greater than or equal to a first threshold, and the third characteristic data is characteristic data of the first modal data corresponding to the target preset object; the preset object set comprises at least one preset object, and the characteristic data of the first modal data corresponding to the preset object is matched with the characteristic data of the second modal data corresponding to the preset object; the acquisition unit is further used for obtaining second modal similarity between the target preset object and the target object according to fourth characteristic data and second characteristic data of the target preset object searched by the search unit, wherein the fourth characteristic data is characteristic data of second modal data corresponding to the target preset object; and the determining unit is used for determining that the first modal data corresponding to the target object is matched with the second modal data corresponding to the target object under the condition that the second modal similarity acquired by the acquiring unit is greater than a second threshold.

Optionally, when the number of the target preset objects is greater than 1, the obtaining unit is specifically configured to: obtaining a third modal similarity between each target preset object and the target object according to the fourth characteristic data and the second characteristic data of each target preset object; and determining the average value of the third modal similarity between each target preset object and the target object as the second modal similarity between the target preset object and the target object.

Optionally, the information detecting apparatus further includes: the determining unit is further configured to determine, by the obtaining unit, a first single-mode double-tower model according to a data type of first-mode data corresponding to the target object before obtaining the first feature data of the target object according to the first single-mode double-tower model, where the first single-mode double-tower model is obtained by training according to first sample data, and the data type of the first sample data is the same as the data type of the first-mode data; and determining a second single-mode double-tower model according to the data type of second-mode data corresponding to the target object, wherein the second single-mode double-tower model is obtained by training according to second sample data, and the data type of the second sample data is the same as that of the second-mode data.

Optionally, the information detecting apparatus further includes an extracting unit; the processing unit is further used for extracting and storing the characteristic data of the first modal data corresponding to each preset object in the preset object set according to the first single-modal double-tower model before the searching unit searches the target preset object corresponding to the target object in the preset object set according to the first characteristic data; and extracting and storing the preset object set and the characteristic data corresponding to the second modal data corresponding to each preset object according to the second single-modal double-tower model.

Optionally, the information detecting apparatus further includes an adding unit; and the adding unit is used for adding the target object in the preset object set and recording the first characteristic data and the second characteristic data after the determining unit determines that the first modal data corresponding to the target object is matched with the second modal data corresponding to the target object under the condition that the second modal similarity is greater than the second threshold.

Optionally, in the information detecting apparatus, the first modality data is any one of: text data, video data, audio data, and image data; the second modality data is any one of: text data, video data, audio data, and image data.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute the above instructions to implement the information detection method as provided by the first aspect and any one of its possible design forms.

According to a fourth aspect of embodiments of the present disclosure, there is provided a readable storage medium, wherein instructions of the readable storage medium, when executed by a processor, can implement the information detection method as provided in the first aspect and any one of the possible design manners thereof.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the information detection method as provided by the first aspect and any one of its possible designs.

The technical scheme provided by the disclosure at least brings the following beneficial effects: firstly, acquiring first characteristic data of a target object according to a first single-mode double-tower model, wherein the first characteristic data is characteristic data of first-mode data corresponding to the target object, and the target object is a multimedia resource object; acquiring second characteristic data of the target object according to the second single-mode double-tower model, wherein the second characteristic data is characteristic data of second mode data corresponding to the target object; then, according to the first feature data, a target preset object corresponding to the target object is searched in a preset object set, a first modal similarity between third feature data of the target preset object and the first feature data is larger than or equal to a first threshold, and the third feature data is feature data of the first modal data corresponding to the target preset object; obtaining second modal similarity between the target preset object and the target object according to fourth characteristic data and second characteristic data of the target preset object, wherein the fourth characteristic data is characteristic data of second modal data corresponding to the target preset object; and finally, under the condition that the similarity of the second modality is larger than a second threshold value, determining that the first modality data corresponding to the target object is matched with the second modality data corresponding to the target object. Therefore, for different modality data corresponding to the same target object, the first characteristic data and the second characteristic data are respectively extracted, then a target preset object corresponding to third characteristic data similar to the first characteristic data is searched in the same modality, then the fourth characteristic data corresponding to the second modality data of the target preset object is compared with the second characteristic data, and whether the first modality data and the second modality data of the target object are consistent or not is judged according to the similarity of the characteristic data in the same modality, so that the accuracy of a result for detecting whether different information representation modes of the same target object are consistent or not can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is one of the flow diagrams of an information detection method shown in accordance with an exemplary embodiment;

FIG. 2 is a second schematic flow chart of an information detection method according to an exemplary embodiment;

FIG. 3 is a third schematic flow chart diagram illustrating a method of information detection in accordance with an exemplary embodiment;

FIG. 4 is a fourth flowchart illustrating a method of information detection according to an exemplary embodiment;

FIG. 5 is a fifth flowchart illustrating a method of information detection according to an exemplary embodiment;

FIG. 6 is a block diagram illustrating an information detection apparatus according to an exemplary embodiment;

fig. 7 is a schematic structural diagram of an electronic device according to an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In addition, in the description of the embodiments of the present disclosure, "/" indicates an OR meaning, for example, A/B may indicate A or B, unless otherwise specified. "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present disclosure, "a plurality" means two or more than two.

In the information distribution scenario, it is assumed that the works distributed by the user account include a video file and a text file describing the video. And when the user account A searches on the platform for issuing the information, displaying a search result according to the search words input by the user account A. A text file corresponding to the search word is generally searched by the server, and then a work corresponding to the text file is taken as a search result. If the text file is inconsistent with the video file in the work, the search results presented are inaccurate. Therefore, in order to be able to prepare a presentation search result, it is also necessary to detect whether data (video file and text file) of different modalities in a work are consistent.

In the related art, it is detected whether the text file is consistent with the video file according to the classification model. However, the classification model strongly depends on the data set, and the data set required by the training mode of the classification model is a high-quality confidence sample obtained by manual acquisition, so that the difficulty in acquiring the high-quality confidence sample is high, the classification model obtained by training may be inaccurate, and further, the result of detecting whether different information representation modes of the same object are consistent is inaccurate.

Based on this, the embodiments of the present disclosure provide an information detection method to solve the above problems. The electronic device executing the information detection method may be a personal intelligent device such as a mobile phone and a tablet computer, or may also be an electronic device such as a notebook computer, a handheld computer, a desktop computer, an ultra-mobile personal computer (UMPC), a server, or may also be another electronic device that can store and process first-modality data and second-modality data of a target object, where the form of the electronic device is not limited.

The information detection method provided by the embodiment of the present disclosure is described below with reference to the drawings, and the method is exemplarily described by taking an execution subject as an information detection apparatus as an example.

Fig. 1 is a schematic flow chart of an information detection method according to an embodiment of the present disclosure. As shown in fig. 1, an information detection method provided by an embodiment of the present disclosure includes the following steps 101 to 105.

Step 101, the information detection device obtains first characteristic data of a target object according to a first single-mode double-tower model.

In the embodiment of the present disclosure, the first feature data is feature data of first modality data corresponding to the target object; the target object is a multimedia asset object.

In an embodiment of the present disclosure, the target object may be characterized with data of a plurality of modalities, wherein the first modality data is one of the ways for characterizing the target object. It should be noted that each source or form of information may be referred to as a modality, such as smell, hearing, vision, touch, or the like, or voice, video, text, or the like, or radar, infrared, bluetooth, broadcast, or the like, each of which may be referred to as a modality, and in the embodiment of the present disclosure, a modality is used to represent a data type.

Illustratively, at the information distribution platform, a work is distributed, the work including a video file and a text file for describing the video, and then the first modality data may be the video file or the text file.

Optionally, in an embodiment of the present disclosure, the first modality data is any one of: text data, video data, audio data, and image data. The technical scheme provided by the disclosure at least brings the following beneficial effects: the first modal data can be of multiple data types, so that the information of the multiple data types can be detected, and the diversity of information detection is improved.

In the embodiment of the present disclosure, the first single-modality double-tower type is used to extract feature data of first-modality data, where the first-modality data refers to data having the same data type.

It should be noted that the first single-mode double-tower model is to process input data simultaneously by using two different submodels, and then reprocess the input results of the two submodels to obtain a final output result. The problem of inaccurate feature extraction result caused by a single sub-model can be avoided through the double-tower model.

And 102, the information detection device acquires second characteristic data of the target object according to the second single-mode double-tower model.

In an embodiment of the present disclosure, the second feature data is feature data of second modality data corresponding to the target object.

Optionally, in an embodiment of the present disclosure, the second modality data is any one of: text data, video data, audio data, and image data. The technical scheme provided by the disclosure at least brings the following beneficial effects: the second modal data can be of multiple data types, so that the information of the multiple data types can be detected, and the diversity of information detection is improved.

In the embodiment of the present disclosure, the data types of the first modality data and the second modality data may be the same or different, for example, the first modality data is text data, and the second modality data is also text data, or the first modality data is text data, and the second modality data is video data. It is to be appreciated that the first modality data is different from the second modality data even though the data type of the first modality data is the same as the data type of the second modality data.

This step is similar to step 101 described above, and is not described again in the embodiments of the present disclosure.

And 103, searching a target preset object corresponding to the target object in the preset object set by the information detection device according to the first characteristic data.

In the embodiment of the present disclosure, a similarity between third characteristic data of the target preset object and the first characteristic data is greater than or equal to a first threshold, where the third characteristic data is first modality characteristic data of first modality data corresponding to the target preset object; the preset object set comprises at least one preset object, and the characteristic data of the first modal data corresponding to the preset object is matched with the characteristic data of the second modal data corresponding to the preset object.

In the embodiment of the present disclosure, the preset object set includes preset objects, the preset objects are similar to the target object and can be characterized by data of multiple modalities, each preset object is characterized by data of a first modality and data of a second modality, and the data of the first modality corresponding to the preset object is matched with the data of the second modality corresponding to the preset object.

It should be noted that the preset objects in the preset object set are collected in advance through manual review or other review modes, and meet the collection conditions: the first modality data corresponding to the preset object matches (coincides with) the second modality data corresponding to the preset object.

In the embodiment of the present disclosure, the target preset object is searched, and a first modality similarity between the first feature data and the feature data of the first modality data of any preset object in the preset object set is further calculated according to a preset algorithm. The first modality similarity may be a semantic similarity, an image similarity, a volume similarity, an audio similarity, or the like, according to data characteristics of the first modality feature data.

In the embodiment of the present disclosure, the first similarity of each preset object in the preset object set is compared with a first threshold, and the preset object corresponding to the first similarity greater than or equal to the first threshold is determined as the target preset object.

Optionally, in this embodiment of the present disclosure, if there is no first similarity greater than or equal to the first threshold in the preset object set, the preset objects corresponding to the preset number (at least 1) of first similarities are determined to be the target preset objects according to the order from high to low of the first similarity.

It should be noted that, because the preset objects in the preset object set are limited, all the preset objects may be different from the target object, and cannot be used to determine whether the first modal data of the target object matches the second modal data corresponding to the target object, the second step is performed, and the preset object closest to the target object is selected as the basis for the determination.

And step 104, the information detection device obtains a second modal similarity between the target preset object and the target object according to the fourth characteristic data and the second characteristic data of the target preset object.

In this embodiment of the present disclosure, the fourth feature data is feature data of the second modality data corresponding to the target preset object.

Optionally, in this embodiment of the present disclosure, whether the similarity of the first modality is greater than or equal to the first threshold or the similarity of the first modality is in an order from high to low, the target preset object is determined, and in a case that the similarity of the first modality is greater than 1, as shown in fig. 2, step 104 may be implemented by step 201 and step 202.

Step 201, the information detection apparatus obtains a third modal similarity between each target preset object and the target object according to the fourth characteristic data and the second characteristic data of each target preset object.

Step 202, the information detection apparatus determines an average value of the third modal similarity between each target preset object and the target object as the second modal similarity between the target preset object and the target object.

In the embodiment of the present disclosure, if the number of the target preset objects is greater than 1, to avoid the influence of the single target preset object on the similarity of the second modality due to the factors such as inapplicability of the adopted algorithm, calculation deviation, or inaccuracy of the data itself, a manner of calculating an average value may be adopted.

The technical scheme provided by the disclosure at least brings the following beneficial effects: by calculating the average value of the third modal similarity of the target preset objects, the factors that the similarity calculation method is not applicable, the calculation deviation or the data per se is inaccurate are reduced, and the accuracy of the obtained second modal similarity is improved.

In the embodiment of the present disclosure, the second modality similarity may be a semantic similarity, an image similarity, a volume similarity, an audio similarity, or the like, according to the data characteristics of the second modality feature data.

And 105, determining that the first modal data corresponding to the target object is matched with the second modal data corresponding to the target object by the information detection device under the condition that the similarity of the second modal is greater than a second threshold.

In the embodiment of the present disclosure, the second modality similarity is compared with a second threshold, when the second modality similarity is greater than the second threshold, it is determined that the first modality data corresponding to the target object matches with the second modality data corresponding to the target object, and when the second modality similarity is less than or equal to the second threshold, it is determined that the first modality data corresponding to the target object does not match with the second modality data corresponding to the target object.

In the embodiment of the present disclosure, the target object is set to correspond to the first modality data and the second modality data, so as to determine whether the first modality data corresponding to the target object matches with the second modality data corresponding to the target object. Similarly, the target object further corresponds to third modality data, and then a third modality similarity between the feature data of the third modality data of the target preset object and the feature data of the third modality data of the target object needs to be calculated, and finally, under the condition that the second modality similarity is greater than the second threshold and the third modality similarity is greater than the third threshold, it is determined that the first modality data, the second modality data and the third modality data corresponding to the target object match. It should be noted that if the target object further includes data of other modalities, similarly to the above steps, a modality similarity corresponding to the data of other modalities needs to be calculated, and a comparison result between the modality similarity and another threshold value needs to be calculated.

Illustratively, the first modality data corresponding to the target object is a video file (yaoming brush) and the second modality data corresponding to the target object is a text file (how the yaoming child buckles the basket). The video file and the text file may be considered to be consistent when the classification model is used for direct discrimination, but by adopting the method from the step 101 to the step 105, the video file and the text file can be judged to be inconsistent because the video content is the Yaoming brush, but the text description is how the Yaoming teaches the child to buckle the basket.

The technical scheme provided by the disclosure at least brings the following beneficial effects: the method comprises the steps of extracting first characteristic data and second characteristic data from different modal data corresponding to the same target object, searching a target preset object corresponding to third characteristic data similar to the first characteristic data in the same modality, comparing the fourth characteristic data with the second characteristic data according to the third characteristic data of the target preset object corresponding to the second modal data, judging whether the first modal data and the second modal data of the target object are consistent or not according to the similarity of the characteristic data in the same modality, and improving the accuracy of a result of detecting whether different information representation modes of the same target object are consistent or not.

Optionally, in order to improve the data reliability of the first feature data and the second feature data, as shown in fig. 3, before step 101, the information detection method provided in the embodiment of the present disclosure further includes step 301 and step 302.

Step 301, the information detection apparatus determines a first single-mode double-tower model according to the data type of the first-mode data corresponding to the target object.

In an embodiment of the present disclosure, the first single-mode double-tower model is trained according to first sample data, and a data type of the first sample data is the same as a data type of the first-mode data.

In this embodiment of the disclosure, the submodel in the first single-mode double-tower model may be configured according to a data type, for example, the data type is a video, the submodel may select a convolutional neural network algorithm, for example, the data type is a text, and the submodel may select a semantic similarity algorithm.

Step 302, the information detection apparatus determines a second single-mode double-tower model according to the data type of the second-mode data corresponding to the target object.

In the embodiment of the present disclosure, the second single-mode double-tower model is obtained by training according to the second sample data, and the data type of the second sample data is the same as the data type of the second-mode data.

In the embodiment of the present disclosure, the second single-mode double-tower model is similar to the first single-mode double-tower model, and the sub-model in the second single-mode double-tower model may be configured according to a data type, for example, the data type is a video, the sub-model may select a convolutional neural network algorithm, for example, the data type is a text, and the sub-model may select a semantic similarity algorithm.

The technical scheme provided by the disclosure at least brings the following beneficial effects: before the first characteristic data and the second characteristic data are obtained, a module used for obtaining the characteristic data is pre-trained, so that the obtained first characteristic data and the second characteristic data are more accurate.

Optionally, as shown in fig. 4, before step 101, the information detection method provided in the embodiment of the present disclosure further includes step 401 and step 402.

Step 401, the information detection apparatus extracts feature data of first modality data corresponding to each preset object in the preset object set according to the first single-modality double-tower model.

Step 402, the information detection apparatus extracts feature data corresponding to second modal data corresponding to each preset object in the preset object set according to the second single-modal double-tower model.

In the embodiment of the present disclosure, the preset object set is used for storing the preset objects, and since the feature data of the first modality data of the preset objects and the feature data corresponding to the second modality data corresponding to the preset objects are often used in the information detection process, after the preset object set is generated, the data can be extracted and stored, so as to improve the speed of information detection.

The technical scheme provided by the disclosure at least brings the following beneficial effects: and extracting and storing the characteristic data of the first modal data corresponding to each preset object in the preset object set and the characteristic data of the second modal data corresponding to the preset object so as to be convenient for directly using the data in the subsequent information detection process and improve the information detection speed.

Optionally, as shown in fig. 5, on the basis of the method shown in fig. 1, after step 105, the information detection method provided in the embodiment of the present disclosure further includes step 501.

Step 501, the information detection device adds a target object into a preset object set, and records first characteristic data and second characteristic data.

The technical scheme provided by the disclosure at least brings the following beneficial effects: the target object matched with the first modal data and the second modal data is determined to accord with the screening condition of the preset object in the preset object set, and the target object is added into the preset object set, so that the range of the preset object set can be expanded, and the accuracy of the result of detecting whether different information representation modes of the same target object are consistent can be improved.

The foregoing describes aspects of embodiments of the present disclosure primarily from a methodological perspective. It is to be understood that the information detecting apparatus includes at least one of a hardware structure and a software module corresponding to each function in order to realize the above-described functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The information detection device according to the embodiments of the present disclosure may be divided into functional units according to the above method examples, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the units in the embodiments of the present disclosure is schematic, and is only one logical function division, and there may be another division manner in actual implementation.

Fig. 6 is a schematic diagram illustrating a structure of an information detecting apparatus according to an exemplary embodiment. Referring to fig. 6, the information detecting apparatus provided in the embodiment of the present disclosure includes an obtaining unit 61, a searching unit 62, and a determining unit 63;

the acquiring unit 61 is configured to acquire first feature data of a target object according to a first single-mode double-tower model, where the first feature data is feature data of first-mode data corresponding to the target object, and the target object is a multimedia resource object; for example, as shown in fig. 1, the obtaining unit 61 may be configured to perform step 101.

The obtaining unit 61 is further configured to obtain second feature data of the target object according to the second single-mode double-tower model, where the second feature data is feature data of second-mode data corresponding to the target object; for example, as shown in fig. 1, the obtaining unit 61 may be configured to perform step 102.

A searching unit 62, configured to search, according to the first feature data acquired by the acquiring unit 61, a target preset object corresponding to the target object in the preset object set, where a first modal similarity between third feature data of the target preset object and the first feature data is greater than or equal to a first threshold, and the third feature data is feature data of first modal data corresponding to the target preset object; the preset object set comprises at least one preset object, and the characteristic data of the first modal data corresponding to the preset object is matched with the characteristic data of the second modal data corresponding to the preset object; for example, as shown in fig. 1, the lookup unit 62 may be configured to perform step 103.

The obtaining unit 61 is further configured to obtain a second modal similarity between the target preset object and the target object according to fourth feature data and the second feature data of the target preset object searched by the searching unit 62, where the fourth feature data is feature data of second modal data corresponding to the target preset object; for example, as shown in fig. 1, the obtaining unit 61 may be configured to perform step 104.

A determining unit 63, configured to determine that the first modality data corresponding to the target object matches the second modality data corresponding to the target object when the second modality similarity acquired by the acquiring unit 61 is greater than the second threshold. For example, as shown in fig. 1, the determining unit 63 may be configured to perform step 105.

Optionally, as shown in fig. 6, when the number of the target preset objects is greater than 1, the obtaining unit 61 is specifically configured to: obtaining a third modal similarity between each target preset object and the target object according to the fourth characteristic data and the second characteristic data of each target preset object; and determining the average value of the third modal similarity between each target preset object and the target object as the second modal similarity between the target preset object and the target object. For example, as shown in fig. 2, the obtaining unit 61 may be configured to perform step 201 and step 202.

Optionally, as shown in fig. 6, the information detecting apparatus further includes: the determining unit 63 is further configured to determine, by the obtaining unit 61, a first single-mode double-tower model according to a data type of first-mode data corresponding to the target object before obtaining the first feature data of the target object according to the first single-mode double-tower model, where the first single-mode double-tower model is obtained by training according to the first sample data, and the data type of the first sample data is the same as the data type of the first-mode data; and determining a second single-mode double-tower model according to the data type of second-mode data corresponding to the target object, wherein the second single-mode double-tower model is obtained by training according to second sample data, and the data type of the second sample data is the same as that of the second-mode data. For example, as shown in fig. 3, the determining unit 63 may be configured to perform step 301 and step 302.

Optionally, as shown in fig. 6, the information detecting apparatus further includes an extracting unit; the processing unit 64 is further configured to, according to the first feature data, before searching for a target preset object corresponding to the target object in the preset object set, the searching unit 62 extracts and stores feature data of first modal data corresponding to each preset object in the preset object set according to the first single-modal double-tower model; and extracting and storing the preset object set and the characteristic data corresponding to the second modal data corresponding to each preset object according to the second single-modal double-tower model. For example, as shown in fig. 4, the processing unit 64 may be configured to perform

steps

401 and 402.

Optionally, as shown in fig. 6, the information detecting apparatus further includes an adding unit 65; and an adding unit 65, configured to, when the second modality similarity is greater than the second threshold, determine that the first modality data corresponding to the target object matches the second modality data corresponding to the target object, add the target object to the preset object set, and record the first feature data and the second feature data. For example, as shown in fig. 5, the joining unit 65 may be configured to perform step 501.

With regard to the apparatus in the above-described embodiment, the specific manner in which each unit performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Fig. 7 is a schematic structural diagram of an electronic device provided in the present disclosure. As in fig. 7, the electronic device may include a processor 71, a memory 72 for storing instructions executable by the processor 71; wherein the processor 71 is configured to execute the instructions to implement the information detection method in the above embodiment.

In addition, the electronic device may also include a communication bus 73 and at least one communication interface 74.

The processor 71 may be a Central Processing Unit (CPU), a micro-processing unit, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits for controlling the execution of programs in accordance with the disclosed aspects.

The communication bus 73 is a signal path for transmitting information between the above components.

The communication interface 74 may be any device, such as a transceiver, for communicating with other devices or communication networks, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.

The memory 72 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 72 may be separate and coupled to the processor 71 via a communication bus 74. The memory 72 may also be integrated with the processor 71.

The memory 72 is used for storing instructions for executing the disclosed solution, and is controlled by the processor 71. The processor 71 is configured to execute programs or instructions stored in the memory 72 to implement the functions in the method of the present disclosure.

As an example, in conjunction with fig. 6, the acquisition unit 61, the search unit 62, and the determination unit 63 in the information detection apparatus implement the same functions as those of the processor 71 in fig. 7.

In particular implementations, processor 71 may include one or more CPUs such as CPU0 and CPU1 in fig. 7 as one embodiment.

In a specific implementation, as an embodiment, the electronic device may include a plurality of processors 71, and each of the processors 71 may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. Processor 71 herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In particular implementations, electronic device may also include an output device 75 and an input device 76, as one embodiment. The output device 75 is in communication with the processor 71 and may display information in a variety of ways. For example, the output device 75 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 76 is in communication with the processor 71 and can accept user input in a variety of ways. For example, the input device 76 may be a mouse, keyboard, touch screen device, or sensing device, among others.

Those skilled in the art will appreciate that the configuration shown in fig. 7 does not constitute a limitation of the electronic device, and may include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components. The electronic device in fig. 7 may be a server, a client, or other devices.

In addition, the present disclosure also provides a readable storage medium, on which a program or instructions are stored, and when the instructions in the readable storage medium are executed by a processor, the electronic device is enabled to execute the information detection method provided in the above embodiment. Alternatively, the readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In addition, the present disclosure also provides a computer program product comprising computer programs/instructions, the computer program product being stored in a non-volatile readable storage medium, the computer program product, when executed by at least one processor, causing an electronic device to perform the information detection method as provided in the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An information detection method, comprising:

acquiring first characteristic data of a target object according to a first single-mode double-tower model, wherein the first characteristic data is characteristic data of first-mode data corresponding to the target object, and the target object is a multimedia resource object;

acquiring second characteristic data of the target object according to a second single-mode double-tower model, wherein the second characteristic data is characteristic data of second mode data corresponding to the target object;

searching a target preset object corresponding to the target object in a preset object set according to the first characteristic data, wherein the first modal similarity between third characteristic data of the target preset object and the first characteristic data is greater than or equal to a first threshold value, and the third characteristic data is characteristic data of the first modal data corresponding to the target preset object; the preset object set comprises at least one preset object, and the characteristic data of the first modal data corresponding to the preset object is matched with the characteristic data of the second modal data corresponding to the preset object;

according to fourth feature data of the target preset object and the second feature data, obtaining second modal similarity between the target preset object and the target object, wherein the fourth feature data are feature data of second modal data corresponding to the target preset object;

and under the condition that the second modality similarity is larger than a second threshold, determining that the first modality data corresponding to the target object is matched with the second modality data corresponding to the target object.

2. The information detection method according to claim 1, wherein in a case that the number of the target preset objects is greater than 1, the obtaining a second modal similarity between the target preset object and the target object according to the fourth feature data and the second feature data of the target preset object includes:

obtaining a third modal similarity between each target preset object and the target object according to the fourth characteristic data and the second characteristic data of each target preset object;

and determining the average value of the third modal similarity between each target preset object and the target object as the second modal similarity between the target preset object and the target object.

3. The information detection method according to claim 1, wherein before the obtaining of the first feature data of the target object according to the first single-modality double-tower model, the method further comprises:

determining a first single-mode double-tower model according to the data type of first-mode data corresponding to the target object, wherein the first single-mode double-tower model is obtained by training according to first sample data, and the data type of the first sample data is the same as that of the first-mode data;

and determining a second single-mode double-tower model according to the data type of second-mode data corresponding to the target object, wherein the second single-mode double-tower model is obtained by training according to second sample data, and the data type of the second sample data is the same as that of the second-mode data.

4. The information detecting method according to claim 3, wherein before searching for the target preset object corresponding to the target object in the preset object set according to the first feature data, the method further comprises:

extracting and storing the characteristic data of the first modal data corresponding to each preset object in the preset object set according to the first single-modal double-tower model;

and extracting and storing feature data corresponding to second modal data corresponding to each preset object in the preset object set according to the second single-modal double-tower model.

5. The information detection method according to claim 1, wherein after determining that the first modality data corresponding to the target object matches the second modality data corresponding to the target object if the second modality similarity is greater than a second threshold, the method further comprises:

and adding the target object into the preset object set, and recording the first characteristic data and the second characteristic data.

6. The information detection method according to any one of claims 1 to 5, wherein the first modality data is any one of: text data, video data, audio data, and image data; the second modality data is any one of: text data, video data, audio data, and image data.

7. An information detection device is characterized by comprising an acquisition unit, a search unit and a determination unit;

the acquiring unit is configured to acquire first feature data of a target object according to a first single-mode double-tower model, where the first feature data is feature data of first-mode data corresponding to the target object, and the target object is a multimedia resource object;

the obtaining unit is further configured to obtain second feature data of the target object according to a second single-mode double-tower model, where the second feature data is feature data of second mode data corresponding to the target object;

the searching unit is configured to search a target preset object corresponding to the target object in a preset object set according to the first feature data acquired by the acquiring unit, where a first modal similarity between third feature data of the target preset object and the first feature data is greater than or equal to a first threshold, and the third feature data is feature data of the first modal data corresponding to the target preset object; the preset object set comprises at least one preset object, and the characteristic data of the first modal data corresponding to the preset object is matched with the characteristic data of the second modal data corresponding to the preset object;

the obtaining unit is further configured to obtain a second modal similarity between the target preset object and the target object according to fourth feature data of the target preset object and the second feature data, which are searched by the searching unit, where the fourth feature data is feature data of second modal data corresponding to the target preset object;

the determining unit is configured to determine that the first modality data corresponding to the target object matches the second modality data corresponding to the target object when the second modality similarity acquired by the acquiring unit is greater than a second threshold.

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the information detection method of any one of claims 1-6.

9. A readable storage medium, wherein instructions in the readable storage medium, when executed by a processor, are capable of implementing the information detection method of any one of claims 1-6.

10. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the information detection method according to any of claims 1-6.