CN113435485A

CN113435485A - Picture detection method and device, electronic equipment and storage medium

Info

Publication number: CN113435485A
Application number: CN202110661487.8A
Authority: CN
Inventors: 唐勇平; 李瑞锋
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-15
Filing date: 2021-06-15
Publication date: 2021-09-24

Abstract

The disclosure discloses a picture detection method and device, electronic equipment and a storage medium, and relates to the technical field of computers. The specific implementation scheme is as follows: the method comprises the steps that a picture to be detected is obtained and input into a Convolutional Neural Network (CNN) model, and the picture to be detected is marked as a picture to be detected for the second time under the condition that the similarity between the picture to be detected and a first sample picture is smaller than a first preset value and larger than a second preset value; and inputting the picture to be secondarily detected into the editing distance detection model, and judging that the picture to be secondarily detected and the second sample picture are the same picture under the condition that the editing distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value. Therefore, the CNN model is used for carrying out primary identification and confirmation on the picture to be detected, and the editing distance model is used for carrying out secondary identification and confirmation on the marked picture to be detected, so that the picture detection has good identification robustness, and the accuracy of detecting and identifying the picture can be further improved.

Description

Picture detection method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for detecting a picture, an electronic device, and a storage medium.

Background

With the rapid development of the internet technology and the multimedia technology, pictures in the internet show well-jet explosion growth, and a great amount of pictures containing bad contents are not included in the pictures, so that great wind control risks are brought to a picture storage website; therefore, it is important how to accurately and effectively identify the pictures containing the bad content and perform corresponding processing.

In the related technology, a mode based on manual review is adopted to identify such pictures, but the labor cost is too high, a trained reviewer averagely reviews the pictures by about thousands of pictures every day, the pictures are difficult to implement in the current massive picture scene, and the situations of missed detection and false detection exist.

Disclosure of Invention

The disclosure provides a picture detection method and device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a picture detection method, including: acquiring a picture to be detected; inputting the picture to be detected into a Convolutional Neural Network (CNN) model, and comparing the picture with a first sample picture in a similar vector search library; under the condition that the similarity between the picture to be detected and the first sample picture is smaller than a first preset value and larger than a second preset value, marking the picture to be detected as a picture to be detected for the second time; inputting the picture to be secondarily detected to an editing distance detection model, and comparing the picture to be secondarily detected with a second sample picture in a Hash search library; and under the condition that the editing distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value, judging that the picture to be secondarily detected and the second sample picture are the same picture. Therefore, the CNN model is used for carrying out primary identification and confirmation on the picture to be detected, and the editing distance model is used for carrying out secondary identification and confirmation on the marked picture to be detected, so that the picture detection has good identification robustness, and the accuracy of detecting and identifying the picture can be further improved.

According to a second aspect of the present disclosure, there is provided a picture detection apparatus comprising: the device comprises a detection picture acquisition unit, a CNN model processing unit, a first judgment unit, an edit distance model processing unit and a second judgment unit.

The device comprises a detection picture acquisition unit, a detection picture acquisition unit and a detection picture acquisition unit, wherein the detection picture acquisition unit is used for acquiring a picture to be detected; the CNN model processing unit is used for inputting the picture to be detected into a convolutional neural network CNN model and comparing the picture with a first sample picture in a similar vector search library; the first judging unit is used for marking the picture to be detected as a picture to be detected for the second time under the condition that the similarity between the picture to be detected and the first sample picture is smaller than a first preset value and larger than a second preset value; the editing distance model processing unit is used for inputting the picture to be secondarily detected to an editing distance detection model and comparing the picture to be secondarily detected with a second sample picture in a Hash search library; and the second judging unit is used for judging that the picture to be secondarily detected and the second sample picture are the same picture under the condition that the editing distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of the first aspects.

According to a fifth aspect of the present disclosure, a computer program product is presented, characterized in that when executed by an instruction processor in the computer program product, implements the method according to any of the above first aspects.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a second embodiment of the present application;

FIG. 3 is another schematic illustration according to a second embodiment of the present application;

FIG. 4 is another schematic illustration according to a first embodiment of the present application;

FIG. 5 is a schematic illustration according to a third embodiment of the present application;

FIG. 6 is a schematic illustration according to a fourth embodiment of the present application;

FIG. 7 is another schematic illustration in accordance with a fourth embodiment of the present application;

FIG. 8 is another schematic illustration according to a third embodiment of the present application;

fig. 9 is a block diagram of an electronic device for implementing a picture detection method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The present disclosure provides a picture detection method, and fig. 1 is a schematic diagram according to a first embodiment of the present disclosure.

As shown in fig. 1, the method includes:

s1: and acquiring the picture to be detected.

In the embodiment of the present disclosure, the picture to be detected may be a picture directly uploaded by a user, or a picture uploaded by one or more third party applications that need to check the picture through an access port. It is to be understood that the number of pictures to be detected may be one or more, may be one picture to be detected, or may be multiple pictures to be detected, which is not limited in this disclosure.

S2: and inputting the picture to be detected into a CNN model of the convolutional neural network, and comparing the picture with a first sample picture in a similar vector search library.

In the embodiment of the disclosure, a picture to be detected is input to a convolutional neural network CNN model and compared with a first sample picture in a similar vector search library, where the similar vector search library includes a plurality of sample pictures, the first sample picture may be any one of the sample pictures in the similar vector search library, and the sample pictures in the similar vector search library may be respectively used as the first sample picture to be compared with the picture to be detected.

S3: and under the condition that the similarity between the picture to be detected and the first sample picture is smaller than a first preset value and larger than a second preset value, marking the picture to be detected as a picture to be detected for the second time.

The values of the first preset value and the second preset value may be set arbitrarily as required, which is not limited in this disclosure.

It can be understood that the higher the values of the first preset value and the second preset value are set, the higher the similarity between the picture to be detected and the first sample picture is required, and conversely, the lower the values of the first preset value and the second preset value are set, the lower the similarity between the picture to be detected and the first sample picture is required.

In the embodiment of the present disclosure, if the similarity between the to-be-detected picture and the first sample picture is smaller than the first preset value and larger than the second preset value, it means that the to-be-detected picture and the first sample picture have certain similarity, but the similarity is not high, at this time, in the embodiment of the present disclosure, the to-be-detected picture with low similarity is not identified as a different picture from the first sample picture, but is marked as a to-be-detected picture, and further, the marked to-be-detected picture may continue to be processed in subsequent steps.

S4: and inputting the picture to be secondarily detected into the edit distance detection model, and comparing the picture with a second sample picture in the Hash search library.

In the embodiment of the disclosure, the picture to be secondarily detected is input into the hamming distance detection model and compared with the second sample picture in the Hash search library, wherein the Hash search library comprises a plurality of sample pictures, the second sample picture can be any one of the sample pictures in the Hash search library, and the sample pictures in the Hash search library can be respectively used as the second sample picture to be compared with the picture to be detected.

S5: and under the condition that the editing distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value, judging that the picture to be secondarily detected and the second sample picture are the same picture.

The third preset value may be arbitrarily set according to specific situations, and it can be understood that the lower the third preset value is set, that is, the smaller the editing distance between the picture to be secondarily detected and the second sample picture is, the more similar the picture to be secondarily detected and the second sample picture is, otherwise, the higher the third preset value is set, that is, the greater the editing distance between the picture to be secondarily detected and the second sample picture is, the greater the difference between the picture to be secondarily detected and the second sample picture is.

The picture detection method provided by the embodiment of the disclosure comprises the steps of obtaining a picture to be detected; inputting the picture to be detected into a CNN model, and comparing the CNN model with a first sample picture in a similar vector search library; under the condition that the similarity between the picture to be detected and the first sample picture is smaller than a first preset value and larger than a second preset value, marking the picture to be detected as a picture to be detected for the second time; inputting the picture to be secondarily detected into the editing distance detection model, and comparing the picture with a second sample picture in the Hash search library; and under the condition that the editing distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value, judging that the picture to be secondarily detected and the second sample picture are the same picture. Therefore, the CNN model is used for carrying out primary identification and confirmation on the picture to be detected, and the editing distance model is used for carrying out secondary identification and confirmation on the marked picture to be detected, so that the picture detection has good identification robustness, and the accuracy of detecting and identifying the picture can be further improved.

Fig. 2 is a schematic diagram of a second embodiment of the present disclosure.

As shown in fig. 2, the picture detection method provided by the embodiment of the present disclosure includes:

s10: and acquiring the picture to be detected.

For description of S10 in the embodiment of the present disclosure, reference may be made to the description in S1 in the above embodiment, which is not described herein again.

S20: and inputting the picture to be detected into the CNN model, acquiring the characteristic vector of the picture to be detected, and comparing the characteristic vector with the characteristic vector of the first picture in the similar vector search library.

Wherein, the structure of the CNN model comprises: input layer, convolution layer, excitation layer, pooling layer, full-link layer, output layer, etc. The input layer is used for inputting data of a picture to be detected, and it can be understood that the obtained picture to be detected is a color picture formed by overlapping red, green and blue (RGB). The convolution layer uses convolution kernels to perform feature extraction and feature mapping; adding nonlinear mapping to the excitation layer; the pooling layer is subjected to down-sampling, the characteristic diagram is subjected to sparse processing, and the data calculation amount is reduced; the full connection layer is usually refitted at the tail part of the CNN, so that the loss of characteristic information is reduced; and the output layer is used for outputting the result to obtain the characteristic vector of the picture to be detected.

Of course, the structure of the CNN model can also use other functional layers, namely, the normalization layer normalizes the features in the CNN model; the segmentation layer performs regional independent learning on certain picture data; the fusion layer fuses branches that independently perform feature learning.

In the embodiment of the disclosure, the picture to be detected is input into the CNN model, the feature vector of the picture to be detected is generated, and the feature vector is compared with the feature vector of the first picture in the similar vector search library.

In the embodiment of the disclosure, the feature vectors of a plurality of sample pictures, including the feature vector of the first sample picture, are stored in the similar vector search library, and after the feature vector of the picture to be detected is obtained, the feature vector is compared with the feature vector of the sample picture in the similar vector search library.

S30: and under the condition that the similarity between the picture to be detected and the first sample picture is smaller than a first preset value and larger than a second preset value, marking the picture to be detected as a picture to be detected for the second time.

For description of S30 in the embodiment of the present disclosure, reference may be made to the description in S3 in the above embodiment, which is not described herein again.

In some embodiments, in the embodiment of the present disclosure, when the similarity between the to-be-detected picture and the first sample picture is greater than a first preset value, it is determined that the to-be-detected picture and the first sample picture are the same picture.

The value of the first preset value can be set as required, and it can be understood that the higher the value of the first preset value is set, the higher the similarity between the picture to be detected and the first sample picture is required to be, the picture to be detected can be recognized as the same picture as the first sample picture, and conversely, the lower the value of the first preset value is set, the lower the similarity between the picture to be detected and the first sample picture is, the picture to be detected can also be recognized as the same picture as the first sample picture.

However, the higher the value of the first preset value is, the better, when the value is higher, the picture to be detected can be recognized as the same picture as the first sample picture only by being very similar to the first sample picture, however, if the first sample picture is rotated or simply transformed by adding a watermark and the like, the picture to be detected is compared with the first sample picture again, and when the value of the first preset value is higher, the picture cannot be recognized as the same picture as the first sample picture, which may result in low recognition accuracy. It can be understood that the lower the value of the first preset value is, the better the value of the first preset value is, the lower the value of the first preset value is, the picture to be detected with a larger difference from the first sample picture is also recognized as the same picture as the first sample picture, so that the picture which is not needed to be recognized originally is recognized, and the recognition is also inaccurate.

Similarly, the value of the second preset value has the same effect as the value of the first preset value, and based on the setting of the proper values of the first preset value and the second preset value, the accuracy of detecting the picture to be detected can be improved.

With continued reference to fig. 2, in an embodiment of the present disclosure, S40 is performed after S30.

S40: and inputting the picture to be secondarily detected into a PDQ (perception-based Q) model based on discrete cosine transform and quality measurement, generating a Hash string of the picture to be secondarily detected, and comparing the Hash string with a Hash string of a second sample picture in a Hash search library.

In the embodiment of the disclosure, the picture to be secondarily detected is input to a PDQ model based on discrete cosine transform and quality measurement, a Hash string of the picture to be secondarily detected is generated, and the Hash string of the second sample picture in the Hash search library is compared with the Hash string of the second sample picture.

As shown in fig. 3, S40 in the embodiment of the present disclosure includes the following sub-steps:

s41: and adjusting the size of the picture to be secondarily detected.

Illustratively, the size of the picture to be secondarily detected is reduced, for example: adjusted to 512 x 512 size pictures.

S42: and converting the adjusted picture to be secondarily detected into a YUV color space of a brightness color space to obtain a brightness picture.

It can be understood that the format of the acquired picture to be detected is a color picture, and the picture to be detected is formed by overlapping red, green and blue RGB.

S43: the luminance picture is divided into a plurality of sub-pictures using a two-layer tent convolution filter.

Illustratively, a 512 by 512 size luminance picture is divided into a plurality of 64 by 64 size sub-pictures using two layers of tent convolution filters.

S44: in the sub-picture down-sampling, calculating two-dimensional Discrete Cosine Transform (DCT) to obtain a plurality of sub-blocks corresponding to the sub-picture.

Illustratively, in the downsampling of 64 × 64 sub-pictures, two-dimensional discrete cosine transform DCT is calculated, and a plurality of 16 × 16 sub-blocks corresponding to the 64 × 64 sub-pictures are obtained.

S45: and calculating the median of a plurality of elements in the transformation space corresponding to the plurality of sub-blocks.

Illustratively, the median of the plurality of elements in the transform space corresponding to the plurality of 16 × 16 sized sub-blocks is calculated. The median is a number at a middle position among a plurality of elements in the transformation space corresponding to a group of a plurality of sub-blocks arranged in sequence, the median can be found out by sequencing the plurality of elements in the transformation space corresponding to all the sub-blocks in a high-low manner and then used as the median, and if the plurality of elements in the transformation space corresponding to the sub-blocks are even, the average of two most middle values is usually used as the median.

S46: and generating a Hash string of the picture to be secondarily detected according to the elements and the median in the transformation space corresponding to the sub-blocks.

For example, in the embodiment of the present disclosure, a Hash string of the sample picture is generated according to elements and median in a transform space corresponding to 16 × 16 sub-blocks, specifically, for each element in the transform space corresponding to 16 × 16 sub-blocks that is greater than the median, a 1 is sent out, otherwise, a 0 is sent out, and a 256-bit Hash string, that is, a Hash string of the picture to be secondarily detected is obtained.

In the embodiment of the disclosure, the Hash strings of the multiple sample pictures are stored in the Hash search library, the Hash string of the second sample picture may be the Hash string of any one of the multiple sample pictures in the Hash search library, and the Hash strings of the sample pictures in the Hash search library may be respectively used as the Hash string of the second sample picture to be compared with the Hash string of the picture to be secondarily detected.

In the embodiment of the disclosure, the picture to be secondarily detected is input to the PDQ model based on the discrete cosine transform and the quality metric, the Hash string of the picture to be secondarily detected is generated, and is compared with the Hash string of the second sample picture in the Hash search library.

Please continue to refer to fig. 2, S50: and under the condition that the Hamming distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value, judging that the picture to be secondarily detected and the second sample picture are the same picture.

In the embodiment of the disclosure, after the Hash string of the picture to be secondarily detected is obtained, the Hash string of the picture to be secondarily detected is compared with the Hash string of the second sample picture in the Hash search library, and the hamming distance between the two samples is calculated.

Specifically, when the hamming distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value, the picture to be secondarily detected and the second sample picture are judged to be the same picture, and conversely, when the hamming distance between the picture to be secondarily detected and the second sample picture is larger than the third preset value, the picture to be secondarily detected and the second sample picture are judged to be different pictures.

The third preset value may be set according to specific situations, and it can be understood that the lower the third preset value is set, that is, the smaller the hamming distance between the picture to be secondarily detected and the second sample picture is, the more similar the picture to be secondarily detected and the second sample picture is, otherwise, the higher the third preset value is set, that is, the larger the hamming distance between the picture to be secondarily detected and the second sample picture is, the larger the difference between the picture to be secondarily detected and the second sample picture is.

Similarly, the effect similar to the values of the first preset value and the second preset value is achieved, the value of the third preset value is not lower as well as better, the value of the third preset value is lower, that is, the hamming distance between the picture to be secondarily detected and the second sample picture is smaller, it is indicated that the hamming distance between the picture to be secondarily detected and the second sample picture is smaller, and the picture to be secondarily detected can be identified as the same picture as the second sample picture. It can be understood that the value of the third preset value is not higher and better, and the higher the value of the third preset value is, the picture to be detected for the second time, which has a larger difference from the picture of the second sample, is also recognized as the same picture as the picture of the second sample, so that the picture which is not needed to be recognized originally is recognized, and the recognition is also inaccurate.

Based on this, the value of the appropriate third preset value is set, and the accuracy of the detection of the picture to be detected can be improved.

As shown in fig. 4, in the picture detection method provided in the embodiment of the present disclosure, at S1: before the picture to be detected is obtained, the method further comprises the following steps:

s100: a plurality of sample pictures are acquired.

In the disclosed embodiment, the sample pictures can be collected manually or by machine. And it is understood that the sample picture may be one or more sample pictures, may be one sample picture, or may be multiple sample pictures, which is not limited in this disclosure.

S200: and inputting the sample picture into the CNN model, acquiring the characteristic vector of the sample picture, and storing to generate a similar vector retrieval library.

The description of S300 in the embodiment of the present disclosure may refer to partial descriptions in S2 and S20 in the above embodiments, where the steps related to inputting the picture to be detected to the CNN model and generating the feature vector of the picture to be detected may be applied to inputting the sample picture to the CNN model and generating the feature vector of the sample picture, and details of the present disclosure are not repeated here.

In the embodiment of the disclosure, after the feature vector of the sample picture is generated, the feature vector of the sample picture is stored, and a similar vector search library is generated.

S300: and inputting the sample picture into the editing distance model, obtaining a Hash string of the sample picture, and storing to generate a Hash search library.

The description of S300 in the embodiment of the present disclosure may refer to the partial descriptions in S4 and S40 in the above embodiments, wherein the related step of inputting the picture to be secondarily detected into the edit distance model may be equally applied to inputting the sample picture into the edit distance model; the related steps of inputting the picture to be secondarily detected into the PDQ model and generating the Hash string of the picture to be secondarily detected can be applied to inputting the sample picture into the PDQ model and generating the Hash string of the sample picture, which is not described herein again.

In the embodiment of the disclosure, after the Hash string of the sample picture is generated, the Hash string of the sample picture is stored to generate a Hash search library.

According to the picture detection method provided by the embodiment of the disclosure, the characteristic vector of the picture to be detected is extracted through the CNN model to perform primary detection and identification, and secondary confirmation is performed through the PDQ model, so that the picture detection method provided by the embodiment of the disclosure has the generalization capability of the CNN model and higher detection and identification robustness of the PDQ model.

Fig. 5 is a schematic diagram of a third embodiment of the present disclosure.

As shown in fig. 5, a third embodiment of the present disclosure further provides a picture detection apparatus 10, where the picture detection apparatus 10 includes: a detected picture acquiring unit 11, a CNN model processing unit 12, a first determining unit 13, an edit distance model processing unit 14, and a second determining unit 15.

The detected picture acquiring unit 11 is used for acquiring a picture to be detected.

The CNN model processing unit 12 is configured to input the picture to be detected into the convolutional neural network CNN model, and compare the picture with the first sample picture in the similarity vector search library.

The first determining unit 13 is configured to mark the picture to be detected as a picture to be detected for the second time when the similarity between the picture to be detected and the first sample picture is smaller than a first preset value and larger than a second preset value.

The edit distance model processing unit 14 is configured to input the picture to be secondarily detected to the edit distance detection model, and compare the edit distance model with the second sample picture in the Hash search library.

The second determining unit 15 is configured to determine that the picture to be secondarily detected and the second sample picture are the same picture when the editing distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value.

The picture detection device 10 provided by the embodiment of the present disclosure obtains a picture to be detected through the detected picture obtaining unit 11; the CNN model processing unit 12 inputs the picture to be detected into a convolutional neural network CNN model, and compares the picture with a first sample picture in a similar vector search library; the first judging unit 13 marks the picture to be detected as a picture to be detected for the second time when the similarity between the picture to be detected and the first sample picture is smaller than a first preset value and larger than a second preset value; the edit distance model processing unit 14 inputs the picture to be secondarily detected to the edit distance detection model, and compares the picture with a second sample picture in the Hash retrieval library; the second determining unit 15 determines that the picture to be secondarily detected and the second sample picture are the same picture when the editing distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value. Therefore, the CNN model is used for carrying out primary identification confirmation on the picture to be detected, and the editing distance model is used for carrying out secondary identification confirmation on the marked picture to be detected, so that the image detection has good identification robustness, and the accuracy of detecting and identifying the picture can be further improved.

Fig. 6 is a schematic diagram according to a fourth embodiment of the present disclosure.

As shown in fig. 6, an embodiment of the present disclosure provides a picture detection apparatus 20.

The picture detection apparatus 20 provided in the embodiment of the present disclosure includes: a detected picture acquiring unit 21, a CNN model processing unit 22, a first judging unit 23, a PDQ model processing unit 24, and a second judging unit 25.

The detected picture acquiring unit 21 is configured to acquire a picture to be detected.

The CNN model processing unit 22 is configured to input the picture to be detected into the CNN model, obtain the feature vector of the picture to be detected, and compare the feature vector with the feature vector of the first sample picture in the similar vector search library.

The first determining unit 23 is configured to mark the picture to be detected as a picture to be detected for the second time when the similarity between the picture to be detected and the first sample picture is smaller than a first preset value and larger than a second preset value; and the image processing device is also used for judging that the image to be detected and the first sample image are the same image under the condition that the similarity between the image to be detected and the first sample image is greater than a first preset value.

The PDQ model processing unit 24 is configured to input the picture to be secondarily detected to a PDQ model based on discrete cosine transform and quality metric, generate a Hash string of the picture to be secondarily detected, and compare the Hash string with a Hash string of a second sample picture in a Hash search library.

As shown in fig. 7, in some embodiments, the PDQ model processing unit 24 includes: a size adjusting unit 241, a picture converting unit 242, a filter processing unit 243, a DCT processing unit 244, a median obtaining unit 245, and a Hash string obtaining unit 246.

The size adjusting unit 241 is used for adjusting the size of the picture to be secondarily detected.

The picture conversion unit 242 is configured to convert the adjusted picture to be detected for the second time into a luminance color space YUV color space, so as to obtain a luminance picture.

The filtering processing unit 243 is configured to divide the luminance picture into a plurality of sub-pictures by using a two-layer tent convolution filter.

The DCT processing unit 244 is configured to calculate a two-dimensional discrete cosine transform DCT during sub-picture down-sampling to obtain a plurality of sub-blocks corresponding to the sub-picture.

The median obtaining unit 245 is configured to calculate medians of a plurality of elements in the transform space corresponding to the plurality of sub-blocks.

The Hash string obtaining unit 246 is configured to generate a Hash string of the picture to be secondarily detected according to the elements and the median in the transform space corresponding to the sub-blocks.

Referring to fig. 6, in a fourth embodiment of the present disclosure, the second determining unit 25 is configured to determine that the picture to be secondarily detected and the second sample picture are the same picture when the hamming distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value.

As shown in fig. 8, the picture detecting device 10 in the embodiments of the present disclosure further includes: a sample picture obtaining unit 100, a similarity vector search library generating unit 200 and a Hash search library generating unit 300.

The sample picture acquiring unit 100 is configured to acquire a plurality of sample pictures.

The similar vector search library generating unit 200 is configured to input the sample picture to the CNN model, obtain a feature vector of the sample picture, and store the feature vector to generate a similar vector search library.

The Hash search library generating unit 300 is configured to input the sample picture to the edit distance detection model, obtain a Hash string of the sample picture, and store the Hash string to generate the Hash search library.

It is understood that a picture detecting apparatus 20 in fig. 6 of the present embodiment and a picture detecting apparatus 10 in the above-described embodiment, the detected picture acquiring unit 21, the first judging unit 23, and the second judging unit 25 may have the same functions and structures as the detected picture acquiring unit 11, the first judging unit 13, and the second judging unit 15 in the above-described embodiment.

It should be noted that the above explanation of the picture detection method is also applicable to the picture detection apparatus of the present embodiment, and is not repeated herein.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 500 includes a computing unit 501 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM502, and the RAM503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, for example, a picture detection method.

For example, in some embodiments, the picture detection method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM502 and/or the communication unit 509. When the computer program is loaded into the RAM503 and executed by the computing unit 501, one or more steps of the picture detection method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the picture detection method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the picture detection methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

Throughout the specification and claims, the term "comprising" is to be interpreted as open-ended, inclusive, meaning that it is "including, but not limited to," unless the context requires otherwise. In the description herein, the terms "some embodiments," "exemplary embodiments," "examples," and the like are intended to indicate that a particular feature, structure, material, or characteristic described in connection with the embodiments or examples is included in at least one embodiment or example of the disclosure. The schematic representations of the above terms are not necessarily referring to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be included in any suitable manner in any one or more embodiments or examples.

"plurality" means two or more unless otherwise specified. The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.

The use of "for" herein means open and inclusive language that does not exclude devices adapted or configured to perform additional tasks or steps.

Additionally, the use of "based on" means open and inclusive, as a process, step, calculation, or other action that is "based on" one or more stated conditions or values may in practice be based on additional conditions or values beyond those stated.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A picture detection method comprises the following steps:

acquiring a picture to be detected;

inputting the picture to be detected into a Convolutional Neural Network (CNN) model, and comparing the picture with a first sample picture in a similar vector search library;

under the condition that the similarity between the picture to be detected and the first sample picture is smaller than a first preset value and larger than a second preset value, marking the picture to be detected as a picture to be detected for the second time;

inputting the picture to be secondarily detected to an editing distance detection model, and comparing the picture to be secondarily detected with a second sample picture in a Hash search library;

and under the condition that the editing distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value, judging that the picture to be secondarily detected and the second sample picture are the same picture.

2. The method according to claim 1, wherein the inputting the picture to be detected to a Convolutional Neural Network (CNN) model, and comparing the picture to a first sample picture in a similarity vector search library, comprises:

inputting the picture to be detected into the CNN model, acquiring the characteristic vector of the picture to be detected, and comparing the characteristic vector with the characteristic vector of the first sample picture in a similar vector search library.

3. The method of claim 1 or 2, further comprising:

and under the condition that the similarity between the picture to be detected and the first sample picture is greater than a first preset value, judging that the picture to be detected and the first sample picture are the same picture.

4. The method as claimed in claim 1, wherein the inputting the picture to be secondarily detected to an edit distance detection model for comparison with a second sample picture in a Hash search library comprises:

and inputting the picture to be secondarily detected into a PDQ (perception-based Q) model based on discrete cosine transform and quality measurement, generating a Hash string of the picture to be secondarily detected, and comparing the Hash string with a Hash string of a second sample picture in a Hash search library.

5. The method according to claim 4, wherein the inputting the picture to be secondarily detected into a PDQ (perceptual Q) model based on discrete cosine transform and quality metric, and generating a Hash string of the picture to be secondarily detected comprises:

adjusting the size of the picture to be secondarily detected;

converting the adjusted picture to be secondarily detected into a YUV color space of a brightness color space to obtain a brightness picture;

dividing the brightness picture into a plurality of sub-pictures by utilizing a two-layer tent convolution filter;

in the sub-picture down-sampling, calculating two-dimensional Discrete Cosine Transform (DCT) to obtain a plurality of sub-blocks corresponding to the sub-picture;

calculating median of a plurality of elements in a transformation space corresponding to a plurality of sub-blocks;

and generating a Hash string of the picture to be secondarily detected according to the elements in the transformation space corresponding to the sub-blocks and the median.

6. The method according to claim 4, wherein the determining that the picture to be secondarily detected and the second sample picture are the same picture in the case that the edit distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value comprises:

and under the condition that the Hamming distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value, judging that the picture to be secondarily detected and the second sample picture are the same picture.

7. The method according to claim 1, wherein before the acquiring the picture to be detected, the method further comprises:

acquiring a plurality of sample pictures;

inputting the sample picture into the CNN model, acquiring a feature vector of the sample picture, and storing the feature vector to generate the similar vector search library;

and inputting the sample picture into the editing distance model, obtaining a Hash string of the sample picture, and storing to generate the Hash search library.

8. A picture detection apparatus comprising:

the detection picture acquisition unit is used for acquiring a picture to be detected;

the CNN model processing unit is used for inputting the picture to be detected into a convolutional neural network CNN model and comparing the picture with a first sample picture in a similar vector search library;

the first judging unit is used for marking the picture to be detected as a picture to be detected for the second time under the condition that the similarity between the picture to be detected and the first sample picture is smaller than a first preset value and larger than a second preset value;

the editing distance model processing unit is used for inputting the picture to be secondarily detected to an editing distance detection model and comparing the picture to be secondarily detected with a second sample picture in a Hash search library;

and the second judging unit is used for judging that the picture to be secondarily detected and the second sample picture are the same picture under the condition that the editing distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value.

9. The apparatus according to claim 8, wherein the CNN model processing unit is further configured to input the picture to be detected into the CNN model, obtain a feature vector of the picture to be detected, and compare the feature vector with the feature vector of the first sample picture in a similarity vector search library.

10. The apparatus according to claim 8 or 9, wherein the first determining unit is further configured to determine that the picture to be detected and the first sample picture are the same picture if the similarity between the picture to be detected and the first sample picture is greater than a first preset value.

11. The apparatus of claim 8, wherein the edit distance model processing unit comprises:

and the PDQ model processing unit is used for inputting the picture to be secondarily detected to a perception algorithm PDQ model based on discrete cosine transform and quality measurement, generating a Hash string of the picture to be secondarily detected, and comparing the Hash string with a Hash string of a second sample picture in a Hash search library.

12. The apparatus of claim 11, wherein the PDQ model processing unit comprises:

the size adjusting unit is used for adjusting the size of the picture to be secondarily detected;

the picture conversion unit is used for converting the adjusted picture to be secondarily detected into a YUV color space of a brightness color space to obtain a brightness picture;

the filtering processing unit is used for dividing the brightness picture into a plurality of sub-pictures by utilizing a two-layer tent convolution filter;

the DCT processing unit is used for calculating two-dimensional Discrete Cosine Transform (DCT) in the sub-picture downsampling to obtain a plurality of sub-blocks corresponding to the sub-picture;

a median obtaining unit, configured to calculate median of a plurality of elements in a transform space corresponding to the plurality of sub-blocks;

and the Hash string obtaining unit is used for generating the Hash string of the picture to be secondarily detected according to the elements in the transformation space corresponding to the sub-blocks and the median.

13. The apparatus according to claim 11, wherein the second determining unit is further configured to determine that the picture to be secondarily detected and the second sample picture are the same picture if a hamming distance between the picture to be secondarily detected and the second sample picture is smaller than a third preset value.

14. The apparatus of claim 8, wherein the apparatus further comprises:

a sample picture acquiring unit for acquiring a plurality of sample pictures;

a similar vector search library generating unit, configured to input the sample picture to the CNN model, obtain a feature vector of the sample picture, and store the feature vector to generate the similar vector search library;

and the Hash search library generating unit is used for inputting the sample picture into the edit distance detection model, acquiring a Hash string of the sample picture, and storing the Hash string to generate the Hash search library.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 7.

16. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.

17. A computer program product, characterized in that when executed by an instruction processor in the computer program product implements the method of any one of claims 1 to 7.