CN117636370A

CN117636370A - Method and device for detecting image content

Info

Publication number: CN117636370A
Application number: CN202311686653.5A
Authority: CN
Inventors: 兴百桥
Original assignee: Shenzhen Xingtong Technology Co ltd
Current assignee: Shenzhen Xingtong Technology Co ltd
Priority date: 2023-12-06
Filing date: 2023-12-06
Publication date: 2024-03-01

Abstract

The disclosure provides a method and a device for detecting image content, wherein the method comprises the following steps: acquiring a first image to be detected and a second image serving as a reference; determining a first result based on the image features of the first image and the image features of the second image; determining a second result based on the variance feature of the first image and the variance feature of the second image; determining a third result based on a text similarity between the text content of the first image and the text content of the second image; based on the first result, the second result, and the third result, it is determined whether the image content of the first image changes relative to the image content of the second image. The method and the device realize that whether the image content of the image to be detected changes relative to the image content of the reference image is detected from three different dimensions, and can improve the accuracy of detection results.

Description

Method and device for detecting image content

Technical Field

The disclosure relates to the field of image technology, and in particular, to a method and a device for detecting image content.

Background

At present, textbooks used by students in different areas are different in content due to different publishers. In the textbook reading module of the learning machine, textbooks of different departments and purposes of each publishing company are supported. The contents of the textbook pages are updated more or less basically in each learning period, so that the textbook pages with updated contents are checked out, and the textbook pages in the textbook reading module of the learning machine are updated correspondingly.

At present, a manual checking mode is generally adopted to check whether the contents of the pages of the new version and the old version are changed or not to check the pages of the textbook with updated contents, but the pages are millions of more than hundreds of thousands of pages, and the manual checking mode is adopted to solve the problems of time and labor waste and low efficiency.

Disclosure of Invention

In order to solve the above technical problems or at least partially solve the above technical problems, embodiments of the present disclosure provide a method and an apparatus for detecting image content.

According to an aspect of the present disclosure, there is provided a method of detecting image content, including:

acquiring a first image to be detected and a second image serving as a reference;

determining a first result based on image features of the first image and image features of the second image;

determining a second result based on the variance feature of the first image and the variance feature of the second image;

determining a third result based on a text similarity between text content of the first image and text content of the second image;

based on the first result, the second result, and the third result, it is determined whether the image content of the first image changes relative to the image content of the second image.

According to another aspect of the present disclosure, there is provided an apparatus for detecting image content, including:

the image acquisition module is used for acquiring a first image to be detected and a second image serving as a reference;

a first determining module configured to determine a first result based on image features of the first image and image features of the second image;

a second determining module configured to determine a second result based on the variance feature of the first image and the variance feature of the second image;

a third determining module, configured to determine a third result based on a text similarity between text content of the first image and text content of the second image;

and a fourth determining module, configured to determine whether an image content of the first image changes relative to an image content of the second image based on the first result, the second result, and the third result.

According to another aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory in which a program is stored,

wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method of detecting image content according to the preceding aspect.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of detecting image content according to the previous aspect.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method of detecting image content of the preceding aspect.

According to one or more technical schemes provided by the embodiment of the disclosure, a first result is determined by acquiring a first image to be detected and a second image serving as a reference, based on the image characteristics of the first image and the image characteristics of the second image, a second result is determined based on the variance characteristics of the first image and the variance characteristics of the second image, and a third result is determined based on the text similarity between the text content of the first image and the text content of the second image, and whether the image content of the first image changes relative to the image content of the second image is determined based on the first result, the second result and the third result. By adopting the scheme of the disclosure, whether the image to be detected changes relative to the image content of the reference image is judged by combining the image characteristics, the variance characteristics and the text content, so that the difference of the image content is detected from three different dimensions, the accuracy of a detection result can be improved, the automatic detection of the difference of the image content is realized, manual participation is not needed, and the detection efficiency is improved.

Drawings

Further details, features and advantages of the present disclosure are disclosed in the following description of exemplary embodiments, with reference to the following drawings, wherein:

FIG. 1 illustrates a flowchart of a method of detecting image content according to an exemplary embodiment of the present disclosure;

FIG. 2 illustrates a flowchart of a method of detecting image content according to another exemplary embodiment of the present disclosure;

FIG. 3 illustrates a flowchart of a method of detecting image content according to yet another exemplary embodiment of the present disclosure;

FIG. 4 shows a schematic block diagram of an apparatus for detecting image content according to an exemplary embodiment of the present disclosure;

fig. 5 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

Fig. 1 illustrates a flowchart of a method of detecting image content according to an exemplary embodiment of the present disclosure, which may be performed by an apparatus for detecting image content provided by an embodiment of the present disclosure, where the apparatus may be implemented in software and/or hardware, and may be generally integrated in an electronic device, where the electronic device includes a computer, a tablet, a mobile phone, a server, and so on.

As shown in fig. 1, the method of detecting image content may include the steps of:

step 101, a first image to be detected and a second image as a reference are acquired.

The first image and the second image are images which need to be compared whether the content is changed or not, the first image is an image to be detected, and the second image is an image serving as a reference. For example, in an educational scenario, the first image may be a new version of a textbook page image and the second image may be an old version of a textbook page image.

For example, when the first image and the second image are acquired, two images to be compared may be uploaded by the user, for example, after the user clicks the uploading entry of the image to be detected, one image is uploaded, the electronic device takes the received image as the first image, after the user clicks the uploading entry of the reference image, one image is uploaded, and the electronic device takes the image uploaded this time as the second image.

In an exemplary embodiment, when the first image and the second image are acquired, an image may be acquired from the electronic device as the first image, the first image is displayed to the user, then a reference image to be compared with the first image is uploaded by the user according to the content of the first image, and the electronic device receives the image uploaded by the user and then uses the image as the second image.

Step 102, determining a first result based on the image features of the first image and the image features of the second image.

The dimension of the image feature may be preset. The image features reflect the distribution condition of the line and the illustration boundary information in the image, are favorable for detecting whether the diagrams, geometric figures and other line illustration in the image change or not, and simultaneously, are favorable for detecting whether the contents of some weak texture illustration change or not.

In this embodiment of the present disclosure, for the obtained first image and second image, corresponding image features may be extracted respectively, and then, based on the image features corresponding to the first image and the image features corresponding to the second image, whether the image content of the first image changes (i.e., whether there is a difference) with respect to the image content of the second image is determined, so as to obtain a first result.

For example, the euclidean distance between the image feature corresponding to the first image and the image feature corresponding to the second image may be calculated, and it is determined whether the calculated euclidean distance value is smaller than a preset distance threshold, if so, a first result that the image content of the first image and the image content of the second image are unchanged (i.e. there is no difference) is obtained, otherwise, a first result that the image content of the first image and the image content of the second image are changed (i.e. there is a difference) is obtained.

For example, the cosine similarity between the image features corresponding to the first image and the image features corresponding to the second image may be calculated, and whether the calculated cosine similarity value is greater than a preset similarity threshold value may be determined, if so, a first result that the image contents of the first image and the second image are unchanged (i.e., there is no difference) may be obtained, otherwise, a first result that the image contents of the first image and the second image are changed (i.e., there is a difference) may be obtained.

Step 103, determining a second result based on the variance feature of the first image and the variance feature of the second image.

In the embodiment of the disclosure, for the obtained first image and second image, corresponding variance features may be extracted respectively, and then, based on the variance features corresponding to the first image and the variance features corresponding to the second image, whether the image content of the first image and the image content of the second image change (i.e. whether there is a difference) is determined, so as to obtain a second result. When calculating the variance characteristic, calculating variances of all pixels in the first image as the variance characteristic of the first image; for the second image, the variance of all pixels in the second image is calculated as a variance feature of the second image. The variance features reflect the texture distribution condition of the image, and are favorable for detecting whether the artwork with rich texture information in the image changes or not.

For example, a currently-used feature similarity calculation manner may be adopted to calculate feature similarity between variance features of the first image and variance features of the second image, and determine whether the obtained feature similarity is greater than a preset value, if so, a second result is obtained that image contents of the first image and the second image are unchanged (i.e. there is no difference), otherwise, a second result is obtained that image contents of the first image and the second image are changed (i.e. there is a difference).

For example, the variance feature of the first image and the variance feature of the second image may be input into a fully-connected network trained in advance, detected by the fully-connected network according to the input variance feature, and a second result of whether the image content of the first image changes with respect to the image content of the second image may be output.

Step 104, determining a third result based on the text similarity between the text content of the first image and the text content of the second image.

For example, for the acquired first image and second image, text recognition may be performed on the first image and second image respectively, for example, optical character recognition (Optical Character Recognition, OCR) may be performed on the first image and second image respectively, so as to obtain first text content corresponding to the first image and second text content corresponding to the second image, and then, text similarity between the first text content and the second text content may be calculated, for example, by using a currently commonly used text similarity calculation manner. Then, comparing the obtained text similarity with a preset text similarity threshold value, judging whether the obtained text similarity is larger than the preset text similarity threshold value, and if the text similarity is larger than the text similarity threshold value, obtaining a third result that the image content of the first image is unchanged (namely, no difference) relative to the image content of the second image; and if the text similarity is not greater than the text similarity threshold, obtaining a third result that the image content of the first image is changed (namely, a difference exists) relative to the image content of the second image. The text similarity threshold may be valued according to actual requirements, for example, the text similarity threshold may be valued 0.9.

It should be noted that, in the embodiment of the present disclosure, the execution sequence of the step 102, the step 103 and the step 104 is not sequential, and the three may be executed sequentially or may be executed simultaneously, and the embodiment only uses the execution sequence of the step 102 to the step 104 as an example to illustrate the present disclosure, and should not be taken as a limitation of the present disclosure.

Step 105, determining whether the image content of the first image changes relative to the image content of the second image based on the first result, the second result and the third result.

In the embodiment of the disclosure, after the first result, the second result and the third result are obtained, the obtained three results may be combined to finally determine whether the image content of the first image changes relative to the image content of the second image.

For example, if the first, second, and third results each indicate that the image content of the first image does not change relative to the image content of the second image, it is determined that the image content of the first image does not change relative to the image content of the second image, and if at least one of the first, second, and third results indicates that the image content of the first image changes relative to the image content of the second image, it is determined that the image content of the first image changes relative to the image content of the second image.

Considering that the image content may contain an illustration, text and background information irrelevant to the content, it is difficult to detect whether various contents in the image change at the same time by adopting a mode of single dimension characteristics such as image characteristics or text characteristics, so that the scheme of the disclosure judges whether the image content changes from three latitudes, wherein a first result obtained based on the feature vector mainly judges whether line and illustration boundary information in the image change from a microcosmic angle, which is beneficial to detecting whether charts, geometric figures and other line illustration in the image change, and simultaneously detects whether some weak texture illustration contents change; the second result obtained based on the variance features is mainly used for detecting whether the artwork with relatively rich texture information in the image changes or not; the third result obtained based on the text content is mainly used for detecting whether the text content in the image changes. Therefore, the emphasis points of the three dimensions are different, and the detection tasks of whether the image content changes or not are completed in a complementary mode, so that the accuracy of the detection result is ensured.

According to the method for detecting the image content, a first image to be detected and a second image serving as a reference are obtained, a first result is determined based on the image characteristics of the first image and the image characteristics of the second image, a second result is determined based on the variance characteristics of the first image and the variance characteristics of the second image, a third result is determined based on the text similarity between the text content of the first image and the text content of the second image, and whether the image content of the first image changes relative to the image content of the second image is determined based on the first result, the second result and the third result. By adopting the scheme of the disclosure, whether the image to be detected changes relative to the image content of the reference image is judged by combining the image characteristics, the variance characteristics and the text content, so that the difference of the image content is detected from three different dimensions, the accuracy of a detection result can be improved, the automatic detection of the difference of the image content is realized, manual participation is not needed, and the detection efficiency is improved.

When the contents of the textbook pages are updated each time, the updated contents can be smaller, for example, individual characters in the textbook pages are modified, and a plurality of lines of the illustration are modified. Based on this, in an alternative embodiment of the present disclosure, a scheme of segmenting the first image and the second image into a plurality of image blocks, detecting whether the content changes for each image block, and further determining whether the image content of the entire image changes according to the detection result is provided, so as to detect a smaller change in the image, and improve the detection accuracy. Thus, as shown in fig. 2, the method of detecting image content of the present disclosure may include the steps of:

in step 201, a first image to be detected and a second image as a reference are acquired.

And 202, respectively segmenting the first image and the second image into a plurality of image blocks according to the same segmentation rule.

The segmentation rule may be preset, for example, the segmentation rule may be to segment two images according to a preset number of grids, the preset number of grids may be set according to actual requirements, for example, the preset number of grids may be set to 8×8, 4*4, 6*9, and so on; for another example, the segmentation rule may be to segment two images into a plurality of image blocks with different sizes, where the proportion of the image blocks obtained by first segmentation of the two images to the whole image is the same, the proportion of the image blocks obtained by second segmentation of the two images to the whole image is the same, but the proportion of the image blocks obtained by second segmentation to the whole image may be different from the proportion of the image blocks obtained by first segmentation to the whole image, and so on. For example, the proportion of the image blocks obtained by first segmentation of the first image to the first image is 1/8, the proportion of the image blocks obtained by first segmentation of the second image to the second image is also 1/8, the proportion of the image blocks obtained by second segmentation of the first image to the first image and the proportion of the image blocks obtained by second segmentation of the second image to the second image are both 1/6.

In the embodiment of the disclosure, for the acquired first image and second image, the first image may be segmented into a plurality of image blocks according to a segmentation rule, and the second image may also be segmented into a plurality of image blocks according to the same segmentation rule.

For example, assuming that the segmentation rule is to segment two images according to an 8×8 grid number, the first image is equally divided into 64 image blocks according to an 8×8 grid number, and the second image is equally divided into 64 image blocks according to an 8×8 grid number.

It should be noted that, in an alternative embodiment of the present disclosure, in order to further ensure accuracy of a detection result, before the first image and the second image are segmented into image blocks, the first image and the second image may be scaled to the same size, so that the sizes of two image blocks at the same position obtained by segmentation according to the same segmentation rule are the same, so that features extracted from images with the same content are consistent, and accuracy of the detection result is ensured.

And 203, extracting the characteristics of each image block to obtain the image characteristics corresponding to each image block.

In the embodiment of the disclosure, for each image block, feature extraction may be performed on the image block to obtain an image feature corresponding to the image block. The present embodiment is not limited to a specific manner of feature extraction.

For example, assuming that the segmentation rule is to segment two images according to the number of 8 x 8 palace, 64 image blocks are segmented for the first image, 64 image blocks are segmented for the second image, and then feature extraction is performed on the 128 image blocks to obtain image features corresponding to each image block, namely 64 feature vectors corresponding to the first image and 64 feature vectors corresponding to the second image, and 128 image features are obtained in total.

In an optional embodiment of the present disclosure, when extracting the image feature of each image block, the image block may be input into a feature extraction model trained in advance, and feature extraction is performed on the image block by using the feature extraction model, so as to obtain the image feature corresponding to the image block.

In an optional embodiment of the present disclosure, when extracting the image feature of each image block, gray processing may be performed on the image block to obtain a gray small image, then the gray small image is subjected to segmentation processing to obtain a segmented small image, for example, the gray small image may be segmented by using a Sobel operator to obtain a segmented small image, and the gray small image may also be segmented by using other segmentation algorithms, where a specific segmentation mode is not limited in the present disclosure; then, based on the pixel value of each pixel point in the segmentation small image and the pixel average value of all the pixel points in the segmentation small image, carrying out binarization processing on the segmentation small image to obtain a binary small image; and then, carrying out channel splicing on the gray level small image, the segmentation small image and the binary small image to generate a three-channel small image, and inputting the three-channel small image into a pre-trained feature extraction model to carry out feature extraction to obtain the image features corresponding to the image block.

Specifically, after obtaining the segmentation small image, the pixel value of each pixel point in the segmentation small image can be counted, an average value is calculated based on the pixel values to be used as the pixel average value of all the pixel points in the segmentation small image, each pixel point in the segmentation small image is traversed, the pixel value of each pixel point in the segmentation small image is compared with the pixel average value, if the pixel value of one pixel point is larger than the pixel average value, the pixel value of the pixel point is set to be 1, and if the pixel value of one pixel point is smaller than or equal to the pixel average value, the pixel value of the pixel point is set to be 0, so that the binarization processing of the segmentation small image is realized, and a binary small image is obtained.

In the embodiment of the disclosure, the channel splicing sequence of the gray-scale small image, the segmentation small image and the binary small image is consistent with the channel splicing sequence of the gray-scale image, the segmentation small image and the binary image in the training sample when the feature extraction model is obtained by training. For example, when a training sample of a training feature extraction model is constructed, a gray scale image is used as a first channel, a segmentation image is used as a second channel, and a binary image is used as a third channel for channel stitching, when image features are extracted, channel stitching is performed according to the sequence of the gray scale image as the first channel, the segmentation image as the second channel, and the binary image as the third channel, so as to obtain three-channel images, and then the three-channel images are input into the feature extraction model for feature extraction, so that image features corresponding to image blocks are obtained. In addition, the feature extraction model can output feature vectors with fixed dimensions, and the present disclosure does not limit the feature extraction model used, for example, a LeNet-5 model may be used as the feature extraction model, or other models may be used as the feature extraction model.

In the embodiment of the disclosure, each image block is processed into the gray level small image, the divided small image is obtained based on the gray level small image, the binary processing is carried out based on the divided small image to obtain the binary small image, and then the three-channel small image is obtained by channel splicing of the small images to carry out feature extraction, so that the image feature corresponding to each image block is obtained, wherein the gray level small image retains the complete image texture feature in the image block, the divided small image removes the invalid background information, the feature of the line pattern is extracted, the binary small image further filters the background information, and the feature extraction model focuses on the foreground feature, so that the obtained three-channel small image can effectively retain the texture feature, the line feature and remove the background information in the image block, and further the accuracy of feature extraction is improved, and the method is beneficial to coping with various image contents in the image block.

Step 204, determining a first result based on the image feature corresponding to each image block in the first image and the image feature corresponding to each image block in the second image.

In the embodiment of the disclosure, after obtaining the image features of each image block in the first image and the image features of each image block in the second image, a first result of whether the image content of the first image changes relative to the image content of the second image may be determined based on the image features.

In an optional embodiment of the disclosure, image features of each image block in the first image may be spliced according to an order of each image block in the first image to obtain image features corresponding to the first image, and image features of each image block in the second image may be spliced according to an order of each image block in the second image to obtain image features corresponding to the second image, so as to calculate feature similarity between the image features of the first image and the image features of the second image, and if the feature similarity is greater than a preset similarity threshold, obtain a first result that image content of the first image is unchanged relative to image content of the second image, or obtain a first result that image content of the first image is changed relative to image content of the second image.

In an optional embodiment of the disclosure, a sum value, an average value, and the like of image features of each image block in the first image may be calculated as the image features of the first image, and image features of the second image are calculated in the same manner, so as to calculate feature similarity between the image features of the first image and the image features of the second image, and if the feature similarity is greater than a preset similarity threshold, a first result that the image content of the first image is unchanged relative to the image content of the second image is obtained, otherwise, a first result that the image content of the first image is changed relative to the image content of the second image is obtained.

In an optional embodiment of the disclosure, whether the content of the image block at the corresponding position in the first image and the second image is changed or not may be compared, and whether the content of the image of the first image is changed relative to the content of the image of the second image is finally determined according to the comparison result of each image block. Specifically, whether the contents of the two image blocks at the same position are the same or not can be determined based on the image characteristics of the image blocks at the same position in the first image and the second image, and a first result that the image contents of the first image are unchanged relative to the image contents of the second image is obtained in response to the contents of the two image blocks at each position in the first image and the second image being the same. If the contents of the two image blocks at least one position in the first image and the second image are different, a first result that the image content of the first image is changed relative to the image content of the second image is obtained.

Wherein, the image blocks at the same position refer to the rows and columns of the image blocks in the first image, which are the same as the rows and columns in the second image. For example, the image block in the third column of the first row in the first image and the image block in the third column of the first row in the second image are two image blocks in the same position, and the image block in the second column of the fifth row in the first image and the image block in the second column of the fifth row in the second image are two image blocks in the same position.

In the embodiment of the disclosure, since the first image and the second image are segmented according to the same segmentation rule, each image block in the first image can find an image block in the second image in a corresponding position, that is, the same position corresponds to one image block in the first image and one image block in the second image, if the image contents of the two image blocks are the same, the extracted image features should be the same, so that whether the contents of the two image blocks are the same can be determined based on the feature vectors of the two image blocks in the same position. For example, the euclidean distance of the feature vectors of the two image blocks at the same position can be calculated, whether the euclidean distance is smaller than a preset distance threshold value is judged, if yes, the content of the two image blocks at the position is identical, and if not, the content of the two image blocks at the position is different. After determining whether the contents of the two image blocks at each position are the same, whether the contents of the two image blocks at each position are the same can be judged, if the contents of the two image blocks at each position are the same, a first result that the image content of the first image is unchanged relative to the image content of the second image is obtained, and if the contents of the two image blocks at least one position are different, a first result that the image content of the first image is changed relative to the image content of the second image is obtained.

In the embodiment of the disclosure, the image is segmented into the plurality of image blocks, and whether the contents of the image blocks at the same position are changed is compared to determine whether the contents of the whole image are changed, so that the change of the contents subjected to micro adjustment can be detected well, and the detection precision is improved.

Step 205, determining a second result based on the variance feature of the first image and the variance feature of the second image.

Step 206, determining a third result based on the text similarity between the text content of the first image and the text content of the second image.

Step 207 of determining whether the image content of the first image changes relative to the image content of the second image based on the first result, the second result and the third result.

It should be noted that, in the embodiment of the present disclosure, the explanation of step 205 to step 207 may be referred to the related descriptions of step 103 to step 105 in the previous embodiment, and will not be repeated here.

According to the method for detecting the image content, the first image and the second image are segmented into the plurality of image blocks according to the preset Gong Geshu, the corresponding image characteristics are obtained by extracting the characteristics of each image block, and further, the first result of whether the image content of the two images is changed or not is determined based on the image characteristics corresponding to each image block in the first image and the image characteristics of each image block in the second image, so that finer characteristics can be extracted, and the accuracy of the first result is guaranteed.

In an alternative embodiment of the present disclosure, the first image and the second image have the same size, as shown in fig. 3, and step 103 may include the following substeps on the basis of the foregoing example:

and 300, respectively segmenting the first image and the second image into a plurality of image blocks according to the same segmentation rule.

In the embodiment of the disclosure, for the obtained first image and second image with the same size, or the obtained first image and second image with the same size after the scaling treatment, the first image may be segmented into a plurality of image blocks according to a segmentation rule, and the second image may also be segmented into a plurality of image blocks according to the same segmentation rule.

And step 301, scaling each image block to a preset size to obtain a scaled small image.

The preset size may be preset according to actual requirements, for example, the preset size may be 64×64, 128×128, 64×128, etc.

In the embodiment of the disclosure, after the first image and the second image are respectively segmented into a plurality of image blocks according to the same segmentation rule, each image block may be scaled to a preset size to obtain a scaled small image. It can be understood that scaling an image block to a preset size refers to scaling the image block horizontally and vertically to the number of pixels corresponding to the preset size. For example, if the preset size is 64×128, the image block is scaled according to the size, and the obtained scaled small image has 64 pixels in the horizontal direction and 128 pixels in the vertical direction.

And 302, calculating variance based on the pixel value of each pixel point in the scaled small image to obtain variance characteristics corresponding to the scaled small image.

In an alternative embodiment of the disclosure, for each scaled small image in the first image, a pixel value of each pixel point in the scaled small image may be obtained, and a variance of the pixel values is found in one scaled small image, and the variance is used as a variance feature corresponding to the scaled small image. Similarly, for each scaled thumbnail in the second image, the variance of the pixel values is found in one scaled thumbnail and is used as the variance feature of the scaled thumbnail.

In an alternative embodiment of the present disclosure, for a scaled thumbnail, the following may be in terms of m ⁱ *m ⁱ Dividing the scaled small image into a plurality of grids, and calculating variance based on pixel values of pixel points in each grid to obtain dimension m ⁱ *m ⁱ Wherein m is a positive integer, i is a natural number, the value of i is successively reduced from an initial value to 0 according to a preset step length, and the initial value is not more than log _m n, n is the row number of the zoomed image; next, the obtained dimension is m ⁱ *m ⁱ And (3) splicing the variance features of the small images to obtain variance features corresponding to the scaled small images.

For example, if the image block is scaled to a small image with a size of 64×64, a scaled small image with a size of 64×64 is obtained, n is 64, m is 2, and the initial value is not greater than log ₂ 64 =6, i.e. the value of i is not greater than6. Assuming that the initial value is 4, the preset step length is 1, that is, the value of i is gradually reduced from the initial value by 1 until i is 0. Then the variance characteristics of the scaled small images are determined by the method of 2 on the scaled small images of 64 x 64 ⁴ *2 ⁴ (i.e., 16 x 16) squares the scaled plot was aliquoted, each square after the aliquotation was 4*4 in size, and then the variance of the pixel values was found in each 4*4 square, resulting in a variance feature with dimensions 16 x 16 = 256. Next, i-1 is given i=3, and the scaling is continued on the same 64×64 scaled plot, according to 2 ³ *2 ³ (i.e., 8 x 8) cells divide the scaled plot equally, each cell size after the equally dividing is 8 x 8, and then the variance of the pixel values is found in each 8 x 8 cell, resulting in a variance feature with dimension 8 x 8 = 64. Next, i-1 is given i=2, and the scaling is continued on the same 64×64 scaled plot, according to 2 ² *2 ² (i.e., 4*4) dividing the scaled plot equally, dividing each scaled plot by 16 x 16, and then solving for the variance of the pixel values in each 16 x 16 plot to obtain a variance feature having dimensions 4*4 =16. Next, i-1 is obtained to obtain i=1, the scaled small image is further divided equally according to 2×2 grids on the same scaled small image with 64×64, each divided small image is 32×32, and then the variance of the pixel value is calculated in each 32×32 grid, so as to obtain the variance feature with the dimension of 2×2=4. Next, i-1 is given i=0, and the process is continued on the same 64×64 scaled plot, according to 2 ⁰ *2 ⁰ And dividing the scaled small image equally by the palace lattice, namely solving the variance of the pixel values in the scaled small image with the size of 64 x 64, and obtaining the variance characteristic with the dimension of 1. And then, uniformly splicing the variance features of the dimensions into a row to obtain variance features of which the dimensions are 341, namely 341-dimensional variance features corresponding to the scaled small images.

For another example, if the image block is scaled to a small image with a size of 64×64, a scaled small image with a size of 64×64 is obtained, n is 64, m is 2, and the initial value is not greater than log ₂ 64 =6, i.e. i has a value not greater than 6. Assuming that the initial value is 5, the preset step length is 2, that is, the value of i is gradually reduced by 2 from the initial value 5 until i is 0. Then the variance characteristics of the scaled small images are determined by the method of 2 on the scaled small images of 64 x 64 ⁵ *2 ⁵ The scaled plot is equally divided by (i.e., 32 x 32) bins, each bin size after the equally dividing is 2 x 2, and then the variance of the pixel values is calculated in each 2 x 2 bin, resulting in a variance feature with dimension 32 x 32=1024. Next, i-2 is given i=3, and the process is continued on the same 64×64 scaled plot, according to 2 ³ *2 ³ (i.e., 8 x 8) cells divide the scaled plot equally, each cell size after the equally dividing is 8 x 8, and then the variance of the pixel values is found in each 8 x 8 cell, resulting in a variance feature with dimension 8 x 8 = 64. Next, i-2 is obtained to obtain i=1, the scaled small image is further divided equally according to 2×2 grids on the same scaled small image with 64×64, each divided small image is 32×32, and then the variance of the pixel value is calculated in each 32×32 grid, so as to obtain the variance feature with the dimension of 2×2=4. Next, i-2 is taken to be i= -1, in this case i=0, continuing on the same 64 x 64 scaled plot, as per 2 ⁰ *2 ⁰ And dividing the scaled small image equally by the palace lattice, namely solving the variance of the pixel values in the scaled small image with the size of 64 x 64, and obtaining the variance characteristic with the dimension of 1. And then uniformly splicing the variance features of the dimensions into a row to obtain 1093-dimensional variance features of the dimensions, namely 1093-dimensional variance features corresponding to the scaled small images.

In the embodiment of the disclosure, by extracting the variance characteristic on each scaled small image and the small lattice extracting the variance characteristic from small to large, the extracted image texture distribution information characteristic is changed from thin to thick just like a golden tower, so that whether the inserted image with rich texture information in the image is changed or not is detected, and the detection precision is improved.

Step 303, determining a second result based on the variance feature corresponding to each scaled small image in the first image and the variance feature corresponding to each scaled small image in the second image.

In the embodiment of the disclosure, after the variance features of each scaled small image in the first image and the second image are extracted, a second result of whether the image content of the first image changes relative to the image content of the second image may be determined based on the variance features.

In an optional embodiment of the present disclosure, variance features of the scaled small images in the same line in the first image may be summed, averaged, and the obtained result may be used as the variance feature of the line, and then the variance features of each line may be spliced in the line direction to obtain the variance feature corresponding to the first image. The variance characteristic corresponding to the second image can be obtained in the same manner. And further, feature similarity between the variance feature of the first image and the variance feature of the second image can be calculated, if the feature similarity is larger than a preset similarity threshold, a second result that the image content of the first image is unchanged relative to the image content of the second image is obtained, and otherwise, a second result that the image content of the first image is changed relative to the image content of the second image is obtained.

In an optional embodiment of the disclosure, a sum value, an average value, and the like of variance features of each scaled small image in the first image may be calculated as the variance features of the first image, and variance features of the second image may be calculated in the same manner, so as to calculate feature similarity between the variance features of the first image and the variance features of the second image, and if the feature similarity is greater than a preset similarity threshold, a second result is obtained that the image content of the first image is unchanged relative to the image content of the second image, otherwise, a second result is obtained that the image content of the first image is changed relative to the image content of the second image.

In an optional embodiment of the present disclosure, variance features of scaled small images at the same position in the first image and the second image may be stitched to obtain stitching features; then, inputting the spliced characteristics into a pre-trained fully-connected network to perform two-classification to obtain a classification result, wherein the classification result comprises a first confidence coefficient with the same content of two scaled small images at the same position and a second confidence coefficient with different content of the two scaled small images at the same position; determining that the contents of the two scaled small figures at the same position are the same in response to the first confidence coefficient being greater than the second confidence coefficient; in response to the first confidence being no greater than the second confidence, it is determined that there is a difference in the contents of the two scaled thumbnail images at the same location. Then, comparing whether the contents of the two scaled small images at each position in the first image and the second image are the same or not, and responding to the fact that the contents of the two scaled small images at each position in the first image and the second image are the same to obtain a second result that the image contents of the first image are unchanged relative to the image contents of the second image; and responding to the fact that the contents of the two scaled small images at least one position in the first image and the second image are different, and obtaining a second result that the image contents of the first image are changed relative to the image contents of the second image.

For example, assuming that the variance feature corresponding to the scaled small image is 341 dimension, stitching the variance feature corresponding to the scaled small image of the first row and the first column in the first image with the variance feature of the first row and the first column in the second image to form a stitching feature of 682 dimension, inputting the stitching feature of 682 dimension into a pre-trained fully-connected network, performing two classification by the fully-connected network based on the feature vector, and outputting a classification result, if the first confidence coefficient of the same content in the classification result is greater than the second confidence coefficient of the different content, determining that the content of two scaled small images of the first row and the first column in the first image is the same. In the same manner, it may be determined whether the contents of the two scaled thumbnail images for each location are the same. And if the contents of the two scaled small images at each position are the same, obtaining a second result that the image contents of the first image are the same relative to the image contents of the second image, and if the contents of the two scaled small images at least one position are different, obtaining a second result that the image contents of the first image are different relative to the image contents of the second image.

In the embodiment of the disclosure, the variance features of the two scaled small images at the same position are spliced to obtain the spliced feature, whether the contents of the two scaled small images at the same position are identical is detected based on the spliced feature, and further based on the detection result of whether the scaled small images at each position are identical, a second result of whether the image contents of the first image are identical relative to the contents of the second image is finally determined, so that whether the contents are changed is detected by detecting the changes of the contents on the small images, the change of the contents subjected to micro adjustment can be detected well, and the detection precision is improved.

According to the method for detecting the image content, the first image and the second image with the same size are respectively segmented into the plurality of image blocks according to the same segmentation rule, each image block is scaled to the preset size to obtain the scaled small image, variance calculation is conducted on the basis of the pixel value of each pixel point in the scaled small image to obtain variance characteristics corresponding to the scaled small image, and further the second result is determined on the basis of the variance characteristics corresponding to each scaled small image in the first image and the variance characteristics corresponding to each scaled small image in the second image, so that texture changes in a small range in the images can be captured, and detection accuracy is improved.

In an alternative embodiment of the present disclosure, the variance feature and the conventional convolution feature may be combined to detect whether the contents of two scaled small images at the same location are the same, where in this case, when training to obtain a fully connected network, the convolution feature and the variance feature of the image need to be combined as training samples to train to obtain the fully connected network, where the convolution feature may be extracted by using a currently commonly used convolution network. Thus, in this embodiment, the method for detecting an image content difference of the present disclosure further includes: inputting the scaled small image into a pre-trained convolution feature extraction model to obtain corresponding convolution features. That is, for each scaled thumbnail, it may be input into a convolution feature extraction model, and the convolution feature extraction model performs convolution feature extraction to obtain a corresponding convolution feature, thereby obtaining a convolution feature corresponding to each scaled thumbnail in the first image and the second image. Furthermore, the variance features and the convolution features corresponding to the scaled small images at the same position in the first image and the second image can be spliced to obtain splicing features, and whether the contents of the two scaled small images at the same position are the same or not is detected by using the splicing features. When the variance features and the convolution features of the two scaled small images at the same position are spliced, the two scaled small images can be spliced according to a preset splicing mode, the preset splicing mode can be determined according to the splicing mode of the convolution features and the variance features in the training samples when the full-connection network is obtained through training, and the method is not limited.

In the embodiment of the disclosure, whether the content of the scaled small image at the same position is the same or not is detected by combining the variance characteristic and the convolution characteristic, so that whether the image with rich texture information is changed or not can be better captured, and the accuracy of a detection result is improved.

The exemplary embodiments of the present disclosure also provide an apparatus for detecting image content. Fig. 4 shows a schematic block diagram of an apparatus for detecting image content according to an exemplary embodiment of the present disclosure, and as shown in fig. 4, the apparatus 40 for detecting image content includes: the image acquisition module 410, the first determination module 420, the second determination module 430, the third determination module 440, and the fourth determination module 450.

The image acquisition module 410 is configured to acquire a first image to be detected and a second image serving as a reference;

a first determining module 420, configured to determine a first result based on the image feature of the first image and the image feature of the second image;

a second determining module 430, configured to determine a second result based on the variance feature of the first image and the variance feature of the second image;

a third determining module 440, configured to determine a third result based on a text similarity between the text content of the first image and the text content of the second image;

A fourth determining module 450, configured to determine whether the image content of the first image changes relative to the image content of the second image based on the first result, the second result, and the third result.

Optionally, the first determining module 420 further includes:

the first segmentation unit is used for segmenting the first image and the second image into a plurality of image blocks according to the same segmentation rule;

the feature extraction unit is used for extracting features of each image block to obtain image features corresponding to each image block;

and the first determining unit is used for determining a first result based on the image characteristics corresponding to each image block in the first image and the image characteristics corresponding to each image block in the second image.

Optionally, the feature extraction unit is further configured to:

carrying out gray processing on the image block to obtain a gray small image;

dividing the gray level small image to obtain a divided small image;

based on the pixel value of each pixel point in the segmentation small image and the pixel average value of all the pixel points in the segmentation small image, carrying out binarization processing on the segmentation small image to obtain a binary small image;

Performing channel splicing on the gray-scale small image, the segmentation small image and the binary small image to generate a three-channel small image;

and inputting the three-channel small image into a pre-trained feature extraction model to perform feature extraction, so as to obtain the image features.

Optionally, the first determining unit is further configured to:

determining whether contents of two image blocks at the same position are the same or not based on image characteristics of the image blocks at the same position in the first image and the second image;

responding to the fact that the content of two image blocks at each position in the first image and the second image is the same, and obtaining a first result that the image content of the first image is unchanged relative to the image content of the second image;

and responding to the fact that the contents of two image blocks at least one position in the first image and the second image are different, and obtaining a first result that the image contents of the first image are changed relative to the image contents of the second image.

Optionally, the first image and the second image are the same size; the second determining module 430 includes:

the second segmentation unit is used for segmenting the first image and the second image into a plurality of image blocks according to the same segmentation rule;

The scaling unit is used for scaling each image block to a preset size to obtain a scaled small image;

the calculating unit is used for carrying out variance calculation based on the pixel value of each pixel point in the scaled small image to obtain variance characteristics corresponding to the scaled small image;

and the second determining unit is used for determining a second result based on the variance characteristic corresponding to each scaled small image in the first image and the variance characteristic corresponding to each scaled small image in the second image.

Optionally, the computing unit is further configured to:

according to m ⁱ *m ⁱ Dividing the scaled small image into a plurality of grids by the grids, and performing variance calculation based on pixel values of pixel points in each grid to obtain a dimension of m ⁱ *m ⁱ Wherein m is a positive integer, i is a natural number, the value of i is successively reduced from an initial value to 0 according to a preset step length, and the initial value is not more than log _m n, n is the rank number of the zoomed image;

the obtained dimension is the m ⁱ *m ⁱ And (3) splicing the variance features of the scaled small images to obtain variance features corresponding to the scaled small images.

Optionally, the second determining unit is further configured to:

the variance characteristics of the scaled small images at the same positions in the first image and the second image are spliced to obtain splicing characteristics;

Inputting the spliced features into a pre-trained fully-connected network to perform secondary classification to obtain a classification result, wherein the classification result comprises a first confidence coefficient with the same content of two scaled small images at the same position and a second confidence coefficient with different content of the two scaled small images at the same position;

determining that the content of the two scaled small figures at the same location is the same in response to the first confidence being greater than the second confidence;

responding to the fact that the content of the two scaled small images at each position in the first image and the second image is the same, and obtaining a second result that the image content of the first image is unchanged relative to the image content of the second image;

and responding to the fact that the contents of the two scaled small images at least one position in the first image and the second image are different, and obtaining a second result that the image contents of the first image are changed relative to the image contents of the second image.

Optionally, the fully connected network is obtained by combining and training convolution characteristics and variance characteristics of the image; the apparatus 40 for detecting image content further includes:

the feature extraction module is used for inputting the scaled small image into a pre-trained convolution feature extraction model to obtain corresponding convolution features;

The second determining unit is further configured to:

and splicing the variance features and the convolution features of the scaled small images at the same positions in the first image and the second image to obtain splicing features.

Optionally, the third determining module 440 is further configured to:

performing text recognition on the first image to obtain first text content corresponding to the first image;

performing text recognition on the second image to obtain second text content corresponding to the second image;

calculating text similarity between the first text content and the second text content;

responding to the text similarity being greater than a text similarity threshold, and obtaining a third result that the image content of the first image is unchanged relative to the image content of the second image;

and responding to the text similarity not being larger than the text similarity threshold, and obtaining a third result that the image content of the first image is changed relative to the image content of the second image.

Optionally, the fourth determining module 450 is further configured to:

determining that the image content of the first image is unchanged relative to the image content of the second image in response to the first result, the second result, and the third result each indicating that the image content of the first image is unchanged relative to the image content of the second image;

Or,

in response to at least one of the first, second, and third results indicating a change in image content of the first image relative to image content of the second image, determining that image content of the first image is changing relative to image content of the second image.

The device for detecting the image content provided by the embodiment of the disclosure can execute any method for detecting the image content applicable to the electronic equipment, and has the corresponding functional modules and beneficial effects of the execution method. Details of the embodiments of the apparatus of the present disclosure that are not described in detail may refer to descriptions of any of the embodiments of the method of the present disclosure.

The exemplary embodiments of the present disclosure also provide an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method of detecting image content according to embodiments of the present disclosure when executed by the at least one processor.

The present disclosure also provides a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method of detecting image content according to an embodiment of the present disclosure.

The exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform a method of detecting image content according to an embodiment of the present disclosure.

Referring to fig. 5, a block diagram of an electronic device 1100 that may be a server or client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the electronic device 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in the electronic device 1100 are connected to the I/O interface 1105, including: an input unit 1106, an output unit 1107, a storage unit 1108, and a communication unit 1109. The input unit 1106 may be any type of device capable of inputting information to the electronic device 1100, and the input unit 1106 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 1107 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 1108 may include, but is not limited to, magnetic disks, optical disks. The communication unit 1109 allows the electronic device 1100 to exchange information/data with other devices through computer networks such as the internet and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1101 performs the respective methods and processes described above. For example, in some embodiments, the method of detecting image content may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto electronic device 1100 via ROM 1102 and/or communication unit 1109. In some embodiments, the computing unit 1101 may be configured to perform the method of detecting image content by any other suitable means (e.g., by means of firmware).

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The terms "machine-readable medium" and "computer-readable medium" as used in this disclosure refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) for providing machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims

1. A method of detecting image content, wherein the method comprises:

2. The method for detecting image content as recited in claim 1, wherein,

the determining a first result based on the image features of the first image and the image features of the second image comprises:

dividing the first image and the second image into a plurality of image blocks according to the same dividing rule;

extracting the characteristics of each image block to obtain the image characteristics corresponding to each image block;

and determining a first result based on the image characteristics corresponding to each image block in the first image and the image characteristics corresponding to each image block in the second image.

3. The method for detecting image content according to claim 2, wherein the feature extraction of each image block to obtain the image feature corresponding to each image block includes:

carrying out gray processing on the image block to obtain a gray small image;

dividing the gray level small image to obtain a divided small image;

4. The method for detecting image content according to claim 2, wherein said determining a first result based on the image feature corresponding to each image block in the first image and the image feature corresponding to each image block in the second image comprises:

5. The method of detecting image content according to claim 1, wherein the first image and the second image are the same size; the determining a second result based on the variance feature of the first image and the variance feature of the second image comprises:

scaling each image block to a preset size to obtain a scaled small image;

performing variance calculation based on the pixel value of each pixel point in the scaled small image to obtain variance characteristics corresponding to the scaled small image;

and determining a second result based on the variance characteristic corresponding to each scaled small image in the first image and the variance characteristic corresponding to each scaled small image in the second image.

6. The method for detecting image content according to claim 5, wherein said calculating a variance based on the pixel value of each pixel point in the scaled small image to obtain a variance feature corresponding to the scaled small image includes:

7. The method of detecting image content according to claim 6, wherein the determining a second result based on the variance feature corresponding to each scaled thumbnail in the first image and the variance feature corresponding to each scaled thumbnail in the second image comprises:

8. The method of detecting image content according to claim 7, wherein the fully connected network is trained from a combination of convolution features and variance features of the image;

the method further comprises the steps of:

inputting the scaled small image into a pre-trained convolution feature extraction model to obtain corresponding convolution features;

The step of stitching the variance features of the scaled small images at the same positions in the first image and the second image to obtain stitching features, including:

9. The method of detecting image content according to claim 1, wherein the determining a third result based on a text similarity between the text content of the first image and the text content of the second image comprises:

10. The method of detecting image content according to claim 1, wherein the determining whether the image content of the first image changes relative to the image content of the second image based on the first result, the second result, and the third result comprises:

or,

11. An apparatus for detecting image content, wherein the apparatus comprises: