WO2019184464A1 - 检测近似重复图像 - Google Patents

检测近似重复图像 Download PDF

Info

Publication number
WO2019184464A1
WO2019184464A1 PCT/CN2018/122069 CN2018122069W WO2019184464A1 WO 2019184464 A1 WO2019184464 A1 WO 2019184464A1 CN 2018122069 W CN2018122069 W CN 2018122069W WO 2019184464 A1 WO2019184464 A1 WO 2019184464A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
image
network
target image
approximate
Prior art date
Application number
PCT/CN2018/122069
Other languages
English (en)
French (fr)
Inventor
康丽萍
魏晓明
Original Assignee
北京三快在线科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京三快在线科技有限公司 filed Critical 北京三快在线科技有限公司
Priority to US17/043,656 priority Critical patent/US20210019872A1/en
Priority to EP18911696.5A priority patent/EP3772036A4/en
Publication of WO2019184464A1 publication Critical patent/WO2019184464A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present application relates to the field of computer technology, and in particular to detecting approximate repeated images.
  • An approximate repeated image refers to an image having only differences in color, saturation, cropping, shooting angle, watermark, etc., and there are many types of approximate repeated images in UGC (User Generated Content) data. The appearance of these approximate repeated images will have a negative impact on the training of search and recommendation models and the display of search recommendation results. Thereby affecting the user experience.
  • the present application provides an approximate repeated image detection method to improve the accuracy of the approximate repeated image detection in the prior art.
  • an embodiment of the present application provides an approximate repeated image detection method, including: determining, by a multitasking network model, a first feature and a second feature of an input target image, where the first feature is a class An image feature of the inter-information and the in-class information, the second feature is an image feature reflecting an intra-class difference; and the fusion feature of the target image is constructed according to the first feature and the second feature of the target image And determining, according to the fusion feature, whether the target image is an approximate repeated image of the candidate image.
  • an embodiment of the present application provides an information retrieval apparatus, including: a feature extraction module, configured to respectively determine a first feature and a second feature of an input target image by using a multitasking network model, where the first The feature includes an image feature that reflects the difference between the class and the intra-class difference, the second feature is an image feature that reflects the difference within the class, and the feature fusion module is configured to use the feature image of the target image extracted by the feature extraction module a feature and the second feature, constructing a fusion feature of the target image; an approximate repeated image detection module, configured to determine, according to the fusion feature determined by the feature fusion module, whether the target image is an approximation of a candidate image Repeat the image.
  • a feature extraction module configured to respectively determine a first feature and a second feature of an input target image by using a multitasking network model, where the first The feature includes an image feature that reflects the difference between the class and the intra-class difference, the second feature is an image feature that reflects the difference within the class, and the feature
  • an embodiment of the present application further discloses an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program
  • the approximate repeated image detecting method described in the embodiment of the present application is implemented.
  • an embodiment of the present application provides a computer readable storage medium, where a computer program is stored thereon, and the program is executed by a processor, and the steps of the approximate repeated image detecting method disclosed in the embodiment of the present application are performed.
  • the approximate repeated image detecting method disclosed in the embodiment of the present application determines the first feature and the second feature of the input target image respectively by the multi-task network model, wherein the first feature includes an image reflecting the difference between the classes and the intra-class difference a feature, the second feature is an image feature reflecting an intra-class difference; constructing a fusion feature of the target image according to the first feature and the second feature of the target image; determining the target image according to the fusion feature Whether it is an approximate repeating image of the candidate image.
  • the approximate repeated image detection method disclosed in the present application constructs the fusion feature of the image by combining the intra-class information and the inter-class information of the image, so that the image fusion feature is more comprehensive, thereby improving the accuracy of the approximate repeated image detection.
  • Embodiment 1 is a flowchart of an approximate repeated image detecting method according to Embodiment 1 of the present application;
  • FIG. 3 is a schematic structural diagram of a classification model according to Embodiment 2 of the present application.
  • FIG. 4 is a schematic structural diagram of a multi-tasking network according to Embodiment 2 of the present application.
  • Figure 5 is a schematic diagram of an approximately repeated image
  • Figure 6 is a schematic diagram of a non-approximate repeating image pair
  • FIG. 7 is a schematic structural diagram of an approximate repeated image detecting apparatus according to Embodiment 3 of the present application.
  • FIG. 8 is a second schematic structural diagram of an approximate repeated image detecting apparatus according to Embodiment 3 of the present application.
  • FIG. 9 is a third schematic structural diagram of an approximate repeated image detecting apparatus according to Embodiment 3 of the present application.
  • image features are first acquired by a classification method and then the image is detected. For different types of images, the acquired features are better distinguishable. However, in the approximate repeated graph detection scene, the images in the candidate set are mostly of the same category, and the image feature is used to detect the image, which has the problem of low detection accuracy.
  • the approximate repeated image detecting method provided by the present application can improve the detection accuracy of the approximate repeated image.
  • the following description will be made in conjunction with specific embodiments.
  • An approximate repeated image detecting method disclosed in this embodiment as shown in FIG. 1, the method includes steps 110 to 130.
  • Step 110 Determine a first feature and a second feature of the input target image by using a multitasking network model.
  • the first feature includes image features reflecting inter-class differences and intra-class differences
  • the second features are image features reflecting intra-class differences.
  • the multitasking network model is pre-trained.
  • the multitasking network model includes a plurality of sub-networks, such as a classification network, a similarity measurement network, etc., and the classification network and the similarity measurement network share a basic network.
  • the base network is configured to extract a first feature of the input image
  • the similarity measure network is configured to extract a second feature of the input image.
  • the optimization target of the classification network is to maximize the variance between classes
  • the optimization goal of the similarity measurement network is to reduce intra-class variance between approximate repeated images and increase intra-class variance between non-approximate repeated images. .
  • Increasing the inter-class variance in the optimization process is to make the image features of different categories more distinguishable.
  • the intra-class variance is reduced in the optimization process so that the image features of the same category are as close as possible.
  • the classification network and the similarity measure network are convolutional neural networks, and respectively include multiple convolution layers and feature extraction layers.
  • the classification network may select a deep convolutional neural network such as MobileNet, and then, the output of a certain feature extraction layer of the basic network may be selected as the first feature of the input image.
  • the feature extraction layer is also a convolution layer.
  • the last convolutional layer in the network structure of the basic network is expressed as a feature of the image, which may be referred to as a feature extraction layer.
  • the similarity metric network may select a symmetric convolutional neural network and then select the output of a convolutional layer (such as the last convolutional layer of the similarity measure network) as the second feature of the input image.
  • Each input image is referred to as a target image, and for each target image, input to the multitasking network model, a first feature and a second feature of each target image can be obtained.
  • Step 120 Construct a fusion feature of the target image according to the first feature and the second feature of the target image.
  • the fusion feature of the target image is constructed according to the first feature and the second feature of the image.
  • the first feature and the second feature of the image may be directly spliced to obtain a fused feature of the image.
  • the first feature and the second feature of the image may be further separately encoded and converted, such as performing hash code conversion, and the converted feature is determined by the converted code.
  • the first feature of each target image is a general feature of the input image extracted by the base network, that is, the first feature includes image features reflecting inter-class differences and intra-class differences
  • the first feature may be further classified.
  • the network performs convolution processing to obtain a third feature that separately reflects the difference between the classes, and then, based on the third feature and the second feature, constructs a fusion feature of the corresponding image.
  • the third feature and the second feature of the image may be directly spliced to obtain a fused feature of the image.
  • the third feature and the second feature of the image may be further separately encoded and converted, such as performing hash code conversion, and the transformed feature is determined by the converted code.
  • Step 130 Determine, according to the fusion feature, whether the target image is an approximate repeated image of the candidate image.
  • the fusion features of the target image can be obtained separately by the foregoing steps. Further, a candidate image set is stored in advance, and the fusion feature of each image in the candidate image set is obtained by the foregoing steps. Further, based on the fusion feature of the target image and the fusion feature of any candidate image in the candidate image set, the target image and the candidate image may be determined by comparing the similarity measurement methods in the prior art. Approximate repeating images of each other.
  • the approximate repeated image detecting method disclosed in the embodiment of the present application determines the first feature and the second feature of the input target image respectively by the multi-task network model, wherein the first feature includes an image reflecting the difference between the classes and the intra-class difference a feature, the second feature is an image feature reflecting an intra-class difference; constructing a fusion feature of the target image according to the first feature and the second feature of the target image; determining the target image according to the fusion feature Whether it is an approximate repeating image of the candidate image.
  • the approximate repeated image detection method disclosed in the present application constructs the fusion feature of the image by combining the intra-class information and the inter-class information of the image, so that the image fusion feature is more comprehensive, thereby improving the accuracy of the approximate repeated image detection.
  • An approximate repeated image detecting method disclosed in this embodiment includes steps 210 to 260.
  • Step 210 training a classification model based on a plurality of image samples including a plurality of approximate repeated images.
  • the plurality of approximate repeating images described in this embodiment may be composed of different images of the same object, such as multiple images of the same object captured under different lighting conditions, or an original image of an object and the original
  • the image is composed of cropping, rotation, and brightness adjustment processing.
  • the non-approximate repetitive image described in the embodiment of the present application refers to at least two images different in the subject.
  • the plurality of training samples used to train the classification model includes a plurality of approximate repeated images.
  • the training samples can be synthesized based on the existing images.
  • the determination of the image category mainly considers the actual application scenario.
  • Image categories match business scenarios, such as hotels, dishes, and beauty.
  • the distribution of image types covers as many business scenarios as possible to improve the accuracy of the model trained using images.
  • an approximate repeated image of an original image may be a transformed version of the original image.
  • Common transformation operations include: geometric affine transformation, blur, noise pollution, image content enhancement, and compression.
  • the image processing types mainly involved in the image sample may include: brightness change, contrast change, cropping, rotation, watermark, and the like.
  • An automatic synthesis process can be performed on a plurality of images of each type to obtain image samples for training the classification model. The automatic synthesis includes: adjusting brightness, adjusting contrast, cutting, rotating, watermarking, and the like.
  • the classification model is then trained based on the plurality of image samples.
  • the classification model is trained based on the aforementioned plurality of image samples including a plurality of approximate repeated images.
  • the classification model can be a convolutional neural network based on the MobileNet architecture.
  • MobileNet is a streamlined architecture that uses deeply separable convolution to build a lightweight deep neural network with a good performance compromise between speed and accuracy.
  • the classification model includes a plurality of convolution layers 310 and one feature extraction layer 320.
  • the pool6 layer of the MobileNet network can be selected as the feature extraction layer, which has a number of 1024 nodes, and the output of the layer is used as an expression of the feature vector of the input image.
  • the classification model further includes a last convolution layer 330 for generating a confidence level that the sample may be a certain category, and a loss function softmaxLoss 340 for measuring the quality of the model learning.
  • the optimization goal of the classification model is to maximize the variance between classes.
  • the features acquired by the trained classification model are mainly used to distinguish image features between different categories. After training, the optimal weight parameters of the classification model can be obtained.
  • Step 220 Initialize the multi-task network model based on the parameters of the classification model obtained by the training.
  • the multitasking network model in the embodiment of the present application includes: a classification network 410 and a similarity measurement network 420.
  • the classification network 410 and the similarity measure network 420 are convolutional neural networks, which respectively include a plurality of convolution layers and feature extraction layers.
  • the classification network 410 and the similarity metrics network 420 each include a base network including a feature extraction layer 430 and a convolution layer prior to the feature extraction layer.
  • the classification network 410 and the similarity metrics network 420 can share a base network.
  • the classification network 410 and the similarity measure network 420 share a portion of the convolutional layer and the feature extraction layer.
  • the classification network 410 and the similarity metric network 420 may also be separate network structures, but the parameters of the underlying network are shared.
  • the multitasking network model further includes a convolution layer 440, which is a feature extraction layer for the similarity measure network learning, a normalization layer, so that the features obtained by the image are normalized and comparable, and the loss function contrastLoss is used for Network optimization
  • the classification network 410 and the similarity measure network 420 may be initialized by the network parameters of the classification model trained in the foregoing steps. Basic network parameters. . Then, the network parameter fine-tuning and optimization are further performed by training the multi-tasking network model.
  • Step 230 training the multitasking network model based on a plurality of image pair samples including a plurality of approximate repeated image pairs and a plurality of non-approximate repeated image pairs.
  • a plurality of approximate repeated image pairs and a plurality of non-approximate repeated image pairs are constructed for training the multi-tasking network model.
  • the image pair includes an image pair matching the specified image processing type, and the specified image processing type is determined by: determining, by using the classification model obtained by training, image features of each image in the test image pair; The distance between the image features of the two images in the test image pair is subjected to approximate repeated image discrimination; and the specified image processing type is determined according to the accuracy of approximating the repeated image discrimination for the image pairs matching different image processing types.
  • the distance is a distance metric between the feature vectors, and may be an Euclidean distance or a cosine distance. This disclosure does not limit this.
  • An original image is processed by various preset images to obtain at least one processed image
  • the approximate repeated image pair is composed of at least one processed image and any two of the original images.
  • the approximate repeating image pair may be a processed image and an original image, or the approximate repeating image pair may be two processed images.
  • various preset image processing includes, but is not limited to, any one of the following: cropping, rotation, watermarking, brightness change, contrast change, and the like. For example, the original image 510 in FIG.
  • image 5 is cropped to obtain an image 520, and after the original image 510 is rotated to obtain an image 530, the original image 510 and the image 520 may constitute an approximate repeated image pair, the original image 510, and Image 530 may constitute an approximate repeating image pair, image 520 and image 530 may also constitute an approximate repeating image pair.
  • image 610 and image 620 form a non-approximate repeating image pair.
  • the plurality of approximate repeated image pairs and the plurality of non-approximate repeated image pairs constructed as above are used as test image pairs, and the classification model trained in step 210 is input to determine image features of each image in the test image pair.
  • the 1024-dimensional feature outputted by the model pool 6 layer is acquired as an image feature, and then the distance of the image features of the two images in the image pair is calculated to determine whether the two images are similar.
  • the test image pair is provided with a label that is an approximate repeating image pair.
  • the approximate repeated image discrimination for the pair of images is considered to be unsuccessful, or two images of a non-approximate repeated image pair are identified based on the image feature distance. If they are similar, it is considered that the approximate repeated image discrimination of the image pair fails.
  • the type of image processing of the approximate repeated image pair that fails to be discriminated is statistically determined, that is, the type of image processing through which the approximate repeated image pair of the failure is determined. If the discriminative accuracy of the approximate repeated image pair obtained by a certain type of image processing is lower than the set accuracy threshold, it is determined that the classification model is difficult to classify and recognize the approximate repeated image pairs obtained by the image processing. This type of image processing is identified as a hard type.
  • An approximate repeating image pair of the hard type is an image pair that is difficult to distinguish by using only the features acquired by the classification, such as an approximate repeated image pair obtained by image processing such as cropping, blurring, logo, or rotation. Further, based on the above method, at least one hard type can be obtained, and then multiple approximate repeated image pairs are constructed based on the hard type. For example, a test image pair identified as a hard type is used as an approximate repeating image pair, or a test image pair of a large number of hard types is selected to construct an approximate repeated image pair. At the same time, based on the labels of the test image pairs, non-approximate duplicate image pairs are constructed using different images of the same category.
  • the multi-task network model is then trained for a plurality of approximate repeating image pairs constructed for the hard type and a plurality of non-approximate repetitive image pairs.
  • the multitasking network model includes: a classification network and a similarity measurement network, and the classification network and the similarity measurement network share a basic network.
  • the training method of the multi-task network model is: through inputting a plurality of samples, and continuously adjusting network parameters, so that the output of the classification network and the similarity measurement network are as close as possible to the classification network and the Similarity measures the optimization goals of the network.
  • the optimization goal of the classification network is to increase the inter-class variance between approximate repeated images; the optimization target of the similarity measurement network is to reduce the intra-class variance between the approximate repeated images and increase the non-approximate repeated image. The intra-class variance between.
  • Each of the approximately repeated image pairs and non-approximate repeated image pairs input to the multitasking network model includes a category label.
  • the input data form of the classification network may be ⁇ image, category ⁇ , that is, an image with a category label.
  • the loss function of the classification network is softmaxloss(), and the optimization goal is to maximize the variance between classes.
  • the input data form of the similarity measure network may be ⁇ (image 1, category), (image 2, category), whether it is approximated by repeated image pairs ⁇ .
  • the similarity measure network is a symmetric convolutional neural network, and the loss function is Contrastiveloss().
  • the optimization goal is to reduce the intra-class variance between approximate repeated images and increase the intra-class variance between non-approximate repeated images. It should be noted that the form of the above input data is only used during training, and different input formats can be used in subsequent queries.
  • the multitasking network model in the present application lies in the network structure and the input training data, the optimization target, and the network parameters of the preliminary training classification model are performed on the corresponding network in the multitasking network model. initialization.
  • the optimization process of the model can be seen in the prior art model training process.
  • Step 240 Determine a first feature and a second feature of the input target image by using a multitasking network model.
  • the first feature includes image features reflecting inter-class differences and intra-class differences
  • the second features are image features reflecting intra-class differences.
  • the image features of the images input to the multitasking network model can be extracted through the trained multitasking network model. Determining, according to the network structure of the multitasking network model, an output of a last feature extraction layer of the base network shared by the classification network and the similarity measurement network as a first feature of the input image, that is, the first feature includes a reflection class
  • the image features of the difference and the intra-class difference are the image features output by a feature extraction layer 430 (i.e., the last convolutional layer of the underlying network) in FIG. Although two feature extraction layers 430 are depicted in Figure 4, the two feature extraction layers 430 share parameters, so the output is the same.
  • the output of the last convolutional layer of the similarity metric network is then selected as the second feature of the input image, such as the output of 440 in FIG.
  • the second feature is an image feature obtained by performing a convolution operation on the first feature.
  • the 1024-dimensional feature extracted by the MobileNet network pool 6 layer can be used as the first feature of the input image.
  • the 256-dimensional feature obtained by further convolving the 1024-dimensional feature extracted by the similarity metric network to the MobileNet network pool6 layer is used as the second feature of the input image.
  • the feature of the 1024 dimension extracted by the pool6 layer of the MobileNet network is a general feature of the input image, and an image feature reflecting the difference between the classes and the intra-class difference.
  • the 256-dimensional feature obtained by convolution processing through the similarity measure network is a finer-grained image feature that reflects intra-class differences.
  • Step 250 Construct a fusion feature of the target image according to the first feature and the second feature of the target image.
  • the first feature and the second feature of the target image may be directly spliced, and the spliced feature is used as a fusion feature of the target image.
  • the first feature of 1024 dimensions and the second feature of 256 dimensions are sequentially spliced into a feature vector of 1280 dimensions as a fusion feature of the target image.
  • constructing the fusion feature of the target image according to the first feature and the second feature of the target image further comprising: convolving the first feature of the target image by a classification network Computing to obtain a third feature of the target image, wherein the third feature is an image feature that reflects a difference between classes.
  • a fusion feature of the target image is constructed based on the third feature and the second feature of the target image. For example, by convolving the convolution layer of the classification network, the first feature is convoluted to obtain a third feature that separately reflects the difference between the classes, and then the third feature and the second feature are sequentially stitched into a multi-dimensional The feature vector is used as a fusion feature of an image.
  • the data amount of the image feature can be reduced, and the efficiency of the image comparison can be improved.
  • constructing the fusion feature of the target image according to the third feature and the second feature of the target image further comprising: determining a hash code and a corresponding corresponding to the third feature of the target image a hash code corresponding to the second feature; splicing the hash code corresponding to the third feature and the hash code corresponding to the second feature to obtain a fusion feature of the target image.
  • the image features may be hash coded and hashed to represent the fusion features of the image.
  • the third feature of the 512 dimension is [0.7, 0.6, 0.2, ..., 0.3, 0.8].
  • the hash code corresponding to the dimension feature is considered to be 1 and the threshold value is 0.5 or less.
  • the hash code corresponding to the dimension feature is 0, and the third feature of the 512-dimensional feature is hash coded and represented as [110...01].
  • the second feature of 256 dimensions is hash coded, and the resulting hash code is expressed as [10...000].
  • the resulting image fusion feature is [110...0110...000].
  • the first feature and the second feature may be hash coded separately, and the hash code is stitched.
  • constructing the fusion feature of the target image according to the first feature and the second feature of the target image comprising: respectively determining a hash code corresponding to the first feature of the target image and the second feature Corresponding hash coding; splicing the hash code corresponding to the first feature and the hash code corresponding to the second feature to obtain a fusion feature of the target image.
  • the first feature of the 1024 dimension is [0.6, 0.6, 0.3, ..., 0.7, 0.2], and by comparison with the threshold of 0.5, the hash code corresponding to the dimension feature is considered to be 1 and the threshold value is 0.5 or less.
  • the hash code corresponding to the dimension feature is 0, and the first feature of the 1024 dimension is hash coded and represented as [110...10].
  • the second feature of 256 dimensions is hash coded, and the resulting hash code is expressed as [10...000]. After splicing the hash codes, the resulting image fusion feature is [110...1010...000].
  • the coding length of the second feature needs to be within a reasonable range. If it is too short, the effect of intra-class constraints is not obvious, similar to under-fitting; if it is too long, the granularity is too fine, similar to over-fitting.
  • the first feature code length is 1024
  • the second feature code length is 256, which can well balance the influence of the intra-class difference on the detection result.
  • the detection result is greatly affected by the second feature coding length.
  • the present application makes the fusion feature of the finally obtained image more stable by splicing the first feature or the third feature with the second feature.
  • Step 260 Determine, according to the fusion feature, whether the target image is an approximate repeated image of the candidate image.
  • the fusion features of the target image can be obtained separately by the foregoing steps. Further, a candidate image set is stored in advance, and the fusion feature of each image in the candidate image set is obtained by the foregoing steps. Further, based on the fusion feature of the target image and the fusion feature of any candidate image in the candidate image set, the target image and the candidate image may be determined by comparing the similarity measurement methods in the prior art. Approximate repeating images of each other.
  • the classification model is trained by using an image sample including an approximate repeated image, and then the multi-task network model is initialized based on the parameters of the classification model obtained by the training;
  • the multi-task network model is trained by approximating the image pair of the repeated image pair and the non-approximate repeat image pair. Determining a first feature and a second feature of the input target image by using a multitasking network model; constructing a fusion feature of the target image according to the first feature and the second feature of the target image; and finally, according to the fusion feature, It is determined whether the target image is an approximate repeated image of the candidate image.
  • the approximate repeated image detection method disclosed in the present application further enhances the feature variance of the image class by performing pre-training of the classification model based on the convolutional neural network, and further utilizes the parameter fine-tuning classification of the learned classification model and the similarity measurement multi-task network to further Increasing the intra-class variance of the non-approximate repetitive graph, so that in the image detection process, the optimized image features reflecting the inter-class differences and intra-class differences and the more fine-grained image features reflecting the intra-class differences are spliced as images.
  • the fusion feature can increase the intra-class variance of the non-approximate repeating graph while increasing the variance of the feature between the classes, which is beneficial to improve the accuracy of the approximate repeated image detection.
  • the feature expression capability is improved by splicing the features outputted by the classification network and the similar metric network.
  • the image features after splicing are less affected by the features reflecting the differences within the class, and the feature stability is higher.
  • An approximate repeated image detecting device disclosed in this embodiment as shown in FIG. 7, the device includes:
  • a feature extraction module 710 configured to respectively determine a first feature and a second feature of the input target image by using a multi-task network model, wherein the first feature includes an image feature that reflects an inter-class difference and an intra-class difference, the The second feature is an image feature that reflects differences within the class.
  • the feature fusion module 720 is configured to construct a fusion feature of the target image according to the first feature and the second feature of the target image extracted by the feature extraction module 710.
  • the approximate repeated image detecting module 730 is configured to determine, according to the fusion feature determined by the feature fusion module 720, whether the target image is an approximate repeated image of the candidate image.
  • the multi-tasking network model includes: a classification network and a similarity measurement network, and the classification network and the similarity measurement network share a basic network.
  • the training method of the multi-task network model is: training the multi-task network model by solving a network parameter that satisfies both the classification network and the optimization target of the similarity measurement network, wherein the classification The optimization goal of the network is to increase the inter-class variance between approximate repeated images; the optimization goal of the similarity measure network is to reduce the intra-class variance between approximate repeated images and increase the intra-class between non-approximate repeated images. variance.
  • the feature extraction module 710 further includes:
  • the first feature extraction unit 7101 is configured to determine, by using the base network, a first feature of the target image.
  • the second feature extraction unit 7102 is configured to perform a convolution operation on the first feature by using the similarity metric network to determine a second feature of the target image in the input image pair, wherein the optimization target of the similarity measure network To reduce the intra-class variance between approximately repeated images and to increase the intra-class variance between non-approximate repetitive images.
  • the approximate repeated image detecting apparatus disclosed in the embodiment of the present application determines the first feature and the second feature of the input target image respectively by the multi-task network model, wherein the first feature includes an image reflecting the difference between the classes and the intra-class difference a feature, the second feature is an image feature reflecting an intra-class difference; constructing a fusion feature of the target image according to the first feature and the second feature of the target image; determining the target image according to the fusion feature Whether it is an approximate repeating image of the candidate image.
  • the approximate repeated image detecting device disclosed in the present application constructs the fusion feature of the image by combining the intra-class information of the image and the inter-class information, and the obtained image fusion feature is more comprehensive, thereby improving the accuracy of the approximate repeated image detection.
  • the apparatus further includes:
  • a classification model training module 740 is configured to train the classification model based on a plurality of image samples including a plurality of approximate repeated images.
  • the multitasking network model initialization module 750 is configured to initialize the multitasking network model based on the parameters of the classification model obtained by the training.
  • a multitasking network model training module 760 for training the multitasking network model based on image pair samples comprising a plurality of approximate repeated image pairs and a plurality of non-approximate repeated image pairs, wherein the approximate repeated image pairs are from the original image And consisting of any two of the at least one image obtained after the original image is processed by the preset image, the non-approximate repeated image pair being composed of two different images.
  • the image pair includes an image pair matching the specified image processing type, and the specified image processing type is determined by the following method:
  • the feature fusion module 720 further includes:
  • the third feature extraction unit 7201 is configured to determine a third feature of the target image by performing a convolution operation on the first feature of the target image, where the third feature is an image reflecting a difference between classes feature.
  • the first feature fusion unit 7202 is configured to construct a fusion feature of the target image according to the third feature and the second feature of the target image.
  • the first feature fusion module 7202 further includes:
  • the first coding unit 72021 is configured to determine a hash code corresponding to the third feature of the target image and a hash code corresponding to the second feature.
  • the first feature splicing unit 72022 is further configured to splicing the hash code corresponding to the third feature and the hash code corresponding to the second feature to obtain a fused feature of the target image.
  • the feature fusion module 720 further includes:
  • a second coding unit 7204 configured to respectively determine a hash code corresponding to the first feature of the target image and a hash code corresponding to the second feature
  • the second feature fusion unit 7205 is configured to splice the hash code corresponding to the first feature and the hash code corresponding to the second feature to obtain a fusion feature of the target image.
  • the approximate repeated image detecting apparatus disclosed in the present application is used for pre-training of a classification model based on a convolutional neural network, for increasing feature variance between image classes, and using the learned parameter classification fine-tuning classification and similarity measurement multi-task
  • the network further increases the intra-class variance of the non-approximate repetitive graph, so that in the image detection process, the optimized image features reflecting the differences between the classes and intra-class differences and the more fine-grained image features reflecting the intra-class differences are spliced.
  • the intra-class variance of the non-approximate repeating graph can be further increased, which is beneficial to improve the accuracy of the approximate repeated image detection.
  • the feature expression capability is improved by splicing the features outputted by the classification network and the similar metric network.
  • the image features after splicing are less affected by the features reflecting the differences within the class, and the feature stability is higher.
  • the present application also discloses an electronic device including a memory, a processor, and a computer program stored on the memory and operable on the processor, the processor executing the computer program, such as The method for detecting an approximate repeated image according to the first embodiment and the second embodiment of the present application.
  • the electronic device can be a PC, a mobile terminal, a personal digital assistant, a tablet, or the like.
  • the present application also discloses a computer readable storage medium having stored thereon a computer program, which is executed by the processor to implement the steps of the approximate repeated image detecting method according to the first embodiment and the second embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

一种近似重复图像检测方法、装置、电子设备和计算机可读存储介质。所述方法包括:在通过多任务网络模型分别确定输入的目标图像的第一特征和第二特征之后,其中,所述第一特征包含反映类间差异和类内差异的图像特征,所述第二特征为反映类内差异的图像特征,可根据所述目标图像的所述第一特征和所述第二特征构造所述目标图像的融合特征,并根据所述融合特征确定所述目标图像是否为候选图像的近似重复图像。

Description

检测近似重复图像
相关申请的交叉引用
本专利申请要求于2018年03月30日提交的、申请号为201810277058.9、发明名称为“一种近似重复图像检测方法及装置,电子设备”的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
技术领域
本申请涉及计算机技术领域,特别是涉及检测近似重复图像。
背景技术
近似重复图像是指仅有色彩、饱和度、裁切、拍摄角度、水印等区别的图像,在UGC(User Generated Content,用户产生内容)数据中存在多种类型的近似重复图像。这些近似重复图像的出现,会给搜索和推荐模型的训练、搜索推荐结果的展示带来负面影响。从而影响用户体验。
发明内容
有鉴于此,本申请提供一种近似重复图像检测方法,以提高现有技术中近似重复图像检测的准确率。
第一方面,本申请实施例提供了一种近似重复图像检测方法,包括:通过多任务网络模型分别确定输入的目标图像的第一特征和第二特征,其中,所述第一特征为包含类间信息和类内信息的图像特征,所述第二特征为反映类内差异的图像特征;根据所述目标图像的所述第一特征和所述第二特征,构造所述目标图像的融合特征;根据所述融合特征,确定所述目标图像是否为候选图像的近似重复图像。
第二方面,本申请实施例提供了一种信息检索装置,包括:特征提取模块,用于通过多任务网络模型分别确定输入的目标图像的第一特征和第二特征,其中,所述第一特征包含反映类间差异和类内差异的图像特征,所述第二特征为反映类内差异的图像特征;特征融合模块,用于根据所述特征提取模块提取的所述目标图像的所述第一特征和所述第二特征,构造所述目标图像的融合特征;近似重复图像检测模块,用于根据所述 特征融合模块确定的所述融合特征,确定所述目标图像是否为候选图像的近似重复图像。
第三方面,本申请实施例还公开了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本申请实施例所述的近似重复图像检测方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时本申请实施例公开的近似重复图像检测方法的步骤。
本申请实施例公开的近似重复图像检测方法,通过多任务网络模型分别确定输入的目标图像的第一特征和第二特征,其中,所述第一特征包含反映类间差异和类内差异的图像特征,所述第二特征为反映类内差异的图像特征;根据所述目标图像的第一特征和第二特征,构造所述目标图像的融合特征;根据所述融合特征,确定所述目标图像是否为候选图像的近似重复图像。本申请公开的近似重复图像检测方法通过结合图像的类内信息和类间信息构建图像的融合特征,使得图像融合特征更全面,进而提升了近似重复图像检测的准确率。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例一的近似重复图像检测方法流程图;
图2是本申请实施例二的近似重复图像检测方法流程图;
图3是本申请实施例二的分类模型结构示意图;
图4是本申请实施例二的多任务网络而结构示意图;
图5为近似重复图像的示意图;
图6为非近似重复图像对的示意图;
图7是本申请实施例三的近似重复图像检测装置的结构示意图之一;
图8是本申请实施例三的近似重复图像检测装置的结构示意图之二;
图9是本申请实施例三的近似重复图像检测装置的结构示意图之三。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在某些应用中,首先通过分类方法获取图像特征,然后再检测图像。对于不同类别的图像,获取的特征具有较好的区分性。但是在近似重复图检测场景中,候选集合中的图像大多是相同类别的,采用图像特征表达的方法检测图像,会存在检测准确率低的问题。
为此,本申请提供的近似重复图像检测方法,可以提高近似重复图像的检测准确率。以下结合具体实施例进行说明。
实施例一
本实施例公开的一种近似重复图像检测方法,如图1所示,该方法包括步骤110至步骤130。
步骤110,通过多任务网络模型分别确定输入的目标图像的第一特征和第二特征。
其中,所述第一特征包含反映类间差异和类内差异的图像特征,所述第二特征为反映类内差异的图像特征。
本申请具体实施时,预先训练多任务网络模型。所述多任务网络模型包括多个子网络,如:分类网络、相似性度量网络等,所述分类网络和相似性度量网络共享基础网络。其中,所述基础网络用于提取输入图像的第一特征,所述相似性度量网络用于提取输入图像的第二特征。其中,所述分类网络的优化目标为类间方差最大化,所述相似性度量网络的优化目标为减小近似重复图像之间的类内方差和增大非近似重复图像之间的类内方差。优化过程中增大类间方差是使的不同类别的图像特征具有更大的区分性,优化过程中减小类内方差是使得同一个类别的图像特征尽可能接近。
所述分类网络和相似性度量网络为卷积神经网络,分别包括多个卷积层、特征提取层。具体实施时,所述分类网络可以选择如MobileNet等深度卷积神经网络,然后,可以选择基础网络的某一特征提取层的输出作为输入图像的第一特征。该特征提取层本质 上也是一个卷积层,本实施例中将基础网络的网络结构中最后一个卷积层作为图像的特征表达,可以称为特征提取层。所述相似性度量网络可以选择对称卷积神经网络,然后,选择某一卷积层(如相似性度量网络最后一个卷积层)的输出作为输入图像的第二特征。每幅输入图像被称为目标图像,对于每幅目标图像,输入至所述多任务网络模型,可以得到每幅目标图像的第一特征和第二特征。
步骤120,根据所述目标图像的第一特征和第二特征,构造所述目标图像的融合特征。
在确定了待检测图像对中目标图像的第一特征和第二特征之后,对于其中的每幅目标图像,根据该图像的第一特征和第二特征,构造所述目标图像的融合特征。在一个实施方式中,可以直接将该图像的第一特征和第二特征进行拼接,得到该图像的融合特征。在另一个实施方式中,也可以进一步对该图像的第一特征和第二特征分别进行编码转换,如进行哈希编码转换,并通过转换得到的编码,确定该图像的融合特征。
由于每幅目标图像的第一特征是基础网络提取的输入图像的通用特征,即第一特征为包含反映类间差异和类内差异的图像特征,因此,可以进一步对所述第一特征通过分类网络进行卷积处理,得到单独反映类间差异的第三特征,然后,基于所述第三特征和第二特征,构造相应图像的融合特征。在一个实施方式中,可以直接将该图像的第三特征和第二特征进行拼接,得到该图像的融合特征。在另一个实施方式中,也可以进一步对该图像的第三特征和第二特征分别进行编码转换,如进行哈希编码转换,并通过转换得到的编码,确定该图像的融合特征。
步骤130,根据所述融合特征,确定所述目标图像是否为候选图像的近似重复图像。
对于目标图像,通过前述步骤可以分别获得目标图像的融合特征。此外,预先存有候选图像集,并通过前述步骤获得了候选图像集中的每幅图像的融合特征。进一步的,基于所述目标图像的融合特征和候选图像集中任意一幅候选图像的融合特征,通过现有技术中的相似度度量方法进行比较,即可确定所述目标图像和所述候选图像是否互为近似重复图像。
本申请实施例公开的近似重复图像检测方法,通过多任务网络模型分别确定输入的目标图像的第一特征和第二特征,其中,所述第一特征包含反映类间差异和类内差异的图像特征,所述第二特征为反映类内差异的图像特征;根据所述目标图像的第一特征和第二特征,构造所述目标图像的融合特征;根据所述融合特征,确定所述目标图像 是否为候选图像的近似重复图像。本申请公开的近似重复图像检测方法通过结合图像的类内信息和类间信息构建图像的融合特征,使得图像融合特征更全面,进而提升了近似重复图像检测的准确率。
实施例二
本实施例公开的一种近似重复图像检测方法,如图2所示,该方法包括步骤210至步骤260。
步骤210,基于包括多个近似重复图像的多个图像样本,训练分类模型。
本实施例中所述的多个近似重复图像可以由同一对象的不同图像组成,例如在不同光照条件下拍摄的同一对象的多幅图像,或者,由某一对象的原始图像和对所述原始图像进行裁切、旋转、亮度调节处理后得到的图像组成。本申请实施例中所述的非近似重复图像是指拍摄对象不同的至少两幅图像。
首先搜集平台上已有的不同品类的多个相似图像,作为多个图像样本以训练分类模型。用于训练分类模型的多个训练样本包括多个近似重复图像。在实际使用过程中因为很难收集到大量与原图近似重复的图像作为训练数据,所以可以基于已有图像合成训练样本。以美团点评平台为例,首先收集整理覆盖该平台所有业务场景的所有业务类型的多个图像(如22种业务类型)。图像类别的确定主要考虑实际应用场景。图像类别与业务场景匹配,如:酒店、菜品、丽人等。图像类型的分布尽可能覆盖较多的业务场景,以提升使用图像训练出来的模型的准确度。然后,对每个类型的多个图像进行合成,得到多组近似重复图像,作为多个图像样本。在实际应用场景中,某一原始图像的近似重复图像可以为对该原始图像经过变换的版本,常见的变换操作有:几何仿射变换、模糊、噪声污染、图像内容增强以及压缩等。相应的,为贴近实际应用场景下的近似重复图像,在构成图像样本时,主要涉及的图像处理类型可以包括:亮度变化、对比度变化、裁切、旋转、水印等。对每个类型的多个图像可以进行自动合成处理,得到用于训练分类模型的图像样本。其中,所述自动合成包括:调节亮度、调节对比度、裁切、旋转、水印等变换处理。
然后,基于所述多个图像样本,训练分类模型。
基于前述包括多个近似重复图像的多个图像样本,训练分类模型。例如,分类模型可以是基于MobileNet架构的卷积神经网络。MobileNet是一个流线型的架构,它使用深度可分离的卷积来构建轻量级的深度神经网络,在速度和准确率之间有较好的性能折 中。如图3所示,所述分类模型包括多个卷积层310和一个特征提取层320。可以选择MobileNet网络的pool6层做为特征提取层,该层有1024个节点数目,将该层的输出作为输入图像的特征向量的表达。此外,所述分类模型还包括最后一个卷积层330,用于产生样本可能为某个类别的置信度大小,以及损失函数softmaxLoss340,用于衡量模型学习的好坏。具体实施时,分类模型的优化目标为最大化类间方差。训练得到的分类模型获取的特征主要用于区分不同类别之间的图像特征。经过训练,可以得到分类模型的最优权重参数。
步骤220,基于训练得到的所述分类模型的参数,初始化所述多任务网络模型。
如图4所示,本申请实施例中的多任务网络模型包括:分类网络410和相似性度量网络420。所述分类网络410和相似性度量网络420为卷积神经网络,分别包括多个卷积层、特征提取层。所述分类网络410和相似性度量网络420均包括基础网络,所述基础网络包括特征提取层430和特征提取层之前的卷积层。在一个实施方式中,所述分类网络410和相似性度量网络420可以共享基础网络。如图4所示,分类网络410和相似性度量网络420共用了一部分卷积层和特征提取层。在另一个实施方式中,所述分类网络410和相似性度量网络420也可以为独立的网络结构,但是基础网络的参数是共享的。多任务网络模型还包括:卷积层440,为相似性度量网络学习的特征提取层,归一化层,以使图像得到的特征归一化后具有可比性,以及损失函数contrastiveLoss,用于对网络进行优化
无论分类网络410和相似性度量网络420是否共用部分卷积层和特征提取层,为了提升训练效率,可以通过前述步骤训练的分类模型的网络参数初始化所述分类网络410和相似性度量网络420的基础网络参数。。然后,进一步通过训练所述多任务网络模型,进行网络参数微调和优化。
步骤230,基于包括多个近似重复图像对和多个非近似重复图像对的多个图像对样本,训练所述多任务网络模型。
多任务网络初始化完成之后,基于真实业务场景的测试图像,构建多个近似重复图像对和多个非近似重复图像对,用于训练所述多任务网络模型。所述图像对样本中包括匹配指定图像处理类型的图像对,所述指定图像处理类型通过以下方法确定:通过训练得到的所述分类模型,确定测试图像对中每幅图像的图像特征;根据所述测试图像对中两幅图像的图像特征之间的距离,进行近似重复图像判别;根据对匹配不同图像处理类型的图像对进行近似重复图像判别的准确率,确定指定图像处理类型。其中,所述距离 是特征向量之间的距离度量,可以为欧式距离,也可以为余弦距离。本公开对此不作限定。
一幅原始图像经过各类预设图像处理后可以得到至少一幅处理图像,近似重复图像对由至少一幅处理图像和所述原始图像中的任意两幅图像组成。近似重复图像对可以是一幅处理图像和一幅原始图像,或者,近似重复图像对也可以是两幅处理图像。其中,各类预设的图像处理包括但不限于以下任意一种:裁切、旋转、水印、亮度变化、对比度变化等。例如,对图5中的原始图像510进行裁切后得到图像520,以及,对原始图像510进行旋转后得到图像530,那么原始图像510和图像520可以构成一个近似重复图像对、原始图像510和图像530可以构成一个近似重复图像对、图像520和图像530也可以构成一个近似重复图像对。而非近似重复图像对由不同的两幅图像组成,如图6所示,图像610和图像620构成一个非近似重复图像对。
然后,上述构建的多个近似重复图像对和多个非近似重复图像对作为测试图像对,输入至步骤210训练得到的分类模型,以确定测试图像对中每一幅图像的图像特征。通过将图像输入至所述分类模型,获取模型pool6层输出的1024维度特征作为图像特征,然后,计算图像对中的2幅图像的图像特征的距离,以确定两幅图像是否相似。测试图像对设置有是否为近似重复图像对的标签。如果,一个近似重复图像对中的两幅图像被识别为不相似,则认为对该图像对的近似重复图像判别失败,或者一个非近似重复图像对中的两幅图像根据图像特征距离,被认定为相似,则认为该图像对的近似重复图像判别失败。最后,统计判别失败的近似重复图像对的图像处理类型,即判别失败的近似重复图像对是经过哪类的图像处理得到的。如果经过某一类图像处理得到的近似重复图像对的判别准确率低于设定的准确率阈值,则确定所述分类模型较难对经过该类图像处理得到的近似重复图像对进行分类识别,则将该类图像处理标识为hard类型。
hard类型的近似重复图像对为仅利用分类获取的特征较难区分的图像对,比如经过裁切、模糊、logo或旋转等图像处理后得到的近似重复图像对。进一步的,基于上述方法可以得到至少一个hard类型,然后基于hard类型构建多个近似重复图像对。例如,将标识为hard类型的测试图像对作为近似重复图像对,或者,选择大量hard类型的测试图像对构建近似重复图像对。同时,根据测试图像对的标签,利用同一类别的不同图像构建非近似重复图像对。
然后,针对hard类型构建的多个近似重复图像对和多个非近似重复图像对,训练所述多任务网络模型。
如图4所示,所述多任务网络模型包括:分类网络和相似性度量网络,所述分类网络和所述相似性度量网络共享基础网络。所述多任务网络模型的训练方法为:通过输入的多个样本,并不断调整网络参数,使得所述分类网络和所述相似性度量网络的输出,尽可能同时接近所述分类网络和所述相似性度量网络的优化目标。其中,所述分类网络的优化目标为增大近似重复图像之间的类间方差;所述相似性度量网络的优化目标为减小近似重复图像之间的类内方差和增大非近似重复图像之间的类内方差。输入至所述多任务网络模型的近似重复图像对和非近似重复图像对中的每幅图像都包括类别标签。训练分类网络时,每次可以同时输入两幅图像或只输入一幅图像。分类网络的输入数据形式可以为{图像,类别},即具有类别标签的图像。所述分类网络的损失函数为softmaxloss(),优化目标为类间方差最大化。
在训练相似性度量网络时。所述相似性度量网络的输入数据形式可以为{(图像1,类别),(图像2,类别),是否近似重复图像对}。所述相似性度量网络为对称卷积神经网络,损失函数为Contrastiveloss(),优化目标为减小近似重复图像之间的类内方差和增大非近似重复图像之间的类内方差。需要注意的是,上述输入数据的形式仅仅是在训练时使用的,在后续查询时,可以使用不同的输入格式。
多任务网络模型的具体训练过程,参见现有技术,本实施例中不再赘述。本申请中的多任务网络模型与现有技术中的网络模型的区别在于网络结构和输入的训练数据、优化目标,以及,结合初步训练的分类模型的网络参数对多任务网络模型中相应网络进行初始化。模型的优化过程可参见现有技术的模型训练过程。通过训练分类网络和相似性度量网络,可以得到所述多任务网络中分类网络和相似性度量网络的各层的最优参数。
步骤240,通过多任务网络模型分别确定输入的目标图像的第一特征和第二特征。
其中,所述第一特征包含反映类间差异和类内差异的图像特征,所述第二特征为反映类内差异的图像特征。
通过所述多任务网络模型的基础网络确定目标图像的第一特征;通过所述相似性度量网络对所述第一特征进行卷积运算,确定目标图像的第二特征。
在多任务网络模型训练结束后,可以通过训练好的多任务网络模型,提取输入至所述多任务网络模型的图像的图像特征。根据所述多任务网络模型的网络结构,选择所述分类网络和相似性度量网络共享的基础网络的最后一个特征提取层的输出作为输入图像的第一特征,即所述第一特征包含反映类间差异和类内差异的图像特征,如图4中一 个特征提取层430(即基础网络的最后一个卷积层)输出的图像特征。虽然图4中画有两个特征提取层430,但是这两个特征提取层430共享参数,所以输出结果是一样的。当只输入一幅图像时,可以任选其中的一支输出图像特征。然后,选择所述相似性度量网络的最后一个卷积层的输出作为输入图像的第二特征,例如,图4中的440的输出。其中,所述第二特征为对所述第一特征进行卷积运算后,得到的图像特征。
本实施例中,以分类网络为MobileNet网络举例,可以将MobileNet网络pool6层提取的1024维度的特征作为输入图像的第一特征。将相似性度量网络对所述MobileNet网络pool6层提取的1024维度的特征进行进一步卷积处理后得到的256维度的特征,作为所述输入图像的第二特征。其中,所述MobileNet网络pool6层提取的1024维度的特征为输入图像的通用特征,同时反映类间差异和类内差异的图像特征。而通过相似性度量网络进行卷积处理后得到的256维度的特征,则为更细粒度的,反映类内差异的图像特征。
步骤250,根据所述目标图像的第一特征和第二特征,构造所述目标图像的融合特征。
在一个实施方式中,可以直接将目标图像的第一特征和第二特征进行拼接,将拼接得到的特征,作为所述目标图像的融合特征。例如,将1024维的第一特征和256维的第二特征,依序拼接为1280维的特征向量,作为该目标图像的融合特征。
在另一个实施方式中,根据所述目标图像的第一特征和第二特征,构造所述目标图像的融合特征,还包括:通过分类网络对所述目标图像的所述第一特征进行卷积运算,得到所述目标图像的第三特征,其中,所述第三特征为反映类间差异的图像特征。根据所述目标图像的所述第三特征和所述第二特征,构造所述目标图像的融合特征。例如,通过分类网络的卷积层,对所述第一特征进行卷积运算,得到单独反映类间差异的第三特征,然后,将所述第三特征和第二特征,依序拼接为多维的特征向量,作为一幅图像的融合特征。通过对第一特征提取单独反映类间差异的第三特征,然后基于所述第三特征和第二特征,构建图像的融合特征,可以减少图像特征的数据量,提升图像比对的效率。
可选的,根据所述目标图像的所述第三特征和所述第二特征,构造所述目标图像的融合特征,还包括:确定所述目标图像的第三特征对应的哈希编码和所述第二特征对应的哈希编码;将所述第三特征对应的哈希编码和所述第二特征对应的哈希编码进行拼接,得到所述目标图像的融合特征。为了进一步减小图像特征的数据量,可以将图像特 征进行哈希编码,用哈希编码表示图像的融合特征。例如,512维度的第三特征为[0.7,0.6,0.2,…,0.3,0.8],通过与阈值0.5进行比较,大于阈值0.5认为该维特征对应的哈希编码为1,小于等于阈值0.5认为该维特征对应的哈希编码为0,则512维的第三特征进行哈希编码后表示为[110…01]。按照同样的方法,对256维的第二特征进行哈希编码,得到的哈希编码表示为[10…000]。对哈希编码进行拼接后,最终得到的图像融合特征为[110…0110…000]。
如果直接根据第一特征和第二特征,构造相应的图像融合特征,可以则对第一特征和第二特征分别进行哈希编码,并将哈希编码进行拼接。可选的,根据所述目标图像的第一特征和第二特征,构造所述目标图像的融合特征,包括:分别确定所述目标图像的第一特征对应的哈希编码和所述第二特征对应的哈希编码;将所述第一特征对应的哈希编码和所述第二特征对应的哈希编码进行拼接,得到所述目标图像的融合特征。例如,1024维度的第一特征为[0.6,0.6,0.3,…,0.7,0.2],通过与阈值0.5进行比较,大于阈值0.5认为该维特征对应的哈希编码为1,小于等于阈值0.5认为该维特征对应的哈希编码为0,则1024维的第一特征进行哈希编码后表示为[110…10]。按照同样的方法,对256维的第二特征进行哈希编码,得到的哈希编码表示为[10…000]。对哈希编码进行拼接后,最终得到的图像融合特征为[110…1010…000]。
对于多任务网络而言,第二特征的编码长度需要在合理的范围内,如果过短则类内约束的作用不明显,类似于欠拟合;过长则粒度过细,类似于过拟合,本申请选择第一特征编码长度为1024,第二特征编码长度为256,可以很好地平衡类内差异对检测结果的影响。在训练网络不变的情况下,如果在使用过程中只使用第一特征编码,则检测结果受第二特征编码长度影响较大。本申请通过将第一特征或第三特征与第二特征进行拼接,使得最终得到的图像的融合特征稳定性更高。
步骤260,根据所述融合特征,确定所述目标图像是否为候选图像的近似重复图像。
对于目标图像,通过前述步骤可以分别获得目标图像的融合特征。此外,预先存有候选图像集,并通过前述步骤获得了候选图像集中的每幅图像的融合特征。进一步的,基于所述目标图像的融合特征和候选图像集中任意一幅候选图像的融合特征,通过现有技术中的相似度度量方法进行比较,即可确定所述目标图像和所述候选图像是否互为近似重复图像。
本申请实施例公开的近似重复图像检测方法,通过基于包括近似重复图像的图像样本,训练分类模型,然后,基于训练得到的所述分类模型的参数,初始化所述多任务网 络模型;并基于包括近似重复图像对和非近似重复图像对的图像对样本,训练所述多任务网络模型。通过多任务网络模型确定输入的目标图像的第一特征和第二特征;根据所述目标图像的第一特征和第二特征,构造所述目标图像的融合特征;最后,根据所述融合特征,确定所述目标图像是否为候选图像的近似重复图像。本申请公开的近似重复图像检测方法通过基于卷积神经网络进行分类模型的预训练,增大图像类间特征方差,并利用学习到的分类模型的参数微调分类及相似性度量多任务网络,进一步增大非近似重复图的类内方差,使得在图像检测过程中,将优化得到的反映类间差异与类内差异的图像特征和更细粒度的反映类内差异的图像特征进行拼接作为图像的融合特征,在增大类间特征方差的同时,可进一步增大非近似重复图的类内方差,有利于提升近似重复图像检测的准确率。
本申请公开的多任务网络模型中,通过将分类网络和相似度量网络输出的特征进行拼接,提升了特征的表达能力。同时,相比单纯使用反映类间差异与类内差异的图像特征,拼接后的图像特征受反映类内差异的特征影响较小,特征稳定性更高。
实施例三
本实施例公开的一种近似重复图像检测装置,如图7所示,所述装置包括:
特征提取模块710,用于通过多任务网络模型分别确定输入的目标图像的第一特征和第二特征,其中,所述第一特征包含反映类间差异和类内差异的图像特征,所述第二特征为反映类内差异的图像特征。
特征融合模块720,用于根据所述特征提取模块710提取的所述目标图像的所述第一特征和所述第二特征,构造所述目标图像的融合特征。
近似重复图像检测模块730,用于根据所述特征融合模块720确定的所述融合特征,确定所述目标图像是否为候选图像的近似重复图像。
可选的,所述多任务网络模型包括:分类网络和相似性度量网络,所述分类网络和所述相似性度量网络共享基础网络。可选的,所述多任务网络模型的训练方法为:通过求解同时满足所述分类网络和所述相似性度量网络的优化目标的网络参数,训练所述多任务网络模型,其中,所述分类网络的优化目标为增大近似重复图像之间的类间方差;所述相似性度量网络的优化目标为减小近似重复图像之间的类内方差和增大非近似重复图像之间的类内方差。
可选的,如图8所示,所述特征提取模块710进一步包括:
第一特征提取单元7101,用于通过所述基础网络确定所述目标图像的第一特征。
第二特征提取单元7102,用于通过所述相似性度量网络对所述第一特征进行卷积运算,确定输入图像对中目标图像的第二特征,其中,所述相似性度量网络的优化目标为减小近似重复图像之间的类内方差和增大非近似重复图像之间的类内方差。
本申请实施例公开的近似重复图像检测装置,通过多任务网络模型分别确定输入的目标图像的第一特征和第二特征,其中,所述第一特征包含反映类间差异和类内差异的图像特征,所述第二特征为反映类内差异的图像特征;根据所述目标图像的第一特征和第二特征,构造所述目标图像的融合特征;根据所述融合特征,确定所述目标图像是否为候选图像的近似重复图像。本申请公开的近似重复图像检测装置通过结合图像的类内信息和类间信息构建图像的融合特征,得到的图像融合特征更全面,进而提升了近似重复图像检测的准确率。
可选的,如图8所示,所述装置还包括:
分类模型训练模块740,用于基于包括多个近似重复图像的多个图像样本,训练分类模型。
多任务网络模型初始化模块750,用于基于训练得到的所述分类模型的参数,初始化所述多任务网络模型。
多任务网络模型训练模块760,用于基于包括多个近似重复图像对和多个非近似重复图像对的图像对样本,训练所述多任务网络模型,其中,所述近似重复图像对由原始图像和和所述原始图像经过预设图像处理后得到的至少一幅图像中的任意两幅图像组成,所述非近似重复图像对由不同的两幅图像组成。
可选的,所述图像对样本中包括匹配指定图像处理类型的图像对,所述指定图像处理类型通过以下方法确定:
通过训练得到的所述分类模型,确定测试图像对中每幅图像的图像特征;根据所述测试图像对中两幅图像的图像特征之间的距离,进行近似重复图像判别;根据对匹配不同图像处理类型的图像对进行近似重复图像判别的准确率,确定指定图像处理类型。
可选的,如图8所示,所述特征融合模块720进一步包括:
第三特征提取单元7201,用于通过对所述目标图像的所述第一特征进行卷积运算,确定所述目标图像的第三特征,其中,所述第三特征为反映类间差异的图像特征。
第一特征融合单元7202,用于根据目标图像的所述第三特征和所述第二特征,构造所述目标图像的融合特征。
可选的,所述第一特征融合模块7202还包括:
第一编码单元72021,用于确定所述目标图像的第三特征对应的哈希编码和所述第二特征对应的哈希编码。
所述第一特征拼接单元72022,还用于将所述第三特征对应的哈希编码和所述第二特征对应的哈希编码进行拼接,得到所述目标图像的融合特征。
在另一实施例中,如图9所示,所述特征融合模块720进一步包括:
第二编码单元7204,用于分别确定所述目标图像的第一特征对应的哈希编码和所述第二特征对应的哈希编码;
第二特征融合单元7205,用于将所述第一特征对应的哈希编码和所述第二特征对应的哈希编码进行拼接,得到所述目标图像的融合特征。
本申请公开的近似重复图像检测装置,通过基于卷积神经网络进行分类模型的预训练,用于增大图像类间特征方差,并利用学习到的分类模型的参数微调分类及相似性度量多任务网络,进一步增大非近似重复图的类内方差,使得在图像检测过程中,将优化得到的反映类间差异与类内差异的图像特征和更细粒度的反映类内差异的图像特征进行拼接做为图像的融合特征,在增大类间特征方差的同时,可进一步增大非近似重复图的类内方差,有利于提升近似重复图像检测的准确率。
本申请公开的多任务网络模型中,通过将分类网络和相似度量网络输出的特征进行拼接,提升了特征的表达能力。同时,相比单纯使用反映类间差异与类内差异的图像特征,拼接后的图像特征受反映类内差异的特征影响较小,特征稳定性更高。
相应的,本申请还公开了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如本申请实施例一和实施例二所述的近似重复图像检测方法。所述电子设备可以为PC机、移动终端、个人数字助理、平板电脑等。
本申请还公开了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请实施例一和实施例二所述的近似重复图像检测方法的步骤。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其 他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上对本申请提供的一种近似重复图像检测方法及装置进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件实现。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。

Claims (11)

  1. 一种近似重复图像检测方法,包括:
    通过多任务网络模型分别确定输入的目标图像的第一特征和第二特征,其中,所述第一特征包含反映类间差异和类内差异的图像特征,所述第二特征为反映类内差异的图像特征;
    根据所述目标图像的所述第一特征和所述第二特征,构造所述目标图像的融合特征;
    根据所述融合特征,确定所述目标图像是否为候选图像的近似重复图像。
  2. 根据权利要求1所述的方法,其特征在于,
    所述多任务网络模型包括:分类网络和相似性度量网络,所述分类网络和所述相似性度量网络共享基础网络,
    所述多任务网络模型的训练方法为:
    通过求解同时满足所述分类网络和所述相似性度量网络的优化目标的网络参数,训练所述多任务网络模型,其中,
    所述分类网络的优化目标为增大近似重复图像之间的类间方差;
    所述相似性度量网络的优化目标为减小近似重复图像之间的类内方差和增大非近似重复图像之间的类内方差。
  3. 根据权利要求2所述的方法,其特征在于,通过所述多任务网络模型分别确定所述目标图像的所述第一特征和所述第二特征,包括:
    通过所述基础网络确定所述目标图像的所述第一特征;
    通过所述相似性度量网络对所述第一特征进行卷积运算,确定所述目标图像的所述第二特征。
  4. 根据权利要求2或3所述的方法,其特征在于,通过所述多任务网络模型分别确定所述目标图像的第一特征和第二特征之前,还包括:
    基于包括多个近似重复图像的多个图像样本,训练分类模型;
    基于训练得到的所述分类模型的参数,初始化所述多任务网络模型;
    基于包括多个近似重复图像对和多个非近似重复图像对的图像对样本,训练所述多任务网络模型,其中,
    所述近似重复图像对由原始图像和所述原始图像经过预设图像处理后得到的至少一幅图像中的任意两幅图像组成,
    所述非近似重复图像对由不同的两幅图像组成。
  5. 根据权利要求4所述的方法,其特征在于,
    所述图像对样本中包括匹配指定图像处理类型的图像对,
    所述指定图像处理类型通过以下方法确定:
    通过训练得到的所述分类模型,确定测试图像对中每幅图像的图像特征;
    根据所述测试图像对中两幅图像的所述图像特征之间的距离,进行近似重复图像判别;
    根据对匹配不同图像处理类型的图像对进行近似重复图像判别的准确率,确定所述指定图像处理类型。
  6. 根据权利要求1所述的方法,其特征在于,根据所述目标图像的所述第一特征和所述第二特征,构造所述目标图像的所述融合特征,包括:
    通过对所述目标图像的所述第一特征进行卷积运算,确定所述目标图像的第三特征,其中,所述第三特征为反映类间差异的图像特征;
    根据所述目标图像的所述第三特征和所述第二特征,构造所述目标图像的所述融合特征。
  7. 根据权利要求6所述的方法,其特征在于,根据所述目标图像的所述第三特征和所述第二特征,构造所述目标图像的所述融合特征,还包括:
    分别确定所述目标图像的所述第三特征对应的哈希编码和所述第二特征对应的哈希编码;
    将所述第三特征对应的哈希编码和所述第二特征对应的哈希编码进行拼接,得到所述目标图像的所述融合特征。
  8. 根据权利要求1至3任一项所述的方法,其特征在于,根据所述目标图像的所述第一特征和所述第二特征,构造所述目标图像的所述融合特征,包括:
    分别确定所述目标图像的所述第一特征对应的哈希编码和所述第二特征对应的哈希编码;
    将所述第一特征对应的哈希编码和所述第二特征对应的哈希编码进行拼接,得到所述目标图像的所述融合特征。
  9. 一种近似重复图像检测装置,包括:
    特征提取模块,用于通过多任务网络模型分别确定输入的目标图像的第一特征和第二特征,其中,所述第一特征包含反映类间差异和类内差异的图像特征,所述第二特征为反映类内差异的图像特征;
    特征融合模块,用于根据所述特征提取模块提取的所述目标图像的所述第一特征和所述第二特征,构造所述目标图像的融合特征;
    近似重复图像检测模块,用于根据所述特征融合模块确定的所述融合特征,确定所述目标图像是否为候选图像的近似重复图像。
  10. 一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至8任意一项所述的近似重复图像检测方法。
  11. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至8任意一项所述的近似重复图像检测方法的步骤。
PCT/CN2018/122069 2018-03-30 2018-12-19 检测近似重复图像 WO2019184464A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/043,656 US20210019872A1 (en) 2018-03-30 2018-12-19 Detecting near-duplicate image
EP18911696.5A EP3772036A4 (en) 2018-03-30 2018-12-19 NEAR-DUPLICATE IMAGE DETECTION

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810277058.9A CN108665441B (zh) 2018-03-30 2018-03-30 一种近似重复图像检测方法及装置,电子设备
CN201810277058.9 2018-03-30

Publications (1)

Publication Number Publication Date
WO2019184464A1 true WO2019184464A1 (zh) 2019-10-03

Family

ID=63782025

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/122069 WO2019184464A1 (zh) 2018-03-30 2018-12-19 检测近似重复图像

Country Status (4)

Country Link
US (1) US20210019872A1 (zh)
EP (1) EP3772036A4 (zh)
CN (1) CN108665441B (zh)
WO (1) WO2019184464A1 (zh)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665441B (zh) * 2018-03-30 2019-09-17 北京三快在线科技有限公司 一种近似重复图像检测方法及装置,电子设备
JP7122625B2 (ja) * 2018-07-02 2022-08-22 パナソニックIpマネジメント株式会社 学習データ収集装置、学習データ収集システム、及び学習データ収集方法
CN109523532B (zh) * 2018-11-13 2022-05-03 腾讯医疗健康(深圳)有限公司 图像处理方法、装置、计算机可读介质及电子设备
WO2020110224A1 (ja) * 2018-11-28 2020-06-04 Eizo株式会社 情報処理方法及びコンピュータプログラム
CN109615017B (zh) * 2018-12-21 2021-06-29 大连海事大学 考虑多参考因素的Stack Overflow重复问题检测方法
CN111626085A (zh) * 2019-02-28 2020-09-04 中科院微电子研究所昆山分所 一种检测方法、装置、设备及介质
CN110189279B (zh) * 2019-06-10 2022-09-30 北京字节跳动网络技术有限公司 模型训练方法、装置、电子设备及存储介质
CN110413603B (zh) * 2019-08-06 2023-02-24 北京字节跳动网络技术有限公司 重复数据的确定方法、装置、电子设备及计算机存储介质
CN110413812B (zh) * 2019-08-06 2022-04-26 北京字节跳动网络技术有限公司 神经网络模型的训练方法、装置、电子设备及存储介质
TWI719713B (zh) * 2019-11-14 2021-02-21 緯創資通股份有限公司 物件偵測方法、電子裝置與物件偵測系統
CN111126264A (zh) * 2019-12-24 2020-05-08 北京每日优鲜电子商务有限公司 图像处理方法、装置、设备及存储介质
CN111860542B (zh) * 2020-07-22 2024-06-28 海尔优家智能科技(北京)有限公司 用于识别物品类别的方法及装置、电子设备
CN112418303B (zh) * 2020-11-20 2024-07-12 浙江大华技术股份有限公司 一种识别状态模型的训练方法、装置及计算机设备
CN113204664B (zh) * 2021-04-25 2022-11-04 北京三快在线科技有限公司 一种图像聚类方法及装置
US11875496B2 (en) * 2021-08-25 2024-01-16 Genpact Luxembourg S.à r.l. II Dimension estimation using duplicate instance identification in a multiview and multiscale system
CN116029556B (zh) * 2023-03-21 2023-05-30 支付宝(杭州)信息技术有限公司 一种业务风险的评估方法、装置、设备及可读存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930537A (zh) * 2012-10-23 2013-02-13 深圳市宜搜科技发展有限公司 一种图像检测方法及系统
CN107688823A (zh) * 2017-07-20 2018-02-13 北京三快在线科技有限公司 一种图像特征获取方法及装置,电子设备
CN108665441A (zh) * 2018-03-30 2018-10-16 北京三快在线科技有限公司 一种近似重复图像检测方法及装置,电子设备

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110085728A1 (en) * 2009-10-08 2011-04-14 Yuli Gao Detecting near duplicate images
US9111183B2 (en) * 2013-01-04 2015-08-18 International Business Machines Corporation Performing a comparison between two images which are scaled to a common resolution
CN103092935A (zh) * 2013-01-08 2013-05-08 杭州电子科技大学 一种基于sift量化的近似拷贝图像检测方法
US9530072B2 (en) * 2013-03-15 2016-12-27 Dropbox, Inc. Duplicate/near duplicate detection and image registration
US9218701B2 (en) * 2013-05-28 2015-12-22 Bank Of America Corporation Image overlay for duplicate image detection
CN106537379A (zh) * 2014-06-20 2017-03-22 谷歌公司 细粒度图像相似性
US9454713B2 (en) * 2014-12-30 2016-09-27 Ebay Inc. Similar item detection
US9824299B2 (en) * 2016-01-04 2017-11-21 Bank Of America Corporation Automatic image duplication identification
CN106056067B (zh) * 2016-05-27 2019-04-12 南京邮电大学 基于对应关系预测的低分辨率人脸图像识别方法
US11176423B2 (en) * 2016-10-24 2021-11-16 International Business Machines Corporation Edge-based adaptive machine learning for object recognition
CN106570141B (zh) * 2016-11-04 2020-05-19 中国科学院自动化研究所 近似重复图像检测方法
CN107330750B (zh) * 2017-05-26 2019-03-08 北京三快在线科技有限公司 一种推荐产品配图方法及装置,电子设备
US10726254B2 (en) * 2018-03-16 2020-07-28 Bank Of America Corporation Dynamic duplicate detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930537A (zh) * 2012-10-23 2013-02-13 深圳市宜搜科技发展有限公司 一种图像检测方法及系统
CN107688823A (zh) * 2017-07-20 2018-02-13 北京三快在线科技有限公司 一种图像特征获取方法及装置,电子设备
CN108665441A (zh) * 2018-03-30 2018-10-16 北京三快在线科技有限公司 一种近似重复图像检测方法及装置,电子设备

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CAO, YUDONG: "Similar Image Detection Method with Fusional Feature", COMPUTER TECHNOLOGY AND DEVELOPMEN T, vol. 22, no. 8, 31 August 2012 (2012-08-31), pages 103 - 106, XP055745392, ISSN: 1673-629X *
See also references of EP3772036A4 *

Also Published As

Publication number Publication date
US20210019872A1 (en) 2021-01-21
EP3772036A1 (en) 2021-02-03
CN108665441B (zh) 2019-09-17
CN108665441A (zh) 2018-10-16
EP3772036A4 (en) 2021-05-19

Similar Documents

Publication Publication Date Title
WO2019184464A1 (zh) 检测近似重复图像
CN112465008B (zh) 一种基于自监督课程学习的语音和视觉关联性增强方法
CN109815770B (zh) 二维码检测方法、装置及系统
JP5282658B2 (ja) 画像学習、自動注釈、検索方法及び装置
Johnson et al. Sparse coding for alpha matting
CN111814620B (zh) 人脸图像质量评价模型建立方法、优选方法、介质及装置
US20120269441A1 (en) Image quality assessment
CN105243376A (zh) 一种活体检测方法和装置
CN112651333B (zh) 静默活体检测方法、装置、终端设备和存储介质
CN113656660B (zh) 跨模态数据的匹配方法、装置、设备及介质
WO2010043954A1 (en) Method, apparatus and computer program product for providing pattern detection with unknown noise levels
KR20120066462A (ko) 얼굴 인식 방법 및 시스템, 얼굴 인식을 위한 학습용 특징 벡터 추출 장치 및 테스트용 특징 벡터 추출 장치
KR20190080388A (ko) Cnn을 이용한 영상 수평 보정 방법 및 레지듀얼 네트워크 구조
US20220292877A1 (en) Systems, methods, and storage media for creating image data embeddings to be used for image recognition
CN114548274A (zh) 一种基于多模态交互的谣言检测方法及系统
Peng et al. Document image quality assessment using discriminative sparse representation
CN105678349A (zh) 一种视觉词汇的上下文描述子生成方法
JP2006293720A (ja) 顔検出装置、顔検出方法、及び顔検出プログラム
CN111062338B (zh) 一种证照人像一致性比对方法及其系统
CN116048682B (zh) 一种终端系统界面布局对比方法及电子设备
CN111062199A (zh) 一种不良信息识别方法及装置
CN115862119A (zh) 基于注意力机制的人脸年龄估计方法及装置
Liu et al. Video retrieval based on object discovery
CN113569684A (zh) 短视频场景分类方法、系统、电子设备及存储介质
Chen et al. Big Visual Data Analysis: Scene Classification and Geometric Labeling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18911696

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2018911696

Country of ref document: EP