WO2023142753A1 - 图像相似性度量方法及其装置 - Google Patents

图像相似性度量方法及其装置 Download PDF

Info

Publication number
WO2023142753A1
WO2023142753A1 PCT/CN2022/139238 CN2022139238W WO2023142753A1 WO 2023142753 A1 WO2023142753 A1 WO 2023142753A1 CN 2022139238 W CN2022139238 W CN 2022139238W WO 2023142753 A1 WO2023142753 A1 WO 2023142753A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
depth
neural network
features
feature
Prior art date
Application number
PCT/CN2022/139238
Other languages
English (en)
French (fr)
Inventor
张培科
林永兵
马莎
万蕾
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023142753A1 publication Critical patent/WO2023142753A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Definitions

  • the embodiments of the present application relate to the field of machine vision, and more specifically, relate to an image similarity measurement method and device thereof.
  • Image quality assessment image quality assessment, IQA
  • IQA image quality assessment
  • Common methods include mean squared error (MSE), structure similarity measurement (SSIM) or multi-scale structural similarity measurement, etc., but often the evaluation results obtained by using the above evaluation methods do not conform to the human eye.
  • Perception that is to say, the distorted image a and the distorted image b caused by image processing of an original image and different methods or configurations. From the evaluation results, the quality of the distorted image a is better than that of b, but the image observed by the naked eye is better than b. a has better quality.
  • Embodiments of the present application provide a method and device for measuring image similarity, which can reduce computational complexity on the premise of ensuring the measurement effect.
  • a method for measuring image similarity comprising: acquiring depth features of the first image and depth features of the second image, the depth features including pixel features, and according to the depth features of the first image and The depth feature of the second image determines the similarity between the first image and the second image.
  • the depth features of two images are mainly used to obtain the similarity, instead of relying solely on pixels for image quality evaluation, so compared with methods such as MSE and SSIM, there will be no metric value contrary to human perception Case.
  • MSE and SSIM there will be no metric value contrary to human perception Case.
  • the structure is simpler, the amount of calculation is less, and the calculation complexity is lower.
  • the aforementioned depth features further include at least one of the following image features: edge features, texture features, structural features, brightness features, or color features.
  • the first neural network when acquiring the depth features of the first image and the depth features of the second image, can be used to perform feature extraction on the first image to obtain the first the depth feature of the image; and extract the feature of the second image by using the second neural network to obtain the depth feature of the second image.
  • the traditional deep neural network model needs, for example, 5-10 layers of convolutional network to obtain the perceptual distance, but after analysis, it is found that the depth features obtained by the shallower neural network include more image information (image features), while The depth features obtained by the deeper neural network are more related to the specific semantic perception task, so for the deep neural network that calculates the perception distance, the deep neural network, such as the 5th or 10th layer, etc., the depth features obtained are almost It is only related to semantic information, and has low correlation with image quality information.
  • the embodiment of the present application takes advantage of this feature, and only uses the first or second layer convolutional network of the typical deep neural network model to extract the depth features of the image. Such depth features include richer image features, and then These deep features calculate the perception distance, which simplifies the network structure. For example, a 10-layer deep neural network can be simplified into a 2-layer neural network, thereby greatly reducing the computational complexity.
  • the method of the present application adopts one to two layers of convolutional network, which reduces the calculation amount of the algorithm.
  • the depth features of the first image and the depth features of the second image can be analyzed by using the third neural network Convolving the residual of the first image and the second image to obtain the first perceptual distance, the first perceptual distance is used to represent the similarity, the larger the value of the first perceptual distance, the lower the similarity, conversely, the higher the value The smaller the value, the higher the similarity.
  • the third neural network may be a one-layer convolutional network, and the residual error may be a mean square error.
  • a big data training method is used to determine optimal parameters.
  • the neural network can be trained by using the training data to obtain the parameters of the neural network (or be understood as updating the parameters of the neural network), and the training data includes the image to be trained and the quality label of the image to be trained.
  • the quality label may be a score for the image to be trained. For example, if the quality score of the distorted image X is 75 points (out of 100 points), then X can be used as the image to be trained, and 75 points can be used as its label.
  • the neural network here may include any one or more neural networks in the embodiments of the present application, for example, may include at least one of the first neural network, the second neural network, the third neural network or the fourth neural network .
  • a second perceptual distance between the first image and the second image may be obtained; and the similarity may be obtained according to the first perceptual distance and the second perceptual distance.
  • the fourth neural network may be used to convolve the depth features of the first image and the depth features of the second image to obtain the second perceptual distance; and according to the first The perceived distance and the second perceived distance are used to obtain the similarity.
  • the fourth neural network may be a deformable convolutional network.
  • the structure and texture indexes can be calculated for the depth features of the first image and the depth features of the second image to obtain the structure and texture indexes, and the structure and texture indexes As the above-mentioned second perception distance. It is also possible to further perform weighted calculation on the structure and texture indexes to obtain the weighted structure and texture indexes, and use the weighted structure and texture indexes as the above-mentioned second perception distance.
  • the first perception distance and the second perception distance can be superimposed, and the superposition value can be used as a measure of similarity; the first perception distance and the second perception distance can also be weighted and then superimposed, and the weighted superposition value can be used as A measure of similarity.
  • an image similarity measuring device in a second aspect, includes various modules for executing the method in any one of the implementation manners in the first aspect.
  • a computing device which includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is used for Execute the method in any one of the implementation manners in the first aspect.
  • the device can be a vehicle-mounted terminal, a host computer, a computer, a server, a cloud device, and other devices or systems that need line-of-sight detection, or it can be a device installed in the above-mentioned devices or systems.
  • the device can also be a chip.
  • a computer-readable medium stores program code for execution by a device, where the program code includes a method for executing any one of the implementation manners in the first aspect.
  • a computer program product containing instructions is provided, and when the computer program product is run on a computer, it causes the computer to execute the method in any one of the implementation manners in the first aspect above.
  • a chip in a sixth aspect, includes a processor and a data interface, the processor reads the instructions stored on the memory through the data interface, and executes any one of the implementations in the first aspect above. method.
  • the chip may further include a memory, the memory stores instructions, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementation manners in the first aspect.
  • Fig. 1 is a schematic flowchart of an image similarity measurement method provided by an embodiment of the present application.
  • Fig. 2 is a schematic flowchart of an example of an image similarity measurement method provided by an embodiment of the present application.
  • Fig. 3 is a schematic flowchart of another example of the method for measuring image similarity provided by the embodiment of the present application.
  • Fig. 4 is a schematic flowchart of another example of the method for measuring image similarity provided by the embodiment of the present application.
  • Fig. 5 is a schematic block diagram of an image similarity measurement device provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a hardware structure of an image similarity measurement device provided by an embodiment of the present application.
  • Fig. 1 is a schematic flowchart of an image similarity measurement method provided by an embodiment of the present application.
  • the image similarity measurement method provided in the embodiment of the present application can be used for image quality detection of cameras, displays, etc., for example, a vehicle-mounted camera, display, and the like.
  • the solution of the embodiment of the present application may be set in the above-mentioned device in the form of software or hardware, or may be an image similarity measurement device independent of the above-mentioned device.
  • the first image may be a reference image
  • the second image may be an image whose image quality is to be evaluated, for example, an image obtained by using some application algorithm.
  • the first image may be an original image
  • the second image may be an image obtained by performing super-resolution processing or color flower processing on the first image.
  • first image and the second image may also be interchanged, that is, the first image may be an image whose image quality is to be evaluated, and the second image may be a reference image.
  • first image is mainly used as a reference image and the second image is an image to be evaluated as an example for introduction.
  • the first image and/or the second image may be acquired by a sensing device such as a camera or a camera, or the first image and/or the second image may be read from a storage device.
  • a sensing device such as a camera or a camera
  • the first image and/or the second image may be read from a storage device.
  • feature extraction may be performed on the acquired first image and the second image to obtain depth features of the first image and depth features of the second image.
  • the first neural network may be used to perform feature extraction on the first image to obtain the depth features of the first image;
  • the second neural network may be used to perform feature extraction on the second image to obtain the second image's deep features.
  • the first neural network and the second neural network can be the same or different, for example, both the first neural network and the second neural network can be a layer of convolutional network or deep neural network, and for example, the first neural network and the second neural network Neural networks can use the same or different convolution kernels.
  • Depth features can be understood as features including pixel features of images extracted using deep neural networks and the like.
  • the depth feature may include at least one of the following image features in addition to pixel features: edge features, texture features, structure features, brightness features, or color features.
  • pixels and pixel features are different.
  • a pixel is the pixel value itself
  • a pixel feature is the feature of some pixel domains obtained by feature extraction, which contains more comprehensive and extensive information.
  • pixel features can contain image information more comprehensively, while pixels are only single image information.
  • the residual of the depth features of the first image and the depth features of the second image can be convoluted by using the third neural network to obtain the first perceptual distance between the first image and the second image, and using the first perceptual distance
  • the third neural network may be a one-layer convolutional network, and the residual error may be a mean square error.
  • a second perceptual distance between the first image and the second image may be obtained; and the similarity may be obtained according to the first perceptual distance and the second perceptual distance.
  • the fourth neural network can be used to convolve the depth features of the first image and the depth features of the second image to obtain the second perceptual distance; and according to the first perceptual distance and the second perceptual distance, obtain similar Spend.
  • the fourth neural network may be a deformable convolutional network. For example, when using a generative adversarial network (GAN) to restore an image, it may cause distortion of false details in the image, so the depth features obtained in this application can be convolved with a deformable convolutional network to obtain the second perception distance, and then combined with the first perceptual distance obtained by using the above method of the present application, the similarity of the two images is obtained.
  • GAN generative adversarial network
  • the structure and texture index may be calculated on the depth features of the first image and the depth feature of the second image to obtain the structure and texture index, and the structure and texture index may be used as the second perception distance. It is also possible to further perform weighted calculation on the structure and texture indexes to obtain the weighted structure and texture indexes, and use the weighted structure and texture indexes as the above-mentioned second perception distance.
  • the first perceptual distance and the second perceptual distance can be superimposed, and the superimposed value can be used as a measure of similarity; the larger value of the first perceptual distance and the second perceptual distance can also be used as a measure of similarity ;
  • the first perception distance and the second perception distance can also be weighted and superimposed, and the weighted superposition value can be used as a measure of similarity.
  • the training data can be used to train the neural network to obtain the parameters of the neural network (or be understood as updating the parameters of the neural network).
  • the training data includes the image to be trained and the quality label of the image to be trained.
  • the quality label may be a score for the image to be trained. For example, if the quality score of the distorted image X is 75 points (out of 100 points), then X can be used as the image to be trained, and 75 points can be used as its label.
  • the neural network here may include any one or more neural networks in the embodiments of the present application, for example, may include at least one of the first neural network, the second neural network, the third neural network or the fourth neural network .
  • the embodiment of the present application uses a trained neural network to obtain the perception distance, which is more accurate than the traditional method of manually setting parameters.
  • the depth features of the two images are mainly used to obtain the similarity, instead of simply relying on pixels for image quality evaluation, so compared with MSE, SSIM and other methods, there will be no difference between the measurement value and the human eye. Perceived contradictory situations. Moreover, compared with the traditional deep neural network model to obtain the perception distance, it does not require many layers of neural networks, the structure is simpler, the amount of calculation is less, and the calculation complexity is lower.
  • the traditional deep neural network model needs, for example, 5-10 layers of neural network to obtain the perceptual distance, but after analysis, it is found that the depth features obtained by the shallower neural network include more image information (image features), and the more The depth features obtained by the deep neural network are more relevant to specific tasks, so for the deep neural network that calculates the perception distance, the deep neural network, such as the 5th or 10th layer, etc., the depth features obtained are almost only related to computational perception. The distance is related, and it is not very relevant to the image information.
  • the embodiment of the present application takes advantage of this feature, and only uses one layer of deep neural network to extract the depth features of the image.
  • Such depth features include richer image features, and then calculate the perception distance for these depth features, which is equivalent to Splitting tasks simplifies complex neural network structures. For example, a 10-layer deep neural network can be simplified into a 2-layer neural network, thereby greatly reducing computational complexity.
  • the method shown in Figure 1 uses one to two layers of convolutional networks, and the fewer network layers reduce the amount of algorithm calculations, as shown in Table 1, Table 1 is the application method and deep neural network
  • Table 1 is the application method and deep neural network
  • the method shown in Figure 1 is applied to the artificial intelligence model, such as the loss function of the target detection model and the semantic segmentation model for similarity measurement.
  • the artificial intelligence model such as the loss function of the target detection model and the semantic segmentation model for similarity measurement.
  • FIG. 1 will be further introduced below in conjunction with FIGS. 2 to 4 , which can be regarded as specific examples of FIG. 1 .
  • Fig. 2 is a schematic flowchart of an example of an image similarity measurement method provided by an embodiment of the present application.
  • the reference image is extracted using the deep feature extraction network #1 to obtain the deep feature #1
  • the distorted image is extracted using the deep feature extraction network #2 to obtain the deep feature #2.
  • Deep feature extraction network #1 can be seen as an example of the first neural network above
  • deep feature extraction network #2 can be seen as an example of the second neural network above
  • the reference image can be seen as an example of the first image
  • the distorted image can be seen as an example of the second neural network.
  • the depth feature residuals of depth feature #1 and depth feature #2 are calculated, and then the convolution network is used to convolve the depth feature residuals to obtain the perception distance.
  • a convolutional network can be seen as an example of the third neural network mentioned above.
  • each channel of the depth feature can be regarded as a different "image", and then the basic idea is to multiplex image quality assessment or similarity measurement methods on the pixel domain, such as MSE or SSIM.
  • Depth features can have, for example, 64 channels. Due to the existence of multiple channels, and different channels contain different information, such as a channel is mainly image edge information (edge features), a channel is mainly image texture information (texture features), and a channel is image color information (color features), In order to obtain the final similarity metric scalar value, it is necessary to fuse the information of different channels with a specific weight configuration, or perform weighted average of the metrics of different channels.
  • the method shown in Figure 2 can also be regarded as a weighted depth feature mean square error, which refers to the idea of MSE, and has a significant difference, that is, the difference is put into the convolutional network after being squared and before averaging, so that from 64 channels are converted into 1 channel, and then the weighted average is obtained, so it is more accurate than MSE.
  • a weighted depth feature mean square error refers to the idea of MSE
  • Fig. 3 is a schematic flowchart of another example of the method for measuring image similarity provided by the embodiment of the present application.
  • the perceptual distance is mainly obtained by combining the mean square error and the structure and texture indexes.
  • the reference image is extracted using the deep feature extraction network #1 to obtain the deep feature #1
  • the distorted image is extracted using the deep feature extraction network #2 to obtain the deep feature #2.
  • Deep feature extraction network #1 can be seen as an example of the first neural network above
  • deep feature extraction network #2 can be seen as an example of the second neural network above
  • the reference image can be seen as an example of the first image
  • the distorted image can be seen as an example of the second neural network.
  • the depth feature residuals of depth feature #1 and depth feature #2 are obtained, and then the convolution network is used to convolve the depth feature residuals to obtain the MSE index.
  • the MSE index here can be regarded as the first An example of a perceived distance.
  • a convolutional network can be seen as an example of the third neural network mentioned above.
  • the structure and texture indexes of depth feature #1 and depth feature #2 are calculated to obtain the structure and texture indexes, and then the weighted calculations are performed on the structure and texture indexes to obtain the weighted structure and texture indexes, here
  • the weighted structure and texture index of can be seen as an example of the second perceptual distance.
  • the method shown in Figure 3 can be used with the formula where STSIM represents the weighted structure and texture index, ⁇ and ⁇ represent the weight, s represents the structure index, t represents the texture index, the subscript i represents the feature layer, 0 refers to the original image, 1 refers to the depth feature, and j represents the channel .
  • the structural similarity index of image a and image b uses the formula Represents, where C is a constant, denote the pixel variance of image*, or ⁇ a,b denote the covariance of the pixel values of image a and image b.
  • the texture similarity index of image a and image b uses the formula Represents, where C is a constant 1e-6, ⁇ represents the average value of image pixels.
  • WFMSE weighted feature mean squared error
  • the results of the structure and texture indicators in each channel need to be weighted and summed to obtain the final structure and texture indicators. Therefore, it can also be called weighted feature structure similarity on the feature domain (WFSSIM), While WFSSIM has higher accuracy than SSIM, compared with similar methods such as DISTS, the computational complexity is reduced by about 130 times.
  • Fig. 4 is a schematic flowchart of another example of the method for measuring image similarity provided by the embodiment of the present application.
  • the reference image is extracted using the deep feature extraction network #1 to obtain the deep feature #1
  • the distorted image is extracted using the deep feature extraction network #2 to obtain the deep feature #2.
  • Deep feature extraction network #1 can be seen as an example of the first neural network above
  • deep feature extraction network #2 can be seen as an example of the second neural network above
  • the reference image can be seen as an example of the first image
  • the distorted image can be seen as an example of the second image.
  • the depth feature residuals of depth feature #1 and depth feature #2 are obtained, and then the convolution network is used to convolve the depth feature residuals to obtain the WFMSE index.
  • the WFMSE index here can be regarded as the first An example of a perceived distance.
  • a convolutional network can be seen as an example of the third neural network mentioned above.
  • the depth feature #1 and depth feature #2 are convolved using the deformed convolutional network to obtain the distance index #2, where the distance index #2 can be regarded as an example of the second perception distance, Deformable convolutional networks can be seen as an example of a fourth neural network.
  • the perceived distance is obtained by superimposing the WFMSE index and the distance index #2.
  • the method shown in Figure 4 can be used to evaluate and improve the performance of GAN networks.
  • Fig. 5 is a schematic block diagram of an image similarity measurement device provided by an embodiment of the present application.
  • the image similarity measurement device 2000 shown in FIG. 5 includes an acquisition unit 2001 and a processing unit 2002 .
  • the acquisition unit 2001 and the processing unit 2002 may be used to implement the image similarity measurement method of the embodiment of the present application, specifically, the acquisition unit 2001 may perform the above step 101, and the processing unit 2002 may perform the above step 102. It should be understood that the processing unit 2002 in the above device 2000 may be equivalent to the processor 3002 in the device 3000 hereinafter.
  • FIG. 6 is a schematic diagram of a hardware structure of an image similarity measurement device provided by an embodiment of the present application.
  • the image similarity measurement apparatus 3000 shown in FIG. 6 includes a memory 3001 , a processor 3002 , a communication interface 3003 and a bus 3004 .
  • the memory 3001 , the processor 3002 , and the communication interface 3003 are connected to each other through a bus 3004 .
  • the memory 3001 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM).
  • the memory 3001 may store a program, and when the program stored in the memory 3001 is executed by the processor 3002, the processor 3002 and the communication interface 3003 are used to execute each step of the method for measuring image similarity in the embodiment of the present application.
  • the processor 3002 can be a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processing unit (graphics processing unit, GPU), field programmable gate Array (field programmable gate array, FPGA) or one or more integrated circuits, used to execute related programs, to realize the functions required by the units in the image similarity measurement device of the embodiment of the present application, or to implement the method of the present application An example image similarity measure method.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • graphics processing unit graphics processing unit
  • GPU field programmable gate Array
  • FPGA field programmable gate array
  • the processor 3002 may also be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the image similarity measurement method of the present application can be completed by an integrated logic circuit of hardware in the processor 3002 or instructions in the form of software.
  • the above-mentioned processor 3002 may also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • DSP digital signal processing
  • ASIC digital signal processing
  • FPGA field-programmable gate circuits
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 3001, and the processor 3002 reads the information in the memory 3001, and combines its hardware to complete the functions required by the units included in the image similarity measurement device of the embodiment of the application, or execute the method of the embodiment of the application. Image similarity measurement method.
  • the communication interface 3003 implements communication between the apparatus 3000 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver.
  • a transceiver device such as but not limited to a transceiver.
  • the image to be evaluated or the depth feature of the image to be evaluated may be obtained through the communication interface 3003 .
  • the bus 3004 may include a pathway for transferring information between various components of the device 3000 (eg, memory 3001 , processor 3002 , communication interface 3003 ).
  • the device 3000 shown in FIG. 6 only shows a memory, a processor, and a communication interface
  • the device 3000 also includes other devices necessary for normal operation during the specific implementation process.
  • the apparatus 3000 may also include hardware devices for implementing other additional functions.
  • the device 3000 may also only include the devices necessary to realize the embodiment of the present application, and does not necessarily include all the devices shown in FIG. 6 .
  • the disclosed systems, methods and devices can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: Universal Serial Bus flash disk (UFD), UFD can also be referred to as U disk or USB flash drive, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., which can store program codes. medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供一种图像相似性度量方法和图像相似性度量装置,涉及人工智能领域,包括:获取第一图像的深度特征和第二图像的深度特征,所述深度特征包括像素特征,以及根据第一图像的深度特征和第二图像的深度特征,确定第一图像和第二图像的相似度。该方案主要利用两个图像的深度特征来得到相似度,不是单纯依赖像素进行图像质量评价,所以不会出现度量值与人眼感知相悖的情况。而且与传统的深度神经网络模型来求取感知距离相比,不需要很多层神经网络,结构更加简单,运算量少,计算复杂度显著降低。

Description

图像相似性度量方法及其装置
本申请要求于2022年1月27日提交中国专利局、申请号为202210097126.X、申请名称为“图像相似性度量方法及其装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及机器视觉领域,并且更具体地,涉及一种图像相似性度量方法及其装置。
背景技术
图像质量评价(image quality assessment,IQA),也可以称之为相似性度量,广泛存在于对图像处理结果的评价、图像或视频编解码质量的评价等场景中。常见方法有均方误差(mean squared error,MSE)、结构相似性度量(structure similarity measurement,SSIM)或多尺度结构相似性度量等,但常会出现利用上述评价方法得到的评价结果并不符合人眼感知,也就是说,一张原图和不同方法或配置的图像处理导致的失真图像a和失真图像b,从评价结果来看失真图像a比b质量是好的,但肉眼观察图像是b比a有较好的质量。
针对上述缺点,出现了使用深度神经网络模型来在深度特征空间上求感知距离的方法,利用基于人眼感知的标注数据对模型进行训练。这种方法虽然一定程度上克服了传统方法的缺点,但是计算复杂度高,适用场景非常局限。
因此如何在保证度量效果的前提下,降低计算复杂度是亟待解决的技术问题。
发明内容
本申请实施例提供一种图像相似性度量方法及其装置,能够在保证度量效果的前提下,降低计算复杂度。
第一方面,提供了一种图像相似性度量方法,该方法包括:获取第一图像的深度特征和第二图像的深度特征,所述深度特征包括像素特征,以及根据第一图像的深度特征和第二图像的深度特征,确定第一图像和第二图像的相似度。
在本申请方案中,主要利用两个图像的深度特征来得到相似度,不是单纯依赖像素进行图像质量评价,所以相较于MSE、SSIM等方法来说,不会出现度量值与人眼感知相悖的情况。而且与传统的深度神经网络模型来求取感知距离相比,不需要很多层神经网络,结构更加简单,运算量少,计算复杂度较低。
结合第一方面,在第一方面的某些实现方式中,上述深度特征还包括以下至少一种图像特征:边缘特征、纹理特征、结构特征、亮度特征或颜色特征。通过提取更多种类的图像特征,并用于进行相似性度量,能够有效提高相似性度量准确性。
结合第一方面,在第一方面的某些实现方式中,在获取第一图像的深度特征和第二图 像的深度特征时,可以利用第一神经网络对第一图像进行特征提取,得到第一图像的深度特征;以及利用第二神经网络对第二图像进行特征提取,得到第二图像的深度特征。传统的深度神经网络模型来求取感知距离需要例如5-10层卷积网络,但经过分析发现,越浅层的神经网络所得到的深度特征越包括更多的图像信息(图像特征),而越深层的神经网络所得到的深度特征越与具体的语义感知任务相关,所以对于求取感知距离的深度神经网络,深层的神经网络,例如第5层或第10层等,得到的深度特征几乎只与语义信息相关,与图像质量信息已经相关性较低。本申请实施例则利用这个特点,只取用典型深度神经网络模型的第一或与第二层卷积网络的来提取图像的深度特征,这样的深度特征包括有更加丰富的图像特征,再对这些深度特征求取感知距离,使得网络结构得到简化,例如可以将一个10层的深度神经网络简化成2层的神经网络,从而大大减少计算复杂度。
如前所述,相比MSE方法使用深度特征代替像素域,增加感知信息维度(深度特征还包括纹理特征等其他图像特征)代替单一的、孤立的像素差异去度量感知距离,能减少评测结果与人眼视觉主观质量评测结果相矛盾的问题,因此优于MSE;通过训练确定各维度的特征距离的权重参数,能尽最大效能取逼近人眼主观视觉的感知特性。同时相比深度神经网络模型,本申请的方法采用一到两层卷积网络,降低了算法运算量。
结合第一方面,在第一方面的某些实现方式中,在确定第一图像和第二图像的相似度时,可以利用第三神经网络对第一图像的深度特征和第二图像的深度特征的残差进行卷积,得到第一图像和第二图像的第一感知距离,第一感知距离用于表示相似度,第一感知距离的数值越大,说明相似度越低,反之,数值越小,说明相似度越高。第三神经网络可以是一层卷积网络,上述残差可以是均方误差。
结合第一方面,在第一方面的某些实现方式中,为确定本申请方案涉及的参数集合的具体数值集合,使用大数据训练方式确定最优参数。相比于传统的手动设置参数,更加准确、灵活,性能更好。也就是说,可以利用训练数据对神经网络进行训练,得到神经网络的参数(或者理解为更新神经网络的参数),训练数据包括待训练图像和待训练图像的质量标签。该质量标签可以是对于待训练图像的打分。举例说明,例如失真图像X的质量分数是75分(满分100分),则X就可以作为待训练图像,75分作为其标签。
应理解,此处神经网络可以包括本申请实施例中的任意一个或多个神经网络,例如可以包括第一神经网络、第二神经网络、第三神经网络或第四神经网络中的至少一个网络。
可选地,可以得到第一图像和第二图像的第二感知距离;以及根据第一感知距离和第二感知距离,得到相似度。
结合第一方面,在第一方面的某些实现方式中,可以利用第四神经网络对第一图像的深度特征和第二图像的深度特征进行卷积,得到第二感知距离;以及根据第一感知距离和第二感知距离,得到相似度。第四神经网络可以为变形卷积网络。
结合第一方面,在第一方面的某些实现方式中,可以对第一图像的深度特征和第二图像的深度特征进行,结构与纹理指标计算,得到结构与纹理指标,将结构与纹理指标作为上述第二感知距离。还可以对结构与纹理指标进一步进行加权计算,得到加权后的结构与纹理指标,以及将加权后的结构与纹理指标作为上述第二感知距离。
可选地,可以将第一感知距离和第二感知距离进行叠加,将叠加值作为相似度的衡量数值;还可以将第一感知距离和第二感知距离进行加权后叠加,将加权叠加值作为相似度 的衡量数值。
第二方面,提供一种图像相似性度量装置,该装置包括用于执行第一方面中的任意一种实现方式的方法的各个模块。
第三方面,提供一种计算装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第一方面中的任意一种实现方式中的方法。该装置可以为车载终端、主机、电脑、服务器、云端设备等各类需要进行视线检测的设备或系统,也可以是设置在上述设备或系统中的装置。该装置还可以为芯片。
第四方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行第一方面中的任意一种实现方式中的方法。
第五方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面中的任意一种实现方式中的方法。
第六方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面中的任意一种实现方式中的方法。
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面中的任意一种实现方式中的方法。
附图说明
图1是本申请实施例提供的图像相似性度量方法的示意性流程图。
图2是本申请实施例提供的图像相似性度量方法的一种示例的示意性流程图。
图3是本申请实施例提供的图像相似性度量方法的另一种示例的示意性流程图。
图4是本申请实施例提供的图像相似性度量方法的又一种示例的示意性流程图。
图5是本申请实施例提供的图像相似性度量装置的示意性框图。
图6是本申请实施例提供的图像相似性度量装置的硬件结构示意图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
图1是本申请实施例提供的图像相似性度量方法的示意性流程图。本申请实施例提供的图像相似性度量方法可以用于摄像头、显示器等的图像质量检测,例如可以是车载摄像头、显示器等。本申请实施例的方案可以以软件或硬件的形式,设置于上述设备中,也可以是独立于上述设备的图像相似性度量设备。
101、获取第一图像的深度特征和第二图像的深度特征,所述深度特征包括像素特征。
可选地,第一图像可以是参考图像,第二图像可以是待评价图像质量的图像,例如可以是利用某些应用算法得到的图像。举例说明,第一图像可以是原图,第二图像可以是将第一图像进行超分辨率处理或者彩色花处理等得到的图像。
需要说明的是,第一图像和第二图像也可以互换,也就是说,第一图像可以是待评价图像质量的图像,第二图像可以是参考图像。为了便于理解,在下文中,主要以第一图像为参考图像,第二图像为待评价图像为例进行介绍。
可选地,可以利用摄像头、相机等感知设备获取第一图像和/或第二图像,也可以从存储设备中读取第一图像和/或第二图像。
可选地,可以对获取的第一图像和第二图像进行特征提取,得到第一图像的深度特征和第二图像的深度特征。例如,可以利用第一神经网络对第一图像进行特征提取,得到第一图像的深度特征;利用第二神经网络对第二图像进行特征提取,得到第二图像的深度特征。第一神经网络和第二神经网络可以相同也可以不相同,例如,第一神经网络和第二神经网络均可以是一层卷积网络或深度神经网络,又例如,第一神经网络和第二神经网络可以采用相同或者不同的卷积核。
深度特征可以理解为利用深度神经网络等提取的包括有图像的像素特征的特征。在一些实现方式中,深度特征除了包括像素特征这一图像特征以外,还可以包括以下至少一种图像特征:边缘特征、纹理特征、结构特征、亮度特征或颜色特征。通过提取更多种类的图像特征,并用于进行相似性度量,能够有效提高相似性度量准确性。
还应理解,像素和像素特征是不同的,我们常说的像素是像素值本身,而像素特征则是特征提取得到的一些像素域的特征,包含的信息更全面更广泛。相当于,像素特征可以更加全面的包含图像信息,而像素则只是单一的图像信息。
102、根据第一图像的深度特征和第二图像的深度特征,确定第一图像和第二图像的相似度。
可选地,可以利用第三神经网络对第一图像的深度特征和第二图像的深度特征的残差进行卷积,得到第一图像和第二图像的第一感知距离,利用第一感知距离的数值来表示相似度,第一感知距离的数值越大,说明相似度越低,反之,数值越小,说明相似度越高。第三神经网络可以是一层卷积网络,上述残差可以是均方误差。
可选地,可以得到第一图像和第二图像的第二感知距离;以及根据第一感知距离和第二感知距离,得到相似度。
在一些实现方式中,可以利用第四神经网络对第一图像的深度特征和第二图像的深度特征进行卷积,得到第二感知距离;以及根据第一感知距离和第二感知距离,得到相似度。第四神经网络可以为变形卷积网络。例如在利用生成对抗网络(generative adversarialnetwork,GAN)对图像进行恢复时,可能会导致图像出现虚假细节的失真情况,就可以将本申请得到的深度特征利用变形卷积网络进行卷积得到第二感知距离,然后结合利用本申请的上述方法得到的第一感知距离,得到两个图像的相似度。
在另一些实现方式中,可以对第一图像的深度特征和第二图像的深度特征进行,结构与纹理指标计算,得到结构与纹理指标,将结构与纹理指标作为上述第二感知距离。还可以对结构与纹理指标进一步进行加权计算,得到加权后的结构与纹理指标,以及将加权后的结构与纹理指标作为上述第二感知距离。
可选地,可以将第一感知距离和第二感知距离进行叠加,将叠加值作为相似度的衡量数值;还可以将第一感知距离和第二感知距离的较大值作为相似度的衡量数值;还可以将第一感知距离和第二感知距离进行加权后叠加,将加权叠加值作为相似度的衡量数值。
在图1所示方法中,为确定本申请方案涉及所申请方法中的参数集合的具体数值集合,使用大数据训练方式确定最优参数。相比于传统的手动设置参数,更加准确、灵活,性能更好。也就是说,可以利用训练数据对神经网络进行训练,得到神经网络的参数(或者理 解为更新神经网络的参数),训练数据包括待训练图像和待训练图像的质量标签。该质量标签可以是对于待训练图像的打分。举例说明,例如失真图像X的质量分数是75分(满分100分),则X就可以作为待训练图像,75分作为其标签。应理解,此处神经网络可以包括本申请实施例中的任意一个或多个神经网络,例如可以包括第一神经网络、第二神经网络、第三神经网络或第四神经网络中的至少一个网络。换言之,本申请实施例采用训练好的神经网络来求取感知距离,相比于手动设置参数的传统求取方式更加准确。
在图1所示方法中,主要利用两个图像的深度特征来得到相似度,不是单纯依赖像素进行图像质量评价,所以相较于MSE、SSIM等方法来说,不会出现度量值与人眼感知相悖的情况。而且与传统的深度神经网络模型来求取感知距离相比,不需要很多层神经网络,结构更加简单,运算量少,计算复杂度较低。传统的深度神经网络模型来求取感知距离需要例如5-10层神经网络,但经过分析发现,越浅层的神经网络所得到的深度特征越包括更多的图像信息(图像特征),而越深层的神经网络所得到的深度特征越与具体任务相关,所以对于求取感知距离的深度神经网络,深层的神经网络,例如第5层或第10层等,得到的深度特征几乎只与计算感知距离相关,与图像信息已经不太相关。而本申请实施例则利用这个特点,只取用一层的深度神经网络来提取图像的深度特征,这样的深度特征包括有更加丰富的图像特征,再对这些深度特征求取感知距离,相当于将任务进行拆分,使得复杂的神经网络结构得到简化,例如可以将一个10层的深度神经网络简化成2层的神经网络,从而大大减少计算复杂度。
如前所述,相比MSE方法使用深度特征代替像素域,增加感知信息维度(深度特征还包括纹理特征等其他图像特征)代替单一的、孤立的像素差异去度量感知距离,能减少评测结果与人眼视觉主观质量评测结果偶有矛盾的问题,因此优于MSE;通过训练感知距离融合的权重参数,能尽最大效能取逼近人眼主观视觉的感知距离度量特性。同时相比深度神经网络模型,图1所示方法采用一到两层卷积网络,较少的网络层数降低了算法运算量,如表1所示,表1是本申请方法与深度神经网络模型学习的图像块似性(learned perceptual image patch similarity,LPIPS)和深度的图像结构与纹理相似性(deep image structure and texture similarity,DISTS)的运算量比较结果。
表1
方法 运算量(FLOPs)
DISTS 40.125G
LPIPS 40.141G
本申请提出的新方法 0.309G
此外,将图1所示方法应用于人工智能模型,例如目标检测模型和语义分割模型的损失函数中做相似性度量,实验测试发现本申请相比其他方法,能够提升被训练的人工智能模型编解码结果图像的感知精度。
为了进一步理解,下面结合图2至图4对图1进行进一步介绍,图2至图4可以看作是图1的具体示例。
图2是本申请实施例提供的图像相似性度量方法的一种示例的示意性流程图。如图2所示,利用深度特征提取网络#1对参考图像进行特征提取,得到深度特征#1,利用深度特征提取网络#2对失真图像进行特征提取,得到深度特征#2。深度特征提取网络#1可以 看作是上述第一神经网络的一个示例,深度特征提取网络#2可以看作是上述第二神经网络的一个示例,参考图像可以看作是第一图像的一个示例,失真图像可以看作是第二神经网络的一个示例。
如图2所示,求取深度特征#1和深度特征#2的深度特征残差,之后利用卷积网络对深度特征残差进行卷积得到感知距离。卷积网络可以看作是上述第三神经网络的一个示例。
在图2所示方法中,可将深度特征的每个通道视为不同的“图像”,然后基础思路是复用像素域上的图像质量评估或相似性度量方法,如MSE或SSIM等。深度特征例如可以有64通道。由于多个通道的存在,且不同通道包含不同信息,如某通道主要是图像边缘信息(边缘特征),某通道主要是图像纹理信息(纹理特征),某通道是图像颜色信息(颜色特征),为获得最后的相似性度量标量数值,需要以特定权重配置融合不同通道的信息,或者对不同通道的度量指标进行加权平均。
图2所示方法还可以看作是加权的深度特征均方误差,其参考了MSE的思路,又有显著不同点,即所求差在被平方后而求平均前投入卷积网络,使得从64通道化为1通道,然后求得加权均值,因此比MSE更加准确。
图3是本申请实施例提供的图像相似性度量方法的另一种示例的示意性流程图。在图3所示方法中,主要是将均方误差和结构与纹理指标进行结合来得到感知距离。
如图3所示,利用深度特征提取网络#1对参考图像进行特征提取,得到深度特征#1,利用深度特征提取网络#2对失真图像进行特征提取,得到深度特征#2。深度特征提取网络#1可以看作是上述第一神经网络的一个示例,深度特征提取网络#2可以看作是上述第二神经网络的一个示例,参考图像可以看作是第一图像的一个示例,失真图像可以看作是第二神经网络的一个示例。
如图3所示,求取深度特征#1和深度特征#2的深度特征残差,之后利用卷积网络对深度特征残差进行卷积得到MSE指标,此处的MSE指标可以看作是第一感知距离的一个示例。卷积网络可以看作是上述第三神经网络的一个示例。
如图3所示,对深度特征#1和深度特征#2进行结构与纹理指标计算,得到结构与纹理指标,之后对结构与纹理指标进行加权计算,得到加权后的结构与纹理指标,此处的加权后的结构与纹理指标可以看作是第二感知距离的一个示例。
图3所示方法可以用公式
Figure PCTCN2022139238-appb-000001
表示,其中,STSIM表示加权后的结构与纹理指标,α和β表示权重,s表示结构指标,t表示纹理指标,下标i表示特征层,0指原图,1指深度特征,j表示通道。图像a和图像b的结构相似性指标用公式
Figure PCTCN2022139238-appb-000002
表示,其中C为常数,
Figure PCTCN2022139238-appb-000003
表示图像*的像素方差,或σ a,b表示图像a和图像b像素值的协方差。图像a和图像b的纹理相似性指标用公式
Figure PCTCN2022139238-appb-000004
表示,其中C为常数1e-6,μ表示图像像素的平均值。
如图3所示,将特征域上加权均方误差(weighted feature mean squared error,WFMSE)指标和加权后的结构与纹理指标进行叠加,得到感知距离。在图3方法中,结构和纹理指标在各个通道的结果需要加权求和以得最终的结构和纹理指标,因此,也可以称为特征域 上加权结构相似性(weighted feature structure similarity,WFSSIM),WFSSIM比SSIM具有更高精度的同时,相比DISTS等类似方法,计算复杂度降低约130倍。
图4是本申请实施例提供的图像相似性度量方法的又一种示例的示意性流程图。如图4所示,利用深度特征提取网络#1对参考图像进行特征提取,得到深度特征#1,利用深度特征提取网络#2对失真图像进行特征提取,得到深度特征#2。深度特征提取网络#1可以看作是上述第一神经网络的一个示例,深度特征提取网络#2可以看作是上述第二神经网络的一个示例,参考图像可以看作是第一图像的一个示例,失真图像可以看作是第二图像的一个示例。
如图4所示,求取深度特征#1和深度特征#2的深度特征残差,之后利用卷积网络对深度特征残差进行卷积得到WFMSE指标,此处的WFMSE指标可以看作是第一感知距离的一个示例。卷积网络可以看作是上述第三神经网络的一个示例。
如图4所示,利用变形卷积网络对深度特征#1和深度特征#2进行卷积,得到距离指标#2,此处的距离指标#2可以看作是第二感知距离的一个示例,变形卷积网络可以看作是第四神经网络的一个示例。
如图4所示,将WFMSE指标和距离指标#2进行叠加,得到感知距离。图4所示方法能够用于评价和提高GAN网络的性能。
下面结合附图对本申请实施例的图像相似性度量装置进行介绍。
图5是本申请实施例提供的图像相似性度量装置的示意性框图。图5所示的图像相似性度量装置2000包括获取单元2001和处理单元2002。
获取单元2001和处理单元2002可以用于执行本申请实施例的图像相似性度量方法,具体地,获取单元2001可以执行上述步骤101,处理单元2002可以执行上述步骤102。应理解,上述装置2000中的处理单元2002可以相当于下文中的装置3000中的处理器3002。
图6是本申请实施例提供的图像相似性度量装置的硬件结构示意图。图6所示的图像相似性度量装置3000(该装置3000具体可以是一种计算机设备)包括存储器3001、处理器3002、通信接口3003以及总线3004。其中,存储器3001、处理器3002、通信接口3003通过总线3004实现彼此之间的通信连接。
存储器3001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器3001可以存储程序,当存储器3001中存储的程序被处理器3002执行时,处理器3002和通信接口3003用于执行本申请实施例的图像相似性度量方法的各个步骤。
处理器3002可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU),现场可编程门阵列(field programmable gate array,FPGA)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的图像相似性度量装置中的单元所需执行的功能,或者执行本申请方法实施例的图像相似性度量方法。
处理器3002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的图像相似性度量方法的各个步骤可以通过处理器3002中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器3002还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻 辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器3001,处理器3002读取存储器3001中的信息,结合其硬件完成本申请实施例的图像相似性度量装置中包括的单元所需执行的功能,或者执行本申请方法实施例的图像相似性度量方法。
通信接口3003使用例如但不限于收发器一类的收发装置,来实现装置3000与其他设备或通信网络之间的通信。例如,可以通过通信接口3003获取待评价图像或者获取待评价图像的深度特征。
总线3004可包括在装置3000各个部件(例如,存储器3001、处理器3002、通信接口3003)之间传送信息的通路。
应注意,尽管图6所示的装置3000仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置3000还包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置3000还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置3000也可仅仅包括实现本申请实施例所必须的器件,而不必包括图6中所示的全部器件。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同装置来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机 软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:通用串行总线闪存盘(USB flash disk,UFD),UFD也可以简称为U盘或者优盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (15)

  1. 一种图像相似性度量方法,其特征在于,包括:
    获取第一图像的深度特征和第二图像的所述深度特征,所述深度特征包括像素特征;
    根据所述第一图像的深度特征和所述第二图像的深度特征,确定所述第一图像和所述第二图像的相似度。
  2. 根据权利要求1所述的方法,其特征在于,所述深度特征还包括以下至少一种图像特征:边缘特征、纹理特征、结构特征、亮度特征或颜色特征。
  3. 根据权利要求1或2所述的方法,其特征在于,所述获取第一图像的深度特征和第二图像的所述深度特征,包括:
    利用第一神经网络对所述第一图像进行特征提取,得到所述第一图像的深度特征;
    利用第二神经网络对所述第二图像进行特征提取,得到所述第二图像的深度特征。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,根据所述第一图像的深度特征和所述第二图像的深度特征,确定所述第一图像和所述第二图像的相似度,包括:
    利用第三神经网络对所述第一图像的深度特征和所述第二图像的深度特征的残差进行卷积,得到所述第一图像和所述第二图像的第一感知距离,所述第一感知距离用于表示所述相似度。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    利用第四神经网络对所述第一图像的深度特征和所述第二图像的深度特征进行卷积,得到所述第一图像和所述第二图像的第二感知距离,所述第四神经网络为变形卷积网络;
    根据所述第一感知距离和所述第二感知距离,得到所述相似度。
  6. 根据权利要求4或5所述的方法,其特征在于,所述第三神经网络和/或所述第四神经网络的参数是利用数据训练的方法得到的,所述训练数据包括待训练图像和所述待训练图像的质量标签。
  7. 一种图像相似性度量装置,其特征在于,包括:
    获取单元,用于获取所述第一图像的深度特征和所述第二图像的深度特征,所述深度特征包括像素特征;
    处理单元,用于根据所述第一图像的深度特征和所述第二图像的深度特征,确定所述第一图像和所述第二图像的相似度。
  8. 根据权利要求7所述的装置,其特征在于,所述深度特征还包括以下至少一种图像特征:边缘特征、纹理特征、结构特征、亮度特征或颜色特征。
  9. 根据权利要求7或8所述的装置,其特征在于,所述获取单元具体用于:
    利用第一神经网络对所述第一图像进行特征提取,得到所述第一图像的深度特征;
    利用第二神经网络对所述第二图像进行特征提取,得到所述第二图像的深度特征。
  10. 根据权利要求7至9中任一项所述的装置,其特征在于,所述处理单元具体用于:
    利用第三神经网络对所述第一图像的深度特征和所述第二图像的深度特征的残差进行卷积,得到所述第一图像和所述第二图像的第一感知距离,所述第一感知距离用于表示所述相似度。
  11. 根据权利要求10所述的装置,其特征在于,所述处理单元还用于:
    利用第四神经网络对所述第一图像的深度特征和所述第二图像的深度特征进行卷积,得到所述第一图像和所述第二图像的第二感知距离,所述第四神经网络为变形卷积网络;
    根据所述第一感知距离和所述第二感知距离,得到所述相似度。
  12. 根据权利要求10或11所述的装置,其特征在于,所述第三神经网络和/或所述第四神经网络的参数是利用数据训练的方法得到的,所述训练数据包括待训练图像和所述待训练图像的质量标签。
  13. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行如权利要求1至6中任一项所述方法的指令。
  14. 一种计算装置,其特征在于,所述装置包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,以执行如权利要求1至6中任一项所述的方法。
  15. 一种计算机程序产品,其特征在于,当所述计算机程序在计算机上执行时,使得所述计算机执行如权利要求1至6中任一项所述的方法。
PCT/CN2022/139238 2022-01-27 2022-12-15 图像相似性度量方法及其装置 WO2023142753A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210097126.XA CN116563193A (zh) 2022-01-27 2022-01-27 图像相似性度量方法及其装置
CN202210097126.X 2022-01-27

Publications (1)

Publication Number Publication Date
WO2023142753A1 true WO2023142753A1 (zh) 2023-08-03

Family

ID=87470374

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/139238 WO2023142753A1 (zh) 2022-01-27 2022-12-15 图像相似性度量方法及其装置

Country Status (2)

Country Link
CN (1) CN116563193A (zh)
WO (1) WO2023142753A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116883698A (zh) * 2023-09-07 2023-10-13 腾讯科技(深圳)有限公司 一种图像对比方法及相关装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528776A (zh) * 2015-08-07 2016-04-27 上海仙梦软件技术有限公司 针对jpeg图像格式的显著性细节保持的质量评价方法
CN107578404A (zh) * 2017-08-22 2018-01-12 浙江大学 基于视觉显著特征提取的全参考立体图像质量客观评价方法
CN111311595A (zh) * 2020-03-16 2020-06-19 清华大学深圳国际研究生院 一种图像质量的无参考质量评价方法及计算机可读存储介质
CN112052868A (zh) * 2020-06-15 2020-12-08 上海集成电路研发中心有限公司 模型训练方法、图像相似度度量方法、终端及存储介质
CN112529846A (zh) * 2020-11-25 2021-03-19 深圳市商汤科技有限公司 图像处理方法及装置、电子设备和存储介质
WO2021083241A1 (zh) * 2019-10-31 2021-05-06 Oppo广东移动通信有限公司 人脸图像质量评价方法、特征提取模型训练方法、图像处理系统、计算机可读介质和无线通信终端

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528776A (zh) * 2015-08-07 2016-04-27 上海仙梦软件技术有限公司 针对jpeg图像格式的显著性细节保持的质量评价方法
CN107578404A (zh) * 2017-08-22 2018-01-12 浙江大学 基于视觉显著特征提取的全参考立体图像质量客观评价方法
WO2021083241A1 (zh) * 2019-10-31 2021-05-06 Oppo广东移动通信有限公司 人脸图像质量评价方法、特征提取模型训练方法、图像处理系统、计算机可读介质和无线通信终端
CN111311595A (zh) * 2020-03-16 2020-06-19 清华大学深圳国际研究生院 一种图像质量的无参考质量评价方法及计算机可读存储介质
CN112052868A (zh) * 2020-06-15 2020-12-08 上海集成电路研发中心有限公司 模型训练方法、图像相似度度量方法、终端及存储介质
CN112529846A (zh) * 2020-11-25 2021-03-19 深圳市商汤科技有限公司 图像处理方法及装置、电子设备和存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116883698A (zh) * 2023-09-07 2023-10-13 腾讯科技(深圳)有限公司 一种图像对比方法及相关装置
CN116883698B (zh) * 2023-09-07 2023-12-26 腾讯科技(深圳)有限公司 一种图像对比方法及相关装置

Also Published As

Publication number Publication date
CN116563193A (zh) 2023-08-08

Similar Documents

Publication Publication Date Title
JP7058373B2 (ja) 医療画像に対する病変の検出及び位置決め方法、装置、デバイス、及び記憶媒体
CN109886997B (zh) 基于目标检测的识别框确定方法、装置及终端设备
JP2015215895A (ja) 深度画像の深度値復元方法及びシステム
CN110570435B (zh) 用于对车辆损伤图像进行损伤分割的方法及装置
US11967181B2 (en) Method and device for retinal image recognition, electronic equipment, and storage medium
US11310475B2 (en) Video quality determination system and method
CN112784874B (zh) 双目视觉立体匹配方法、装置、电子设备及存储介质
WO2023142753A1 (zh) 图像相似性度量方法及其装置
CN110766708B (zh) 基于轮廓相似度的图像比较方法
CN113240655B (zh) 一种自动检测眼底图像类型的方法、存储介质及装置
CN113706472B (zh) 公路路面病害检测方法、装置、设备及存储介质
CN112396605B (zh) 网络训练方法及装置、图像识别方法和电子设备
KR20230116735A (ko) 3차원 자세 조정 방법, 장치, 전자 기기 및 저장 매체
CN113313680A (zh) 一种结直肠癌病理图像预后辅助预测方法及系统
CN110930386B (zh) 图像处理方法、装置、设备及存储介质
CN111275673A (zh) 肺叶提取方法、装置及存储介质
CN112966687B (zh) 图像分割模型训练方法、装置及通信设备
CN111401102A (zh) 深度学习模型训练方法及装置、电子设备及存储介质
Tolie et al. DICAM: Deep Inception and Channel-wise Attention Modules for underwater image enhancement
CN114511556B (zh) 胃黏膜出血风险预警方法、装置和医学图像处理设备
CN109377524B (zh) 一种单幅图像深度恢复方法和系统
CN110633630A (zh) 一种行为识别方法、装置及终端设备
CN113593707B (zh) 胃早癌模型训练方法、装置、计算机设备及存储介质
CN107273801B (zh) 一种视频多目标跟踪检测异常点的方法
CN116797510A (zh) 图像处理方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22923539

Country of ref document: EP

Kind code of ref document: A1