WO2020224405A1 - 图像处理方法、装置、计算机可读介质及电子设备 - Google Patents

图像处理方法、装置、计算机可读介质及电子设备 Download PDF

Info

Publication number
WO2020224405A1
WO2020224405A1 PCT/CN2020/085021 CN2020085021W WO2020224405A1 WO 2020224405 A1 WO2020224405 A1 WO 2020224405A1 CN 2020085021 W CN2020085021 W CN 2020085021W WO 2020224405 A1 WO2020224405 A1 WO 2020224405A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
feature vector
processed
image processing
Prior art date
Application number
PCT/CN2020/085021
Other languages
English (en)
French (fr)
Inventor
金坤
赵世杰
易阳
李峰
左小祥
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2021542181A priority Critical patent/JP7163504B2/ja
Priority to EP20801637.8A priority patent/EP3968180A4/en
Publication of WO2020224405A1 publication Critical patent/WO2020224405A1/zh
Priority to US17/352,822 priority patent/US11978241B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/56Information retrieval; Database structures therefor; File system structures therefor of still image data having vectorial format
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2111Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • This application relates to the field of computer and communication technologies, and in particular to an image processing method, device, computer readable medium, and electronic equipment.
  • the feature vectors extracted from images greatly affect the accuracy of image processing results.
  • the feature extraction methods proposed by related technologies have many unreasonable places, and As a result, the extracted feature vector is not accurate, which will affect the final processing result.
  • the embodiments of the present application provide an image processing method, device, computer readable medium, and electronic equipment, which can improve the accuracy and rationality of the determined image feature vector at least to a certain extent.
  • an image processing method which includes: extracting a feature map of an image to be processed; dividing the feature map into multiple target regions; and determining according to the feature vector of each target region Weight of each target area; generating the feature vector of the image to be processed according to the weight of each target area and the feature vector of each target area.
  • an image processing method including: inputting an image to be processed into an image processing model, the image processing model including a convolution module, a visual attention module, and a feature merging module, wherein ,
  • the convolution module is used to extract the feature map of the image to be processed;
  • the visual attention module is used to divide the feature map into a plurality of target regions, and determine each feature map according to the feature vector of each target region The weight of the target area;
  • the feature merging module is used to generate the feature vector of the image to be processed according to the weight of each target area and the feature vector of each target area; obtain the image processing model generated The feature vector of the image to be processed.
  • an image processing device including: an extraction unit, configured to extract a feature map of an image to be processed; a dividing unit, configured to divide the feature map into multiple target regions; and determining Unit for determining the weight of each target area according to the feature vector of each target area; generating unit for generating the waiting area according to the weight of each target area and the feature vector of each target area Process the feature vector of the image.
  • the division unit is configured to: divide the feature map according to a predetermined area division method to obtain the multiple target regions; or perform ROI on the feature map (Region Of Interest, region of interest) pooling operation to map the ROI to the feature map to obtain the multiple target regions.
  • ROI Region Of Interest, region of interest
  • the dividing unit is configured to divide the feature map according to at least one predetermined area division method to obtain feature maps corresponding to the various area division methods Region; the feature map region corresponding to the various region division methods is used as the target region.
  • the determining unit is configured to: perform dimensionality reduction processing on the feature vector of each target region to obtain the feature scalar corresponding to each target region;
  • the feature scalar corresponding to the region is normalized to obtain the weight of each target region.
  • the determining unit is configured to: input the feature vector of each target area into a 1-dimensional fully connected layer, and determine according to the output of the fully connected layer The feature scalar corresponding to each of the target regions.
  • the generating unit is configured to: calculate the weighted feature vector of each target area according to the weight of each target area and the feature vector of each target area;
  • the feature vector of the image to be processed is generated according to the weighted feature vector of each target area.
  • the generating unit is configured to: combine the weighted feature vectors of each of the target regions to obtain the feature vector of the image to be processed; or The weighted feature vector of the target area is merged, and the merged feature vector is normalized to obtain the feature vector of the image to be processed.
  • the image processing device further includes: a retrieval unit configured to retrieve an image matching the image to be processed according to the feature vector of the image to be processed.
  • an image processing device including: a processing unit for inputting an image to be processed into an image processing model, the image processing model including a convolution module, a visual attention module, and The feature merging module, wherein the convolution module is used to extract the feature map of the image to be processed; the visual attention module is used to divide the feature map into a plurality of target regions, and according to each target region The feature vector of determines the weight of each target region; the feature merging module is configured to generate the feature vector of the image to be processed according to the weight of each target region and the feature vector of each target region; an acquisition unit, It is used to obtain the feature vector of the image to be processed generated by the image processing model.
  • the image processing device further includes: a training unit configured to obtain image samples marked with feature vectors, and train the image processing model through the image samples.
  • the processing unit is configured to extract the feature map of the image to be processed through any convolution layer in the convolution module.
  • a computer-readable medium having a computer program stored thereon, and the computer program, when executed by a processor, implements the image processing method as described in the foregoing embodiment.
  • an electronic device including: one or more processors; a storage device, configured to store one or more programs, when the one or more programs are When multiple processors are executed, the one or more processors implement the image processing method described in the foregoing embodiment.
  • the feature map of the image to be processed is divided into multiple target regions, and the weight of each target region is determined according to the feature vector of each target region, so as to determine the weight of each target region according to the weight of each target region.
  • the feature vector of each target area to generate the feature vector of the image to be processed, so that when determining the feature vector of the image, each target area can be weighted according to the feature vector of each target area in the image, thereby reducing the insignificance in the image It can effectively improve the accuracy and rationality of the generated image feature vector, which is helpful to improve the effect of image retrieval.
  • FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied;
  • Fig. 2 shows a flowchart of an image processing method according to an embodiment of the present application
  • FIG. 3 shows a flowchart of determining the weight of each target area according to an embodiment of the present application
  • FIG. 4 shows a flowchart of generating a feature vector of an image to be processed according to the weight of each target area and the feature vector of each target area according to an embodiment of the present application
  • Fig. 5 shows a flowchart of an image processing method according to an embodiment of the present application
  • Fig. 6 shows a flowchart of an image processing method according to an embodiment of the present application
  • Fig. 7 shows a schematic diagram of an area division manner according to an embodiment of the present application.
  • Fig. 8 shows a schematic structural diagram of an image retrieval model according to an embodiment of the present application.
  • Fig. 9 shows a schematic diagram of the weight of each region in an image according to an embodiment of the present application.
  • FIG. 10 shows a schematic diagram of image retrieval results according to an embodiment of the present application.
  • Fig. 11 shows a block diagram of an image processing device according to an embodiment of the present application.
  • Fig. 12 shows a block diagram of an image processing device according to an embodiment of the present application.
  • FIG. 13 shows a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.
  • the system architecture may include terminal devices (as shown in FIG. 1, one or more of the smart phone 101, the tablet computer 102, and the portable computer 103, of course, it may also be a desktop computer, etc.), a network 104 And server 105.
  • the network 104 is used as a medium for providing a communication link between the terminal device and the server 105.
  • the network 104 may include various connection types, such as wired communication links, wireless communication links, and so on.
  • the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks and servers.
  • the server 105 may be a server cluster composed of multiple servers.
  • the user can specify the image to be processed through a terminal device (such as a smart phone 101, a tablet computer 102, or a portable computer 103 as shown in FIG. 1), for example, the user sends the image to be processed to the server through the terminal device 105, or the user selects the image to be processed from the images provided by the server 105 through the terminal device.
  • a terminal device such as a smart phone 101, a tablet computer 102, or a portable computer 103 as shown in FIG. 1
  • the user sends the image to be processed to the server through the terminal device 105, or the user selects the image to be processed from the images provided by the server 105 through the terminal device.
  • the server 105 can extract the feature map of the image to be processed, for example, through any convolutional layer in the CNN (Convolutional Neural Network) model. Extract the feature map of the image to be processed. After extracting the feature map of the image to be processed, the feature map can be divided into multiple target areas, and then the weight of each target area can be determined according to the feature vector of each target area, and then the weight of each target area and each The feature vector of the target area is used to generate the feature vector of the image to be processed.
  • CNN Convolutional Neural Network
  • the technical solution of the embodiment of the present application can weight each target area according to the feature vector of each target area in the image, thereby weakening the insignificant areas in the image and highlighting the image
  • the saliency region in effectively improves the accuracy and rationality of the generated image feature vector, which is conducive to improving the effect of image processing, such as improving the effect of image retrieval and the accuracy of image recognition.
  • the image processing method provided in the embodiment of the present application can be executed by the server 105, and accordingly, the image processing device can be set in the server 105.
  • the terminal device may have similar functions to the server, so as to execute the image processing solution provided in the embodiments of the present application.
  • FIG. 2 shows a flowchart of an image processing method according to an embodiment of the present application.
  • the image processing method may be executed by a device having a computing processing function, for example, may be executed by the server 105 shown in FIG. 1.
  • the image processing method includes at least step S210 to step S240, which are described in detail as follows:
  • step S210 the server extracts the feature map of the image to be processed.
  • the image to be processed may be an image for which feature vectors need to be extracted, or an image for retrieval, or an image for recognition.
  • the feature map of the image to be processed can be extracted through any convolutional layer in the CNN model.
  • step S220 the server divides the feature map into multiple target regions.
  • the feature map of the image to be processed may be divided according to a predetermined area division manner to obtain the multiple target areas.
  • at least one area division method can be predetermined (for example, three area division methods are predetermined), and then the feature map is divided by the at least one area division method to obtain the feature map regions corresponding to the various area division methods, and then each The feature map area corresponding to this area division method is used as the target area obtained by division.
  • step S230 the server determines the weight of each target area according to the feature vector of each target area.
  • the process of determining the weight of each target area in step S230 may include the following steps S310 and S320:
  • step S310 the server performs dimensionality reduction processing on the feature vector of each target area to obtain the feature scalar corresponding to each target area.
  • the feature scalar is a physical quantity used to characterize the size of the feature.
  • the feature vector of each target area can be input to the fully connected layer whose output dimension is one-dimensional, so as to determine each The feature scalar corresponding to the target area.
  • step S320 the server performs normalization processing on the feature scalar corresponding to each target area to obtain the weight of each target area.
  • the feature scalar corresponding to each target region may be normalized by L1 norm, L2 norm or softmax (called normalized exponential function) function.
  • the technical solution of the embodiment shown in FIG. 3 makes it possible to determine the weight of each target area according to the feature vector of each target area, and then to weaken the insignificant area (such as the background area) in the image through the determined weight and highlight the image
  • the saliency area (such as the foreground area) in, helps to improve the accuracy and rationality of the generated image feature vector.
  • step S240 the server generates the feature vector of the image to be processed according to the weight of each target area and the feature vector of each target area.
  • the process of generating the feature vector of the image to be processed according to the weight of each target area and the feature vector of each target area in step S240 may include the following steps S410 and S420:
  • step S410 the server calculates the weighted feature vector of each target area according to the weight of each target area and the feature vector of each target area.
  • the weight of each target area and the feature vector of each target area can be dot-multiplied (that is, the quantitative product is calculated) to obtain the weighted feature vector of each target area.
  • step S420 the server generates the feature vector of the image to be processed according to the weighted feature vector of each target area.
  • the weighted feature vectors of each target area may be combined to obtain the feature vector of the image to be processed. Or, after merging the weighted feature vectors of each target area, normalize the merged feature vectors (such as normalization of the L2 norm) to obtain the feature vector of the image to be processed .
  • an image matching the image to be processed can be retrieved according to the feature vector of the image to be processed, and Or based on the feature vector for further image recognition.
  • each target area can be weighted according to the feature vector of each target area in the image, thereby being able to weaken the insignificant area in the image. It also highlights the salient areas in the image, which effectively improves the accuracy and rationality of the generated image feature vectors, which is conducive to improving the effect of image retrieval and image recognition.
  • FIG. 5 shows a flowchart of an image processing method according to an embodiment of the present application.
  • the image processing method may be executed by a device having a computing processing function, for example, it may be executed by the server 105 shown in FIG. 1.
  • the image processing method at least includes steps S510 to S520, which are described in detail as follows:
  • step S510 the image to be processed is input into an image processing model, which includes a convolution module, a visual attention module, and a feature merging module.
  • the convolution module is used to extract the feature map of the image to be processed
  • the visual attention module is used to divide the feature map into multiple target regions, and determine according to the feature vector of each target region The weight of each target region
  • the feature merging module is configured to generate the feature vector of the image to be processed according to the weight of each target region and the feature vector of each target region;
  • the convolution module may be a convolutional neural network, and the feature map of the image to be processed may be extracted through any convolution layer in the convolution module.
  • the visual attention module may divide the feature map of the image to be processed according to a predetermined area division method to obtain multiple target areas.
  • a predetermined area division method For example, at least one area division method can be predetermined, and then the feature map can be divided by the at least one area division method to obtain the feature map regions corresponding to the various area division methods, and then the feature maps corresponding to the various area division methods The area serves as the target area obtained by the division.
  • the visual attention module can also set the size of the output feature map of the ROI pooling operation, and then perform the ROI pooling operation on the feature map of the image to be processed to map the ROI to the image to be processed. Multiple target regions are obtained in the feature map.
  • the solution for the visual attention module to determine the weight of each target region according to the feature vector of each target region is similar to the solution shown in FIG. 3 in the foregoing embodiment, and will not be repeated here.
  • the feature merging module generates the feature vector of the image to be processed according to the weight of each target region and the feature vector of each target region.
  • the solution is similar to the solution shown in FIG. 4 in the foregoing embodiment. No longer.
  • step S520 the server obtains the feature vector of the image to be processed generated by the image processing model.
  • an image matching the image to be processed can be retrieved according to the feature vector of the image to be processed.
  • the image to be processed can be identified according to the feature vector of the image to be processed.
  • the technical solution of the embodiment shown in FIG. 5 is to generate the feature vector of the image to be processed through the image processing model.
  • the end-to-end training method is realized.
  • the image processing model can further facilitate the generation of image feature vectors through the image processing model.
  • the method of training the image processing model may be to obtain an image sample labeled with a feature vector, and train the image processing model through the image sample until the loss function of the image processing model converges .
  • the image processing method according to the embodiment of the present application includes the following steps S610 to S660, which are described in detail as follows:
  • step S610 the server trains a convolutional neural network model on any data set.
  • the convolutional neural network model may be ResNet (Residual Network, residual neural network), ResNeXt, VGGNet (Visual Geometry Group Network, super-resolution test sequence network), InceptionNet, etc.
  • ResNet Residual Network, residual neural network
  • ResNeXt ResNeXt
  • VGGNet Visual Geometry Group Network, super-resolution test sequence network
  • InceptionNet etc.
  • training on any data set may refer to using the data set as a training set to train the convolutional neural network model.
  • step S620 the server inputs the image into the trained convolutional neural network model, and obtains a set of feature maps output by any convolutional layer.
  • the size of the feature map output by the convolutional neural network model may be C ⁇ W ⁇ H, where C represents the number of channels, and H and W represent length and width respectively.
  • the two or more convolutional layers can be parallel, that is, the image will be separated by each convolutional layer Processing to output feature maps corresponding to each convolutional layer, that is, a set of feature maps described above.
  • step S630 the server divides the obtained feature map into several regions, and determines the feature vector of each region.
  • FIG. 7 shows that the whole picture is regarded as one region, which is R1; the figure (2) in Figure 7 shows that the whole picture is approximately divided into 4 regions (in order to avoid inter-regional crossing Too much overlap makes it unclear, only two of them are shown), and the overlap rate of two adjacent regions is set to ⁇ (0 ⁇ 1), these 4 regions are respectively denoted as R2, R3, R4 , R5; Figure 7 (3) shows that the entire image is approximately divided into 9 regions (in order to avoid excessive overlap of regions and cause confusion, only three of them are shown), and set two adjacent
  • the overlap ratio of the regions is ⁇ (0 ⁇ 1), and these 9 regions are denoted as R6, R7, R8, R9, R10, R11, R12, R13, R14, respectively.
  • the whole picture can also be divided into more regions.
  • the image may be divided according to the three methods shown in FIG. 7 to obtain 14 regions R 1 to R 14 . Then, perform max-pooling operation in each area according to the coordinate position of each area to determine the feature vector v of each area.
  • the size of the output feature map of the ROI Pooling layer can also be set. For example, if the output feature map size is set to 3 ⁇ 3, then the input feature map of size W ⁇ H is input to the ROI After the Pooling layer, the algorithm will approximately divide it into 3 ⁇ 3 parts, and each part will obtain a maximum value as output, thereby outputting a 3 ⁇ 3 feature map.
  • the following takes the 14 regions R 1 to R 14 mentioned above as an example for description.
  • the feature vectors of these 14 regions are v 1 to v 14
  • the dimension of each feature vector is C, which is used to represent the corresponding Features within the area.
  • step S640 the server inputs the acquired feature vectors v 1 to v 14 to a fully connected layer, outputs the scalar corresponding to each area, and normalizes the scalar corresponding to each area to obtain each area the weight of.
  • the parameter of the fully connected layer may be w ⁇ R c ⁇ 1 , which is used to indicate that the input dimension of the fully connected layer is C dimension and the output dimension is 1 dimension.
  • the 14 scalars can be normalized.
  • L1 norm, L2 norm or softmax function can be used for normalization to obtain ⁇ 1 ⁇ 14
  • ⁇ 1 to ⁇ 14 respectively represent the weight of the feature vector v 1 to v 14 , that is, the weight of each region. If the normalization of the L1 norm is taken as an example, the weight of the feature vector can be calculated by the following formula (1):
  • step S650 the server multiplies the obtained feature vectors v 1 to v 14 by the corresponding weights ⁇ 1 to ⁇ 14 respectively to obtain the weighted feature vector of each region.
  • the weighted feature vectors of these 14 regions can be expressed as ⁇ 1 v 1 ⁇ 14 v 1 , which means that the image is processed by the visual attention mechanism, and the processing can adopt dot multiplication. Way to achieve.
  • the design of the process is simple, no need to add a specific neural network layer, the dot product process just multiplies the corresponding regional feature vector and the regional weight.
  • step S660 the server sums the weighted feature vectors of each region, and performs normalization processing of the L2 norm to obtain the final feature vector of the image.
  • processing can be performed based on the feature vector, such as image retrieval processing or image recognition processing.
  • the final feature vector of the image can be calculated by the following formula (2):
  • an image retrieval model that can be trained end-to-end can be constructed based on the technical solution shown in FIG. 6.
  • it may include a CNN network 801, a visual attention module 803, and The sum module 804 and the L2 normalization layer 805, where the visual attention module 803 may include an ROI Pooling layer, a fully connected layer, a normalization layer and a dot product module.
  • the CNN network 801 is used to perform step S620 shown in FIG. 6 to obtain the feature map 802; the visual attention module 803 is used to perform step S630 to step S650 shown in FIG. 6; the summation module 804 and the L2 normalization layer 805 is used to execute step S660 shown in FIG. 6 to obtain a feature vector 806 of the image.
  • the image retrieval model may further include a similarity determination module for determining the similarity between the images based on the feature vectors of different images, thereby determining similar images based on the similarity.
  • the constructed image retrieval model can be fine-tuned on the classification task or by using a metric learning method, etc., until the loss function of the image reduction model converges.
  • the calculated weights of each region are marked in the image, as shown in FIG.
  • the "GT" shown in Figure 9 represents the area where the salient object is located in each image.
  • the weight of the area containing the salient object is usually higher, while the weight of the area not containing the salient object is relatively higher. Small, it can strengthen the characteristics of the foreground area, weaken the characteristics of the background area, realize more reasonable and accurate image feature coding, and greatly improve the image retrieval performance.
  • MAP Mean Average Precision Mean value
  • the technical solutions of the embodiments of the present application can be applied in the fields of image retrieval and video retrieval, and can be specifically used for similar video recommendation, similar video de-duplication, image recommendation or de-duplication, etc.
  • Fig. 11 shows a block diagram of an image processing device according to an embodiment of the present application.
  • the image processing device 1100 includes: an extraction unit 1102, a division unit 1104, a determination unit 1106, and a generation unit 1108.
  • the extraction unit 1102 is used to extract the feature map of the image to be processed; the dividing unit 1104 is used to divide the feature map into multiple target regions; the determining unit 1106 is used to determine each feature map according to the feature vector of each target region.
  • the weight of the target area; the generating unit 1108 is configured to generate the feature vector of the image to be processed according to the weight of each target area and the feature vector of each target area.
  • the dividing unit 1104 is configured to: divide the feature map according to a predetermined area division method to obtain the multiple target regions; or perform ROI pooling on the feature map To map the ROI to the feature map to obtain the multiple target regions.
  • the dividing unit 1104 is configured to divide the feature map according to at least one predetermined area division method to obtain the feature map regions corresponding to the various area division methods. ; Take the feature map regions corresponding to the various regions division methods as the target region.
  • the determining unit 1106 is configured to: perform dimensionality reduction processing on the feature vector of each target region to obtain the feature scalar corresponding to each target region; The corresponding feature scalar is normalized to obtain the weight of each target area.
  • the determining unit 1106 is configured to: input the feature vector of each target area into a 1-dimensional fully connected layer, and determine each The feature scalar corresponding to the target area.
  • the generating unit 1108 is configured to: calculate the weighted feature vector of each target area according to the weight of each target area and the feature vector of each target area; The weighted feature vector of each target area generates the feature vector of the image to be processed.
  • the generating unit 1108 is configured to: combine the weighted feature vectors of each target area to obtain the feature vector of the image to be processed; or weight each target area The merged feature vector is processed, and the merged feature vector is normalized to obtain the feature vector of the image to be processed.
  • the image processing apparatus 1100 further includes: a retrieval unit configured to retrieve an image matching the image to be processed according to the feature vector of the image to be processed.
  • Fig. 12 shows a block diagram of an image processing apparatus according to an embodiment of the present application.
  • an image processing apparatus 1200 includes: a processing unit 1202 and an acquiring unit 1204.
  • the processing unit 1202 is used to input the image to be processed into an image processing model, and the image processing model includes a convolution module, a visual attention module, and a feature merging module.
  • the convolution module is used to extract the image to be processed. Process the feature map of the image; the visual attention module is used to divide the feature map into multiple target regions, and determine the weight of each target region according to the feature vector of each target region; the feature merging module Used to generate the feature vector of the image to be processed according to the weight of each target area and the feature vector of each of the target areas; the acquiring unit 1204 is used to acquire the feature of the image to be processed generated by the image processing model vector.
  • the image processing device 1200 further includes: a training unit configured to obtain image samples labeled with feature vectors, and train the image processing model through the image samples.
  • the processing unit 1202 is configured to extract the feature map of the image to be processed through any convolution layer in the convolution module.
  • FIG. 13 shows a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • the computer system 1300 includes a central processing unit (Central Processing Unit, CPU) 1301, which can be loaded into a random system according to a program stored in a read-only memory (Read-Only Memory, ROM) 1302 or from the storage part 1308.
  • Access memory (Random Access Memory, RAM) 1303 programs to perform various appropriate actions and processing, for example, perform the methods described in the foregoing embodiments.
  • RAM 1303 various programs and data required for system operation are also stored.
  • the CPU 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304.
  • An Input/Output (I/O) interface 1305 is also connected to the bus 1304.
  • the following components are connected to the I/O interface 1305: input part 1306 including keyboard, mouse, etc.; output part 1307 such as cathode ray tube (Cathode Ray Tube, CRT), liquid crystal display (LCD), and speakers. ; A storage part 1308 including a hard disk, etc.; and a communication part 1309 including a network interface card such as a LAN (Local Area Network) card and a modem.
  • the communication section 1309 performs communication processing via a network such as the Internet.
  • the driver 1310 is also connected to the I/O interface 1305 as needed.
  • a removable medium 1311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1310 as required, so that the computer program read from it is installed into the storage portion 1308 as required.
  • the process described above with reference to the flowchart can be implemented as a computer software program.
  • the embodiments of the present application include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication part 1309, and/or installed from the removable medium 1311.
  • CPU central processing unit
  • the computer-readable medium shown in the embodiments of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electric, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable of the above The combination.
  • the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the above-mentioned module, program segment, or part of the code includes one or more executables for realizing the specified logic function. instruction.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram or flowchart, and the combination of blocks in the block diagram or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations, or can be It is realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present application can be implemented in software or hardware, and the described units can also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • this application also provides a computer-readable medium.
  • the computer-readable medium may be included in the electronic device described in the above-mentioned embodiments; or it may exist alone without being assembled into the electronic device. in.
  • the foregoing computer-readable medium carries one or more programs, and when the foregoing one or more programs are executed by an electronic device, the electronic device realizes the method described in the foregoing embodiment.
  • modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
  • the exemplary embodiments described herein can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) execute the method according to the embodiment of the present application.
  • a computing device which can be a personal computer, a server, a touch terminal, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Library & Information Science (AREA)
  • Physiology (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种图像处理方法、装置、计算机可读介质及电子设备。该图像处理方法包括:提取待处理图像的特征图(S210);将所述特征图划分为多个目标区域(S220);根据各个所述目标区域的特征向量,确定各个所述目标区域的权重(S230);根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量(S240)。上述方案能够根据图像中各个目标区域的特征向量对各个目标区域进行加权处理,进而能够弱化图像中的非显著性区域,并突出图像中的显著性区域,有效提高了生成的图像特征向量的准确性和合理性。

Description

图像处理方法、装置、计算机可读介质及电子设备
本申请要求于2019年5月6日提交的申请号为201910369974X、发明名称为“图像处理方法、装置、计算机可读介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机及通信技术领域,具体而言,涉及一种图像处理方法、装置、计算机可读介质及电子设备。
背景技术
在图像处理领域,例如,图像检索、图像识别技术中,从图像中提取出的特征向量极大影响了图像处理结果的准确性,而相关技术提出的特征提取方式存在很多不合理的地方,也就导致提取出的特征向量不准确,进而会影响到最终的处理结果。
发明内容
本申请的实施例提供了一种图像处理方法、装置、计算机可读介质及电子设备,进而至少在一定程度上可以提高确定出的图像特征向量的准确性和合理性。
本申请的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本申请的实践而习得。
根据本申请实施例的一个方面,提供了一种图像处理方法,包括:提取待处理图像的特征图;将所述特征图划分为多个目标区域;根据各个所述目标区域的特征向量,确定各个所述目标区域的权重;根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量。
根据本申请实施例的一个方面,提供了一种图像处理方法,包括:将待处理图像输入至图像处理模型中,所述图像处理模型包括卷积模块、视觉注意力模块和特征合并模块,其中,所述卷积模块用于提取所述待处理图像的特征图;所述视觉注意力模块用于将所述特征图划分为多个目标区域,并根据各个所述 目标区域的特征向量确定各个所述目标区域的权重;所述特征合并模块用于根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量;获取所述图像处理模型生成的所述待处理图像的特征向量。
根据本申请实施例的一个方面,提供了一种图像处理装置,包括:提取单元,用于提取待处理图像的特征图;划分单元,用于将所述特征图划分为多个目标区域;确定单元,用于根据各个所述目标区域的特征向量,确定各个所述目标区域的权重;生成单元,用于根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量。
在本申请的一些实施例中,基于前述方案,所述划分单元配置为:根据预定的区域划分方式对所述特征图进行划分,得到所述多个目标区域;或对所述特征图进行ROI(Region Of Interest,感兴趣区域)池化操作,以将ROI映射到所述特征图中得到所述多个目标区域。
在本申请的一些实施例中,基于前述方案,所述划分单元配置为:根据预定的至少一种区域划分方式对所述特征图进行划分,得到各种所述区域划分方式所对应的特征图区域;将各种所述区域划分方式所对应的特征图区域作为所述目标区域。
在本申请的一些实施例中,基于前述方案,所述确定单元配置为:对各个所述目标区域的特征向量进行降维处理,得到各个所述目标区域对应的特征标量;对各个所述目标区域对应的特征标量进行归一化处理,得到各个所述目标区域的权重。
在本申请的一些实施例中,基于前述方案,所述确定单元配置为:将各个所述目标区域的特征向量输入至输出维度为1维的全连接层,根据所述全连接层的输出确定各个所述目标区域对应的特征标量。
在本申请的一些实施例中,基于前述方案,所述生成单元配置为:根据各个所述目标区域的权重和各个所述目标区域的特征向量,计算各个所述目标区域加权后的特征向量;根据各个所述目标区域加权后的特征向量生成所述待处理图像的特征向量。
在本申请的一些实施例中,基于前述方案,所述生成单元配置为:将各个所述目标区域加权后的特征向量进行合并处理,得到所述待处理图像的特征向量;或将各个所述目标区域加权后的特征向量进行合并处理,对合并处理后的 特征向量进行归一化处理,得到所述待处理图像的特征向量。
在本申请的一些实施例中,基于前述方案,所述的图像处理装置还包括:检索单元,用于根据所述待处理图像的特征向量,检索与所述待处理图像相匹配的图像。
根据本申请实施例的一个方面,提供了一种图像处理装置,包括:处理单元,用于将待处理图像输入至图像处理模型中,所述图像处理模型包括卷积模块、视觉注意力模块和特征合并模块,其中,所述卷积模块用于提取所述待处理图像的特征图;所述视觉注意力模块用于将所述特征图划分为多个目标区域,并根据各个所述目标区域的特征向量确定各个所述目标区域的权重;所述特征合并模块用于根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量;获取单元,用于获取所述图像处理模型生成的所述待处理图像的特征向量。
在本申请的一些实施例中,基于前述方案,所述的图像处理装置还包括:训练单元,用于获取标记了特征向量的图像样本,通过所述图像样本对所述图像处理模型进行训练。
在本申请的一些实施例中,基于前述方案,所述处理单元配置为通过所述卷积模块中的任一卷积层提取所述待处理图像的特征图。
根据本申请实施例的一个方面,提供了一种计算机可读介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上述实施例中所述的图像处理方法。
根据本申请实施例的一个方面,提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如上述实施例中所述的图像处理方法。
在本申请的一些实施例所提供的技术方案中,通过将待处理图像的特征图划分为多个目标区域,根据各个目标区域的特征向量确定各个目标区域的权重,以根据各个目标区域的权重和各个目标区域的特征向量生成待处理图像的特征向量,使得在确定图像的特征向量时,能够根据图像中各个目标区域的特征向量对各个目标区域进行加权处理,进而能够弱化图像中的非显著性区域(如背景区域),并突出图像中的显著性区域(如前景区域),有效提高了生成 的图像特征向量的准确性和合理性,有利于提升图像检索的效果。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图;
图2示出了根据本申请的一个实施例的图像处理方法的流程图;
图3示出了根据本申请的一个实施例的确定各个目标区域的权重的流程图;
图4示出了根据本申请的一个实施例的根据各个目标区域的权重和各个目标区域的特征向量,生成待处理图像的特征向量的流程图;
图5示出了根据本申请的一个实施例的图像处理方法的流程图;
图6示出了根据本申请的一个实施例的图像处理方法的流程图;
图7示出了根据本申请的一个实施例的区域划分方式的示意图;
图8示出了根据本申请的一个实施例的图像检索模型的结构示意图;
图9示出了根据本申请的一个实施例的图像中各区域的权重示意图;
图10示出了根据本申请的一个实施例的图像检索结果示意图;
图11示出了根据本申请的一个实施例的图像处理装置的框图;
图12示出了根据本申请的一个实施例的图像处理装置的框图;
图13示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方 式使得本申请将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图。
如图1所示,系统架构可以包括终端设备(如图1中所示智能手机101、平板电脑102和便携式计算机103中的一种或多种,当然也可以是台式计算机等等)、网络104和服务器105。网络104用以在终端设备和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线通信链路、无线通信链路等等。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。比如服务器105可以是多个服务器组成的服务器集群等。
在本申请的一个实施例中,用户可以通过终端设备(如图1中所示智能手机101、平板电脑102或便携式计算机103)指定待处理图像,比如用户通过终端设备将待处理图像发送给服务器105,或者用户通过终端设备在服务器105提供的图像中选择待处理图像。
在本申请的一个实施例中,服务器105在确定待处理图像之后,可以提取待处理图像的特征图,比如可以通过CNN(Convolutional Neural Network,卷积神经网络)模型中的任一卷积层来提取待处理图像的特征图。在提取出待处理图像的特征图之后,可以将该特征图划分为多个目标区域,然后根据各个目标区域的特征向量,确定出各个目标区域的权重,进而可以根据各个目标区域的权重和各个目标区域的特征向量,生成待处理图像的特征向量。可见,由于本申请实施例的技术方案在确定图像的特征向量时,能够根据图像中各个目标区域的特征向量对各个目标区域进行加权处理,进而能够弱化图像中的非显著性区域,并突出图像中的显著性区域,有效提高了生成的图像特征向量的准确性和合理性,有利于提升图像处理的效果,例如提高图像检索的效果以及图像识别的准确性等。
需要说明的是,本申请实施例所提供的图像处理方法可以由服务器105执行,相应地,图像处理装置可以设置于服务器105中。但是,在本申请的其它实施例中,终端设备可以与服务器具有相似的功能,从而执行本申请实施例所提供的图像处理方案。
以下对本申请实施例的技术方案的实现细节进行详细阐述:
图2示出了根据本申请的一个实施例的图像处理方法的流程图,该图像处理方法可以由具有计算处理功能的设备来执行,比如可以由图1中所示的服务器105来执行。参照图2所示,该图像处理方法至少包括步骤S210至步骤S240,详细介绍如下:
在步骤S210中,服务器提取待处理图像的特征图。
在本申请的一个实施例中,待处理图像可以是需要提取特征向量的图像,或者也可以是需要进行检索的图像,又或者也可以是需要进行识别的图像等。
在本申请的一个实施例中,可以通过CNN模型中的任一卷积层来提取待处理图像的特征图。
在步骤S220中,服务器将所述特征图划分为多个目标区域。
在本申请的一个实施例中,可以根据预定的区域划分方式对待处理图像的特征图进行划分,以得到该多个目标区域。比如可以预定至少一种区域划分方式(如预定3种区域划分方式),然后通过这至少一种区域划分方式对特征图 进行划分,得到各种区域划分方式所对应的特征图区域,进而将各种区域划分方式所对应的特征图区域作为划分得到的目标区域。
在本申请的一个实施例中,也可以通过设置ROI池化(Pooling)操作的输出特征图的大小,然后对待处理图像的特征图进行ROI池化操作,以将ROI映射到待处理图像的特征图中得到多个目标区域。
继续参照图2所示,在步骤S230中,服务器根据各个所述目标区域的特征向量,确定各个所述目标区域的权重。
在本申请的一个实施例中,如图3所示,步骤S230中确定各个目标区域的权重的过程,可以包括如下步骤S310和步骤S320:
在步骤S310中,服务器对各个目标区域的特征向量进行降维处理,得到各个目标区域对应的特征标量。
在本申请的一个实施例中,特征标量是用于表征特征大小的物理量,比如可以将各个目标区域的特征向量输入至输出维度为1维的全连接层,以根据全连接层的输出确定各个目标区域对应的特征标量。
在步骤S320中,服务器对各个目标区域对应的特征标量进行归一化处理,得到各个目标区域的权重。
在本申请的一个实施例中,可以对各个目标区域对应的特征标量进行L1范数、L2范数或softmax(称为归一化指数函数)函数的归一化处理。
图3所示实施例的技术方案使得能够根据各个目标区域的特征向量来确定各个目标区域的权重,进而能够通过确定出的权重弱化图像中的非显著性区域(如背景区域),并突出图像中的显著性区域(如前景区域),有利于提高生成的图像特征向量的准确性和合理性。
继续参照图2所示,在步骤S240中,服务器根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量。
在本申请的一个实施例中,如图4所示,步骤S240中根据各个目标区域的权重和各个目标区域的特征向量,生成待处理图像的特征向量的过程可以包括如下步骤S410和步骤S420:
在步骤S410中,服务器根据各个目标区域的权重和各个所述目标区域的特征向量,计算各个所述目标区域加权后的特征向量。
在本申请的一个实施例中,可以将各个目标区域的权重与各个目标区域的 特征向量进行点乘(即计算出数量积),以得到各个目标区域加权后的特征向量。
在步骤S420中,服务器根据各个所述目标区域加权后的特征向量生成所述待处理图像的特征向量。
在本申请的一个实施例中,可以将各个目标区域加权后的特征向量进行合并处理,以得到待处理图像的特征向量。或者也可以在将各个目标区域加权后的特征向量进行合并处理之后,对合并处理后的特征向量进行归一化处理(如进行L2范数的归一化处理),得到待处理图像的特征向量。
基于图2所示实施例的技术方案,在本申请的一个实施例中,在得到待处理图像的特征向量之后,可以根据待处理图像的特征向量,检索与待处理图像相匹配的图像,又或者基于特征向量来进一步进行图像识别。
图2至图4所示实施例的技术方案使得在确定图像的特征向量时,能够根据图像中各个目标区域的特征向量对各个目标区域进行加权处理,进而能够弱化图像中的非显著性区域,并突出图像中的显著性区域,有效提高了生成的图像特征向量的准确性和合理性,有利于提升图像检索的效果,也可以有利于提升图像识别的效果。
图5示出了根据本申请的一个实施例的图像处理方法的流程图,该图像处理方法可以由具有计算处理功能的设备来执行,比如可以由图1中所示的服务器105来执行。参照图5所示,该图像处理方法至少包括步骤S510至步骤S520,详细介绍如下:
在步骤S510中,将待处理图像输入至图像处理模型中,所述图像处理模型包括卷积模块、视觉注意力模块和特征合并模块。其中,所述卷积模块用于提取所述待处理图像的特征图;所述视觉注意力模块用于将所述特征图划分为多个目标区域,并根据各个所述目标区域的特征向量确定各个所述目标区域的权重;所述特征合并模块用于根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量;
在本申请的一个实施例中,卷积模块可以是卷积神经网络,并且可以通过卷积模块中的任一卷积层提取待处理图像的特征图。
在本申请的一个实施例中,视觉注意力模块可以根据预定的区域划分方式对待处理图像的特征图进行划分,以得到多个目标区域。比如可以预定至少一 种区域划分方式,然后通过这至少一种区域划分方式对特征图进行划分,得到各种区域划分方式所对应的特征图区域,进而将各种区域划分方式所对应的特征图区域作为划分得到的目标区域。
在本申请的一个实施例中,视觉注意力模块也可以通过设置ROI池化操作的输出特征图的大小,然后对待处理图像的特征图进行ROI池化操作,以将ROI映射到待处理图像的特征图中得到多个目标区域。
在本申请的一个实施例中,视觉注意力模块根据各个目标区域的特征向量确定各个目标区域的权重的方案与前述实施例中图3所示的方案类似,在此不再赘述。
在本申请的一个实施例中,特征合并模块根据各个目标区域的权重和各个目标区域的特征向量,生成待处理图像的特征向量的方案与前述实施例中图4所示的方案类似,在此不再赘述。
继续参照图5所示,在步骤S520中,服务器获取所述图像处理模型生成的所述待处理图像的特征向量。
在本申请的一个实施例中,在获取到图像处理模型生成的待处理图像的特征向量之后,可以根据待处理图像的特征向量,检索与待处理图像相匹配的图像。
在本申请的一个实施例中,在获取到图像处理模型生成的待处理图像的特征向量之后,可以根据待处理图像的特征向量,对待处理图像进行识别。
图5所示实施例的技术方案是通过图像处理模型来生成待处理图像的特征向量,在保证生成的图像特征向量的准确性和合理性的前提下,实现了通过端到端的训练方式来训练图像处理模型,进而能够便于通过图像处理模型来方便地生成图像的特征向量。其中,在本申请的一个实施例中,对图像处理模型进行训练的方式可以是获取标记了特征向量的图像样本,通过该图像样本对图像处理模型进行训练,直至图像处理模型的损失函数收敛为止。
以下结合图6至图10,对本申请实施例的技术方案的实现细节进行详细阐述:
如图6所示,根据本申请实施例的图像处理方法,包括如下步骤S610至步骤S660,详细说明如下:
在步骤S610中,服务器在任意一个数据集上训练好一个卷积神经网络模 型。
在本申请的一个实施例中,卷积神经网络模型可以是ResNet(Residual Network,残差神经网络)、ResNeXt、VGGNet(Visual Geometry Group Network,超分辨率测试序列网络)、InceptionNet等。
其中,该任意一个数据集上训练可以是指采用数据集作为训练集,来训练卷积神经网络模型。
在步骤S620中,服务器将图像输入至训练好的卷积神经网络模型中,并获取到任意一个卷积层输出的一组特征图。
在本申请的一个实施例中,卷积神经网络模型输出的特征图的大小可以是C×W×H,其中,C表示通道数,H和W分别表示长和宽。
对于一个卷积神经网络来说,若该卷积神经网络模型有两个以上的卷积层,该两个以上的卷积层可以是并行的,也即是,图像会分别被各个卷积层处理,以输出各个卷积层对应的特征图,也即是上述的一组特征图。
在步骤S630中,服务器将得到的特征图划分为若干区域,并确定每一个区域的特征向量。
在本申请的一个实施例中,可以事先针对图像设计若干区域,然后在每个区域内进行池化(Max Pooling)操作,以得到每个区域的特征。如图7所示,图7中(1)图表示将整张图看作一个区域,即为R1;图7中(2)图表示将整张图近似分为4个区域(为避免区域交叠过多导致不清楚,仅示出了其中两个区域),并且设定相邻两个区域的重叠率为α(0<α<1),这4个区域分别记为R2、R3、R4、R5;图7中(3)图表示将整张图近似分为9个区域(为避免区域交叠过多导致不清楚,仅示出了其中三个区域),并且设定相邻两个区域的重叠率为α(0<α<1),这9个区域分别记为R6、R7、R8、R9、R10、R11、R12、R13、R14。当然,还可以将整张图划分为更多个区域。
在本申请的一个实施例中,可以对图像按照图7中所示的三种方式进行划分,得到R 1~R 14这14个区域。然后根据每个区域的坐标位置在每个区域内进行max-pooling操作,以确定每个区域的特征向量v。
在本申请的一个实施例中,也可以设定好ROI Pooling层的输出特征图的大小,比如设定输出特征图大小为3×3,那么将大小为W×H的输入特征图 输入至ROI Pooling层之后,算法会将其近似平分为3×3份,每一份获取一个最大值作为输出,从而输出一个3×3的特征图。
以下以得到上述的R 1~R 14这14个区域为例进行说明,其中,这14个区域的特征向量分别即为v 1~v 14,每个特征向量的维度为C,用于表征相应区域内的特征。
在步骤S640中,服务器将获取到的特征向量v 1~v 14输入至一个全连接层,输出与各区域相对应的标量,并对各区域对应的标量进行归一化处理,得到每一个区域的权重。
在本申请的一个实施例中,全连接层的参数可以为w∈R c×1,用于表示全连接层的输入维度是C维、输出维度是1维。当通过全连接层获取到14个标量之后,可以将这14个标量进行归一化处理,比如可以采用L1范数、L2范数或softmax函数进行归一化处理,得到β 1~β 14,那么β 1~β 14分别表示特征向量v 1~v 14的权重,即每一个区域的权重。如果以L1范数归一化为例,那么特征向量的权重可以通过如下公式(1)进行计算:
Figure PCTCN2020085021-appb-000001
在步骤S650中,服务器将获取到的特征向量v 1~v 14分别点乘对应的权重β 1~β 14,得到每个区域加权后的特征向量。
在本申请的一个实施例中,这14个区域加权后的特征向量可以分别表示为β 1v 1~β 14v 1,也就是对图像进行视觉注意力机制的处理,该处理可以采用点乘的方式实现。该过程设计简单,不需要添加特定的神经网络层,点乘过程只是将相应的区域特征向量和区域权重相乘。
在步骤S660中,服务器将每个区域加权后的特征向量进行求和,并进行L2范数的归一化处理,得到图像最终的特征向量。在得到图像的特征向量之后,可以基于该特征向量进行处理,例如进行图像检索处理或图像识别处理等。在本申请的一个实施例中,可以通过如下公式(2)计算得到图像最终的特征向量:
Figure PCTCN2020085021-appb-000002
在本申请的一个实施例中,可以基于图6所示的技术方案构建一个可以进 行端到端训练的图像检索模型,如图8所示,可以包括CNN网络801、视觉注意力模块803、求和模块804和L2归一化层805,其中的视觉注意力模块803可以包括ROI Pooling层、全连接层、归一化层和点乘模块。CNN网络801用于执行图6中所示的步骤S620,得到特征图802;视觉注意力模块803用于执行图6中所示的步骤S630至步骤S650;求和模块804和L2归一化层805用于执行图6中所示的步骤S660,得到图像的特征向量806。该图像检索模型还可以包括相似度确定模块,用以基于不同图像的特征向量确定图像之间的相似度,从而基于相似度来确定相似图像。
在本申请的一个实施例中,可以在分类任务上或采用度量学习的方法等对构建的图像检索模型进行微调,直到图像减缩模型的损失函数收敛。
为了便于说明本申请实施例的技术方案的效果,本申请实施例中将计算得到的各个区域的权重标注在图像中,具体如图9所示。其中,图9中所示的“GT”表示各图像中显著物所在的区域,从图9中可以看出,包含有显著物的区域权重通常较大,而未包含显著物的区域权重相对较小,进而可以强化前景区域的特征,弱化背景区域的特征,实现更合理更准确的图像特征编码,有利于大幅提升图像检索性能。
在本申请的一个实施例中,基于VGG-16或ResNet-101的网络架构,对学术界公认的图像检索数据集Paris6k、Oxford5k、Paris106k、Oxford105k进行了测试,测试结果以Mean Average Precision(平均精度均值,简称MAP)作为量化指标,具体的测试结果如表1所示:
Figure PCTCN2020085021-appb-000003
表1
从表1可以看出,采用本申请实施例的技术方案可以有效提高量化指标,尤其当选取ResNet-101框架时,在数据集Paris106k上提升了7.36%,在数据集Oxford105k上提升11.25%。
为了进一步验证本申请实施例的技术方案的效果,在本申请的一个实施例中,在根据本申请实施例的技术方案提取出待检索图像的特征向量之后,可以根据提取出的特征向量进行检索,然后按照相似度从大到小的顺序依次返回检索到的图像,其中返回的第5张图像、第10张图像、第20张图像和第30张图像如图10所示。可见,本发明实施例的技术方案由于提取出来合理且准确的特征,因此即便对于非目标区域比较大的图像,仍然能够较好地检索出来。
此外,本申请实施例的技术方案可以应用在图像检索、视频检索领域中,具体可以用于相似视频推荐、相似视频去重、图像推荐或去重等。
以下介绍本申请的装置实施例,可以用于执行本申请上述实施例中的图像处理方法。对于本申请装置实施例中未披露的细节,请参照本申请上述的图像处理方法的实施例。
图11示出了根据本申请的一个实施例的图像处理装置的框图。
参照图11所示,根据本申请的一个实施例的图像处理装置1100,包括:提取单元1102、划分单元1104、确定单元1106和生成单元1108。
其中,提取单元1102用于提取待处理图像的特征图;划分单元1104用于将所述特征图划分为多个目标区域;确定单元1106用于根据各个所述目标区域的特征向量,确定各个所述目标区域的权重;生成单元1108用于根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量。
在本申请的一些实施例中,基于前述方案,划分单元1104配置为:根据预定的区域划分方式对所述特征图进行划分,得到所述多个目标区域;或对所述特征图进行ROI池化操作,以将ROI映射到所述特征图中得到所述多个目标区域。
在本申请的一些实施例中,基于前述方案,划分单元1104配置为:根据预定的至少一种区域划分方式对所述特征图进行划分,得到各种所述区域划分方式所对应的特征图区域;将各种所述区域划分方式所对应的特征图区域作为所述目标区域。
在本申请的一些实施例中,基于前述方案,确定单元1106配置为:对各个所述目标区域的特征向量进行降维处理,得到各个所述目标区域对应的特征标量;对各个所述目标区域对应的特征标量进行归一化处理,得到各个所述目 标区域的权重。
在本申请的一些实施例中,基于前述方案,确定单元1106配置为:将各个所述目标区域的特征向量输入至输出维度为1维的全连接层,根据所述全连接层的输出确定各个所述目标区域对应的特征标量。
在本申请的一些实施例中,基于前述方案,生成单元1108配置为:根据各个所述目标区域的权重和各个所述目标区域的特征向量,计算各个所述目标区域加权后的特征向量;根据各个所述目标区域加权后的特征向量生成所述待处理图像的特征向量。
在本申请的一些实施例中,基于前述方案,生成单元1108配置为:将各个目标区域加权后的特征向量进行合并处理,得到所述待处理图像的特征向量;或将各个所述目标区域加权后的特征向量进行合并处理,对合并处理后的特征向量进行归一化处理,得到所述待处理图像的特征向量。
在本申请的一些实施例中,基于前述方案,图像处理装置1100还包括:检索单元,用于根据所述待处理图像的特征向量,检索与所述待处理图像相匹配的图像。
图12示出了根据本申请的一个实施例的图像处理装置的框图。
参照图12所示,根据本申请的一个实施例的图像处理装置1200,包括:处理单元1202和获取单元1204。
其中,处理单元1202用于将待处理图像输入至图像处理模型中,所述图像处理模型包括卷积模块、视觉注意力模块和特征合并模块,其中,所述卷积模块用于提取所述待处理图像的特征图;所述视觉注意力模块用于将所述特征图划分为多个目标区域,并根据各个所述目标区域的特征向量确定各个所述目标区域的权重;所述特征合并模块用于根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量;获取单元1204用于获取所述图像处理模型生成的所述待处理图像的特征向量。
在本申请的一些实施例中,基于前述方案,图像处理装置1200还包括:训练单元,用于获取标记了特征向量的图像样本,通过所述图像样本对所述图像处理模型进行训练。
在本申请的一些实施例中,基于前述方案,处理单元1202配置为通过所述卷积模块中的任一卷积层提取所述待处理图像的特征图。
图13示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
需要说明的是,图13示出的电子设备的计算机系统1300仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图13所示,计算机系统1300包括中央处理单元(Central Processing Unit,CPU)1301,其可以根据存储在只读存储器(Read-Only Memory,ROM)1302中的程序或者从存储部分1308加载到随机访问存储器(Random Access Memory,RAM)1303中的程序而执行各种适当的动作和处理,例如执行上述实施例中所述的方法。在RAM 1303中,还存储有系统操作所需的各种程序和数据。CPU 1301、ROM 1302以及RAM 1303通过总线1304彼此相连。输入/输出(Input/Output,I/O)接口1305也连接至总线1304。
以下部件连接至I/O接口1305:包括键盘、鼠标等的输入部分1306;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1307;包括硬盘等的存储部分1308;以及包括诸如LAN(Local Area Network,局域网)卡、调制解调器等的网络接口卡的通信部分1309。通信部分1309经由诸如因特网的网络执行通信处理。驱动器1310也根据需要连接至I/O接口1305。可拆卸介质1311,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1310上,以便于从其上读出的计算机程序根据需要被安装入存储部分1308。
特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1309从网络上被下载和安装,和/或从可拆卸介质1311被安装。在该计算机程序被中央处理单元(CPU)1301执行时,执行本申请的系统中限定的各种功能。
需要说明的是,本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子 可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。其中,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装 配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该电子设备执行时,使得该电子设备实现上述实施例中所述的方法。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本申请实施方式的方法。
本领域技术人员在考虑说明书及实践这里公开的实施方式后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (15)

  1. 一种图像处理方法,其特征在于,包括:
    提取待处理图像的特征图;
    将所述特征图划分为多个目标区域;
    根据各个所述目标区域的特征向量,确定各个所述目标区域的权重;
    根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量。
  2. 根据权利要求1所述的图像处理方法,其特征在于,将所述特征图划分为多个目标区域,包括:
    根据预定的区域划分方式对所述特征图进行划分,得到所述多个目标区域;或
    对所述特征图进行感兴趣区域ROI池化操作,以将ROI映射到所述特征图中得到所述多个目标区域。
  3. 根据权利要求2所述的图像处理方法,其特征在于,根据预定的区域划分方式对所述特征图进行划分,包括:
    根据预定的至少一种区域划分方式对所述特征图进行划分,得到各种所述区域划分方式所对应的特征图区域;
    将各种所述区域划分方式所对应的特征图区域作为所述目标区域。
  4. 根据权利要求1所述的图像处理方法,其特征在于,根据各个所述目标区域的特征向量,确定各个所述目标区域的权重,包括:
    对各个所述目标区域的特征向量进行降维处理,得到各个所述目标区域对应的特征标量;
    对各个所述目标区域对应的特征标量进行归一化处理,得到各个所述目标区域的权重。
  5. 根据权利要求4所述的图像处理方法,其特征在于,对各个所述目标区域的特征向量进行降维处理,得到各个所述目标区域对应的特征标量,包括:
    将各个所述目标区域的特征向量输入至输出维度为1维的全连接层,根据所述全连接层的输出确定各个所述目标区域对应的特征标量。
  6. 根据权利要求1所述的图像处理方法,其特征在于,根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向 量,包括:
    根据各个所述目标区域的权重和各个所述目标区域的特征向量,计算各个所述目标区域加权后的特征向量;
    根据各个所述目标区域加权后的特征向量生成所述待处理图像的特征向量。
  7. 根据权利要求6所述的图像处理方法,其特征在于,根据各个所述目标区域加权后的特征向量生成所述待处理图像的特征向量,包括:
    将各个所述目标区域加权后的特征向量进行合并处理,得到所述待处理图像的特征向量;或
    将各个所述目标区域加权后的特征向量进行合并处理,对合并处理后的特征向量进行归一化处理,得到所述待处理图像的特征向量。
  8. 根据权利要求1至7中任一项所述的图像处理方法,其特征在于,还包括:
    根据所述待处理图像的特征向量,检索与所述待处理图像相匹配的图像。
  9. 一种图像处理方法,其特征在于,包括:
    将待处理图像输入至图像处理模型中,所述图像处理模型包括卷积模块、视觉注意力模块和特征合并模块,
    其中,所述卷积模块用于提取所述待处理图像的特征图;所述视觉注意力模块用于将所述特征图划分为多个目标区域,并根据各个所述目标区域的特征向量确定各个所述目标区域的权重;所述特征合并模块用于根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量;
    获取所述图像处理模型的所述待处理图像的特征向量。
  10. 根据权利要求9所述的图像处理方法,其特征在于,还包括:
    获取标记了特征向量的图像样本;
    通过所述图像样本对所述图像处理模型进行训练。
  11. 根据权利要求9或10所述的图像处理方法,其特征在于,通过所述卷积模块中的任一卷积层提取所述待处理图像的特征图。
  12. 一种图像处理装置,其特征在于,包括:
    提取单元,用于提取待处理图像的特征图;
    划分单元,用于将所述特征图划分为多个目标区域;
    确定单元,用于根据各个所述目标区域的特征向量,确定各个所述目标区域的权重;
    生成单元,用于根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量。
  13. 一种图像处理装置,其特征在于,包括:
    处理单元,用于将待处理图像输入至图像处理模型中,所述图像处理模型包括卷积模块、视觉注意力模块和特征合并模块,
    其中,所述卷积模块用于提取所述待处理图像的特征图;所述视觉注意力模块用于将所述特征图划分为多个目标区域,并根据各个所述目标区域的特征向量确定各个所述目标区域的权重;所述特征合并模块用于根据各个所述目标区域的权重和各个所述目标区域的特征向量,生成所述待处理图像的特征向量;
    获取单元,用于获取所述图像处理模型生成的所述待处理图像的特征向量。
  14. 一种计算机可读介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至8中任一项所述的图像处理方法,或实现如权利要求9至11中任一项所述的图像处理方法。
  15. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1至8中任一项所述的图像处理方法,或实现如权利要求9至11中任一项所述的图像处理方法。
PCT/CN2020/085021 2019-05-06 2020-04-16 图像处理方法、装置、计算机可读介质及电子设备 WO2020224405A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2021542181A JP7163504B2 (ja) 2019-05-06 2020-04-16 画像処理方法並びにその、装置、コンピュータプログラム及び電子機器
EP20801637.8A EP3968180A4 (en) 2019-05-06 2020-04-16 METHOD AND APPARATUS FOR IMAGE PROCESSING, COMPUTER READABLE MEDIUM AND ELECTRONIC DEVICE
US17/352,822 US11978241B2 (en) 2019-05-06 2021-06-21 Image processing method and apparatus, computer-readable medium, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910369974.X 2019-05-06
CN201910369974.XA CN110222220B (zh) 2019-05-06 2019-05-06 图像处理方法、装置、计算机可读介质及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/352,822 Continuation US11978241B2 (en) 2019-05-06 2021-06-21 Image processing method and apparatus, computer-readable medium, and electronic device

Publications (1)

Publication Number Publication Date
WO2020224405A1 true WO2020224405A1 (zh) 2020-11-12

Family

ID=67820356

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/085021 WO2020224405A1 (zh) 2019-05-06 2020-04-16 图像处理方法、装置、计算机可读介质及电子设备

Country Status (5)

Country Link
US (1) US11978241B2 (zh)
EP (1) EP3968180A4 (zh)
JP (1) JP7163504B2 (zh)
CN (1) CN110222220B (zh)
WO (1) WO2020224405A1 (zh)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222220B (zh) 2019-05-06 2024-05-10 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机可读介质及电子设备
CN110472627B (zh) * 2019-07-02 2022-11-08 五邑大学 一种端到端的sar图像识别方法、装置及存储介质
CN110796594B (zh) * 2019-10-28 2021-11-09 腾讯科技(深圳)有限公司 一种图像生成方法、装置及设备
JP2023524038A (ja) * 2020-05-01 2023-06-08 マジック リープ, インコーポレイテッド 階層正規化がかけられる画像記述子ネットワーク
CN111639654B (zh) * 2020-05-12 2023-12-26 博泰车联网(南京)有限公司 一种图像处理方法、装置及计算机存储介质
CN112052350B (zh) * 2020-08-25 2024-03-01 腾讯科技(深圳)有限公司 一种图片检索方法、装置、设备和计算机可读存储介质
CN112102167B (zh) * 2020-08-31 2024-04-26 深圳市航宇数字视觉科技有限公司 一种基于视觉感知的图像超分辨率方法
CN113256661A (zh) * 2021-06-23 2021-08-13 北京蜂巢世纪科技有限公司 图像处理方法、装置、设备、介质及程序产品
CN116051385A (zh) * 2021-10-28 2023-05-02 北京三星通信技术研究有限公司 图像处理方法、装置、电子设备及存储介质
CN115205120A (zh) * 2022-07-26 2022-10-18 中国电信股份有限公司 图像处理方法、图像处理装置、介质及电子设备
CN115019151B (zh) * 2022-08-05 2022-10-21 成都图影视讯科技有限公司 非显著特征区域加速型神经网络构架、方法和设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169090A (zh) * 2017-05-12 2017-09-15 深圳市唯特视科技有限公司 一种利用内容环绕信息提取图像表征的特定对象检索方法
CN107577758A (zh) * 2017-08-31 2018-01-12 桂林电子科技大学 一种基于多区域交叉权值的图像卷积特征的生成方法
CN108171260A (zh) * 2017-12-15 2018-06-15 百度在线网络技术(北京)有限公司 一种图片识别方法及系统
CN108229468A (zh) * 2017-06-28 2018-06-29 北京市商汤科技开发有限公司 车辆外观特征识别及车辆检索方法、装置、存储介质、电子设备
CN110222220A (zh) * 2019-05-06 2019-09-10 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机可读介质及电子设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007079616A (ja) * 2005-09-09 2007-03-29 Canon Inc 情報検索装置、情報検索装置の制御方法、及び制御プログラム
JP2010211484A (ja) * 2009-03-10 2010-09-24 Nippon Telegr & Teleph Corp <Ntt> 存在確率による位置重みを考慮した類似画像検索装置、存在確率による位置重みを考慮した類似画像検索方法、存在確率による位置重みを考慮した類似画像検索プログラム
JP5143773B2 (ja) * 2009-03-19 2013-02-13 ヤフー株式会社 画像検索装置
JP5014479B2 (ja) * 2010-10-05 2012-08-29 ヤフー株式会社 画像検索装置、画像検索方法及びプログラム
US10354159B2 (en) * 2016-09-06 2019-07-16 Carnegie Mellon University Methods and software for detecting objects in an image using a contextual multiscale fast region-based convolutional neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169090A (zh) * 2017-05-12 2017-09-15 深圳市唯特视科技有限公司 一种利用内容环绕信息提取图像表征的特定对象检索方法
CN108229468A (zh) * 2017-06-28 2018-06-29 北京市商汤科技开发有限公司 车辆外观特征识别及车辆检索方法、装置、存储介质、电子设备
CN107577758A (zh) * 2017-08-31 2018-01-12 桂林电子科技大学 一种基于多区域交叉权值的图像卷积特征的生成方法
CN108171260A (zh) * 2017-12-15 2018-06-15 百度在线网络技术(北京)有限公司 一种图片识别方法及系统
CN110222220A (zh) * 2019-05-06 2019-09-10 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机可读介质及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3968180A4 *

Also Published As

Publication number Publication date
EP3968180A1 (en) 2022-03-16
CN110222220A (zh) 2019-09-10
CN110222220B (zh) 2024-05-10
US11978241B2 (en) 2024-05-07
JP2022517835A (ja) 2022-03-10
US20210319243A1 (en) 2021-10-14
JP7163504B2 (ja) 2022-10-31
EP3968180A4 (en) 2022-07-06

Similar Documents

Publication Publication Date Title
WO2020224405A1 (zh) 图像处理方法、装置、计算机可读介质及电子设备
CN108898186B (zh) 用于提取图像的方法和装置
CN109241524B (zh) 语义解析方法及装置、计算机可读存储介质、电子设备
TWI737006B (zh) 一種跨模態訊息檢索方法、裝置和儲存介質
WO2020215974A1 (zh) 用于人体检测的方法和装置
CN112559800B (zh) 用于处理视频的方法、装置、电子设备、介质和产品
US20220230061A1 (en) Modality adaptive information retrieval
WO2023015935A1 (zh) 一种体检项目推荐方法、装置、设备及介质
CN110263218B (zh) 视频描述文本生成方法、装置、设备和介质
CN115482395B (zh) 模型训练方法、图像分类方法、装置、电子设备和介质
CN115861462B (zh) 图像生成模型的训练方法、装置、电子设备及存储介质
CN114299194B (zh) 图像生成模型的训练方法、图像生成方法及装置
CN115690443B (zh) 特征提取模型训练方法、图像分类方法及相关装置
CN110633717A (zh) 一种目标检测模型的训练方法和装置
CN115147680B (zh) 目标检测模型的预训练方法、装置以及设备
CN114861758A (zh) 多模态数据处理方法、装置、电子设备及可读存储介质
CN111161238A (zh) 图像质量评价方法及装置、电子设备、存储介质
WO2022012178A1 (zh) 用于生成目标函数的方法、装置、电子设备和计算机可读介质
WO2024082827A1 (zh) 文本相似性度量方法、装置、设备、存储介质和程序产品
WO2021012691A1 (zh) 用于检索图像的方法和装置
CN113409307A (zh) 基于异质噪声特性的图像去噪方法、设备及介质
WO2023185125A1 (zh) 产品资源的数据处理方法及装置、电子设备、存储介质
CN109710939B (zh) 用于确定主题的方法和装置
WO2024130753A1 (zh) 一种多路并行的文本到图像生成方法和系统
WO2023130925A1 (zh) 字体识别方法、装置、可读介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20801637

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021542181

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020801637

Country of ref document: EP

Effective date: 20211206