WO2021249520A1 - 图像处理方法、装置及存储介质 - Google Patents

图像处理方法、装置及存储介质 Download PDF

Info

Publication number
WO2021249520A1
WO2021249520A1 PCT/CN2021/099560 CN2021099560W WO2021249520A1 WO 2021249520 A1 WO2021249520 A1 WO 2021249520A1 CN 2021099560 W CN2021099560 W CN 2021099560W WO 2021249520 A1 WO2021249520 A1 WO 2021249520A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
texture
features
feature map
primitives
Prior art date
Application number
PCT/CN2021/099560
Other languages
English (en)
French (fr)
Inventor
程捷
蒋磊
曹洋
查正军
Original Assignee
华为技术有限公司
中国科学技术大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 中国科学技术大学 filed Critical 华为技术有限公司
Priority to EP21821901.2A priority Critical patent/EP4156078A4/en
Publication of WO2021249520A1 publication Critical patent/WO2021249520A1/zh
Priority to US18/064,144 priority patent/US20230109317A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/49Analysis of texture based on structural texture description, e.g. using primitives or placement rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the embodiments of the present application relate to the field of image processing, and in particular, to an image processing method, device, and storage medium.
  • Texture representation is an important research field of computer vision, and it has broad application prospects in image processing fields such as image recognition, image segmentation and image synthesis.
  • image recognition the image to be recognized can be represented by texture first, and then the image to be recognized can be recognized based on the result of texture representation of the image to be recognized. For example, people and buildings in the image to be recognized can be recognized.
  • the effect is not good when the image is processed according to the result of the texture representation of the image.
  • the recognition accuracy is low.
  • the embodiments of the present application provide an image processing method, device, and storage medium, which can improve the effect of image processing, for example, can improve the accuracy of image recognition.
  • an embodiment of the present application provides an image processing method, which can be implemented through a neural network.
  • the method includes: obtaining the dependency relationship between the features of each texture primitive in the image according to the direction information and the multi-scale feature map of the image, wherein the multi-scale feature map includes the multiple texture primitives of the image at a plurality of different scales.
  • the direction information includes one or more directions.
  • the texture representation result of the image is obtained according to the dependency relationship and at least one set of texture features. Process the image according to the result of the texture representation of the image.
  • the texture representation result of the image can not only include the texture features of the image, but also the dependency relationship between the features of different texture primitives in the image, the texture representation result of the image can reflect the image
  • the texture information is more complete. Therefore, the effect of image processing will be better when performing image processing such as image recognition, image segmentation, or image synthesis according to the result of image texture representation. For example, it can effectively improve the accuracy of image recognition.
  • the direction information includes a first direction and a second direction opposite to the first direction.
  • the direction information may include multiple sets of directions, and each set of directions may include two opposite first and second directions. That is, the directions included in the direction information may be an even number of directions appearing in pairs.
  • the foregoing obtaining at least one set of texture features of the image according to the feature map of at least one scale of the image includes: extracting the features of each texture primitive in the feature map of at least one scale of the image to obtain multiple textures Features of primitives; pooling the features of multiple texture primitives to obtain at least one set of texture features.
  • the spatially ordered texture features of the image can be obtained.
  • obtaining the dependency relationship between the features of the texture primitives in the image according to the direction information and the multi-scale feature map of the image includes: extracting the texture primitives in the multi-scale feature map of the image according to the direction information.
  • the characteristics of the element, the characteristics of the texture primitives in multiple regions of the image are obtained; according to the characteristics of the texture primitives in the multiple regions, the dependence relationship between the features of the texture primitives in each region is obtained, and the characteristics of the texture primitives in each region are obtained.
  • the features of each texture primitive in the multi-scale feature map of the image can be extracted to obtain multiple first matrices; among them, a first matrix contains the features of the texture primitive in a region of the image. Then, the corresponding second matrix can be determined according to each first matrix to obtain multiple second matrices, and the second matrix can contain the dependency relationship between the features of the texture primitives in the image area corresponding to the corresponding first matrix, thereby Multiple sets of dependency relationships corresponding to multiple regions can be obtained. Aggregating the foregoing multiple sets of dependency relationships can obtain the dependency relationships between the features of the texture primitives in the image.
  • the method before determining the dependence relationship between the features of the texture primitives in the image according to the multiple sets of dependence relationships, the method further includes: according to the first function, for each of the multiple sets of dependence relationships The bidirectional relationship value between the features of any two texture primitives in the group dependency is updated.
  • each group of dependencies can be strengthened, and the characteristics of any two texture primitives in each group of dependencies can be enhanced. Associations are established between them, which makes it easier for the neural network to learn the spatial structure dependence between texture primitives.
  • extracting the features of each texture primitive in the multi-scale feature map of the image according to the direction information includes: extracting the features of each texture primitive in the multi-scale feature map of the image along one or more directions.
  • one directional map or multiple directional maps can be used as the spatial context guide condition, and the features of each texture primitive in the multi-scale feature map of the image can be extracted along the direction corresponding to the directional map, thereby effectively improving the ability to extract spatial context clues. Furthermore, it can better perceive the features of texture primitives, extract as many potential texture primitive features as possible in the multi-scale feature map, and obtain a more comprehensive dependence relationship between the features of texture primitives in the image.
  • the image processing method further includes: extracting feature maps of multiple scales of the image.
  • the multi-scale feature maps of the image are scaled to the same scale and then spliced to obtain the multi-scale feature maps of the image.
  • the image processing method may further include: adopting bilateral linear interpolation to adjust the size of the original image to the first size.
  • the image processing method may further include: cropping the original image with the first size from the original image with the first size.
  • the two-size image block is used as the image to be processed.
  • the image processing method may further include: standardizing the image.
  • the characteristic data of each texture primitive in the image can be centralized, and the generalization ability of image processing can be increased.
  • the foregoing processing of the image includes any one of: recognizing the image, segmenting the image, and performing image synthesis based on the image.
  • an embodiment of the present application provides an image processing device, which can be implemented through a neural network.
  • the device has the function of realizing the method described in the first aspect.
  • the function can be realized by hardware, or the corresponding software can be executed by hardware.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions, for example, a texture representation module, a processing module, and so on.
  • the texture representation module can be used to obtain the dependency relationship between the features of each texture primitive in the image according to the direction information and the multi-scale feature map of the image.
  • the multi-scale feature map includes multiple texture primitives of the image.
  • the direction information includes one or more directions; at least one set of texture features of the image is obtained according to a feature map of at least one scale of the image, wherein a set of the textures of the image is obtained according to a feature map of one scale
  • Features Obtain the texture representation result of the image according to the dependency relationship and at least one set of texture features.
  • the processing module can be used to process the image according to the result of the texture representation of the image.
  • an embodiment of the present application provides an image processing device, including: an interface circuit for receiving data of an image to be processed; a processor, connected to the interface circuit, and used for performing possible operations such as the first aspect or the first aspect Any of the methods described in the design.
  • an embodiment of the present application further provides an image processing device, including: a processor, which is configured to be connected to a memory and call a program stored in the memory to execute a possible design such as the first aspect or the first aspect Any of the methods described.
  • the embodiments of the present application also provide a computer-readable storage medium, including: computer software instructions; when the computer software instructions run in an image processing device or a chip built in the image processing device, the image processing device executes such as The method described in the first aspect or any of the possible designs of the first aspect.
  • the embodiments of the present application also provide a computer program product, which can implement the method described in the first aspect or any one of the possible designs of the first aspect when the computer program product is executed.
  • the embodiments of the present application also provide a chip system, which is applied to an image processing device; the chip system includes one or more interface circuits and one or more processors; the interface circuits and the processors are interconnected by wires; The processor receives and executes computer instructions from the memory of the electronic device through the interface circuit to implement the method as described in the first aspect or any of the possible designs of the first aspect.
  • Figure 1 shows a schematic diagram of a waffle image
  • Figure 2 shows a schematic diagram of an existing image recognition network
  • FIG. 3 shows a schematic diagram of the composition of an image processing device provided by an embodiment of the present application
  • FIG. 4 shows a schematic flowchart of an image processing method provided by an embodiment of the present application
  • FIG. 5 shows another schematic flowchart of an image processing method provided by an embodiment of the present application
  • FIG. 6 shows a schematic diagram of the composition of a neural network provided by an embodiment of the present application.
  • Fig. 7 shows a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • Image texture is an important visual means and a ubiquitous feature in images.
  • the image texture is usually composed of multiple texture primitives, and multiple texture primitives may be of the same type or of different types.
  • Figure 1 shows a schematic diagram of a waffle image.
  • the texture primitive can be a quadrilateral grid in the waffle image (primitive 1 shown in Figure 1), or a fork texture in the waffle image ( Primitive 2) shown in Figure 1. That is, the texture primitives of the waffle image can include two types: primitive 1 and primitive 2.
  • texture primitives of the waffle image is only an exemplary description.
  • texture primitives can also be divided in other ways, which is not limited in this application.
  • image texture representation has a wide range of applications in the fields of portrait detection, medical image analysis, industrial vision detection, image classification and retrieval.
  • image recognition people, buildings, animals, etc. in the image to be recognized can be recognized based on the texture representation result of the image to be recognized.
  • image segmentation the image to be segmented can be divided into a number of specific regions with unique properties according to the result of the texture representation of the image to be segmented.
  • image synthesis multiple different images can be synthesized into one image according to the texture representation results of multiple different images. For example, you can embed a person in an image with a desert background into an image with a beach background.
  • Fig. 2 shows a schematic diagram of an existing image recognition network.
  • the existing image recognition network may include: an input layer, a feature extraction layer, a texture coding layer, a fully connected layer, and an output layer.
  • a dictionary base containing multiple codewords is preset in the texture coding layer, and it also includes a residual coding module, a weight distribution module, and a feature aggregation module.
  • the image to be recognized can be input to the image recognition network through the input layer.
  • the feature extraction layer can perform feature extraction on the image input by the input layer to obtain the features of each texture primitive in the image.
  • the residual coding module can calculate the residuals corresponding to the features of the texture primitives of the image according to the features of the texture primitives of the image extracted by the feature extraction layer and the codewords in the dictionary base.
  • the weight distribution module can calculate the weights corresponding to the features of the texture primitives of the image according to the features of the texture primitives of the image extracted by the feature extraction layer and the codewords in the dictionary base.
  • the feature aggregation module can aggregate the residuals obtained by the residual coding module and the weights obtained by the weight distribution module to obtain the texture representation result of the image.
  • the fully connected layer can recognize the image according to the texture representation result of the image obtained by the texture coding layer, for example, it can perform portrait recognition, material detection, and item classification according to the texture representation result of the image.
  • the texture coding layer merely aggregates the features of each texture primitive in the image in a disorderly manner, and obtains the texture representation result of the image, and the texture representation result of the obtained image can reflect
  • the texture information of the output image is limited. Therefore, when subsequent image processing such as image recognition, image segmentation, or image synthesis is performed based on the result of image texture representation, the effect of image processing is not good. For example, the accuracy of image recognition will be low.
  • the embodiment of the present application provides an image processing method, which can be implemented through a neural network.
  • the method can obtain at least one set of texture features of the image according to the feature map of at least one scale of the image, and obtain the dependency relationship between the features of each texture primitive in the image according to the direction information and the multi-scale feature map of the image, and according to the dependency Relationship, and the aforementioned at least one set of texture features to obtain a texture representation result of the image.
  • the image can be processed according to the result of the texture representation of the image.
  • the processing of the image may be any one of image recognition, image segmentation, and image synthesis based on the image, which is not limited here.
  • the texture representation result of the image is obtained according to the dependency relationship between the features of the texture primitives in the image and at least one set of texture features of the image, so that the texture representation result of the image is It can not only include the texture feature of the image, but also the dependency relationship between the features of different texture primitives in the image.
  • the texture representation result of the image can reflect the texture information of the image more perfect. Therefore, the effect of image processing will be better when performing image processing such as image recognition, image segmentation, or image synthesis according to the result of image texture representation. For example, it can effectively improve the accuracy of image recognition.
  • the embodiment of the present application provides an image processing device that can be used to execute the image processing method.
  • the image processing device may be a desktop computer, a server, a TV, a monitor, a mobile phone, a tablet computer, a scanner, etc.
  • Electronic equipment, this application does not limit the specific types of image processing equipment.
  • Fig. 3 shows a schematic diagram of the composition of an image processing device provided by an embodiment of the present application.
  • the image processing device of the embodiment of the present application may include: a processor 310, an external memory interface 320, an internal memory 321, and a universal serial bus (USB) interface 330.
  • a processor 310 may include: a processor 310, an external memory interface 320, an internal memory 321, and a universal serial bus (USB) interface 330.
  • USB universal serial bus
  • the processor 310 may include one or more processing units.
  • the processor 310 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (Neural-network Processing Unit, NPU) Wait.
  • AP application processor
  • modem processor modem processor
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • NPU neural network Processing Unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the above-mentioned controller may be a decision maker who directs the various components of the image processing equipment to coordinate work according to instructions. It is the nerve center and command center of image processing equipment.
  • the above-mentioned controller generates an operation control signal according to the instruction operation code and the timing signal, and completes the control of fetching and executing instructions.
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • NPU can realize the intelligent recognition of image processing equipment and other applications, such as: image recognition, face recognition, voice recognition, text understanding, etc.
  • a memory may also be provided in the processor 310 to store instructions and data.
  • the memory in the processor 310 is a cache memory, which can store instructions or data that have just been used or recycled by the processor 310. If the processor 310 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 310 is reduced, and efficiency is improved.
  • the processor 310 may include an interface.
  • the interface can include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM card interface SIM card interface
  • USB interface etc.
  • the external memory interface 320 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the image processing device.
  • the external memory card communicates with the processor 310 through the external memory interface 320 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 321 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 310 executes various functional applications and data processing of the image processing device by running instructions stored in the internal memory 321.
  • the image processing method provided in the embodiment of the present application can be executed.
  • the internal memory 321 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function.
  • the data storage area can store the data created during the use of the image processing equipment.
  • the internal memory 321 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, other volatile solid-state storage devices, universal flash storage (UFS) Wait.
  • UFS universal flash storage
  • the USB interface 330 may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
  • the USB interface 330 can be used to connect a charger to charge the image processing device, and can also be used to transfer data between the image processing device and peripheral devices. For example, the image to be processed can be transmitted to the processor 310.
  • the structure illustrated in the embodiment of the present invention does not constitute a limitation on the image processing device.
  • the image processing device may also include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the image processing device may also include a charging management module, a power management module, a battery, an antenna, a radio frequency module, a communication module, an audio module, a speaker, a receiver, a microphone, a headphone
  • Fig. 4 shows a schematic flowchart of an image processing method provided by an embodiment of the present application. As shown in Fig. 4, the image processing method may include S401-S409.
  • the image may be an image captured by a photographing device such as a mobile phone or a video camera, or an image scanned by a scanner, or may be a certain frame of image in some videos or a screenshot of a certain screen.
  • a photographing device such as a mobile phone or a video camera
  • an image scanned by a scanner or may be a certain frame of image in some videos or a screenshot of a certain screen.
  • the part used to extract feature maps of multiple scales of the image can be a deep residual network (Deep residual network, Resnet), VGG network, Alexnet network, GoogLeNet network, etc., this application There is no restriction on this.
  • Resnet can perform feature extraction on the input image and output feature maps of the image at multiple scales.
  • the feature map at each scale contains the features of multiple texture primitives of the image at that scale.
  • the feature maps of multiple scales of the image are scaled to the same scale and then spliced to obtain a multi-scale feature map of the image.
  • the multi-scale feature map includes the features of multiple texture primitives of the image at multiple different scales.
  • Resnet50 For example: Taking the aforementioned Resnet as Resnet50 as an example, suppose the output of the third residual module (Res3), the fourth residual module (Res4), and the fifth residual module (Res5) in Resnet50 are respectively Feature map 3, feature map 4, and feature map 5.
  • Feature map 3, feature map 4, and feature map 5 are the feature maps of the image at three different scales.
  • the matrices corresponding to the feature map 4 and the feature map 5 can be scaled to the same size as the feature map 3, but the number of channels is different.
  • the scaling method may be interpolation scaling.
  • feature map 3, feature map 4, and feature map 5 can be spliced along the channel dimension to obtain a multi-scale feature map of the image.
  • the dependency relationship between the features of the texture primitives in the image can be obtained according to the direction information and the multi-scale feature map of the image.
  • the direction information may include one or more directions.
  • the direction information may include at least one preset direction map, each direction map may be used to indicate a direction, and different direction maps correspond to different directions.
  • the direction information including 8 directional graphs as an example, the 8 directional graphs can indicate in turn: up, down, left, right, top left, bottom left, top right, bottom right, etc. 8 directions.
  • the matrix corresponding to the directional map can be numerically gradual in the direction corresponding to the directional map to be able to indicate the direction.
  • the matrix of the direction map corresponding to the up direction can be as follows:
  • the matrix of the direction map corresponding to the downward direction can be as follows:
  • the matrix corresponding to the direction map with the upper left direction can be as follows:
  • the matrix of the direction map corresponding to the lower right direction can be as follows:
  • the direction information may include a first direction and a second direction opposite to the first direction.
  • the number of directional patterns included in the direction information may be an even number, and for any one of the first directional patterns in the directional information, the direction information further includes a second directional pattern that is opposite to the direction corresponding to the first directional pattern. That is, the directional patterns in the directional information are all in pairs. For example, if the direction information includes 2 directional patterns, the direction corresponding to the 2 directional patterns may be a pair of left and right, upper and lower, upper left and lower right, and so on. Similarly, if there are 4 directional patterns, they can be one or more pairs of the foregoing. When the directional patterns included in the directional information are an even number of pairs appearing, the dependency relationship between the features of the texture primitives in the image can be fully obtained.
  • the direction information can also include more (e.g., 16, 32) directions for indicating different directions, or fewer (e.g., 1, 2) for indicating different directions.
  • Directional map of directions may also be implemented in other ways such as relative coordinates, absolute coordinates, etc., which is not limited in this application.
  • S403 Extract the features of each texture primitive in the multi-scale feature map of the image according to the direction information to obtain the features of the texture primitives in multiple regions of the image.
  • the feature of each texture primitive in the multi-scale feature map of the image can be extracted along one or more directions included in the direction information.
  • the neural network can extract the features of each texture primitive in the multi-scale feature map of the image along the direction corresponding to the direction map according to one or more directional maps to obtain multiple first matrices.
  • each first matrix contains the features of the texture primitives of the local regions in the multi-scale feature map of the image.
  • different first matrices correspond to different local regions of the multi-scale feature map of the image.
  • the part of the neural network used to extract the features of each texture primitive in the multi-scale feature map of the image may be a convolutional network.
  • the convolutional network Through the convolutional network, multiple convolution operations can be performed on the multi-scale feature map according to the direction map to obtain multiple first matrices.
  • this convolutional network when performing multiple convolution operations on the multi-scale feature map according to the direction map, you can first use a linear function or a nonlinear function to map the value of the direction map to a fixed value range, such as: [- 1,1], and use a convolutional network to map the directional map to the same feature space as the multi-scale feature map. Normalizing the directional map and multi-scale feature map in this way can reduce the numerical difference between the directional map and the multi-scale feature map, so that the neural network can more easily converge, and it can also capture the texture primitives more accurately. feature.
  • the matrix of the multi-scale feature map is a 9*9 (9 rows and 9 columns) matrix
  • the convolution kernel of the convolutional network is a 3*3 matrix
  • the matrix of the direction map should also be a 3* 3 matrix (refer to the previous example description).
  • the convolutional network can convolve the 9*9 matrix corresponding to the multi-scale feature map through the 3*3 matrix corresponding to the convolution kernel along the direction corresponding to the directional map, and each convolution can extract a 3 *3
  • the matrix is the first matrix above.
  • the first matrix obtained by the first convolution of the convolutional network contains the 9*9 matrix corresponding to the multi-scale feature map, and belongs to the first row to the third row, The features of each texture primitive in the first column to the third column.
  • the first matrix obtained by the second convolution of the convolutional network contains the 9*9 matrix corresponding to the multi-scale feature map, which belongs to the fourth row to the sixth row, and the fourth column to the sixth column. The characteristics of local texture primitives.
  • the neural network can extract the features of each texture primitive in the multi-scale feature map of the image according to one or more directional maps along the direction corresponding to the directional map to obtain multiple first matrices, thereby obtaining the multiple images of the image.
  • the characteristics of texture primitives in each region Further, the neural network can also obtain the dependency relationships between the features of the texture primitives in each region according to the multiple first matrices, and obtain multiple sets of dependency relationships corresponding to the multiple regions respectively.
  • the second matrix corresponding to each first matrix can be determined to obtain a plurality of second matrices, and the second matrix can contain the dependency relationship between the features of the texture primitives corresponding to the local region included in the first matrix.
  • the size of the A matrix is (k_w, k_h, c), where k_w represents the number of rows of the A matrix, k_h represents the number of columns of the A matrix, and c represents the channel dimension of the A matrix.
  • the A matrix can be subjected to two different nonlinear transformations to obtain the two nonlinear transformed matrices corresponding to the A matrix (for example, the transformation can be carried out by two nonlinear functions), which are referred to here as the B1 matrix and the B2 matrix.
  • the sizes of the B1 matrix and the B2 matrix are both (k_w, k_h, c).
  • the B1 matrix and the B2 matrix After obtaining the B1 matrix and the B2 matrix, you can first reshape and transpose the B1 matrix into a matrix of size (k_w*k_h, 1, c), and shape the B2 matrix into a size of (1, k_w*k_h, c) matrix.
  • the matrix obtained by shaping and transposing the B1 matrix can be called the B1' matrix
  • the matrix obtained by the B2 matrix shaping can be called the B2' matrix.
  • the B1' matrix and the B2' matrix can be multiplied to obtain a C matrix.
  • the C matrix is the difference between the aforementioned features of the local texture primitives contained in the corresponding first matrix (A matrix).
  • the second matrix of dependencies The size of the C matrix is (k_w*k_h, k_w*k_h).
  • the different characteristics of the texture primitives in the A matrix can be polarized, so that the dependency relationship between the features of the texture primitives established later is more reliable.
  • S405 Determine the dependence relationship between the features of the texture primitives in the image according to the multiple sets of dependence relationships.
  • the multiple sets of dependency relationships can be aggregated together through S405 as the dependency relationship between the features of the texture primitives in the image.
  • the neural network can determine the feature correspondence of the texture primitives of the local area contained in each first matrix according to each first matrix and the second matrix corresponding to the first matrix
  • the feature vector of obtain multiple feature vectors, and aggregate the multiple feature vectors together as the dependency relationship between the features of the texture primitives in the image.
  • the feature vector is used to indicate the dependency relationship between the features of the texture primitives of the local area contained in the first matrix.
  • the neural network before determining the dependency between the features of the texture primitives in the image according to the multiple sets of dependency relationships, can also perform calculations on any two of the multiple sets of dependency relationships according to the first function.
  • the two-way relationship value between the features of the texture primitives is updated.
  • the neural network can determine the feature vector corresponding to the feature of the texture primitive of the local area contained in each first matrix according to each first matrix and the second matrix corresponding to the first matrix, and compare the second matrix. The dependencies contained in the matrix are updated.
  • the first matrix is the aforementioned A matrix
  • the second matrix is the aforementioned C matrix for example.
  • the C matrix After the C matrix is obtained, based on the first function, the C matrix can be subjected to bidirectional cooperative operation through the neural network to obtain the D matrix.
  • the size of the D matrix is the same as that of the C matrix.
  • the two-way coordination strategy (that is, the first function) for performing two-way cooperative operation on the C matrix may be as follows.
  • r ij and r ji represent the bidirectional relationship value between texture primitive i and texture primitive j in the C matrix; and Represents the bidirectional relationship value between texture primitive i and texture primitive j in matrix D (after performing bidirectional cooperative operation on matrix C).
  • Performing a two-way cooperative operation on the C matrix refers to calculating the weight ratio between the texture primitive i and the texture primitive j according to the aforementioned two-way coordination strategy to obtain a new D matrix.
  • weighting functions such as softmax and logit can be used, and the application does not limit the type of the function.
  • the dependency between the local features of the texture primitives contained in the first matrix can be strengthened.
  • the D matrix After obtaining the D matrix, you can first form the A matrix corresponding to the D matrix (corresponding to the C matrix, and the D matrix is obtained from the C matrix, so corresponding to the D matrix) A matrix of the size (1, k_w*k_h, c), Such as: call it A'matrix. Then, the A'matrix and the D matrix can be multiplied, and the result matrix of the multiplication can be reshaped to obtain the E matrix. The size of the E matrix (k_w, k_h, c). It is understandable that, according to the aforementioned series of matrix operations, each A matrix (the first matrix) will get a corresponding E matrix.
  • the E matrix can be pooled to obtain the feature vector at the center of the E matrix, which is the feature vector corresponding to the feature of each local texture primitive contained in the A matrix (the first matrix).
  • the size of the feature vector is (1, 1, c).
  • pooling the E matrix may include average pooling, maximum pooling, etc., which is not limited here.
  • the eigenvector corresponding to each first matrix can be determined and obtained, thereby obtaining multiple eigenvectors.
  • a fourth matrix can be formed according to the multiple eigenvectors, for example, it can be called an F matrix.
  • the size of the F matrix is (ww, hh, c), and ww represents the size of the multi-scale feature map. Length, hh represents the width of the multi-scale feature map.
  • the fourth matrix can be used to indicate the dependency between the features of the texture primitives in the image.
  • At least one set of texture features of the image can also be obtained according to the feature map of at least one scale of the image.
  • the following S406-S407 can be executed.
  • S406 Extract the features of each texture primitive in the feature map of at least one scale of the image to obtain the features of multiple texture primitives.
  • S407 Pool the features of the multiple texture primitives to obtain at least one set of texture features.
  • the part of the neural network described in this application used to extract the features of each texture primitive in the feature map of at least one scale of the image can also be implemented by a convolutional network.
  • the primitive principle is the same as that of the aforementioned extracted image.
  • the features of the texture primitives in the multi-scale feature map are similar, so I won’t repeat them here.
  • the extraction of the features of each texture primitive in the feature map of at least one scale of the image in S406 may refer to: one or more features in the feature map of multiple scales of the image obtained in S401 Feature extraction is performed on the image to obtain texture features of one or more feature images.
  • the texture features of one or more feature maps are pooled.
  • extracting the features of each texture primitive in the feature map of at least one scale of the image in S406 may also refer to: performing feature extraction on the multi-scale feature map of the image obtained in S402 to obtain the multi-scale feature map.
  • the texture feature of the feature map may also refer to: performing feature extraction on the multi-scale feature map of the image obtained in S402 to obtain the multi-scale feature map.
  • the texture feature of the feature map may also refer to: performing feature extraction on the multi-scale feature map of the image obtained in S402 to obtain the multi-scale feature map.
  • the texture feature of the feature map may also refer to: performing feature extraction on the multi-scale feature map of the image obtained in S402 to obtain the multi-scale feature map.
  • the texture feature of the feature map may also refer to: performing feature extraction on the multi-scale feature map of the image obtained in S402 to obtain the multi-scale feature map.
  • the texture feature of the feature map may also refer to: performing feature extraction on the multi-scale feature map of the image obtained in S402 to obtain
  • the dependency relationship between the features of the texture primitives in the image and at least one set of texture features of the image After obtaining the dependency relationship between the features of the texture primitives in the image and at least one set of texture features of the image, the dependency relationship between the features of the texture primitives in the image and the image At least one set of texture features are aggregated to obtain a texture representation result of the image. For example: S408 can be executed.
  • S408 Obtain a texture representation result of the image according to the dependency relationship and at least one set of texture features.
  • obtaining the texture representation result of the image according to the dependency relationship and at least one set of texture features may refer to: adding the fourth matrix and the fifth matrix to realize the comparison between the features of the texture primitives in the image.
  • the aggregation of the dependence relationship of and the texture feature of the image, the sum of the obtained fourth matrix and the fifth matrix is the result of the texture representation of the image.
  • the image can be processed.
  • S409 can be executed.
  • S409 Process the image according to the texture representation result of the image.
  • processing the image may refer to: recognizing the image, segmenting the image, or performing image synthesis based on the image, or the like.
  • the image processing method provided in the embodiments of the present application can be applied to any scene where image processing needs to be performed according to the texture representation result of the image.
  • the embodiment of the present application can obtain the texture representation result of the image according to the dependency between the features of the texture primitives in the image and at least one set of texture features of the image, so that the texture representation result of the image can be Contains the texture features of the image, and can also include the dependencies between the features of different texture primitives in the image, so that the texture representation result of the image can reflect the texture information of the image more perfect, and then improve the subsequent texture according to the image Shows the result, the image processing effect when performing image processing such as image recognition, image segmentation, or image synthesis. For example, it can effectively improve the accuracy of image recognition based on the result of image texture representation.
  • the neural network when the neural network extracts the features of each texture primitive in the multi-scale feature map of the image along the direction corresponding to the directional map, at least one directional map is used as the spatial context guidance condition, which can effectively improve the extraction of spatial context clues.
  • the neural network before determining the dependence relationship between the features of each texture primitive in the image according to the multiple sets of dependence relationships, performs a first function on each of the multiple sets of dependence relationships.
  • the two-way relationship value between the features of any two texture primitives can be updated to strengthen each set of dependencies and establish a relationship between the two-way relationship between the features of any two texture primitives, so that the neural network can learn texture more easily The spatial structure dependence between primitives.
  • the original image before processing the original image according to the aforementioned procedures of S401-S409, the original image may be preprocessed first to obtain a preprocessed image. Then, the pre-processed image can be processed in accordance with the aforementioned procedures of S401-S409. That is, the image processing method may further include a step of preprocessing the image.
  • FIG. 5 shows another schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the image processing method may further include S501-S503.
  • S501 Adjust the size of the original image to the first size by adopting bilateral linear interpolation.
  • the first size may be 512*512, and the size of the original image may be adjusted to 512*512 through bilateral linear interpolation.
  • the specific size of the first size can also be other values, such as 256*256, which is not limited in this application.
  • S502 From the original image whose size is the first size, crop an image block whose size is the second size as an image to be processed.
  • the second size can be 224*224.
  • the original image with a size of 512*512 can be cropped to obtain an image block with a size of 224*224 as the subsequent image to be processed.
  • the cropping method can be random cropping, or cropping based on the center position of the original image with a size of 512*512 as the center, which is not limited in this application.
  • z-score standardization can be performed on the image block obtained in S502, so as to realize the centralization of the feature data of each texture primitive in the image block.
  • the generalization ability of image processing can be increased.
  • multiple image blocks of the second size may be cropped from the original image of the first size.
  • it can be processed according to the process described in S401-S409.
  • multiple multi-scale feature maps corresponding to the image block can be obtained to form a multi-scale feature pool.
  • the operations described in S403-S405 can be followed to extract the dependency relationships between the features of the texture primitives included in the multi-scale feature map.
  • the dependence relationship between the features of the texture primitives contained in the multiple multi-scale feature maps that is, the dependence relationship between the features of the texture primitives in the image block can be constituted.
  • S409 may specifically refer to: determining the predicted classification label of the image through a neural network according to the texture representation result of the image.
  • the predicted classification label of the image is the recognition result of the image.
  • the output predicted classification label may be any one or more of user M's occupation, gender, name, and so on.
  • the specific type of the predicted classification label is related to the actual classification label in the training sample during training. That is, it is related to the specific recognition function of the neural network used for image recognition.
  • FIG. 6 shows a schematic diagram of the composition of a neural network provided by an embodiment of the present application.
  • the neural network provided by the embodiment of the present application may include: an input layer, a feature extraction layer, a texture coding layer, and a fully connected Layer and output layer.
  • the input layer can be used to input the original image or the image after the original image is preprocessed.
  • the feature extraction layer may include a Resnet50 network and a zooming and splicing module.
  • the Resnet50 network can perform feature extraction on an image and output feature maps of multiple scales of the image (such as implementing the aforementioned S401 function).
  • the zooming and splicing module can zoom the feature maps of multiple scales of the image to the same size and then splice them to obtain the multi-scale feature maps of the image (for example, to realize the function of S402).
  • the texture coding layer may include: a structure revealing module, a first pooling module, a convolutional network, a second pooling module, and a feature aggregation module.
  • the structure revealing module can use the directional map as the spatial context guide condition, and obtain the dependency relationship between the features of each texture primitive in the image according to the multi-scale feature map output by the feature extraction layer (such as realizing the aforementioned functions of S403-S405).
  • the first pooling module can pool the output results of the structure revealing module.
  • the convolutional network can extract the global texture feature of the image according to the multi-scale feature map output by the feature extraction layer, or the feature map of at least one scale (such as implementing the function of S406 described above).
  • the second pooling module can pool the global texture features of the image output by the convolutional network (such as implementing the function of S407 mentioned above).
  • the feature aggregation module can aggregate the dependency relationship between the texture feature of the image and the feature of each texture primitive in the image to obtain the texture representation result of the image (such as realizing the function of S408 mentioned above).
  • the fully connected layer can recognize the image according to the texture representation result of the image output by the texture coding layer, and output the predicted classification label of the image.
  • the predicted classification label is the recognition result of the image (similar to the fully connected layer shown in Figure 2 above, I won't repeat it here).
  • the image processing method described in the embodiments of the present application can be implemented by program codes in a memory, and can be applied or inferred on high-performance computing devices such as CPUs and GPUs.
  • high-performance computing devices such as CPUs and GPUs.
  • the training process of the neural network can be as follows: First, the architecture of the neural network as shown in Fig. 6 can be constructed, and the weight parameters of the entire neural network can be initialized. Then, you can use the current network weights to perform forward inference calculations on GPUs, CPUs and other devices, and use the results of the forward inference calculations and the true values to calculate the error value. Through the error value, it can be judged whether the neural network has reached the convergence requirement. If the error value does not meet the convergence requirement, all trainable weights in the neural network are updated by back propagation according to the error value. Afterwards, the aforementioned steps can be repeated until the error value reaches convergence. When the error value reaches convergence, all parameters in the neural network can be solidified and no longer updated, and stored.
  • the reasoning process of the neural network can be: storing the neural network trained in 1) on other computing devices such as GPU or CPU. Then, the image to be recognized can be input to the neural network, and the current network weight can be used for forward inference calculation, and the output of the neural network is the recognition result of the image.
  • the texture coding layer shown in FIG. 6 can also be embedded in other neural networks to implement corresponding functions. No matter what kind of neural network the texture coding layer is applied to, it can have better robustness. I will not give an example one by one here.
  • the neural network or image processing device may include corresponding hardware structures and/or software modules for performing various functions.
  • the embodiment of the present application may also provide an image processing device.
  • Fig. 7 shows a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • the image processing apparatus may include: a texture representation module 701, which may be used to obtain the dependency relationship between the features of the texture primitives in the image according to the orientation information and the multi-scale feature map of the image.
  • the feature map includes the features of multiple texture primitives of the image at multiple different scales, and the direction information includes one or more directions; at least one set of texture features of the image is obtained according to the feature map of at least one scale of the image, wherein, according to one
  • the scale feature map obtains a set of the texture features of the image; the texture representation result of the image is obtained according to the dependency relationship and at least one set of texture features.
  • the processing module 702 may be used to process the image according to the texture representation result of the image.
  • the direction information may include a first direction and a second direction opposite to the first direction.
  • the texture representation module can be specifically used to extract the features of each texture primitive in a feature map of at least one scale of the image to obtain the features of multiple texture primitives; pool the features of multiple texture primitives to obtain At least one set of texture features.
  • the texture representation module is specifically used to extract the features of each texture primitive in the multi-scale feature map of the image according to the direction information to obtain the characteristics of the texture primitives in multiple regions of the image; according to the texture primitives in the multiple regions To obtain the dependency relationship between the features of each texture primitive in each region, and obtain multiple sets of dependency relationships corresponding to multiple regions; according to the multiple sets of dependency relationships, determine the relationship between the features of each texture primitive in the image Dependency.
  • the texture representation module is specifically configured to extract the features of each texture primitive in the multi-scale feature map of the image along one or more directions.
  • the texture representation module can also be used to update the bidirectional relationship value between the features of any two texture primitives in each of the multiple sets of dependence relationships according to the first function.
  • modules or units in the above device is only a division of logical functions, and may be fully or partially integrated into one physical entity in actual implementation, or may be physically separated.
  • the modules in the device can be all implemented in the form of software called by processing elements; they can also be all implemented in the form of hardware; part of the units can also be implemented in the form of software called by the processing elements, and some of the units can be implemented in the form of hardware.
  • each unit can be a separately set up processing element, or it can be integrated in a certain chip of the device for implementation.
  • it can also be stored in the memory in the form of a program, which is called and executed by a certain processing element of the device.
  • All or part of these units can be integrated together or implemented independently.
  • the processing element described here may also be called a processor, and may be an integrated circuit with signal processing capability.
  • each step of the above method or each of the above units may be implemented by an integrated logic circuit of hardware in a processor element or implemented in a form of being called by software through a processing element.
  • the unit in any of the above devices may be one or more integrated circuits configured to implement the above method, for example: one or more application specific integrated circuits (ASIC), or, one or Multiple microprocessors (digital singnal processors, DSP), or, one or more field programmable gate arrays (FPGA), or a combination of at least two of these integrated circuits.
  • ASIC application specific integrated circuits
  • DSP digital singnal processors
  • FPGA field programmable gate arrays
  • the processing element can be a general-purpose processor, such as a central processing unit (CPU) or other processors that can call programs.
  • CPU central processing unit
  • these units can be integrated together and implemented in the form of a system-on-a-chip (SOC).
  • an embodiment of the present application may also provide an image processing device, which may include: an interface circuit for receiving data of an image to be processed; a processor, connected to the interface circuit and used for executing each step in the above method.
  • the processor may include one or more.
  • the modules that respectively implement each corresponding step in the above method can be implemented in the form of a processing element scheduling program.
  • the image processing device may include a processing element and a storage element, and the processing element calls a program stored in the storage element to execute the method described in the above method embodiment.
  • the storage element may be a storage element on the same chip as the processing element, that is, an on-chip storage element.
  • the program used to implement the above method may be in a storage element on a different chip from the processing element, that is, an off-chip storage element.
  • the processing element calls or loads a program from the off-chip storage element to the on-chip storage element to call and execute the method described in the above method embodiment.
  • an embodiment of the present application may also provide an image processing device, which may include a processor, which is configured to connect to a memory and call a program stored in the memory to execute the method described in the foregoing method embodiment.
  • the memory may be located in the image processing device or outside the image processing device.
  • the processor includes one or more.
  • the module used to implement each step in the above method may be configured as one or more processing elements, and these processing elements may be provided on the terminal, where the processing element may be an integrated circuit, for example: one Or multiple ASICs, or, one or more DSPs, or, one or more FPGAs, or a combination of these types of integrated circuits. These integrated circuits can be integrated together to form a chip.
  • the modules used to implement each step in the above method can be integrated together and implemented in the form of an SOC, and the SOC chip is used to implement the corresponding method.
  • At least one processing element and a storage element may be integrated in the chip, and the corresponding method may be implemented by the processing element calling the stored program of the storage element; or, at least one integrated circuit may be integrated in the chip to implement the corresponding method; or It can be combined with the above implementations.
  • the functions of some units are implemented in the form of calling programs by processing elements, and the functions of some units are implemented in the form of integrated circuits.
  • the processing element here is the same as the above description, and it can be a general-purpose processor, such as a CPU, or one or more integrated circuits configured to implement the above method, such as: one or more ASICs, or, one or more micro-processing DSP, or, one or more FPGAs, etc., or a combination of at least two of these integrated circuit forms.
  • a general-purpose processor such as a CPU
  • integrated circuits configured to implement the above method, such as: one or more ASICs, or, one or more micro-processing DSP, or, one or more FPGAs, etc., or a combination of at least two of these integrated circuit forms.
  • the storage element can be a memory or a collective term for multiple storage elements.
  • the disclosed device and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be It can be combined or integrated into another device, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate parts may or may not be physically separate.
  • the parts displayed as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the software product is stored in a program product, such as a computer-readable storage medium, and includes several instructions to make a device (may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all of the methods described in the various embodiments of this application Or part of the steps.
  • the aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.
  • an embodiment of the present application may also provide a computer-readable storage medium, including: computer software instructions; when the computer software instructions run in an image processing device or a chip built in the image processing device, the image processing device can execute such as The method described in the foregoing method embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种图像处理方法、装置及存储介质,涉及图像处理领域,可以基于方向信息和图像的多尺度特征图,获取图像中各纹理基元的特征之间的依赖关系,以及根据图像的至少一个尺度的特征图获得图像的至少一组纹理特征,并根据依赖关系、及前述至少一组纹理特征获得图像的纹理表示结果。然后,可以根据图像的纹理表示结果对图像进行处理。由于图像的纹理表示结果能够反映出的图像的纹理信息更加完善,所以,根据图像的纹理表示结果,进行图像识别、图像分割、或图像合成等图像处理时,图像处理的效果会更好。

Description

图像处理方法、装置及存储介质 技术领域
本申请实施例涉及图像处理领域,尤其涉及一种图像处理方法、装置及存储介质。
背景技术
纹理表示是计算机视觉的一个重要研究领域,在图像识别、图像分割和图像合成等图像处理领域有着广泛的应用前景。例如,在图像识别中,可以先对待识别图像进行纹理表示,然后,可以根据待识别图像的纹理表示结果,对待识别图像进行识别。如:可以识别待识别图像中的人物、建筑等。
而现有技术中,根据图像的纹理表示结果,对图像进行处理时,效果欠佳。例如,对待识别图像进行识别时,识别准确率较低。
发明内容
本申请实施例提供一种图像处理方法、装置及存储介质,可以提高对图像的处理效果,如:可以提高图像识别的准确率。
第一方面,本申请实施例提供一种图像处理方法,该方法可以通过神经网络来实现。该方法包括:根据方向信息和图像的多尺度特征图获取图像中各纹理基元的特征之间的依赖关系,其中,多尺度特征图包括所述图像的多个纹理基元在多个不同尺度下的特征,方向信息包括一个或多个方向。根据图像的至少一个尺度的特征图获得图像的至少一组纹理特征;其中,根据一个尺度的特征图获得图像的一组纹理特征。根据依赖关系、及至少一组纹理特征获得图像的纹理表示结果。根据图像的纹理表示结果对图像进行处理。
该图像处理方法中,由于图像的纹理表示结果中既可以包含该图像的纹理特征,还可以包含该图像中不同纹理基元的特征之间的依赖关系,图像的纹理表示结果能够反映出的图像的纹理信息更加完善。所以,后续根据图像的纹理表示结果,进行图像识别、图像分割、或图像合成等图像处理时,图像处理的效果会更好。例如,可以有效提高图像识别的准确率。
在一种可能的设计中,方向信息包括第一方向以及与所述第一方向相反的第二方向。
例如,方向信息可以包括多组方向,每组方向可以包括两个相反的第一方向和第二方向。也即,方向信息中包括的方向可以为成对出现的偶数个方向。
本设计中,当方向信息中包括成对出现的偶数个方向时,可以更加充分地获取到图像中各纹理基元的特征之间的依赖关系。
在一种可能的设计中,上述根据图像的至少一个尺度的特征图获得图像的至少一组纹理特征,包括:提取图像的至少一个尺度的特征图中各纹理基元的特征,获得多个纹理基元的特征;对多个纹理基元的特征进行池化,得到至少一组纹理特征。
本设计通过对图像的至少一个尺度的特征图中各纹理基元的特征进行提取,可以获取到图像的空间有序的纹理特征。
在一种可能的设计中,上述根据方向信息和图像的多尺度特征图获取图像中各纹理基元的特征之间的依赖关系,包括:根据方向信息提取图像的多尺度特征图中各纹理基元的特征,得到图像的多个区域的纹理基元的特征;根据多个区域的纹理基元的特征,获取每个区域中各纹理基元的特征之间的依赖关系,得到与多个区域分别对应的多组依赖关系;根据多组依赖关系,确定图像中各纹理基元的特征之间的依赖关系。
例如,可以根据方向信息,提取图像的多尺度特征图中各纹理基元的特征,得到多个第一矩阵;其中,一个第一矩阵包含有图像的一个区域的纹理基元的特征。然后,可以根据每个第一矩阵确定对应的第二矩阵,得到多个第二矩阵,第二矩阵可以包含相应第一矩阵对应的图像区域中各纹理基元的特征之间的依赖关系,从而可以得到与多个区域分别对应的多组依赖关系。将前述多组依赖关系进行聚合,即可得到图像中各纹理基元的特征之间的依赖关系。
在一种可能的设计中,在上述根据多组依赖关系,确定图像中各纹理基元的特征之间的依赖关系之前,该方法还包括:按照第一函数,对多组依赖关系中的每组依赖关系中任意两个纹理基元的特征之间的双向关系值进行更新。
本设计中,通过对每组依赖关系中任意两个纹理基元的特征之间的双向关系值进行更新,可以强化每组依赖关系,在每组依赖关系中任意两个纹理基元的特征之间建立关联,从而使得神经网络更容易学习到纹理基元之间的空间结构依赖性。
在一种可能的设计中,上述根据方向信息提取图像的多尺度特征图中各纹理基元的特征,包括:沿一个或多个方向提取图像的多尺度特征图中各纹理基元的特征。
例如,可以采用一个方向图或多个方向图作为空间上下文引导条件,沿方向图对应的方向提取图像的多尺度特征图中各纹理基元的特征,从而能够有效提升提取空间上下文线索的能力,进而更好的感知纹理基元的特征,以尽可能多的提取出多尺度特征图中潜在的纹理基元的特征,获取更加全面的图像中各纹理基元的特征之间的依赖关系。
可选地,在上述根据方向信息和图像的多尺度特征图获取图像中各纹理基元的特征之间的依赖关系之前,该图像处理方法还包括:提取图像的多个尺度的特征图。将图像的多个尺度的特征图缩放至相同尺度大小后进行拼接,得到图像的多尺度特征图。
在一种可能的设计中,在上述提取图像的多个尺度的特征图之前,该图像处理方法还可以包括:采用双边线性插值将原始图像的尺寸调整至第一尺寸。
在一种可能的设计中,在上述采用双边线性插值将原始图像的尺寸调整至第一尺寸之后,该图像处理方法还可以包括:从大小为第一尺寸的原始图像中,裁剪出大小为第二尺寸的图像块作为待处理的图像。
在一种可能的设计中,在上述提取图像的多个尺度的特征图之前,该图像处理方法还可以包括:对图像进行标准化处理。
通过对图像进行标准化处理,可以实现图像中各纹理基元的特征数据中心化,能够增加图像处理的泛化能力。
在一种可能的设计中,上述对图像进行处理,包括:对图像进行识别、对图像进行分割、以及根据图像进行图像合成中的任意一种。
第二方面,本申请实施例提供一种图像处理装置,可以通过神经网络来实现。该装置具有实现上述第一方面所述方法的功能。功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块,例如,纹理表示模块、处理模块等。
其中,纹理表示模块,可以用于根据方向信息和图像的多尺度特征图获取图像中各纹理基元的特征之间的依赖关系,其中,多尺度特征图包括图像的多个纹理基元在多个不同尺度下的特征,方向信息包括一个或多个方向;根据图像的至少一个尺度的特征图获得图像的至少一组纹理特征,其中,根据一个尺度的特征图获得图像的一组所述纹理特征;根据依赖关系、及至少一组纹理特征获得图像的纹理表示结果。处理模块,可以 用于根据图像的纹理表示结果对图像进行处理。
第三方面,本申请实施例提供一种图像处理装置,包括:接口电路,用于接收待处理的图像的数据;处理器,连接接口电路并用于执行如第一方面或第一方面的可能的设计中任一所述的方法。
第四方面,本申请实施例还提供一种图像处理装置,包括:处理器,处理器用于与存储器相连,调用存储器中存储的程序,以执行如第一方面或第一方面的可能的设计中任一所述的方法。
第五方面,本申请实施例还提供一种计算机可读存储介质,包括:计算机软件指令;当计算机软件指令在图像处理装置或内置在图像处理装置的芯片中运行时,使得图像处理装置执行如第一方面或第一方面的可能的设计中任一所述的方法。
第六方面,本申请实施例还提供一种计算机程序产品,该计算机程序产品被执行时可以实现如第一方面或第一方面的可能的设计中任一所述的方法。
第七方面,本申请实施例还提供一种芯片系统,该芯片系统应用于图像处理设备;芯片系统包括一个或多个接口电路和一个或多个处理器;接口电路和处理器通过线路互联;处理器通过接口电路从电子设备的存储器接收并执行计算机指令,以实现如第一方面或第一方面的可能的设计中任一所述的方法。
可以理解地,上述提供的第二方面至第七方面所能达到的有益效果,可参考第一方面及其任一种可能的设计方式中的有益效果,此处不再赘述。
附图说明
图1示出了一种华夫饼图像的示意图;
图2示出了一种现有的图像识别网络的示意图;
图3示出了本申请实施例提供的一种图像处理设备的组成示意图;
图4示出了本申请实施例提供的图像处理方法的流程示意图;
图5示出了本申请实施例提供的图像处理方法的另一流程示意图;
图6示出了本申请实施例提供的一种神经网络的组成示意图;
图7示出了本申请实施例提供的图像处理装置的结构示意图。
具体实施方式
图像纹理是一种重要的视觉手段,是图像中普遍存在的特征。对于一幅图像而言,图像纹理通常由多个纹理基元组成,多个纹理基元可能是同一种类型,也可能是不同类型。例如,图1示出了一种华夫饼图像的示意图。如图1所示,对于华夫饼图像而言,纹理基元可以是华夫饼图像中的四边形格子(图1中所示的基元1)、或华夫饼图像中的叉形纹理(图1中所示的基元2)。也即,华夫饼图像的纹理基元可以包含有基元1和基元2两种类型。
当然,可以理解的是,前述关于华夫饼图像的纹理基元的描述仅仅为示例性说明。在其他图像中,纹理基元也可以是其他划分方式,本申请在此不作限制。
通过提取图像的纹理基元的特征,可以实现对图像纹理的表示。而根据图像纹理的表示结果,则可以进行图像识别、图像分割、图像合成等图像处理操作。因此,图像纹理表示在人像检测、医学图像分析、工业视觉检测、图像分类与检索等领域有着广泛的应用。
例如,在图像识别中,可以根据待识别图像的纹理表示结果,识别待识别图像中存在的人物、建筑、动物等。在图像分割中,可以根据待分割图像的纹理表示结果,将待 分割图像分割为若干个特定的、具有独特性质的区域。在图像合成中,可以根据多幅不同图像的纹理表示结果,将多幅不同图像合成为一幅图像。如:可以将某个背景为沙漠的图像中的人物嵌入到背景为海滩的图像中。
下面以图像识别为例,结合图2对现有的图像纹理表示过程进行说明。
图2示出了一种现有的图像识别网络的示意图,如图2所示,现有的图像识别网络可以包括:输入层、特征提取层、纹理编码层、全连接层和输出层。纹理编码层中预设有一个包含多个码字的字典基,且还包括残差编码模块、权重分配模块和特征聚合模块。
通过输入层可以向图像识别网络中输入待识别的图像。特征提取层可以对输入层输入的图像进行特征提取,得到图像中各纹理基元的特征。在纹理编码层中,残差编码模块可以根据特征提取层提取到的图像的各纹理基元的特征、以及字典基中的码字,计算得到图像的各纹理基元的特征对应的残差。权重分配模块可以根据特征提取层提取到的图像的各纹理基元的特征、以及字典基中的码字,计算得到图像的各纹理基元的特征对应的权重。特征聚合模块可以将残差编码模块得到的残差、以及权重分配模块得到的权重进行聚合,得到图像的纹理表示结果。全连接层可以根据纹理编码层得到的图像的纹理表示结果,对图像进行识别,如:可以根据图像的纹理表示结果,进行人像识别、材质检测、物品分类等。
但是,上述图2所示图像识别网络中,纹理编码层仅仅是将图像中各纹理基元的特征无序化的聚集,得到了图像的纹理表示结果,所得到的图像的纹理表示结果能够反映出的图像的纹理信息有限。所以,后续根据图像的纹理表示结果,进行图像识别、图像分割、或图像合成等图像处理时,图像处理的效果欠佳。例如,图像识别的准确率会较低。
本申请实施例提供了一种图像处理方法,可以通过神经网络来实现。该方法可以根据图像的至少一个尺度的特征图获得图像的至少一组纹理特征,以及根据方向信息和图像的多尺度特征图获取图像中各纹理基元的特征之间的依赖关系,并根据依赖关系、及前述至少一组纹理特征获得图像的纹理表示结果。然后,可以根据图像的纹理表示结果对图像进行处理。
其中,对图像进行处理可以是对图像进行识别、对图像进行分割、以及根据图像进行图像合成中的任意一种,在此不作限制。
本申请实施例提供的该图像处理方法中,根据图像中各纹理基元的特征之间的依赖关系、及图像的至少一组纹理特征获得图像的纹理表示结果,可以使得图像的纹理表示结果中既可以包含该图像的纹理特征,还可以包含该图像中不同纹理基元的特征之间的依赖关系,图像的纹理表示结果能够反映出的图像的纹理信息更加完善。所以,后续根据图像的纹理表示结果,进行图像识别、图像分割、或图像合成等图像处理时,图像处理的效果会更好。例如,可以有效提高图像识别的准确率。
以下结合附图对本申请实施例提供的图像处理方法进行示例性说明。
需要说明的是,在本申请的描述中,“至少一个”是指一个或多个,“多个”是指两个或两个以上。“第一”、“第二”等字样仅仅是为了区分描述,并不用于对某个特征的特别限定。“和/或”用于描述关联对象的关联关系,表示可以存在三种关系。例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
在示例性实施例中,本申请实施例提供一种可以用于执行该图像处理方法的图像处 理设备,该图像处理设备可以是台式计算机、服务器、电视、显示器、手机、平板电脑、扫描仪等电子设备,本申请对图像处理设备的具体类型不作限制。
图3示出了本申请实施例提供的一种图像处理设备的组成示意图。
如图3所示,本申请实施例的图像处理设备可以包括:处理器310,外部存储器接口320,内部存储器321,通用串行总线(universal serial bus,USB)接口330。
处理器310可以包括一个或多个处理单元,例如:处理器310可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(Neural-network Processing Unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
上述控制器可以是指挥图像处理设备的各个部件按照指令协调工作的决策者。其是图像处理设备的神经中枢和指挥中心。上述控制器根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现图像处理设备的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
处理器310中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器310中的存储器为高速缓冲存储器,可以保存处理器310刚用过或循环使用的指令或数据。如果处理器310需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器310的等待时间,因而提高了效率。
在一些实施例中,处理器310可以包括接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,SIM卡接口,和/或USB接口等。
外部存储器接口320可以用于连接外部存储卡,例如Micro SD卡,实现扩展图像处理设备的存储能力。外部存储卡通过外部存储器接口320与处理器310通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器321可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器310通过运行存储在内部存储器321的指令,从而执行图像处理设备的各种功能应用以及数据处理。例如,可以执行本申请实施例提供的图像处理方法。内部存储器321可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储图像处理设备使用过程中所创建的数据。此外,内部存储器321可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,其他易失性固态存储器件,通用闪存存储器(universal flash storage,UFS)等。
USB接口330可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口330可以用于连接充电器为图像处理设备充电,也可以用于图像处理设备与外围设 备之间传输数据。如:可以向处理器310中传输待处理的图像。
本发明实施例示意的结构并不构成对图像处理设备的限定。图像处理设备也可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
例如,一些实施例中,图像处理设备还可以包括充电管理模块,电源管理模块,电池,天线,射频模块,通信模块,音频模块,扬声器,受话器,麦克风,耳机接口,传感器,按键,指示器,摄像头,显示屏,以及用户标识模块(subscriber identity module,SIM)卡接口等,不再一一赘述。
图4示出了本申请实施例提供的图像处理方法的流程示意图。如图4所示,该图像处理方法可以包括S401-S409。
S401、提取图像的多个尺度的特征图。
可选地,图像可以是通过手机、摄像机等拍摄设备拍摄到的图像,也可以是通过扫描仪扫描得到的图像,还可以是一些视频中的某一帧图像或者某个画面的截图。
在实现该图像处理方法的神经网络中,用于提取图像的多个尺度的特征图的部分可以是深度残差网络(Deep residual network,Resnet)、VGG网络、Alexnet网络、GoogLeNet网络等,本申请对此不作限制。
以Resnet为例,Resnet可以对输入的图像进行特征提取,并输出图像在多个尺度下的特征图,每个尺度下的特征图包含了图像的多个纹理基元在该尺度下的特征。
S402、将图像的多个尺度的特征图缩放至相同尺度大小后进行拼接,得到图像的多尺度特征图。
其中,多尺度特征图包括了图像的多个纹理基元在多个不同尺度下的特征。
举例说明:以前述Resnet为Resnet50为例,假设Resnet50中的第三个残差模块(Res3)、第四个残差模块(Res4)、以及第五个残差模块(Res5)的输出分别依次为特征图3、特征图4和特征图5。特征图3、特征图4和特征图5则为该图像在三个不同尺度下的特征图。可以以特征图3的大小为基准,将特征图4和特征图5对应的矩阵缩放至与特征图3尺度大小相同、但通道数不同。例如,缩放方式可以为插值缩放。在完成缩放后,可以将特征图3、特征图4和特征图5沿着通道维度进行拼接,得到该图像的多尺度特征图。
但需要说明的是,本申请其他实施方式中,当采用了其他如:VGG网络、Alexnet网络、GoogLeNet网络等提取图像的多个尺度的特征图时,也可能是将这些网络中不同网络层输出的特征图进行拼接,以获取多尺度特征图。其基本原理与前述举例说明类似,在此不再赘述。
一方面,在得到图像的多尺度特征图后,可以根据方向信息和图像的多尺度特征图获取图像中各纹理基元的特征之间的依赖关系。
其中,方向信息可以包括一个或多个方向。
例如,在一种可能的设计中,方向信息可以包括预设的至少一个方向图,每个方向图可以用于指示一个方向,不同方向图对应的方向不同。以方向信息包括8个方向图为例,8个方向图可以依次指示:上、下、左、右、左上、左下、右上、右下等8个方向。方向图对应的矩阵可以在方向图对应的方向上数值渐变,以能够指示方向。
对应方向为上的方向图的矩阵可以如下:
Figure PCTCN2021099560-appb-000001
对应方向为下的方向图的矩阵可以如下:
Figure PCTCN2021099560-appb-000002
对应方向为左上的方向图的矩阵可以如下:
Figure PCTCN2021099560-appb-000003
对应方向为右下的方向图的矩阵可以如下:
Figure PCTCN2021099560-appb-000004
一些实施例中,方向信息可以包括第一方向以及与第一方向相反的第二方向。
例如,方向信息中包括的方向图的数量可以为偶数个,且对于方向信息中的任意一个第一方向图,方向信息中还包括与第一方向图对应的方向相反的第二方向图。也即,方向信息中的方向图都是成对存在的。例如,若方向信息包括2个方向图,则2方向图对应的方向可以是左和右、上和下、左上和右下等中的某一对。类似地,若存在4个方向图,则可以是前述一对或多对。当方向信息中包括的方向图为成对出现的偶数个时,可以充分地获取到图像中各纹理基元的特征之间的依赖关系。
当然,可以理解的,在实际实施时,方向信息也可以包括更多(如:16、32)个用于指示不同方向的方向图,或更少(如:1、2)个用于指示不同方向的方向图。或者,在一些实施例中,方向信息也可以采用如:相对坐标、绝对坐标等其他方式实现,本申请并不作限制。
根据方向信息和图像的多尺度特征图获取图像中各纹理基元的特征之间的依赖关系的具体步骤,可以参考S403-S405。
S403、根据方向信息提取图像的多尺度特征图中各纹理基元的特征,得到图像的多个区域的纹理基元的特征。
可选地,可以沿方向信息中包括的一个或多个方向提取图像的多尺度特征图中各纹理基元的特征。
以前述方向图为例对S403进行举例说明。
神经网络可以根据一个或多个方向图,沿方向图对应的方向提取图像的多尺度特征图中各纹理基元的特征,得到多个第一矩阵。
其中,每个第一矩阵包含有图像的多尺度特征图中局部区域的纹理基元的特征。根据每个方向图得到的多个第一矩阵中,不同的第一矩阵对应图像的多尺度特征图的不同局部区域。
可选地,神经网络中用于提取图像的多尺度特征图中各纹理基元的特征的部分,可以为一个卷积网络。通过该卷积网络可以根据方向图,对多尺度特征图进行多次卷积操作,得到多个第一矩阵。
通过该卷积网络可以根据方向图,对多尺度特征图进行多次卷积操作时,可以先采用线性函数或非线性函数将方向图的值映射至一个固定的数值范围内,如:[-1,1],并 采用一个卷积网络将方向图映射到与多尺度特征图相同的特征空间。通过这种方式对方向图和多尺度特征图进行归一化,可以缩小方向图和多尺度特征图在数值上的差异,从而使得神经网络更容易收敛,也能够更准确地捕捉纹理基元的特征。
举例说明:假设多尺度特征图的矩阵为一个9*9(9行9列)的矩阵,卷积网络的卷积核为一个3*3的矩阵,则方向图的矩阵也应当为一个3*3的矩阵(可参考前述示例说明)。该卷积网络可以沿着方向图对应的方向,通过卷积核对应的3*3的矩阵,对多尺度特征图对应的9*9的矩阵进行卷积,每次卷积可以提取出一个3*3的矩阵,则为上述第一矩阵。
以方向图对应方向为右下为例,则卷积网络第一次卷积得到的第一矩阵包含了多尺度特征图对应的9*9的矩阵中,同时属于第一行至第三行、第一列至第三列的局部的各纹理基元的特征。类似地,卷积网络第二次卷积得到的第一矩阵则包含了多尺度特征图对应的9*9的矩阵中,同时属于第四行至第六行、第四列至第六列的局部的各纹理基元的特征。
S404、根据多个区域的纹理基元的特征,获取每个区域中各纹理基元的特征之间的依赖关系,得到与多个区域分别对应的多组依赖关系。
如S403中所述,神经网络可以根据一个或多个方向图,沿方向图对应的方向提取图像的多尺度特征图中各纹理基元的特征,得到多个第一矩阵,从而得到图像的多个区域的纹理基元的特征。进一步,神经网络还可以根据多个第一矩阵,获取每个区域中各纹理基元的特征之间的依赖关系,得到与多个区域分别对应的多组依赖关系。例如,可以确定每个第一矩阵对应的第二矩阵,得到多个第二矩阵,第二矩阵可以包含有对应第一矩阵所包含的局部区域的各纹理基元的特征之间的依赖关系。
以第一矩阵为A矩阵为例,假设A矩阵的大小为(k_w,k_h,c),其中k_w表示A矩阵的行数,k_h表示A矩阵的列数、c表示A矩阵的通道维度。可以将A矩阵进行两次不同的非线性变换,得到A矩阵对应的两个非线性变换后的矩阵(如:可以通过两个非线性函数进行变换),这里称之为B1矩阵和B2矩阵。B1矩阵和B2矩阵的大小均为(k_w,k_h,c)。
在得到B1矩阵和B2矩阵后,可以先将B1矩阵整形并转置成大小为(k_w*k_h,1,c)的矩阵,将B2矩阵整形成大小为(1,k_w*k_h,c)的矩阵。B1矩阵整形并转置得到的矩阵可以称之为B1'矩阵,B2矩阵整形得到的矩阵可以称之为B2'矩阵。
然后,可以将B1'矩阵和B2'矩阵进行相乘,得到一个C矩阵,该C矩阵即为前述包含有对应第一矩阵(A矩阵)所包含的局部的各纹理基元的特征之间的依赖关系的第二矩阵。C矩阵的大小为(k_w*k_h,k_w*k_h)。
通过将A矩阵通过两个非线性函数分别映射得到B1矩阵和B2矩阵,可以极化A矩阵中纹理基元的特征的不同特性,从而使得之后建立的纹理基元的特征之间的依赖关系更加可靠。
但需要说明的是,上述A矩阵至C矩阵的过程,仅仅为根据第一矩阵确定第二矩阵的过程的示例性说明。例如,其他实施方式中,也可以直接将A矩阵与A矩阵自身进行相乘,得到C矩阵,本申请在此不作限制。
S405、根据多组依赖关系,确定图像中各纹理基元的特征之间的依赖关系。
如上所述,S404中得到与多个区域分别对应的多组依赖关系后,可以通过S405将多组依赖关系聚合在一起,作为图像中各纹理基元的特征之间的依赖关系。
以前述第一矩阵和第二矩阵为例:神经网络可以根据每个第一矩阵、及第一矩阵对应的第二矩阵,确定每个第一矩阵所包含的局部区域的纹理基元的特征对应的特征向量,得到多个特征向量,并将多个特征向量聚合到一起,作为图像中各纹理基元的特征之间的依赖关系。其中,特征向量用于指示第一矩阵所包含的局部区域的各纹理基元的特征之间的依赖关系。
可选地,在根据多组依赖关系,确定图像中各纹理基元的特征之间的依赖关系之前,神经网络还可以按照第一函数,对多组依赖关系中的每组依赖关系中任意两个纹理基元的特征之间的双向关系值进行更新。
也即,神经网络可以在根据每个第一矩阵、及第一矩阵对应的第二矩阵,确定每个第一矩阵所包含的局部区域的纹理基元的特征对应的特征向量之前,对第二矩阵包含的依赖关系进行更新。
同样以第一矩阵为前述A矩阵,第二矩阵为前述C矩阵进行举例说明。
在得到C矩阵后,可以基于第一函数,通过神经网络对C矩阵进行双向协同操作,得到D矩阵。D矩阵的大小与C矩阵相同。
一些实施例中,对C矩阵进行双向协同操作的双向协同策略(也即第一函数)可以如下。
Figure PCTCN2021099560-appb-000005
Figure PCTCN2021099560-appb-000006
其中,r ij和r ji表示C矩阵中纹理基元i和纹理基元j之间的双向关系值;
Figure PCTCN2021099560-appb-000007
Figure PCTCN2021099560-appb-000008
表示D矩阵中(对C矩阵进行双向协同操作后)纹理基元i和纹理基元j之间的双向关系值。
对C矩阵进行双向协同操作是指按照前述双向协同策略,对纹理基元i和纹理基元j之间的权重比例进行计算,得到新的D矩阵。计算权重比例时,可以采用softmax、logit等重赋权函数,本申请对函数类型不作限制。
相对于C矩阵而言,对C矩阵进行双向协同操作得到的D矩阵中,第一矩阵所包含的局部的各纹理基元的特征之间的依赖关系可以得到加强。
在得到D矩阵后,可以先将D矩阵对应的(与C矩阵对应,D矩阵由C矩阵得到,所以与D矩阵对应)A矩阵整形成大小为(1,k_w*k_h,c)的矩阵,如:称之为A'矩阵。然后,可以将A'矩阵和D矩阵进行相乘,并对相乘的结果矩阵进行整形,得到E矩阵。E矩阵的大小(k_w,k_h,c)。可以理解的,根据前述一系列的矩阵操作,每个A矩阵(第一矩阵)都会得到一个对应的E矩阵。
在得到E矩阵后,可以对E矩阵进行池化,得到E矩阵中心位置处的特征向量,即为A矩阵(第一矩阵)所包含的局部的各纹理基元的特征对应的特征向量。特征向量的大小为(1,1,c)。
可选地,对E矩阵进行池化可以包括平均池化、最大值池化等,在此不作限制。
按照上述从A矩阵至特征向量的过程,即可确定得到每个第一矩阵对应的特征向量,从而得到多个特征向量。
将多个特征向量聚合到一起,即可得到图像中各纹理基元的特征之间的依赖关系。
例如,在得到多个特征向量后,可以根据多个特征向量组成第四矩阵,如:可以称之为F矩阵,F矩阵的大小为(ww,hh,c),ww表示多尺度特征图的长度,hh表示多尺度特征图的宽度。第四矩阵即可以用于指示图像中各纹理基元的特征之间的依赖关系。
另一方面,除了执行前述一方面所述的S403-S405外,还可以根据图像的至少一个尺度的特征图获得图像的至少一组纹理特征。如:可以执行下述S406-S407。
S406、提取图像的至少一个尺度的特征图中各纹理基元的特征,获得多个纹理基元的特征。
S407、对多个纹理基元的特征进行池化,得到至少一组纹理特征。
可选地,本申请所述的神经网络中用于提取图像的至少一个尺度的特征图中各纹理基元的特征的部分,也可以通过一个卷积网络实现,其基元原理与前述提取图像的多尺度特征图中各纹理基元的特征类似,在此不再赘述。
一种实施方式中,S406中提取图像的至少一个尺度的特征图中各纹理基元的特征,可以是指:对从S401中得到的图像的多个尺度的特征图中的一个或多个特征图进行特征提取,得到一个或多个特征图的纹理特征。相应地,S407中则是对一个或多个特征图的纹理特征进行池化。
另一种实施方式中,S406中提取图像的至少一个尺度的特征图中各纹理基元的特征,也可以是指:对从S402中得到的图像的多尺度特征图进行特征提取,得到多尺度特征图的纹理特征。相应地,S407中则是对多尺度特征图的纹理特征进行池化。
本申请对S406和S407的具体实施方式并不作限制。
通过上述两个方面,得到图像中各纹理基元的特征之间的依赖关系、及图像的至少一组纹理特征后,可以将图像中各纹理基元的特征之间的依赖关系、及图像的至少一组纹理特征进行聚合,得到图像的纹理表示结果。如:可以执行S408。
S408、根据依赖关系、及至少一组纹理特征获得图像的纹理表示结果。
假设图像中各纹理基元的特征之间的依赖关系为前述S405中所述的第四矩阵(F矩阵)、至少一组纹理特征组成第五矩阵。
一些实施例中,根据依赖关系、及至少一组纹理特征获得图像的纹理表示结果可以是指:将第四矩阵和第五矩阵进行相加,从而实现对图像中各纹理基元的特征之间的依赖关系和图像的纹理特征的聚合,得到的第四矩阵和第五矩阵的和,即为图像的纹理表示结果。
另外一些实施例中,也可以是将第四矩阵和第五矩阵进行相乘、或其他更复杂的矩阵运算,实现对图像中各纹理基元的特征之间的依赖关系和图像的纹理特征的聚合,得到图像的纹理表示结果,本申请在此亦不作限制。
根据S408得到的图像的纹理表示结果,即可对图像进行处理。如:可以执行S409。
S409、根据图像的纹理表示结果对图像进行处理。
可选地,对图像进行处理可以是指:对图像进行识别、对图像进行分割、或者根据该图像进行图像合成等。根据不同的图像处理需求,本申请实施例提供的该图像处理方法,可以应用于任何需要根据图像的纹理表示结果进行图像处理的场景。
由上所述,本申请实施例可以根据图像中各纹理基元的特征之间的依赖关系、及图像的至少一组纹理特征获取图像的纹理表示结果,能够使得图像的纹理表示结果中既可以包含该图像的纹理特征,还可以包含该图像中不同纹理基元的特征之间的依赖关系,从而使得图像的纹理表示结果能够反映出的图像的纹理信息更加完善,进而提高后续根据图像的纹理表示结果,进行图像识别、图像分割、或图像合成等图像处理时的图像处理效果。例如,可以有效提高根据图像的纹理表示结果进行图像识别的准确率。
另外,本申请实施例中,神经网络沿方向图对应的方向提取图像的多尺度特征图中 各纹理基元的特征时,采用至少一个方向图作为空间上下文引导条件,能够有效提升提取空间上下文线索的能力,从而更好的感知纹理基元的特征,以尽可能多的提取出多尺度特征图中潜在的纹理基元的特征。
进一步,本申请实施例中,在根据多组依赖关系,确定图像中各纹理基元的特征之间的依赖关系之前,神经网络按照第一函数,对多组依赖关系中的每组依赖关系中任意两个纹理基元的特征之间的双向关系值进行更新,可以强化每组依赖关系,在任意两个纹理基元的特征的双向关系之间建立关联,从而使得神经网络更容易学习到纹理基元之间的空间结构依赖性。
在一些实施例中,在按照前述S401-S409的过程,对原始图像进行处理之前,可以先对原始图像进行预处理,得到预处理后的图像。然后,可以按照前述S401-S409的过程,对预处理后的图像进行处理。也即,该图像处理方法还可以包括对图像进行预处理的步骤。例如,图5示出了本申请实施例提供的图像处理方法的另一流程示意图。
如图5所示,在前述图4所示的S401之前,该图像处理方法还可以包括S501-S503。
S501、采用双边线性插值将原始图像的尺寸调整至第一尺寸。
例如,第一尺寸可以为512*512,可以通过双边线性插值的方式将原始图像的尺寸调整为512*512。当然,第一尺寸的具体大小也可以是其他数值,如:256*256,本申请不作限制。
S502、从大小为第一尺寸的原始图像中,裁剪出大小为第二尺寸的图像块作为待处理的图像。
例如,第二尺寸可以为224*224。在将原始图像的尺寸调整为512*512后,可以对512*512大小的原始图像进行裁剪,得到尺寸为224*224的图像块作为后续待处理的图像。裁剪的方式可以是随机裁剪,也可以是以512*512大小的原始图像的中心位置为中心进行裁剪,本申请在此不作限制。
S503、对图像进行标准化处理。
例如,可以对S502得到的图像块进行z-score标准化,从而实现图像块中各纹理基元的特征数据中心化。通过对图像进行标准化处理,能够增加图像处理的泛化能力。
可以理解的,一些实施例中,可以从大小为第一尺寸的原始图像中,裁剪出多个大小为第二尺寸的图像块。对于每个图像块,都可以按照S401-S409所述的过程进行处理。另外,在处理过程中,对于每个作为图像的图像块,可以得到该图像块对应的多个多尺度特征图,构成多尺度特征池。对每个多尺度特征图,均可以按照S403-S405所述的操作,提取多尺度特征图中包含的各纹理基元的特征之间的依赖关系。多个多尺度特征图中分别包含的各纹理基元的特征之间的依赖关系,即可以构成该图像块中各纹理基元的特征之间的依赖关系。
下面以图像识别为例,对本申请实施例进行更进一步的说明。
前述图4或图5所示的图像处理方法中,S409具体可以是指:根据图像的纹理表示结果,通过神经网络确定图像的预测分类标签。图像的预测分类标签即为该图像的识别结果。
例如,若原始图像为用户M的照片,则输出的预测分类标签可以用户M的职业、性别、姓名等中的任意一种或多种。预测分类标签的具体类型与训练时的训练样本中的实际分类标签有关。也即,与用于图像识别的神经网络具体的识别功能有关。
图6示出了本申请实施例提供的一种神经网络的组成示意图。如图6所示,在一种 可能的设计中,当该图像处理方法应用于图像识别中时,本申请实施例提供的神经网络可以包括:输入层、特征提取层、纹理编码层、全连接层和输出层。
其中,输入层可以用于输入原始图像或对原始图像进行预处理后的图像。
特征提取层可以包括Resnet50网络和缩放拼接模块,Resnet50网络能够对图像进行特征提取,输出图像的多个尺度的特征图(如实现前述S401的功能)。缩放拼接模块能够将图像的多个尺度的特征图缩放至相同尺度大小后进行拼接,得到图像的多尺度特征图(如实现前述S402的功能)。
纹理编码层可以包括:结构揭示模块、第一池化模块、卷积网络、第二池化模块、以及特征聚合模块。结构揭示模块可以将方向图作为空间上下文引导条件,并根据特征提取层输出的多尺度特征图,获取图像中各纹理基元的特征之间的依赖关系(如实现前述S403-S405的功能)。第一池化模块可以对结构揭示模块的输出结果进行池化。卷积网络可以根据特征提取层输出的多尺度特征图,或至少一个尺度的特征图,提取图像的全局纹理特征(如实现前述S406的功能)。第二池化模块可以对卷积网络输出的图像的全局纹理特征进行池化(如实现前述S407的功能)。特征聚合模块可以将图像的纹理特征和图像中各纹理基元的特征之间的依赖关系进行聚合,得到图像的纹理表示结果(如实现前述S408的功能)。
全连接层可以根据纹理编码层输出的图像的纹理表示结果,对图像进行识别,输出图像的预测分类标签,预测分类标签即为图像的识别结果(与前述图2所示的全连接层类似,在此不再赘述)。
如前述实施例中所述,本申请实施例所述的图像处理方法可以通过存储器中的程序代码来实现,可以应用或者推理在CPU、GPU等高性能计算设备上。下面以图像识别为例,对神经网络的训练过程和推理过程进行简单说明。
1)神经网络的训练过程可以为:首先,可以构建如图6所示的神经网络的架构,并初始化整个神经网络的权重参数。然后,可以在GPU、CPU等设备上,利用当前的网络权重进行前向推理计算,并利用前向推理计算的结果和真实值计算出误差值。通过误差值即可判断出神经网络是否达到了收敛的要求,若误差值未满足收敛要求,则根据误差值进行反向传播更新神经网络中的所有可训练权重。之后可以循环进行前述步骤直到误差值达到收敛。当误差值达到收敛,可以将神经网络中的所有参数固化不再更新,并存储。
2)神经网络的推理过程可以为:将1)中训练好的神经网络存储到GPU或CPU等其他计算设备上。然后,可以将需要识别的图像输入该神经网络,并利用当前的网络权重进行前向推理计算,神经网络的输出即为该图像的识别结果。
可选地,当该图像处理方法应用于其他图像处理,如:图像分隔、图像合成中时,也可以将图6中所示的纹理编码层嵌入其他神经网络以实现对应的功能。不论将该纹理编码层应用于何种神经网络,均可以具有较好的鲁棒性。在此不再一一举例说明。
上述主要从神经网络或图像处理设备的角度对本申请实施例提供的方案进行了介绍。可以理解的是,为了实现上述功能,该神经网络或图像处理设备可以包含执行各个功能相应的硬件结构和/或软件模块。
如:本申请实施例还可以提供一种图像处理装置。图7示出了本申请实施例提供的图像处理装置的结构示意图。如图7所示,该图像处理装置可以包括:纹理表示模块701,可以用于根据方向信息和图像的多尺度特征图获取图像中各纹理基元的特征之间的依赖 关系,其中,多尺度特征图包括图像的多个纹理基元在多个不同尺度下的特征,方向信息包括一个或多个方向;根据图像的至少一个尺度的特征图获得图像的至少一组纹理特征,其中,根据一个尺度的特征图获得图像的一组所述纹理特征;根据依赖关系、及至少一组纹理特征获得图像的纹理表示结果。处理模块702,可以用于根据图像的纹理表示结果对图像进行处理。
在一种可能的设计中,方向信息可以包括第一方向以及与所述第一方向相反的第二方向。
可选地,纹理表示模块具体可以用于提取图像的至少一个尺度的特征图中各纹理基元的特征,获得多个纹理基元的特征;对多个纹理基元的特征进行池化,得到至少一组纹理特征。
可选地,纹理表示模块具体还用于根据方向信息提取图像的多尺度特征图中各纹理基元的特征,得到图像的多个区域的纹理基元的特征;根据多个区域的纹理基元的特征,获取每个区域中各纹理基元的特征之间的依赖关系,得到与多个区域分别对应的多组依赖关系;根据多组依赖关系,确定图像中各纹理基元的特征之间的依赖关系。
可选地,纹理表示模块具体用于沿一个或多个方向提取图像的多尺度特征图中各纹理基元的特征。
在一种可能的设计中,纹理表示模块还可以用于按照第一函数,对多组依赖关系中的每组依赖关系中任意两个纹理基元的特征之间的双向关系值进行更新。
应理解以上装置中模块或单元的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且装置中的模块可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分单元以软件通过处理元件调用的形式实现,部分单元以硬件的形式实现。
例如,各个单元可以为单独设立的处理元件,也可以集成在装置的某一个芯片中实现,此外,也可以以程序的形式存储于存储器中,由装置的某一个处理元件调用并执行该单元的功能。此外这些单元全部或部分可以集成在一起,也可以独立实现。这里所述的处理元件又可以称为处理器,可以是一种具有信号的处理能力的集成电路。在实现过程中,上述方法的各步骤或以上各个单元可以通过处理器元件中的硬件的集成逻辑电路实现或者以软件通过处理元件调用的形式实现。
在一个例子中,以上任一装置中的单元可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(application specific integrated circuit,ASIC),或,一个或多个微处理器(digital singnal processor,DSP),或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA),或这些集成电路形式中至少两种的组合。
再如,当装置中的单元可以通过处理元件调度程序的形式实现时,该处理元件可以是通用处理器,例如中央处理器(central processing unit,CPU)或其它可以调用程序的处理器。再如,这些单元可以集成在一起,以片上系统(system-on-a-chip,SOC)的形式实现。
例如,本申请实施例还可以提供一种图像处理装置,可以包括:接口电路,用于接收待处理的图像的数据;处理器,连接接口电路并用于执行以上方法中所述的各个步骤。该处理器可以包括一个或多个。
在一种实现中,分别实现以上方法中各个对应步骤的模块可以通过处理元件调度程 序的形式实现。例如,图像处理装置可以包括处理元件和存储元件,处理元件调用存储元件存储的程序,以执行以上方法实施例中所述的方法。存储元件可以为与处理元件处于同一芯片上的存储元件,即片内存储元件。
在另一种实现中,用于实现以上方法的程序可以在与处理元件处于不同芯片上的存储元件,即片外存储元件。此时,处理元件从片外存储元件调用或加载程序于片内存储元件上,以调用并执行以上方法实施例中所述的方法。
例如,本申请实施例还可以提供一种图像处理装置,可以包括:处理器,处理器用于与存储器相连,调用存储器中存储的程序,以执行如前述方法实施例中所述的方法。该存储器可以位于该图像处理装置之内,也可以位于该图像处理装置之外。且该处理器包括一个或多个。
在又一种实现中,用于实现以上方法中各个步骤的模块可以是被配置成一个或多个处理元件,这些处理元件可以设置于终端上,这里的处理元件可以为集成电路,例如:一个或多个ASIC,或,一个或多个DSP,或,一个或者多个FPGA,或者这些类集成电路的组合。这些集成电路可以集成在一起,构成芯片。
在又一种实现中,用于实现以上方法中各个步骤的模块可以集成在一起,以SOC的形式实现,该SOC芯片,用于实现对应的方法。该芯片内可以集成至少一个处理元件和存储元件,由处理元件调用存储元件的存储的程序的形式实现对应的方法;或者,该芯片内可以集成至少一个集成电路,用于实现对应的方法;或者,可以结合以上实现方式,部分单元的功能通过处理元件调用程序的形式实现,部分单元的功能通过集成电路的形式实现。
这里的处理元件同以上描述,可以是通用处理器,例如CPU,还可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个ASIC,或,一个或多个微处理器DSP,或,一个或者多个FPGA等,或这些集成电路形式中至少两种的组合。
存储元件可以是一个存储器,也可以是多个存储元件的统称。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,如:程序。该软件产品存储在一个程序产品,如计算机可读存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
例如,本申请实施例还可以提供一种计算机可读存储介质,包括:计算机软件指令;当计算机软件指令在图像处理装置或内置在图像处理装置的芯片中运行时,可以使得图像处理装置执行如前述方法实施例中所述的方法。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (17)

  1. 一种图像处理方法,其特征在于,所述方法通过神经网络来实现,所述方法包括:
    根据方向信息和图像的多尺度特征图获取所述图像中各纹理基元的特征之间的依赖关系,其中,所述多尺度特征图包括所述图像的多个纹理基元在多个不同尺度下的特征,所述方向信息包括一个或多个方向;
    根据所述图像的至少一个尺度的特征图获得所述图像的至少一组纹理特征;其中,根据一个尺度的所述特征图获得所述图像的一组所述纹理特征;
    根据所述依赖关系、及所述至少一组纹理特征获得所述图像的纹理表示结果;
    根据所述图像的纹理表示结果对所述图像进行处理。
  2. 根据权利要求1所述的方法,其特征在于,所述方向信息包括第一方向以及与所述第一方向相反的第二方向。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述图像的至少一个尺度的特征图获得所述图像的至少一组纹理特征,包括:
    提取所述图像的至少一个尺度的特征图中各纹理基元的特征,获得多个纹理基元的特征;
    对所述多个纹理基元的特征进行池化,得到所述至少一组纹理特征。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述根据方向信息和图像的多尺度特征图获取所述图像中各纹理基元的特征之间的依赖关系,包括:
    根据所述方向信息提取所述图像的多尺度特征图中各纹理基元的特征,得到所述图像的多个区域的纹理基元的特征;
    根据所述多个区域的纹理基元的特征,获取每个区域中各纹理基元的特征之间的依赖关系,得到与所述多个区域分别对应的多组依赖关系;
    根据所述多组依赖关系,确定所述图像中各纹理基元的特征之间的依赖关系。
  5. 根据权利要求4所述的方法,其特征在于,在所述根据所述多组依赖关系,确定所述图像中各纹理基元的特征之间的依赖关系之前,所述方法还包括:
    按照第一函数,对所述多组依赖关系中的每组依赖关系中任意两个纹理基元的特征之间的双向关系值进行更新。
  6. 根据权利要求4或5所述的方法,其特征在于,所述根据所述方向信息提取所述图像的多尺度特征图中各纹理基元的特征,包括:
    沿所述一个或多个方向提取所述图像的多尺度特征图中各纹理基元的特征。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述对所述图像进行处理,包括:对所述图像进行识别、对所述图像进行分割、以及根据所述图像进行图像合成中的任意一种。
  8. 一种图像处理装置,其特征在于,所述装置通过神经网络来实现,所述装置包括:
    纹理表示模块,用于根据方向信息和图像的多尺度特征图获取所述图像中各纹理基元的特征之间的依赖关系,其中,所述多尺度特征图包括所述图像的多个纹理基元在多个不同尺度下的特征,所述方向信息包括一个或多个方向;根据所述图像的至少一个尺度的特征图获得所述图像的至少一组纹理特征,其中,根据一个尺度的所述特征图获得所述图像的一组所述纹理特征;根据所述依赖关系、及所述至少一组纹理特征获得所述图像的纹理表示结果;
    处理模块,用于根据所述图像的纹理表示结果对所述图像进行处理。
  9. 根据权利要求8所述的装置,其特征在于,所述方向信息包括第一方向以及与所述第一方向相反的第二方向。
  10. 根据权利要求8或9所述的装置,其特征在于,所述纹理表示模块,具体用于提取所述图像的至少一个尺度的特征图中各纹理基元的特征,获得多个纹理基元的特征;对所述多个纹理基元的特征进行池化,得到所述至少一组纹理特征。
  11. 根据权利要求8-10任一项所述的装置,其特征在于,所述纹理表示模块,具体用于根据所述方向信息提取所述图像的多尺度特征图中各纹理基元的特征,得到所述图像的多个区域的纹理基元的特征;根据所述多个区域的纹理基元的特征,获取每个区域中各纹理基元的特征之间的依赖关系,得到与所述多个区域分别对应的多组依赖关系;根据所述多组依赖关系,确定所述图像中各纹理基元的特征之间的依赖关系。
  12. 根据权利要求11所述的装置,其特征在于,所述纹理表示模块,还用于按照第一函数,对所述多组依赖关系中的每组依赖关系中任意两个纹理基元的特征之间的双向关系值进行更新。
  13. 根据权利要求11或12所述的装置,其特征在于,所述纹理表示模块,具体用于沿所述一个或多个方向提取所述图像的多尺度特征图中各纹理基元的特征。
  14. 根据权利要求8-13任一项所述的装置,其特征在于,所述对所述图像进行处理,包括:对所述图像进行识别、对所述图像进行分割、以及根据所述图像进行图像合成中的任意一种。
  15. 一种图像处理装置,其特征在于,包括:
    接口电路,用于接收待处理的图像的数据;
    处理器,连接所述接口电路并用于执行权利要求1至7中任一项所述的方法。
  16. 一种图像处理装置,其特征在于,包括:处理器,所述处理器用于与存储器相连,调用所述存储器中存储的程序,以执行权利要求1至7中任一项所述的方法。
  17. 一种计算机可读存储介质,其特征在于,包括:计算机软件指令;
    当所述计算机软件指令在图像处理装置或内置在所述图像处理装置的芯片中运行时,使得所述图像处理装置执行如权利要求1至7中任一项所述的方法。
PCT/CN2021/099560 2020-06-12 2021-06-11 图像处理方法、装置及存储介质 WO2021249520A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21821901.2A EP4156078A4 (en) 2020-06-12 2021-06-11 IMAGE PROCESSING METHOD AND APPARATUS AND STORAGE MEDIUM
US18/064,144 US20230109317A1 (en) 2020-06-12 2022-12-09 Image processing method and apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010537872.7 2020-06-12
CN202010537872.7A CN113807360A (zh) 2020-06-12 2020-06-12 图像处理方法、装置及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/064,144 Continuation US20230109317A1 (en) 2020-06-12 2022-12-09 Image processing method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2021249520A1 true WO2021249520A1 (zh) 2021-12-16

Family

ID=78846897

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099560 WO2021249520A1 (zh) 2020-06-12 2021-06-11 图像处理方法、装置及存储介质

Country Status (4)

Country Link
US (1) US20230109317A1 (zh)
EP (1) EP4156078A4 (zh)
CN (1) CN113807360A (zh)
WO (1) WO2021249520A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6205348B1 (en) * 1993-11-29 2001-03-20 Arch Development Corporation Method and system for the computerized radiographic analysis of bone
CN103559496A (zh) * 2013-11-15 2014-02-05 中南大学 泡沫图像多尺度多方向纹理特征的提取方法
CN103942540A (zh) * 2014-04-10 2014-07-23 杭州景联文科技有限公司 基于曲波纹理分析和svm-knn分类的假指纹检测算法
CN104091333A (zh) * 2014-07-01 2014-10-08 黄河科技学院 基于区域可信融合的多类无监督彩色纹理图像分割方法
US20170287252A1 (en) * 2016-04-03 2017-10-05 Harshal Dwarkanath Laddha Counterfeit Document Detection System and Method
US20180061058A1 (en) * 2016-08-26 2018-03-01 Elekta, Inc. Image segmentation using neural network method
CN110516678A (zh) * 2019-08-27 2019-11-29 北京百度网讯科技有限公司 图像处理方法和装置
CN111078940A (zh) * 2019-12-16 2020-04-28 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机存储介质及电子设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6205348B1 (en) * 1993-11-29 2001-03-20 Arch Development Corporation Method and system for the computerized radiographic analysis of bone
CN103559496A (zh) * 2013-11-15 2014-02-05 中南大学 泡沫图像多尺度多方向纹理特征的提取方法
CN103942540A (zh) * 2014-04-10 2014-07-23 杭州景联文科技有限公司 基于曲波纹理分析和svm-knn分类的假指纹检测算法
CN104091333A (zh) * 2014-07-01 2014-10-08 黄河科技学院 基于区域可信融合的多类无监督彩色纹理图像分割方法
US20170287252A1 (en) * 2016-04-03 2017-10-05 Harshal Dwarkanath Laddha Counterfeit Document Detection System and Method
US20180061058A1 (en) * 2016-08-26 2018-03-01 Elekta, Inc. Image segmentation using neural network method
CN110516678A (zh) * 2019-08-27 2019-11-29 北京百度网讯科技有限公司 图像处理方法和装置
CN111078940A (zh) * 2019-12-16 2020-04-28 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机存储介质及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4156078A4 *

Also Published As

Publication number Publication date
US20230109317A1 (en) 2023-04-06
EP4156078A4 (en) 2023-11-22
CN113807360A (zh) 2021-12-17
EP4156078A1 (en) 2023-03-29

Similar Documents

Publication Publication Date Title
CN109902548B (zh) 一种对象属性识别方法、装置、计算设备及系统
WO2021227726A1 (zh) 面部检测、图像检测神经网络训练方法、装置和设备
WO2021073493A1 (zh) 图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质
CA3034688C (en) Systems and methods for verifying authenticity of id photo
WO2021248859A1 (zh) 视频分类方法、装置、设备及计算机可读存储介质
CN112070044B (zh) 一种视频物体分类方法及装置
CN110059728B (zh) 基于注意力模型的rgb-d图像视觉显著性检测方法
WO2022001372A1 (zh) 训练神经网络的方法、图像处理方法及装置
CN110287836B (zh) 图像分类方法、装置、计算机设备和存储介质
WO2018082308A1 (zh) 一种图像处理方法及终端
WO2023284182A1 (en) Training method for recognizing moving target, method and device for recognizing moving target
WO2023142602A1 (zh) 图像处理方法、装置和计算机可读存储介质
CN112088393A (zh) 图像处理方法、装置及设备
WO2020151148A1 (zh) 基于神经网络的黑白照片色彩恢复方法、装置及存储介质
CN112990010A (zh) 点云数据处理方法、装置、计算机设备和存储介质
CN110222718A (zh) 图像处理的方法及装置
CN112256899B (zh) 图像重排序方法、相关设备及计算机可读存储介质
CN111178187A (zh) 一种基于卷积神经网络的人脸识别方法及装置
CN112329808A (zh) 一种Deeplab语义分割算法的优化方法及系统
WO2022063321A1 (zh) 图像处理方法、装置、设备及存储介质
WO2021249520A1 (zh) 图像处理方法、装置及存储介质
CN112614110A (zh) 评估图像质量的方法、装置及终端设备
WO2023273515A1 (zh) 目标检测方法、装置、电子设备和存储介质
CN115995079A (zh) 图像语义相似度分析方法和同语义图像检索方法
CN112016577A (zh) 图像处理方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21821901

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021821901

Country of ref document: EP

Effective date: 20230112

NENP Non-entry into the national phase

Ref country code: DE