WO2019233244A1 - 图像处理方法、装置、计算机可读介质及电子设备 - Google Patents
图像处理方法、装置、计算机可读介质及电子设备 Download PDFInfo
- Publication number
- WO2019233244A1 WO2019233244A1 PCT/CN2019/086384 CN2019086384W WO2019233244A1 WO 2019233244 A1 WO2019233244 A1 WO 2019233244A1 CN 2019086384 W CN2019086384 W CN 2019086384W WO 2019233244 A1 WO2019233244 A1 WO 2019233244A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- convolution
- image
- residual
- convolution layer
- layer
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 33
- 238000012545 processing Methods 0.000 claims abstract description 76
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims abstract description 52
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 230000006870 function Effects 0.000 claims description 52
- 238000012549 training Methods 0.000 claims description 35
- 238000000034 method Methods 0.000 claims description 32
- 238000004590 computer program Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 19
- 239000011159 matrix material Substances 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000005457 optimization Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000012512 characterization method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000013529 biological neural network Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000012993 chemical processing Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/60—Rotation of whole images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Definitions
- the present application relates to the field of computer technology, and in particular, to an image processing method, device, computer-readable medium, and electronic device.
- a neural network is a mathematical or computational model that mimics the structure and function of a biological neural network. It is widely used in image processing, for example, for image recognition.
- ResNet residual Neural Network
- Embodiments of the present application provide an image processing method, device, computer-readable medium, and electronic device, so as to at least to a certain extent ensure that the residual network can extract accurate image features from the image, and improve the accuracy of image recognition.
- an image processing method is provided, which is executed by an electronic device.
- the method includes: acquiring a target image to be processed; and performing feature extraction on the target image based on a residual network to obtain image features.
- the residual network includes a plurality of sequentially connected residual blocks, each of which includes a convolution branch and a residual branch, and a convolution kernel of a first convolution layer in the convolution branch
- the size is smaller than the size of the convolution kernel of the second convolution layer after the first convolution layer, and the convolution step of the second convolution layer is larger than the convolution step of the first convolution layer and smaller than A convolution kernel width of the second convolution layer; performing recognition processing on the image to be processed according to the image feature information.
- an image processing apparatus including: a first obtaining unit configured to obtain a target image to be processed; and a first processing unit configured to perform a process on the target image based on a residual network.
- the residual network includes multiple successively connected residual blocks, each of which includes a convolution branch and a residual branch, and the first volume in the convolution branch.
- the convolution kernel size of the convolution layer is smaller than the convolution kernel size of the second convolution layer after the first convolution layer, and the convolution step of the second convolution layer is larger than that of the first convolution layer.
- the convolution step is smaller than the convolution kernel width of the second convolution layer; the second processing unit is configured to perform recognition processing on the image to be processed according to the image feature information.
- a computer-readable medium on which a computer program is stored, and when the computer program is executed by a processor, the image processing method as described in the foregoing embodiment is implemented.
- an electronic device including: one or more processors; a storage device for storing one or more programs, and when the one or more programs are replaced by the one or more programs, When executed by multiple processors, the one or more processors are caused to implement the image processing method as described in the foregoing embodiment.
- FIG. 1 shows a schematic diagram of an exemplary system architecture to which an image processing method or an image processing apparatus of an embodiment of the present application can be applied;
- FIG. 2 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application
- FIG. 3 schematically illustrates a flowchart of an image processing method according to an embodiment of the present application
- FIG. 4 is a schematic structural diagram of a first residual block in each convolution stage of a residual network according to an embodiment of the present application
- FIG. 5 is a schematic structural diagram of a residual network according to an embodiment of the present application.
- FIG. 6 shows a flowchart of a method for training a residual network according to an embodiment of the present application
- FIG. 7 shows a flowchart of acquiring a training sample image according to an embodiment of the present application
- FIG. 8 shows a flowchart of performing a perturbation process on an image according to an embodiment of the present application
- FIG. 9 schematically illustrates a block diagram of an image processing apparatus according to an embodiment of the present application.
- FIG. 10 schematically illustrates a block diagram of an image processing apparatus according to another embodiment of the present application.
- FIG. 11 schematically illustrates a block diagram of an image processing apparatus according to still another embodiment of the present application.
- FIG. 1 shows a schematic diagram of an exemplary system architecture 100 to which an image processing method or an image processing apparatus of an embodiment of the present application can be applied.
- the system architecture 100 may include one or more of terminal devices 101, 102, and 103, a network 104, and a server 105.
- the network 104 is used to provide a communication link between the terminal devices 101, 102, 103 and the server 105.
- the network 104 may include various connection types, such as a wired communication link, a wireless communication link, and the like.
- the number of terminal devices, networks, and servers in FIG. 1 is only exemplary. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.
- the server 105 may be a server cluster composed of multiple servers.
- the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
- the terminal devices 101, 102, 103 may be various electronic devices with a display screen, including but not limited to smart phones, tablet computers, portable computers, desktop computers, and so on.
- the server 105 may be a server that provides various services.
- the user uses the terminal device 103 (may also be the terminal device 101 or 102) to collect an image to be identified, and then uploads the image to the server 105.
- the server 105 may perform feature extraction on the image based on a residual network (for example, a residual neural network) to obtain image characteristic information, and then perform recognition processing on the image based on the image characteristic information.
- a residual network for example, a residual neural network
- the residual network used by the server 105 when performing feature extraction includes multiple residual blocks connected in sequence, and each residual block includes a convolution branch and a residual branch, and the first convolution layer in the convolution branch
- the convolution kernel size of is smaller than the convolution kernel size of the second convolution layer after the first convolution layer, and the convolution step size of the second convolution layer is larger than the convolution step size of the first convolution layer and smaller than the first convolution layer.
- the convolution kernel width of the two convolutional layers is smaller than the convolution kernel size of the second convolution layer after the first convolution layer, and the convolution step size of the second convolution layer is larger than the convolution step size of the first convolution layer and smaller than the first convolution layer.
- the convolution kernel size of the first convolution layer is 1 ⁇ 1 pixel (pixel)
- the convolution step size is 1 pixel
- the convolution kernel size of the second convolution layer is 3 ⁇ 3 pixels
- the convolution step size is 2 Pixels
- the image processing method provided by the embodiment of the present application is generally executed by the server 105, and accordingly, the image processing apparatus is generally disposed in the server 105.
- the terminal may also have similar functions with the server, so as to execute the image processing method provided by the embodiment of the present application.
- FIG. 2 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
- the computer system 200 includes a central processing unit (CPU) 201 that can be loaded into a random access memory (RAM) 203 according to a program stored in a read-only memory (ROM) 202 or a program loaded from a storage section 208 Instead, perform various appropriate actions and processes.
- RAM 203 various programs and data required for system operation are also stored.
- the CPU 201, the ROM 202, and the RAM 203 are connected to each other through a bus 204.
- An input / output (I / O) interface 205 is also connected to the bus 204.
- the following components are connected to the I / O interface 205: an input portion 206 including a keyboard, a mouse, and the like; an output portion 207 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the speaker; a storage portion 208 including a hard disk and the like; a communication section 209 including a network interface card such as a local area network (LAN) card, a modem, and the like.
- the communication section 209 performs communication processing via a network such as the Internet.
- the driver 210 is also connected to the I / O interface 205 as needed.
- a removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 210 as needed, so that a computer program read therefrom is installed into the storage section 208 as needed.
- a process described below with reference to a flowchart may be implemented as a computer software program.
- embodiments of the present application include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing the method shown in the flowchart.
- the computer program may be downloaded and installed from a network through the communication section 209, and / or installed from a removable medium 211.
- this computer program is executed by a central processing unit (CPU) 201, various functions defined in the system of the present application are executed.
- CPU central processing unit
- the computer-readable medium shown in the present application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing.
- the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device.
- Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline and the like, or any suitable combination of the foregoing.
- each block in the flowchart or block diagram may represent a module, program segment, or part of code, which contains one or more of the logic functions used to implement the specified logic. Executable instructions.
- the functions labeled in the blocks may also occur in a different order than those labeled in the drawings. For example, two blocks represented one after the other may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
- each block in the block diagram or flowchart, and combinations of blocks in the block diagram or flowchart can be implemented with a dedicated hardware-based system that performs the specified function or operation, or can be implemented with A combination of dedicated hardware and computer instructions.
- the units described in the embodiments of the present application may be implemented by software or hardware.
- the described units may also be provided in a processor.
- the names of these units do not, in some cases, define the unit itself.
- the present application also provides a computer-readable medium, which may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device in.
- the computer-readable medium carries one or more programs, and when the one or more programs are executed by one of the electronic devices, the electronic device is caused to implement a method as described in the following embodiments.
- the electronic device can implement the steps shown in FIG. 3, FIG. 6 to FIG.
- FIG. 3 schematically illustrates a flowchart of an image processing method according to an embodiment of the present application.
- the image processing method is applicable to the electronic device described in the foregoing embodiment.
- the image processing method includes at least steps S310 to S330, which are described in detail as follows:
- step S310 a target image to be processed is acquired.
- the target image to be processed may be an image requiring visual processing, such as an image requiring object recognition processing.
- step S320 feature extraction is performed on the target image based on a residual network to obtain image feature information.
- the residual network includes multiple consecutive residual blocks (Residual Blocks), each of the residual blocks Including a convolution branch and a residual branch, and a convolution kernel size of a first convolution layer in the convolution branch is smaller than a convolution kernel size of a second convolution layer after the first convolution layer, the The convolution step of the second convolution layer is larger than the convolution step of the first convolution layer and smaller than the convolution kernel width of the second convolution layer.
- the residual branch in the residual block is from the input of the convolution branch to the output of the convolution branch.
- the "block”, “branch”, “layer”, and “stage” used in describing the residual network represent various processes or steps, and the term “sequentially connected” used indicates The processes or steps are connected in sequence.
- the convolution layer also refers to performing a convolution process or operation on a target image to be processed.
- Convolution is a mathematical operator that generates a third function from two functions f and g.
- the target image is represented by a function f, for example, and the convolution kernel is a function g.
- the functions f, g are usually three-dimensional discrete matrices, and the generated functions are also three-dimensional matrices.
- the target image is represented as a (H, W, C) three-dimensional matrix format, where H and W are the height and width of the target image, respectively, and are used to indicate the resolution or size of the target image, and C is the target image.
- the number of channels are examples of channels.
- the first-dimensional element is each row of pixels of the target image
- the second-dimensional element is each pixel of its column
- the third-dimensional element is its pixel value for each channel.
- Each pixel in it is used as a description unit, and the pixel values of its three channels are recorded.
- the convolution kernel is also called a filter matrix.
- the convolution kernel uses a "sliding window" method to extract features at different positions in the target image.
- the result is a feature map, and the pixels on the feature map are the feature points.
- the convolution step (Stride) is the number of pixels that the center of the convolution kernel moves on the target image at a time.
- the input target image is a grayscale image of 5 ⁇ 5 pixels
- the convolution kernel is a matrix of 3 ⁇ 3 pixels
- the convolution step is 1.
- the convolution process is as follows: a 3 ⁇ 3 pixel matrix is moved from the upper left corner to the lower right corner on a 5 ⁇ 5 pixel picture, and is moved 1 pixel at a time.
- the convolution step size can implement the downsampling function. Through the convolution calculation, the output image has a much smaller resolution than the input target image.
- the residual network may be a deep residual network, and the residual network further includes an initial convolution layer located before the multiple residual blocks, and an output of the initial convolution layer is used as the The input of the first residual block among multiple residual blocks.
- the second convolutional layer in the residual block can already implement the downsampling process, the pooling layer before the residual block in some residual networks can be removed, which simplifies the structure of the residual network.
- a plurality of residual blocks in the residual network constitute a plurality of convolution stages, and the residuals included in each residual block in each of the convolution stages
- the branch includes a batch of normalized processing layers and a target convolution layer connected in sequence.
- the residual branch can be an identity map. But if its input and output are not the same size, then a convolution operation is needed to map the input and output to the same size.
- a residual branch of a non-identical mapping that is, adding a convolution layer
- the BN layer refers to normalizing the feature maps generated by multiple samples (target images) through convolution processing. Specifically, the feature points generated by each sample are reduced by the mean and the variance. The distribution of feature points is normalized to a distribution with a mean of 0 and a variance of 1.
- step S330 a recognition process is performed on the image to be processed according to the image feature information.
- the technical solution of the embodiment shown in FIG. 3 enables the convolutional layer in the residual block to perform a convolution operation, which not only ensures that the downsampling processing can be achieved through the second convolutional layer, but also does not skip any one.
- the feature points can further ensure that there is no loss of the characterization ability of the feature network, thereby ensuring the accuracy of image feature extraction and improving the accuracy of image recognition.
- the structural diagram of the residual block specifically includes: a convolution branch 401 and a residual branch 402, where the residual branch 402 points from the input of the convolution branch 401 to the output of the convolution branch 401.
- the convolution branch 401 includes each of a first convolution layer 4011, a second convolution layer 4012, and a third convolution layer 4013, each of the first convolution layer 4011, the second convolution layer 4012, and the third convolution layer 4013.
- a BN layer is provided, and after processing through the BN layer, it will be processed by Relu (Rectified Linear Unit).
- Relu Rectified Linear Unit
- the convolution kernel size of the first convolution layer 4011 is 1 ⁇ 1 pixels and the convolution step size is 1 pixel; the convolution kernel size of the second convolution layer 4012 is 3 ⁇ 3 pixels and the convolution step size is 2 Pixels; the convolution kernel of the third convolution layer 4013 has a size of 1 ⁇ 1 pixel and a convolution step size of 1 pixel. Since the second convolution layer 4012 can not only implement the processing of downsampling, but also ensure that any feature point is not skipped, the residual block in the embodiment of the present application can ensure that the loss of the feature capacity of the feature network is not caused.
- the convolution kernel size of the first convolution layer in the convolution branch is 1 ⁇ 1 pixels
- the convolution step size is 2 pixels
- the convolution kernel size of the second convolution layer is It is 3 ⁇ 3 pixels and the convolution step is 1 pixel.
- Kernel size, and the convolution step of the second convolution layer is larger than the convolution step of the first convolution layer and smaller than the convolution kernel width of the second convolution layer, so that the convolution layer in the residual block is being convoluted
- it can ensure that the downsampling processing can be realized through the second convolution layer, and it can also ensure that any feature point is not skipped, and it can also ensure that the loss of feature network characterization ability is not lost, thereby ensuring image features.
- the accuracy of extraction improves the accuracy of image recognition.
- the calculation formula of the linear rectifier unit is, for example:
- x is a feature point on the input feature map
- y is a corresponding feature point on the output feature map.
- the linear rectifier unit introduces non-linear characteristics to a system that undergoes a linear computation operation in the convolutional layer.
- the residual branch 402 includes a convolution layer 4021 and a BN layer set before the convolution layer, and after processing through the BN layer, it will be processed by the Relu function.
- the outputs of the convolution branch 401 and the residual branch 402 are added at the element level to obtain the output of each residual block.
- FIG. 5 a structure diagram of a residual network according to an embodiment of the present application is shown.
- the structure includes an initial convolution layer 501, a convolution phase 502, and a convolution phase 503 which are sequentially connected.
- the convolution kernel size of the initial convolution layer 501 is 7 ⁇ 7 pixels
- the convolution step is 2 pixels
- the number of channels is 64.
- Each convolution stage in the convolution stage 505 includes multiple residual blocks. The number of residual blocks contained in different convolution stages may be different.
- the convolution stage 502 includes 3 residual blocks
- the convolution phase 503 includes 4 residual blocks
- the convolution phase 504 includes 23 residual blocks
- the convolution phase 505 includes 4 residual blocks.
- the structure of the first residual block in each convolution stage is shown in Figure 4.
- the residual branches in the other residual blocks are identity maps.
- the convolution branches are shown in Figure 4.
- the convolution branch 401 is the same.
- the sampling process is placed in the first convolution stage, that is, the convolution stage 502, and specifically, the second convolution layer 4012 in the first residual block in the convolution stage 502.
- the downsampling process is placed in the second convolution layer with a convolution kernel size of 3 ⁇ 3 pixels, thereby ensuring that the downsampling process does not skip any feature points, ensuring that Will not cause the loss of feature network characterization capabilities.
- the BN layer is added to the convolution branch, but also the BN layer is added to the residual branch of the non-identical mapping. In this way, the bias term can be added before the convolution layer through the BN layer, which can ensure the maximum Excellent processing effect.
- the embodiment of the present application also proposes a solution on how to train the residual network.
- the residual network is performed according to the embodiment of the present application. Training methods, including:
- Step S610 Initialize the residual network
- Step S620 input training image samples to the residual network for iterative training until the loss function of the residual network satisfies a convergence condition.
- a loss function or cost function refers to a function that maps an event (an element in a sample space) to a real number that expresses the opportunity cost associated with its event, thereby Some of the "costs" visually associated with the event.
- the goal of an optimization problem is to minimize the loss function.
- An objective function is usually a loss function itself or its negative value. When an objective function is a negative value of a loss function, the value of the objective function seeks to maximize.
- the purpose of the loss function is to estimate parameters.
- a Momentum-SGD (Stochastic Gradient Descent) optimization method may be used for training, and a distributed training framework may be used to improve the training rate.
- a Momentum-SGD Spochastic Gradient Descent
- a distributed training framework may be used to improve the training rate. For example, you can use the hardware configuration of 4 and 32 graphics cards for training.
- the specific hyperparameters for training are shown in Table 1. Among them, epoch refers to the number of iterations used to learn all the training images once with the current Batch size (batch size).
- the BN layer in the residual network includes learnable parameters, and these learnable parameters have a strong correlation with the distribution of training image samples.
- learnable parameters have a strong correlation with the distribution of training image samples.
- ⁇ and ⁇ represent learning parameters
- ⁇ represents the average value of the training image samples
- ⁇ represents the variance of the training image samples
- x represents the training image samples
- y represents the output.
- ⁇ and ⁇ are obtained by iterative learning on the basis of training image samples using an optimization algorithm. The learning process is to be able to minimize the loss function (or maximize it when the loss is negative) by adjusting the parameters.
- a regular term of the loss function may be generated according to ⁇ and ⁇ , and added to the original loss function (ie, classification loss) of the residual network to improve the generalization ability of the residual network model.
- the regular term coefficient set by the term should be less than the weight order, for example, one order of magnitude smaller than the weight order, to avoid the added regular term from having a greater impact on the original loss function.
- a technical solution for how to obtain a training sample image is also provided. As shown in FIG. 7, the technical solution includes the following steps:
- Step S710 Obtain a sample image for training the residual network.
- the sample image may be an RGB image.
- step S720 an image area that occupies a predetermined ratio of the sample image and has an aspect ratio of a predetermined ratio is intercepted from the sample image.
- the technical solution of this embodiment makes it possible to intercept a plurality of image regions from a sample image, and at the same time, it can ensure translation invariance and size invariance of the captured image.
- the predetermined ratio is a value randomly selected from a predetermined ratio interval, and / or the predetermined ratio is a value randomly selected from a predetermined aspect ratio interval.
- the predetermined ratio interval may be [0.05,1.0]
- the predetermined aspect ratio interval may be [3 / 4,4 / 3].
- Step S730 Adjust the image area to an image of a set size.
- an image of the same size can be input to the network for training.
- Step S740 Randomly perturb the image of the set size to obtain the training image sample.
- performing random perturbation processing on a set-size image includes: performing horizontal flip processing on the set-size image with a first processing probability; and / or applying a second processing probability to all
- the set-size image is rotated at a random angle, and the random angle is a value randomly selected from a predetermined angle interval; and / or the attributes of the set-size image are adjusted with a third processing probability.
- image attributes include saturation, contrast, brightness, and chroma.
- the technical solution of the embodiment shown in FIG. 7 enables a certain probability to be used to select whether or not to process an image, thereby increasing the degree of difference in training data while avoiding the introduction of excessive data disturbances and bringing about a large noise impact.
- this embodiment shows a process of performing a disturbance processing on an image, which specifically includes:
- step S801 an image is input.
- the image may be an RGB image.
- Step S802 randomly cropping from the image an area that accounts for any ratio of the total area of the image to [0.05, 1.0], and whose aspect ratio is any ratio of [3/4, 4/3].
- step S803 the size of the cropped image is adjusted to a size of 224 * 224 pixels.
- step S804 the image is horizontally flipped with a processing probability of 0.5.
- step S805 a random angle rotation process is performed on the image with a processing probability of 0.25.
- Step S806 Perturb the saturation, contrast, brightness, and chroma of the image with a processing probability of 0.5.
- the technical solutions of the above embodiments of the present application can be widely used in vision-related services, such as evaluation and recommendation of image quality, object recognition in a game scene, image understanding, and video understanding.
- FIG. 9 schematically illustrates a block diagram of an image processing apparatus according to an embodiment of the present application.
- an image processing apparatus 900 includes a first obtaining unit 901, a first processing unit 902, and a second processing unit 903.
- the first acquisition unit 901 is configured to acquire a target image to be processed; the first processing unit 902 is configured to perform feature extraction on the target image based on a residual network to obtain image feature information, and the residual network includes sequential connections. Multiple residual blocks of each, each of the residual blocks includes a convolution branch and a residual branch, and the size of the convolution kernel of the first convolution layer in the convolution branch is smaller than that of the first convolution layer.
- the size of the convolution kernel of the second convolution layer, the convolution step of the second convolution layer is larger than the convolution step of the first convolution layer and smaller than the convolution kernel of the second convolution layer Width; the second processing unit 903 is configured to perform recognition processing on the image to be processed according to the image feature information.
- the residual network further includes an initial convolution layer before the multiple residual blocks, and an output of the initial convolution layer is used as the multiple residuals.
- the input of the first residual block in the block is based on the foregoing scheme.
- the multiple residual blocks constitute multiple convolution stages
- the residual branch contained in the first residual block in each of the convolution stages includes A batch of normalized processing layers and a target convolution layer are connected in sequence.
- the convolution branch further includes a third convolution layer, the first convolution layer, the second convolution layer, and the third convolution layer
- the convolution kernel of the first convolution layer and the third convolution layer is 1 ⁇ 1 pixels, and the convolution step is 1 pixel; the convolution kernel of the second convolution layer The size is 3 ⁇ 3 pixels and the convolution step is 2 pixels.
- a batch normalization is set before each of the first convolutional layer, the second convolutional layer, and the third convolutional layer.
- FIG. 10 schematically illustrates a block diagram of an image processing apparatus according to another embodiment of the present application.
- an image processing apparatus 1000 on the basis of having the first acquisition unit 901, the first processing unit 902, and the second processing unit 903 shown in FIG. 9, It includes: an initialization unit 1001 and a training unit 1002.
- the initialization unit 1001 is configured to initialize the residual network; the training unit 1002 is configured to input training image samples into the residual network for iterative training until a loss function of the residual network meets a convergence condition.
- FIG. 11 schematically illustrates a block diagram of an image processing apparatus according to still another embodiment of the present application.
- an image processing apparatus 1100 in addition to the image processing apparatus shown in FIG. 10, further includes: a loss function optimization unit 1101.
- the loss function optimization unit 1101 is configured to obtain learning parameters included in the batch normalization processing layer in the residual network, and generate a regular term of the loss function through the learning parameters, and add the regular term to The loss function.
- the loss function optimization unit 1101 is configured to determine a coefficient of the regular term, and the coefficient of the regular term is of an order of magnitude smaller than a convolution layer included in the residual network. An order of magnitude of weight; based on a coefficient of the regular term, the regular term is added to the loss function.
- the loss function optimization unit 1101 is configured to: calculate a sum of squares of the learning parameters; and use an arithmetic mean of the sum of squares as a regular term of the loss function.
- the image processing apparatus shown in FIG. 10 and FIG. 11 may further include: a second obtaining unit, configured to obtain a sample image used for training the residual network; and an intercepting unit For capturing an image area from the sample image that occupies a predetermined proportion of the sample image and having an aspect ratio of a predetermined ratio; a size adjustment unit for adjusting the image area to an image of a set size; perturbation processing A unit configured to perform random perturbation processing on the set-size image to obtain the training image sample.
- the predetermined ratio is a value randomly selected from a predetermined ratio interval, and / or the predetermined ratio is a value randomly selected from a predetermined aspect ratio interval.
- the disturbance processing unit is configured to: perform horizontal flip processing on the set-size image with a first processing probability; and / or perform a second processing probability on the image with a second processing probability.
- the set-size image is rotated at a random angle, and the random angle is a value randomly selected from a predetermined angle interval; and / or the attributes of the set-size image are adjusted with a third processing probability.
- modules or units of the device for action execution are mentioned in the detailed description above, this division is not mandatory.
- the features and functions of two or more modules or units described above may be embodied in one module or unit.
- the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
- the technical solution according to the embodiment of the present application may be embodied in the form of a software product, and the software product may be stored in a non-volatile storage medium (can be a CD-ROM, a U disk, a mobile hard disk, etc.) or on a network , Including several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiment of the present application.
- a computing device which may be a personal computer, a server, a touch terminal, or a network device, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Image Analysis (AREA)
Abstract
本申请的实施例提供了一种图像处理方法、装置、计算机可读介质及电子设备。该图像处理方法包括:获取待处理的目标图像;基于残差网络对所述目标图像进行特征提取,得到图像特征信息,所述残差网络包含顺次相连的多个残差块,每个所述残差块包含卷积分支和残差分支,所述卷积分支中的第一卷积层的卷积核大小小于位于所述第一卷积层之后的第二卷积层的卷积核大小,所述第二卷积层的卷积步长大于所述第一卷积层的卷积步长且小于所述第二卷积层的卷积核宽度;根据所述图像特征信息对所述待处理图像进行识别处理。
Description
本申请要求于2018年06月08日提交中国专利局、申请号为201810588686.9、发明名称为“图像处理方法、装置、计算机可读介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及计算机技术领域,具体而言,涉及一种图像处理方法、装置、计算机可读介质及电子设备。
神经网络是一种模仿生物神经网络的结构和功能的数学模型或计算模型,广泛应用于图像处理过程中,例如用于进行图像识别。ResNet(Residual Neural Network,残差神经网络)是由何凯明等人提出的一种神经网络。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本申请的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本申请实施例提供一种图像处理方法、装置、计算机可读介质及电子设备,进而至少在一定程度上确保残差网络能够从图像中提取出准确的图像特征,提高图像的识别准确率。
根据本申请实施例的一个方面,提供了一种图像处理方法,由电子设备执行,所述方法包括:获取待处理的目标图像;基于残差网络对所述目标图像进行特征提取,得到图像特征信息,所述残差网络包含顺次相连的多个残差块,每个所述残差块包含卷积分支和残差分支,所述卷积分支中的第一卷积层的卷积核大小小于位于所述第一卷积层之后的第二卷积层的卷积核大小,所述第二卷积层的卷积步长大于所述第一卷积层的卷积步长且小于所述第二卷积层的卷积核宽度;根据所述图像特征信息对所述待处理图像进行识别处理。
根据本申请实施例的一个方面,提供了一种图像处理装置,包括:第一获取单元,用于获取待处理的目标图像;第一处理单元,用于基于残差网络对所 述目标图像进行特征提取,得到图像特征信息,所述残差网络包含顺次相连的多个残差块,每个所述残差块包含卷积分支和残差分支,所述卷积分支中的第一卷积层的卷积核大小小于位于所述第一卷积层之后的第二卷积层的卷积核大小,所述第二卷积层的卷积步长大于所述第一卷积层的卷积步长且小于所述第二卷积层的卷积核宽度;第二处理单元,用于根据所述图像特征信息对所述待处理图像进行识别处理。
根据本申请实施例的一个方面,提供了一种计算机可读介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上述实施例中所述的图像处理方法。
根据本申请实施例的一个方面,提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如上述实施例中所述的图像处理方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图简要说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1示出了可以应用本申请实施例的图像处理方法或图像处理装置的示例性系统架构的示意图;
图2示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图;
图3示意性示出了根据本申请的一个实施例的图像处理方法的流程图;
图4示出了根据本申请实施例的残差网络的每个卷积阶段中的第一个残差块的结构示意图;
图5示出了根据本申请的实施例的残差网络的结构示意图;
图6示出了根据本申请的一个实施例的对残差网络进行训练的方法的流程图;
图7示出了根据本申请的一个实施例的获取训练样本图像的流程图;
图8示出了根据本申请的一个实施例的对图像进行扰动处理的流程图;
图9示意性示出了根据本申请的一个实施例的图像处理装置的框图;
图10示意性示出了根据本申请的另一个实施例的图像处理装置的框图;
图11示意性示出了根据本申请的又一个实施例的图像处理装置的框图。
实施本发明的方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本申请将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
图1示出了可以应用本申请实施例的图像处理方法或图像处理装置的示例性系统架构100的示意图。
如图1所示,系统架构100可以包括终端设备101、102、103中的一种或多种,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路。网络104可以包括各种连接类型,例如有线通信链路、无线通信链路等等。
图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。比如服务器105可以是多个服务器组成的服务器集群等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103可以是具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、便携式计算机和台式计算机等等。
服务器105可以是提供各种服务的服务器。例如用户利用终端设备103(也可以是终端设备101或102)采集到了需要识别的图像,然后将该图像上传到服务器105。服务器105在接收到该图像之后,可以基于残差网络(例如,残差神经网络)对该图像进行特征提取,以得到图像特征信息,进而基于该图像特征信息对该图像进行识别处理。其中,服务器105在进行特征提取时采用的残差网络包括顺次相连的多个残差块,每个残差块包含卷积分支和残差分支,该卷积分支中的第一卷积层的卷积核大小小于位于第一卷积层之后的第二卷积层的卷积核大小,该第二卷积层的卷积步长大于第一卷积层的卷积步长且小于第二卷积层的卷积核宽度。比如第一卷积层的卷积核大小为1×1像素(pixel)、卷积步长为1像素,第二卷积层的卷积核大小为3×3像素、卷积步长为2像素,那么在进行卷积操作时,既保证了通过第二卷积层能够实现下采样的处理,又能够保证不会跳过任何一个特征点(即,特征图上的每个像素点),进而能够保证不会造成特征网络表征能力的损失,从而确保了图像特征提取的准确性,提高了图像识别的准确率。
需要说明的是,本申请实施例所提供的图像处理方法一般由服务器105执行,相应地,图像处理装置一般设置于服务器105中。但是,在本申请的其它实施例中,终端也可以与服务器具有相似的功能,从而执行本申请实施例所提供的图像处理方法。
图2示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
需要说明的是,图2示出的电子设备的计算机系统200仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图2所示,计算机系统200包括中央处理单元(CPU)201,其可以根据存储在只读存储器(ROM)202中的程序或者从存储部分208加载到随机访 问存储器(RAM)203中的程序而执行各种适当的动作和处理。在RAM 203中,还存储有系统操作所需的各种程序和数据。CPU 201、ROM 202以及RAM 203通过总线204彼此相连。输入/输出(I/O)接口205也连接至总线204。
以下部件连接至I/O接口205:包括键盘、鼠标等的输入部分206;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分207;包括硬盘等的存储部分208;以及包括诸如局域网(LAN)卡、调制解调器等的网络接口卡的通信部分209。通信部分209经由诸如因特网的网络执行通信处理。驱动器210也根据需要连接至I/O接口205。可拆卸介质211,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器210上,以便于从其上读出的计算机程序根据需要被安装入存储部分208。
特别地,根据本申请的实施例,下文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分209从网络上被下载和安装,和/或从可拆卸介质211被安装。在该计算机程序被中央处理单元(CPU)201执行时,执行本申请的系统中限定的各种功能。
需要说明的是,本申请所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计 算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该电子设备执行时,使得该电子设备实现如下述实施例中所述的方法。例如,所述的电子设备可以实现如图3、图6至图8所示的各个步骤。
以下对本申请实施例的技术方案的实现细节进行详细阐述:
图3示意性示出了根据本申请的一个实施例的图像处理方法的流程图,该图像处理方法适用于前述实施例中所述的电子设备。参照图3所示,该图像处理方法至少包括步骤S310至步骤S330,详细介绍如下:
在步骤S310中,获取待处理的目标图像。
在本申请的一个实施例中,待处理的目标图像可以是需要进行视觉处理的图像,比如需要进行物体识别处理的图像。
在步骤S320中,基于残差网络对所述目标图像进行特征提取,得到图像特征信息,所述残差网络包含顺次相连的多个残差块(Residual Blocks),每个所述残差块包含卷积分支和残差分支,所述卷积分支中的第一卷积层的卷积核大小小于位于所述第一卷积层之后的第二卷积层的卷积核大小,所述第二卷积层的卷积步长大于所述第一卷积层的卷积步长且小于所述第二卷积层的卷积核宽度。其中,残差块中的残差分支是由卷积分支的输入指向卷积分支的输出。
本申请实施例中,对残差网络描述时所使用的“块”、“分支”、“层”、“阶段”表示各种处理过程或步骤,所使用的“顺次相连”一词,表示各个处理过程或步骤的前后顺序相连接。
根据本申请实施例,卷积层也指的是对待处理的目标图像进行卷积处理或操作。卷积(Convolution)是一种通过两个函数f和g生成第三个函数的数学算子。根据本申请实施例,目标图像例如用函数f表示,卷积核为函数g。函数f、g通常是三维离散的矩阵,生成的函数也是三维矩阵。例如,目标图像表示为(H,W,C)三维矩阵的格式,其中,H、W分别为目标图像的高度和宽度,用于表示目标图像的分辨率或者尺寸(Size),C表示目标图像的通道(Channel)数,比如彩色图像有三通道(R,G,B)即,C=3。在用于表示目标图像的三维矩阵中,例如第一维元素为目标图像的每一行像素,第二维元素为其每一列像素,第三维元素为其每一个通道的像素值,它将目标图像中的每一个像素作为描述单元,记录它三个通道的像素值。
卷积核也称为滤波矩阵。卷积核通过“滑动窗口”的方式提取出目标图像不同位置的特征,所得结果为特征图(Feature Map),特征图上的像素点即为特征点。卷积步长(Stride)是卷积核中心每次在目标图像上移动的像素个数。例如,输入的目标图像为5×5像素的灰度图,卷积核为3×3像素的矩阵,卷积步长为1。卷积过程为:将3×3像素的矩阵在5×5像素的图片上从左上角向右下角移动,每次移动1个像素。卷积核矩阵每移动一次,将卷积核矩阵和对应的目标图像的特征点做点乘,对点乘的积求和,形成新的矩阵。卷积步长能够实现下采样功能,通过卷积计算,输出的图像比输入的目标图像的分辨率小了很多。
在本申请的一个实施例中,残差网络可以是深度残差网络,残差网络还包含位于所述多个残差块之前的初始卷积层,所述初始卷积层的输出作为所述多 个残差块中第一个残差块的输入。在该实施例中,由于残差块中的第二卷积层已经能够实现下采样处理,因此可以去掉一些残差网络中位于残差块之前的池化层,简化了残差网络的结构。
在本申请的一个实施例中,残差网络中的多个残差块构成了多个卷积阶段(stages),每个所述卷积阶段中的每个残差块包含的所述残差分支包含顺次相连的一批量归一化处理层和一目标卷积层。
在该实施例中,对于一个残差块而言,如果其输入和输出具有相同的尺寸(包括size和channel等),那么残差分支可以是一个恒等映射。但是如果其输入和输出的尺寸不相同,那么需要通过一个卷积操作来将输入和输出映射到相同的尺寸上。根据本申请实施例,在每个卷积阶段中的第一个残差块中需要通过非恒等映射(即增加一个卷积层)的残差分支来保证残差块的输入和输出尺寸一致。同时,由于卷积层的卷积操作没有偏置项,因此可以在卷积层之前添加一个BN(即Batch Normalization,批量归一化)层来添加偏置项,进而能够保证达到最优的处理效果。BN层,是指将多个样本(目标图像)通过卷积处理产生的特征图进行归一化处理,具体而言,就是对每一个样本产生的特征点以减均值、除方差的方式,将特征点的分布归一化成均值为0,方差为1的分布。
继续参照图3所示,在步骤S330中,根据所述图像特征信息对所述待处理图像进行识别处理。
图3所示实施例的技术方案使得残差块中的卷积层在进行卷积操作时,既保证了能够通过第二卷积层实现下采样的处理,又能够保证不会跳过任何一个特征点,进而可以保证不会造成特征网络表征能力的损失,从而能够确保图像特征提取的准确性,提高了图像识别的准确率。
基于前述实施例中所介绍的残差网络的结构,在本申请的一个具体实施例中,如图4所示,为本申请实施例的残差网络的每个卷积阶段中的第一个残差块的结构示意图,具体包括:卷积分支401和残差分支402,其中,残差分支402由卷积分支401的输入指向卷积分支401的输出。
卷积分支401包括第一卷积层4011、第二卷积层4012和第三卷积层4013,第一卷积层4011、第二卷积层4012和第三卷积层4013中的每个卷积层之前均设置有BN层,并且在通过BN层处理之后,均会通过Relu(Rectified Linear Unit,线性整流单元)进行处理。通常,随着卷积层数的增 加,残差网络的表达能力增强,在具体应用上的效果更好。例如,在图像识别应用中,可能会更准确地识别目标图像中的对象。其中,第一卷积层4011的卷积核大小为1×1像素、卷积步长为1像素;第二卷积层4012的卷积核大小为3×3像素、卷积步长为2像素;第三卷积层4013的卷积核大小为1×1像素、卷积步长为1像素。由于第二卷积层4012既可以实现下采样的处理,又能够保证不会跳过任何一个特征点,因此本申请实施例的残差块能够保证不会造成特征网络表征能力的损失。
在有的残差网络结构中,卷积分支中的第一个卷积层的卷积核大小为1×1像素、卷积步长为2像素,第二个卷积层的卷积核大小为3×3像素、卷积步长为1像素,那么在第一个卷积层进行卷积操作时,两次卷积过程之间会跳过一个特征点,进而会造成特征网络的损失。而通过使用本申请实施例的残差网络,使残差块包含的卷积分支中的第一卷积层的卷积核大小小于位于第一卷积层之后的第二卷积层的卷积核大小,且第二卷积层的卷积步长大于第一卷积层的卷积步长且小于第二卷积层的卷积核宽度,使得残差块中的卷积层在进行卷积操作时,既保证了能够通过第二卷积层实现下采样的处理,又能够保证不会跳过任何一个特征点,进而可以保证不会造成特征网络表征能力的损失,从而能够确保图像特征提取的准确性,提高了图像识别的准确率。
线性整流单元的计算公式例如为:
y=max(0,x),
其中,x为输入特征图上的特征点,y为输出特征图上对应的特征点。线性整流单元给一个在卷积层中经过线性计算操作的系统引入非线性特征。
残差分支402包括卷积层4021和卷积层之前设置的BN层,并且在通过BN层处理之后,会通过Relu函数进行处理。
卷积分支401和残差分支402的输出在元素层面上执行加法(Addition)运算,得到每个残差块的输出。
在本申请的一个实施例中,如图5所示为本申请实施例的残差网络的结构示意图,该结构包括:顺次相连的初始卷积层501、卷积阶段502、卷积阶段503、卷积阶段504、卷积阶段505、全局平均池化层506和全连接层507。其中,初始卷积层501的卷积核大小为7×7像素、卷积步长为2像素、通道数(channel)为64个;卷积阶段502、卷积阶段503、卷积阶段504、卷积阶段505中的每个卷积阶段包含多个残差块,不同卷积阶段中包含的残差块的数量 可能不相同,比如在ResNet101中,卷积阶段502包含3个残差块、卷积阶段503包含4个残差块、卷积阶段504包含23个残差块、卷积阶段505包含4个残差块。需要说明的是,每个卷积阶段中的第一个残差块的结构如图4所示,其它的残差块中的残差分支为恒等映射,卷积分支与图4中所示的卷积分支401相同。
从图4和图5所示的残差网络的结构可以看出,本申请实施例中的残差网络初始卷积层501之后,去掉了一些残差网络中的最大池化层,并将下采样过程放到了第一个卷积阶段中,即卷积阶段502中,具体是放到了卷积阶段502中第一个残差块中的第二卷积层4012中。同时,在每个残差块中,将下采样的过程均放到了卷积核大小为3×3像素的第二卷积层中,进而保证下采样过程不会跳过任何一个特征点,确保不会造成特征网络表征能力的损失。此外,不仅在卷积分支中添加了BN层,而且在非恒等映射的残差分支中也添加了BN层,这样可以通过BN层在卷积层之前添加偏置项,进而能够保证达到最优的处理效果。
基于前述实施例中介绍的残差网络的结构,本申请的实施例还提出了如何对残差网络进行训练的方案,具体参照图6所示,根据本申请的实施例的对残差网络进行训练的方法,包括:
步骤S610,初始化所述残差网络;
步骤S620,将训练图像样本输入所述残差网络进行迭代训练,直至所述残差网络的损失函数满足收敛条件。
在计算机神经科学的领域中,损失函数或成本函数是指一种将一个事件(在一个样本空间中的一个元素)映射到一个表达与其事件相关的机会成本的实数上的一种函数,借此直观表示的一些“成本”与事件的关联。一个最佳化问题的目标是将损失函数最小化。一个目标函数通常为一个损失函数的本身或者为其负值。当一个目标函数为损失函数的负值时,目标函数的值寻求最大化。损失函数的作用是用于估计参数。
在本申请的一个实施例中,在对残差网络进行迭代训练时,可以使用Momentum-SGD(Stochastic Gradient Descent,随机梯度下降)优化方法来进行训练,并且可以采用分布式训练框架来提高训练速率,比如可以采用4机32显卡的硬件配置来进行训练,训练的具体超参数如表1所示。其中,epoch 是指以当前的Batch size(批处理大小)将所有训练图像学习一遍所用的迭代次数。
超参数 | 值 |
Batch Size | 64*4*32(个) |
学习速率 | 0.8 |
学习速率decay系数 | 0.1 |
学习速率decay间隔 | 30epoch |
学习速率warmup | 0.1 |
学习速率warmup decay系数 | 0.1 |
学习速率warmup decay间隔 | 1epoch |
权值正则项系数 | 1e-4 |
表1
在本申请的一个实施例中,残差网络中的BN层包含有可学习参数,并且这些可学习参数和训练图像样本的分布相关性比较强,具体参照下述的BN层的前向传导过程公式所示:
其中,β和γ表示学习参数,μ表示训练图像样本的平均值,σ表示训练图像样本的方差,x表示训练图像样本,y表示输出。β和γ是在训练图像样本的基础上,使用优化算法迭代学习得到。学习的过程是为了能够通过调整参数将损失函数最小化(或者当损失为负时,最大化)。
因此,在本申请的一个实施例中,可以根据β和γ生成损失函数的正则项,并加入残差网络的原始损失函数(即分类loss)中,以提高残差网络模型的泛化能力。其中,可以计算β和γ的平方和,然后将该平方和的算术平均值作为原始损失函数的正则项(即
),即在本申请的实施例中,采用Tensorfow实现的L2正则项的计算方法来计算原始损失函数的正则项。实验发现,由于β和γ的量级通常比残差网络中卷积层的权重(weight)的量级要大一个量级,因此在将该正则项添加至原始损失函数中时,为该正则项设置的正则项系数要小于权重的量级,比如比权重量级小一个量级,以避免添加的正则项对原始损失函数造成较大影响。
在本申请的一个实施例中,还提出来如何得到训练样本图像的技术方案,具体如图7所示,包括以下步骤:
步骤S710,获取用于对残差网络进行训练的样本图像。
在本申请的一个实施例中,该样本图像可以是RGB图像。
步骤S720,从所述样本图像中截取占所述样本图像预定比例、且宽高比为预定比值的图像区域。
该实施例的技术方案使得能够从样本图像中截取多张图像区域,同时能够保证截取得到的图像的平移不变性及尺寸不变性。
在本申请的一个实施例中,所述预定比例为从预定比例区间中随机选取的值,和/或所述预定比值为从预定宽高比区间中随机选取的值,比如预定比例区间可以是[0.05,1.0],预定宽高比区间可以是[3/4,4/3]。
步骤S730,将所述图像区域调整为设定尺寸的图像。
在该实施例中,通过将图像区域调整为设定尺寸的图像,使得在对残差网络进行训练时,可以以相同大小的图像输入至网络中进行训练。
步骤S740,对所述设定尺寸的图像进行随机扰动处理,以得到所述训练图像样本。
在本申请的一个实施例中,对设定尺寸的图像进行随机扰动处理,包括:以第一处理概率对所述设定尺寸的图像进行水平翻转处理;和/或以第二处理概率对所述设定尺寸的图像进行随机角度的旋转处理,所述随机角度为从预定角度区间中随机选取的值;和/或以第三处理概率对所述设定尺寸的图像的属性进行调整。其中,图像的属性包括饱和度、对比度、亮度和色度等。
图7所示实施例的技术方案使得能够采用一定的概率选择是否对图像进行处理,进而可以在增大训练数据差异度的同时,避免引入过多的数据扰动而带来较大的噪声影响。
在本申请的一个具体实施例中,如图8所示,该实施例示出了一种对图像进行扰动处理的流程,具体包括:
步骤S801,输入一幅图像。该图像可以是RGB图像。
步骤S802,从图像中随机剪裁出占图像总面积比例为[0.05,1.0]中的任一比例,宽高比为[3/4,4/3]中的任一比值的区域。
步骤S803,将裁剪得到的图像尺寸调整为224*224像素的尺寸。
步骤S804,以0.5的处理概率对图像进行水平翻转处理。
步骤S805,以0.25的处理概率对图像进行随机角度的旋转处理。
步骤S806,以0.5的处理概率对图像的饱和度、对比度、亮度和色度进行扰动处理。
需要说明的是,图8中所示的具体数值仅为示例,步骤S804至步骤S806的处理顺序并没有严格要求,即这些步骤的执行顺序可以调换,也可以同时执行。
通过本申请上述实施例提出的残差网络的结构,可以达到目前最优的效果,具体如表2所示:
模型 | 框架 | Top1 Acc(%) | Top5 Error(%) |
ResNet101 | Tensorflow | 78.22 | 94.00 |
ResNet152 | Tensorflow | 78.94 | 94.44 |
表2
本申请上述实施例的技术方案可以广泛运用于视觉相关的业务中,譬如可以用于对图像质量的评价与推荐,游戏场景内的物体识别,图像理解和视频理解等。
以下介绍本申请的装置实施例,可以用于执行本申请上述实施例中的图像处理方法。对于本申请装置实施例中未披露的细节,请参照本申请上述的图像处理方法的实施例。
图9示意性示出了根据本申请的一个实施例的图像处理装置的框图。
参照图9所示,根据本申请的一个实施例的图像处理装置900,包括:第一获取单元901、第一处理单元902和第二处理单元903。
其中,第一获取单元901用于获取待处理的目标图像;第一处理单元902用于基于残差网络对所述目标图像进行特征提取,得到图像特征信息,所述残差网络包含顺次相连的多个残差块,每个所述残差块包含卷积分支和残差分支,所述卷积分支中的第一卷积层的卷积核大小小于位于所述第一卷积层之后的第二卷积层的卷积核大小,所述第二卷积层的卷积步长大于所述第一卷积层的卷积步长且小于所述第二卷积层的卷积核宽度;第二处理单元903用于根据所述图像特征信息对所述待处理图像进行识别处理。
在本申请的一些实施例中,基于前述方案,所述残差网络还包含位于所述多个残差块之前的初始卷积层,所述初始卷积层的输出作为所述多个残差块中第一个残差块的输入。
在本申请的一些实施例中,基于前述方案,所述多个残差块构成多个卷积阶段,每个所述卷积阶段中的第一个残差块包含的所述残差分支包含顺次相连的一批量归一化处理层和一目标卷积层。
在本申请的一些实施例中,基于前述方案,所述卷积分支还包括第三卷积层,所述第一卷积层、所述第二卷积层和所述第三卷积层顺次相连;其中,所述第一卷积层和所述第三卷积层的卷积核大小为1×1像素、卷积步长为1像素;所述第二卷积层的卷积核大小为3×3像素、卷积步长为2像素。
在本申请的一些实施例中,基于前述方案,所述第一卷积层、所述第二卷积层和所述第三卷积层中的每个卷积层之前均设置有一批量归一化处理层。
图10示意性示出了根据本申请的另一个实施例的图像处理装置的框图。
参照图10所示,根据本申请的另一个实施例的图像处理装置1000,在具有图9中所示的第一获取单元901、第一处理单元902和第二处理单元903的基础上,还包括:初始化单元1001和训练单元1002。
其中,初始化单元1001用于初始化所述残差网络;训练单元1002用于将训练图像样本输入所述残差网络进行迭代训练,直至所述残差网络的损失函数满足收敛条件。
图11示意性示出了根据本申请的又一个实施例的图像处理装置的框图。
参照图11所示,根据本申请的又一个实施例的图像处理装置1100,在图10所示的图像处理装置的基础上,还包括:损失函数优化单元1101。该损失函数优化单元1101用于获取所述残差网络中的批量归一化处理层所包含的学习参数,并通过所述学习参数生成所述损失函数的正则项,将所述正则项添加至所述损失函数中。
在本申请的一些实施例中,基于前述方案,所述损失函数优化单元1101用于:确定所述正则项的系数,所述正则项的系数的数量级小于所述残差网络包含的卷积层的权重的数量级;基于所述正则项的系数,将所述正则项添加至所述损失函数中。
在本申请的一些实施例中,基于前述方案,所述损失函数优化单元1101用于:计算所述学习参数的平方和;将所述平方和的算术平均值作为所述损失函数的正则项。
在本申请的一些实施例中,基于前述方案,图10和图11所示的图像处理装置还可以包括:第二获取单元,用于获取用于对残差网络进行训练的样本图 像;截取单元,用于从所述样本图像中截取占所述样本图像预定比例、且宽高比为预定比值的图像区域;尺寸调整单元,用于将所述图像区域调整为设定尺寸的图像;扰动处理单元,用于对所述设定尺寸的图像进行随机扰动处理,以得到所述训练图像样本。
在本申请的一些实施例中,基于前述方案,所述预定比例为从预定比例区间中随机选取的值,和/或所述预定比值为从预定宽高比区间中随机选取的值。
在本申请的一些实施例中,基于前述方案,所述扰动处理单元用于:以第一处理概率对所述设定尺寸的图像进行水平翻转处理;和/或以第二处理概率对所述设定尺寸的图像进行随机角度的旋转处理,所述随机角度为从预定角度区间中随机选取的值;和/或以第三处理概率对所述设定尺寸的图像的属性进行调整。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本申请实施方式的方法。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。
Claims (15)
- 一种图像处理方法,由电子设备执行,其特征在于,所述方法包括:获取待处理的目标图像;基于残差网络对所述目标图像进行特征提取,得到图像特征信息,所述残差网络包含顺次相连的多个残差块,每个所述残差块包含卷积分支和残差分支,所述卷积分支中的第一卷积层的卷积核大小小于位于所述第一卷积层之后的第二卷积层的卷积核大小,所述第二卷积层的卷积步长大于所述第一卷积层的卷积步长且小于所述第二卷积层的卷积核宽度;根据所述图像特征信息对所述待处理图像进行识别处理。
- 根据权利要求1所述的图像处理方法,其特征在于,所述残差网络还包含位于所述多个残差块之前的初始卷积层,所述初始卷积层的输出作为所述多个残差块中第一个残差块的输入。
- 根据权利要求1所述的图像处理方法,其特征在于,所述多个残差块构成多个卷积阶段,每个所述卷积阶段中的第一个残差块包含的所述残差分支包含顺次相连的一批量归一化处理层和一目标卷积层。
- 根据权利要求1所述的图像处理方法,其特征在于,所述卷积分支还包括第三卷积层,所述第一卷积层、所述第二卷积层和所述第三卷积层顺次相连;其中,所述第一卷积层和所述第三卷积层的卷积核大小为1×1像素、卷积步长为1像素;所述第二卷积层的卷积核大小为3×3像素、卷积步长为2像素。
- 根据权利要求4所述的图像处理方法,其特征在于,所述第一卷积层、所述第二卷积层和所述第三卷积层中的每个卷积层之前均设置有一批量归一化处理层。
- 根据权利要求1至5中任一项所述的图像处理方法,其特征在于,在基于残差网络对所述目标图像进行特征提取之前,还包括:初始化所述残差网络;将训练图像样本输入所述残差网络进行迭代训练,直至所述残差网络的损失函数满足收敛条件。
- 根据权利要求6所述的图像处理方法,其特征在于,还包括:获取所述残差网络中的批量归一化处理层所包含的学习参数;通过所述学习参数生成所述损失函数的正则项,并将所述正则项添加至所述损失函数中。
- 根据权利要求7所述的图像处理方法,其特征在于,将所述正则项添加至所述损失函数中,包括:确定所述正则项的系数,所述正则项的系数的数量级小于所述残差网络包含的卷积层的权重的数量级;基于所述正则项的系数,将所述正则项添加至所述损失函数中。
- 根据权利要求7所述的图像处理方法,其特征在于,通过所述学习参数生成所述损失函数的正则项,包括:计算所述学习参数的平方和;将所述平方和的算术平均值作为所述损失函数的正则项。
- 根据权利要求6所述的图像处理方法,其特征在于,在将训练图像样本输入所述残差网络进行迭代训练之前,还包括:获取用于对残差网络进行训练的样本图像;从所述样本图像中截取占所述样本图像预定比例、且宽高比为预定比值的图像区域;将所述图像区域调整为设定尺寸的图像;对所述设定尺寸的图像进行随机扰动处理,以得到所述训练图像样本。
- 根据权利要求10所述的图像处理方法,其特征在于,所述预定比例为从预定比例区间中随机选取的值,和/或所述预定比值为从预定宽高比区间中随机选取的值。
- 根据权利要求10所述的图像处理方法,其特征在于,对所述设定尺寸的图像进行随机扰动处理,包括:以第一处理概率对所述设定尺寸的图像进行水平翻转处理;和/或以第二处理概率对所述设定尺寸的图像进行随机角度的旋转处理,所述随机角度为从预定角度区间中随机选取的值;和/或以第三处理概率对所述设定尺寸的图像的属性进行调整。
- 一种图像处理装置,其特征在于,包括:第一获取单元,用于获取待处理的目标图像;第一处理单元,用于基于残差网络对所述目标图像进行特征提取,得到图像特征信息,所述残差网络包含顺次相连的多个残差块,每个所述残差块包含 卷积分支和残差分支,所述卷积分支中的第一卷积层的卷积核大小小于位于所述第一卷积层之后的第二卷积层的卷积核大小,所述第二卷积层的卷积步长大于所述第一卷积层的卷积步长且小于所述第二卷积层的卷积核宽度;第二处理单元,用于根据所述图像特征信息对所述待处理图像进行识别处理。
- 一种计算机可读介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至12中任一项所述的图像处理方法。
- 一种电子设备,其特征在于,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1至12中任一项所述的图像处理方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19814012.1A EP3767523A4 (en) | 2018-06-08 | 2019-05-10 | IMAGE PROCESSING METHOD AND DEVICE, COMPUTER READABLE MEDIUM AND ELECTRONIC DEVICE |
US16/923,325 US11416781B2 (en) | 2018-06-08 | 2020-07-08 | Image processing method and apparatus, and computer-readable medium, and electronic device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810588686.9A CN110163215B (zh) | 2018-06-08 | 2018-06-08 | 图像处理方法、装置、计算机可读介质及电子设备 |
CN201810588686.9 | 2018-06-08 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/923,325 Continuation US11416781B2 (en) | 2018-06-08 | 2020-07-08 | Image processing method and apparatus, and computer-readable medium, and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019233244A1 true WO2019233244A1 (zh) | 2019-12-12 |
Family
ID=67644895
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/086384 WO2019233244A1 (zh) | 2018-06-08 | 2019-05-10 | 图像处理方法、装置、计算机可读介质及电子设备 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11416781B2 (zh) |
EP (1) | EP3767523A4 (zh) |
CN (1) | CN110163215B (zh) |
WO (1) | WO2019233244A1 (zh) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111047515A (zh) * | 2019-12-29 | 2020-04-21 | 兰州理工大学 | 一种基于注意力机制的空洞卷积神经网络图像超分辨率重建方法 |
CN111160268A (zh) * | 2019-12-30 | 2020-05-15 | 北京化工大学 | 一种基于多任务学习的多角度sar目标识别方法 |
CN111242228A (zh) * | 2020-01-16 | 2020-06-05 | 武汉轻工大学 | 高光谱图像分类方法、装置、设备及存储介质 |
CN111292268A (zh) * | 2020-02-07 | 2020-06-16 | 北京字节跳动网络技术有限公司 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
CN111353470A (zh) * | 2020-03-13 | 2020-06-30 | 北京字节跳动网络技术有限公司 | 图像的处理方法、装置、可读介质和电子设备 |
CN111783519A (zh) * | 2020-05-15 | 2020-10-16 | 北京迈格威科技有限公司 | 图像处理方法、装置、电子设备及存储介质 |
CN111814653A (zh) * | 2020-07-02 | 2020-10-23 | 苏州交驰人工智能研究院有限公司 | 一种视频中异常行为的检测方法、装置、设备及存储介质 |
CN111832383A (zh) * | 2020-05-08 | 2020-10-27 | 北京嘀嘀无限科技发展有限公司 | 姿态关键点识别模型的训练方法、姿态识别方法及装置 |
CN112308145A (zh) * | 2020-10-30 | 2021-02-02 | 北京百度网讯科技有限公司 | 一种分类网络训练方法、分类方法、装置以及电子设备 |
CN112464775A (zh) * | 2020-11-21 | 2021-03-09 | 西北工业大学 | 一种基于多分支网络的视频目标重识别方法 |
CN112801161A (zh) * | 2021-01-22 | 2021-05-14 | 桂林市国创朝阳信息科技有限公司 | 小样本图像分类方法、装置、电子设备及计算机存储介质 |
CN112836804A (zh) * | 2021-02-08 | 2021-05-25 | 北京迈格威科技有限公司 | 图像处理方法、装置、电子设备及存储介质 |
CN113239899A (zh) * | 2021-06-17 | 2021-08-10 | 阿波罗智联(北京)科技有限公司 | 用于处理图像和生成卷积核的方法、路侧设备和云控平台 |
CN113344200A (zh) * | 2021-06-17 | 2021-09-03 | 阿波罗智联(北京)科技有限公司 | 用于训练可分离卷积网络的方法、路侧设备及云控平台 |
CN113362409A (zh) * | 2021-05-28 | 2021-09-07 | 北京百度网讯科技有限公司 | 图像上色及其模型训练方法、装置、电子设备、存储介质 |
CN115001937A (zh) * | 2022-04-11 | 2022-09-02 | 北京邮电大学 | 面向智慧城市物联网的故障预测方法及装置 |
EP4165555A4 (en) * | 2020-06-15 | 2024-03-27 | INTEL Corporation | SWITCHABLE INPUT IMAGE SIZE ARRAY FOR ADAPTIVE RUNNING-EFFICIENT IMAGE CLASSIFICATION |
Families Citing this family (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11410000B2 (en) * | 2019-08-08 | 2022-08-09 | Beijing Boe Health Technology Co., Ltd. | Computer-implemented method, computer-implemented diagnosis method, apparatus for classifying image, and computer-program product |
CN110633711B (zh) * | 2019-09-09 | 2022-02-11 | 长沙理工大学 | 训练特征点检测器的计算机装置、方法及特征点检测方法 |
CN110930290B (zh) * | 2019-11-13 | 2023-07-07 | 东软睿驰汽车技术(沈阳)有限公司 | 一种数据处理方法及装置 |
CN112883981A (zh) * | 2019-11-29 | 2021-06-01 | 阿里巴巴集团控股有限公司 | 一种图像处理方法、设备及存储介质 |
CN111091521B (zh) * | 2019-12-05 | 2023-04-07 | 腾讯科技(深圳)有限公司 | 图像处理方法及装置、电子设备和计算机可读存储介质 |
CN112949672B (zh) * | 2019-12-11 | 2024-10-15 | 顺丰科技有限公司 | 商品识别方法、装置、设备以及计算机可读存储介质 |
CN111126303B (zh) * | 2019-12-25 | 2023-06-09 | 北京工业大学 | 一种面向智能停车的多车位检测方法 |
CN111104987B (zh) * | 2019-12-25 | 2023-08-01 | 盛景智能科技(嘉兴)有限公司 | 人脸识别方法、装置及电子设备 |
CN111191593A (zh) * | 2019-12-30 | 2020-05-22 | 成都云尚物联环境科技有限公司 | 图像目标检测方法、装置、存储介质及污水管道检测装置 |
US11380023B2 (en) * | 2020-03-18 | 2022-07-05 | Adobe Inc. | End-to-end relighting of a foreground object of an image |
CN111368937B (zh) * | 2020-03-19 | 2024-05-28 | 京东方科技集团股份有限公司 | 图像分类方法、装置、及其训练方法、装置、设备、介质 |
CN112508924B (zh) * | 2020-12-15 | 2022-09-23 | 桂林电子科技大学 | 一种小目标检测识别方法、装置、系统和存储介质 |
CN112801102B (zh) * | 2021-01-11 | 2023-06-16 | 成都圭目机器人有限公司 | 一种道面块状病害检测的方法 |
CN112785527B (zh) * | 2021-01-28 | 2024-08-02 | 太原巍昂科电子科技有限责任公司 | 基于ResNet网络的图像去模糊方法 |
CN112862079B (zh) * | 2021-03-10 | 2023-04-28 | 中山大学 | 一种流水式卷积计算架构设计方法及残差网络加速系统 |
CN113010873B (zh) * | 2021-03-31 | 2022-09-09 | 山石网科通信技术股份有限公司 | 图像处理方法、装置、非易失性存储介质及处理器 |
CN112906721B (zh) * | 2021-05-07 | 2021-07-23 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、设备及计算机可读存储介质 |
CN113223107B (zh) * | 2021-05-20 | 2023-01-31 | 华北电力大学(保定) | 一种气液两相流的电阻层析成像方法、装置及终端设备 |
CN113658076B (zh) * | 2021-08-18 | 2022-08-02 | 中科天网(广东)科技有限公司 | 基于特征纠缠调制的图像复原方法、装置、设备和介质 |
CN113706416A (zh) * | 2021-09-02 | 2021-11-26 | 宁波星帆信息科技有限公司 | 天文图像复原方法、电子设备、介质及程序产品 |
CN113744160B (zh) * | 2021-09-15 | 2022-09-02 | 马上消费金融股份有限公司 | 图像处理模型训练方法、图像处理方法、装置及电子设备 |
CN114049584A (zh) * | 2021-10-09 | 2022-02-15 | 百果园技术(新加坡)有限公司 | 一种模型训练和场景识别方法、装置、设备及介质 |
CN113887542B (zh) * | 2021-12-06 | 2022-04-05 | 孙晖 | 目标检测方法、电子设备及存储介质 |
CN114548153B (zh) * | 2022-01-21 | 2023-06-02 | 电子科技大学 | 基于残差-胶囊网络的行星齿轮箱故障诊断方法 |
CN114429430A (zh) * | 2022-01-30 | 2022-05-03 | 京东方科技集团股份有限公司 | 图像处理方法、电子设备和非瞬态计算机可读介质 |
CN114862713B (zh) * | 2022-04-29 | 2023-07-25 | 西安理工大学 | 基于注意力平滑膨胀卷积的两阶段图像去雨方法 |
CN114689030A (zh) * | 2022-06-01 | 2022-07-01 | 中国兵器装备集团自动化研究所有限公司 | 一种基于机载视觉的无人机辅助定位方法及系统 |
CN116468100B (zh) * | 2023-03-06 | 2024-05-10 | 美的集团(上海)有限公司 | 残差剪枝方法、装置、电子设备和可读存储介质 |
CN118568896A (zh) * | 2024-07-29 | 2024-08-30 | 中国海洋大学 | 门铰链子系统确定方法、装置、电子设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106778682A (zh) * | 2017-01-11 | 2017-05-31 | 厦门中控生物识别信息技术有限公司 | 一种卷积神经网络模型的训练方法及其设备 |
CN107527044A (zh) * | 2017-09-18 | 2017-12-29 | 北京邮电大学 | 一种基于搜索的多张车牌清晰化方法及装置 |
CN107871105A (zh) * | 2016-09-26 | 2018-04-03 | 北京眼神科技有限公司 | 一种人脸认证方法和装置 |
US20180137388A1 (en) * | 2016-11-14 | 2018-05-17 | Samsung Electronics Co., Ltd. | Method and apparatus for analyzing facial image |
CN108108499A (zh) * | 2018-02-07 | 2018-06-01 | 腾讯科技(深圳)有限公司 | 人脸检索方法、装置、存储介质及设备 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11580398B2 (en) * | 2016-10-14 | 2023-02-14 | KLA-Tenor Corp. | Diagnostic systems and methods for deep learning models configured for semiconductor applications |
US11126914B2 (en) * | 2017-10-11 | 2021-09-21 | General Electric Company | Image generation using machine learning |
CN108009531B (zh) * | 2017-12-28 | 2022-01-07 | 北京工业大学 | 一种多策略防欺诈的人脸识别方法 |
-
2018
- 2018-06-08 CN CN201810588686.9A patent/CN110163215B/zh active Active
-
2019
- 2019-05-10 WO PCT/CN2019/086384 patent/WO2019233244A1/zh unknown
- 2019-05-10 EP EP19814012.1A patent/EP3767523A4/en active Pending
-
2020
- 2020-07-08 US US16/923,325 patent/US11416781B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107871105A (zh) * | 2016-09-26 | 2018-04-03 | 北京眼神科技有限公司 | 一种人脸认证方法和装置 |
US20180137388A1 (en) * | 2016-11-14 | 2018-05-17 | Samsung Electronics Co., Ltd. | Method and apparatus for analyzing facial image |
CN106778682A (zh) * | 2017-01-11 | 2017-05-31 | 厦门中控生物识别信息技术有限公司 | 一种卷积神经网络模型的训练方法及其设备 |
CN107527044A (zh) * | 2017-09-18 | 2017-12-29 | 北京邮电大学 | 一种基于搜索的多张车牌清晰化方法及装置 |
CN108108499A (zh) * | 2018-02-07 | 2018-06-01 | 腾讯科技(深圳)有限公司 | 人脸检索方法、装置、存储介质及设备 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3767523A4 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111047515A (zh) * | 2019-12-29 | 2020-04-21 | 兰州理工大学 | 一种基于注意力机制的空洞卷积神经网络图像超分辨率重建方法 |
CN111047515B (zh) * | 2019-12-29 | 2024-01-09 | 兰州理工大学 | 一种基于注意力机制的空洞卷积神经网络图像超分辨率重建方法 |
CN111160268A (zh) * | 2019-12-30 | 2020-05-15 | 北京化工大学 | 一种基于多任务学习的多角度sar目标识别方法 |
CN111160268B (zh) * | 2019-12-30 | 2024-03-29 | 北京化工大学 | 一种基于多任务学习的多角度sar目标识别方法 |
CN111242228A (zh) * | 2020-01-16 | 2020-06-05 | 武汉轻工大学 | 高光谱图像分类方法、装置、设备及存储介质 |
CN111242228B (zh) * | 2020-01-16 | 2024-02-27 | 武汉轻工大学 | 高光谱图像分类方法、装置、设备及存储介质 |
CN111292268B (zh) * | 2020-02-07 | 2023-07-25 | 抖音视界有限公司 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
CN111292268A (zh) * | 2020-02-07 | 2020-06-16 | 北京字节跳动网络技术有限公司 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
CN111353470A (zh) * | 2020-03-13 | 2020-06-30 | 北京字节跳动网络技术有限公司 | 图像的处理方法、装置、可读介质和电子设备 |
CN111353470B (zh) * | 2020-03-13 | 2023-08-01 | 北京字节跳动网络技术有限公司 | 图像的处理方法、装置、可读介质和电子设备 |
CN111832383A (zh) * | 2020-05-08 | 2020-10-27 | 北京嘀嘀无限科技发展有限公司 | 姿态关键点识别模型的训练方法、姿态识别方法及装置 |
CN111832383B (zh) * | 2020-05-08 | 2023-12-08 | 北京嘀嘀无限科技发展有限公司 | 姿态关键点识别模型的训练方法、姿态识别方法及装置 |
CN111783519A (zh) * | 2020-05-15 | 2020-10-16 | 北京迈格威科技有限公司 | 图像处理方法、装置、电子设备及存储介质 |
EP4165555A4 (en) * | 2020-06-15 | 2024-03-27 | INTEL Corporation | SWITCHABLE INPUT IMAGE SIZE ARRAY FOR ADAPTIVE RUNNING-EFFICIENT IMAGE CLASSIFICATION |
CN111814653A (zh) * | 2020-07-02 | 2020-10-23 | 苏州交驰人工智能研究院有限公司 | 一种视频中异常行为的检测方法、装置、设备及存储介质 |
CN111814653B (zh) * | 2020-07-02 | 2024-04-05 | 苏州交驰人工智能研究院有限公司 | 一种视频中异常行为的检测方法、装置、设备及存储介质 |
CN112308145A (zh) * | 2020-10-30 | 2021-02-02 | 北京百度网讯科技有限公司 | 一种分类网络训练方法、分类方法、装置以及电子设备 |
CN112464775A (zh) * | 2020-11-21 | 2021-03-09 | 西北工业大学 | 一种基于多分支网络的视频目标重识别方法 |
CN112801161B (zh) * | 2021-01-22 | 2024-06-14 | 桂林市国创朝阳信息科技有限公司 | 小样本图像分类方法、装置、电子设备及计算机存储介质 |
CN112801161A (zh) * | 2021-01-22 | 2021-05-14 | 桂林市国创朝阳信息科技有限公司 | 小样本图像分类方法、装置、电子设备及计算机存储介质 |
CN112836804B (zh) * | 2021-02-08 | 2024-05-10 | 北京迈格威科技有限公司 | 图像处理方法、装置、电子设备及存储介质 |
CN112836804A (zh) * | 2021-02-08 | 2021-05-25 | 北京迈格威科技有限公司 | 图像处理方法、装置、电子设备及存储介质 |
CN113362409B (zh) * | 2021-05-28 | 2023-10-31 | 北京百度网讯科技有限公司 | 图像上色及其模型训练方法、装置、电子设备、存储介质 |
CN113362409A (zh) * | 2021-05-28 | 2021-09-07 | 北京百度网讯科技有限公司 | 图像上色及其模型训练方法、装置、电子设备、存储介质 |
CN113344200A (zh) * | 2021-06-17 | 2021-09-03 | 阿波罗智联(北京)科技有限公司 | 用于训练可分离卷积网络的方法、路侧设备及云控平台 |
CN113239899B (zh) * | 2021-06-17 | 2024-05-28 | 阿波罗智联(北京)科技有限公司 | 用于处理图像和生成卷积核的方法、路侧设备和云控平台 |
CN113344200B (zh) * | 2021-06-17 | 2024-05-28 | 阿波罗智联(北京)科技有限公司 | 用于训练可分离卷积网络的方法、路侧设备及云控平台 |
CN113239899A (zh) * | 2021-06-17 | 2021-08-10 | 阿波罗智联(北京)科技有限公司 | 用于处理图像和生成卷积核的方法、路侧设备和云控平台 |
CN115001937B (zh) * | 2022-04-11 | 2023-06-16 | 北京邮电大学 | 面向智慧城市物联网的故障预测方法及装置 |
CN115001937A (zh) * | 2022-04-11 | 2022-09-02 | 北京邮电大学 | 面向智慧城市物联网的故障预测方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
US20200342360A1 (en) | 2020-10-29 |
CN110163215A (zh) | 2019-08-23 |
EP3767523A1 (en) | 2021-01-20 |
CN110163215B (zh) | 2022-08-23 |
US11416781B2 (en) | 2022-08-16 |
EP3767523A4 (en) | 2021-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019233244A1 (zh) | 图像处理方法、装置、计算机可读介质及电子设备 | |
US10810435B2 (en) | Segmenting objects in video sequences | |
US11074445B2 (en) | Remote sensing image recognition method and apparatus, storage medium and electronic device | |
US9697416B2 (en) | Object detection using cascaded convolutional neural networks | |
US11443445B2 (en) | Method and apparatus for depth estimation of monocular image, and storage medium | |
CN110929780B (zh) | 视频分类模型构建、视频分类的方法、装置、设备及介质 | |
CN108229591B (zh) | 神经网络自适应训练方法和装置、设备、程序和存储介质 | |
WO2019174130A1 (zh) | 票据识别方法、服务器及计算机可读存储介质 | |
TWI721510B (zh) | 雙目圖像的深度估計方法、設備及儲存介質 | |
WO2020228522A1 (zh) | 目标跟踪方法、装置、存储介质及电子设备 | |
US10846870B2 (en) | Joint training technique for depth map generation | |
CN108280477B (zh) | 用于聚类图像的方法和装置 | |
CN111860398B (zh) | 遥感图像目标检测方法、系统及终端设备 | |
CN112348828B (zh) | 基于神经网络的实例分割方法和装置以及存储介质 | |
US20150215590A1 (en) | Image demosaicing | |
CN108985190B (zh) | 目标识别方法和装置、电子设备、存储介质 | |
CN109858333A (zh) | 图像处理方法、装置、电子设备及计算机可读介质 | |
WO2020001222A1 (zh) | 图像处理方法、装置、计算机可读介质及电子设备 | |
US20150086128A1 (en) | Method and apparatus for filtering an image | |
US20240320807A1 (en) | Image processing method and apparatus, device, and storage medium | |
CN110717929A (zh) | 图像目标检测方法、装置及存储介质 | |
WO2019205603A1 (zh) | 图像模糊度检测方法、装置、计算机设备及可读存储介质 | |
CN111899239A (zh) | 图像处理方法和装置 | |
CN110765304A (zh) | 图像处理方法、装置、电子设备及计算机可读介质 | |
EP2887309A1 (en) | Method and apparatus for filtering an image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19814012 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019814012 Country of ref document: EP Effective date: 20201016 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |