CN113191377A

CN113191377A - Method and apparatus for processing image

Info

Publication number: CN113191377A
Application number: CN202010036017.8A
Authority: CN
Inventors: 刘丹
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2021-07-30

Abstract

The embodiments of the present application disclose methods and apparatuses for processing images. A specific embodiment of the above method includes: determining a target convolutional layer in each convolutional layer of the convolutional neural network; determining the feature images of multiple channels input to the above-mentioned target convolutional layer; determining the corresponding to the above-mentioned target convolutional layer number of targets; perform convolution calculation on the feature images of the above-mentioned target number of channels and the target number of convolution kernels corresponding to the above-mentioned target number of channels in the above-mentioned target convolution layer; according to the results obtained by each convolution calculation, determine The output of the above target convolutional layer. This embodiment can effectively reduce the amount of parameters and the amount of calculation in the convolution process.

Description

Method and apparatus for processing image

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for processing images.

Background

There are many existing deep learning-based application algorithms, and for example, target detection algorithms such as RCNN, SPP net, Fast RCNN of the RCNN series, and end-to-end target detection algorithms represented by SSD and YOLO can well solve the problem of object identification. The most key step of all target detection algorithms is to extract the characteristics of objects in the images, and the characteristic extraction based on deep learning is completed by adopting a convolutional neural network.

The traditional convolutional neural network is composed of basic layers such as a convolutional layer, a pooling layer and a linear correction layer, and the convolutional operation in the convolutional layer is the most occupied parameter quantity and calculated quantity.

Disclosure of Invention

The embodiment of the application provides a method and a device for processing an image.

In a first aspect, an embodiment of the present application provides a method for processing an image, including: determining a target convolutional layer in each convolutional layer of the convolutional neural network; determining characteristic images of a plurality of channels input into the target convolutional layer; determining a target number corresponding to the target convolutional layer; performing convolution calculation on the feature images of the target number of channels and the target number of convolution kernels corresponding to the target number of channels in the target convolution layer; and determining the output result of the target convolution layer according to the result obtained by each convolution calculation.

In some embodiments, the performing convolution calculation on the feature images of the target number of channels and the target number of convolution kernels corresponding to the target number of channels in the target convolution layer includes: and performing convolution calculation on the feature images of the target number of adjacent channels and the target number of convolution kernels corresponding to the target number of adjacent channels in the target convolution layer.

In some embodiments, the above method further comprises: and taking the output result of the target convolutional layer as an input characteristic image of the next layer of the target convolutional layer in the convolutional neural network.

In some embodiments, the above method further comprises: and identifying an object included in the image input to the convolutional neural network according to an output result of a last convolutional layer of the convolutional neural network.

In some embodiments, the target number is an nth power of 2, N is a natural number, and N is greater than or equal to 1.

In a second aspect, an embodiment of the present application provides an apparatus for processing an image, including: a first determination unit configured to determine a target convolutional layer among convolutional layers of the convolutional neural network; a second determination unit configured to determine feature images of a plurality of channels into which the target convolutional layer is input; a third determining unit configured to determine a target number corresponding to the target convolutional layer; a convolution calculation unit configured to perform convolution calculation on the feature images of the target number of channels and a target number of convolution kernels corresponding to the target number of channels in the target convolution layer; and a result output unit configured to determine an output result of the target convolutional layer according to a result obtained by each convolution calculation.

In some embodiments, the convolution computation unit is further configured to: and performing convolution calculation on the feature images of the target number of adjacent channels and the target number of convolution kernels corresponding to the target number of adjacent channels in the target convolution layer.

In some embodiments, the above apparatus further comprises: a fourth determination unit configured to take an output result of the target convolutional layer as an input feature image of a layer next to the target convolutional layer in the convolutional neural network.

In some embodiments, the above apparatus further comprises: an object recognition unit configured to recognize an object included in an image input to the convolutional neural network according to an output result of a last convolutional layer of the convolutional neural network.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the embodiments of the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the method as described in any one of the embodiments of the first aspect.

The method and apparatus for processing an image according to the above embodiments of the present application first determine a target convolutional layer among convolutional layers of a convolutional neural network. Then, feature images of a plurality of channels of the input target convolution layer are determined, while determining the number of targets corresponding to the target convolution layer. And then, carrying out convolution calculation on the feature images of the target number of channels and the target number of convolution kernels corresponding to the target number of channels in the target convolution layer. And finally, determining the output result of the target convolutional layer according to the result obtained by each convolution calculation. The method of the embodiment can effectively reduce the parameter amount and the calculation amount in the convolution processing process.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for processing an image according to the present application;

FIG. 3 is a schematic illustration of an application scenario of a method for processing an image according to the present application;

FIG. 4 is a flow diagram of another embodiment of a method for processing an image according to the present application;

FIG. 5 is a schematic block diagram of one embodiment of an apparatus for processing images according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method for processing images or the apparatus for processing images of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as an image processing application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting image processing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background server that processes images transmitted on the

terminal devices

101, 102, 103. The backend server may perform processing such as feature extraction on data such as the received image, and feed back a processing result (e.g., a target recognition result) to the

terminal devices

101, 102, and 103.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for processing an image provided in the embodiment of the present application may be executed by the

terminal devices

101, 102, and 103, or may be executed by the server 105. Accordingly, the apparatus for processing images may be provided in the

terminal devices

101, 102, 103, or in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for processing an image according to the present application is shown. The method for processing the image of the embodiment comprises the following steps:

in step 201, a target convolutional layer is determined among convolutional layers of the convolutional neural network.

In this embodiment, an executing subject of the method for processing an image (e.g., the

terminal device

101, 102, 103 or the server 105 shown in fig. 1) may determine a target convolutional layer among convolutional layers of the convolutional neural network. Convolutional neural networks generally include a plurality of convolutional layers and pooling layers. The convolutional layer includes a plurality of convolution kernels for performing feature extraction on an input image. The image of the input convolutional layer may include a plurality of channels, one for each convolutional kernel. When the image to be processed is input from the input layer of the convolutional neural network, the execution subject may take the first convolutional layer as a target convolutional layer. In the image processing, the convolutional layer into which the feature image is to enter may be taken as a target convolutional layer.

In step 202, feature images of a plurality of channels of the input target convolutional layer are determined.

After determining the target convolutional layer, the execution subject may determine feature images of a plurality of channels of the input target convolutional layer. For example, the execution body may take as input the output of a previous layer of the target convolutional layer.

In step 203, the number of targets corresponding to the target convolutional layers is determined.

In this embodiment, the execution subject may determine the target number corresponding to the target convolutional layer after determining the target convolutional layer. It is understood that different convolutional layers may correspond to different target numbers, or to the same target number. The above target number is used to represent the number of channels of the image that are calculated simultaneously. For example, if the target number is 4, the execution subject may perform convolution calculation on the images of 4 channels together, and output an image of one channel.

In some optional implementations of the embodiment, the target number is N times of 2, N is a natural number, and N is greater than or equal to 1.

In this implementation, the value of the target number is set to be the power N of 2, which facilitates subsequent calculation.

And 204, performing convolution calculation on the feature images of the target number of channels and the target number of convolution kernels corresponding to the target number of channels in the target convolution layer.

After determining the target number M corresponding to the target convolutional layer, the execution main body may extract the feature images of the M channels from the feature images of the multiple channels input to the target convolutional layer, and perform convolution calculation on the extracted feature images of the M channels and the M convolution kernels corresponding to the M channels in the target convolutional layer. The M channels may be any M channels, for example, M adjacent channels, or M channels selected every other channel.

In some optional implementations of this embodiment, the M channels are M adjacent channels. Accordingly, when performing convolution calculation, the feature images of M adjacent channels may be convolved with M convolution kernels corresponding to the M adjacent channels in the target convolution layer.

And step 205, determining the output result of the target convolutional layer according to the result obtained by each convolution calculation.

After obtaining the result obtained from each convolution calculation, the execution body may add the result to a preset offset to obtain a final output result.

In some optional implementations of the embodiment, the execution subject may take an output result of the target convolutional layer as an input feature image of a layer next to the target convolutional layer in the convolutional neural network.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for processing an image according to the present embodiment. In the application scenario of fig. 3, the input channel of a certain convolutional layer of the convolutional neural network is a feature image of 8 channels. The execution main body performs convolution calculation on the feature images of the adjacent 4 channels and the convolution kernels of the 4 channels in the convolution layer, and outputs feature images of 5 channels.

In the method for processing an image according to the above embodiment of the present application, a target convolutional layer is first determined among convolutional layers of a convolutional neural network. Then, feature images of a plurality of channels of the input target convolution layer are determined, while determining the number of targets corresponding to the target convolution layer. And then, carrying out convolution calculation on the feature images of the target number of channels and the target number of convolution kernels corresponding to the target number of channels in the target convolution layer. And finally, determining the output result of the target convolutional layer according to the result obtained by each convolution calculation. The method of the embodiment can effectively reduce the parameter amount and the calculation amount in the convolution processing process.

With continued reference to FIG. 4, a flow 400 of another embodiment of a method for processing an image according to the present application is shown. As shown in fig. 4, the method for processing an image of the present embodiment may include the following steps:

step 401, a target convolutional layer is determined among convolutional layers of the convolutional neural network.

At step 402, feature images of a plurality of channels of an input target convolutional layer are determined.

At step 403, the number of targets corresponding to the target convolutional layers is determined.

And step 404, performing convolution calculation on the feature images of the target number of channels and the target number of convolution kernels corresponding to the target number of channels in the target convolution layer.

Step 405, determining the output result of the target convolutional layer according to the result obtained by each convolution calculation.

Step 406, identifying an object included in the image input to the convolutional neural network according to an output result of the last convolutional layer of the convolutional neural network.

In this embodiment, after determining the output result of the last convolutional layer of the convolutional neural network, the execution main body may perform processing such as pooling on the output result to determine an object included in the image input to the convolutional neural network. The object may be any object to be identified, such as a vehicle, a pedestrian, or the like.

The method for processing the image provided by the embodiment of the application can effectively reduce the parameters and the calculated amount in the process of identifying the object by utilizing various object identification algorithms, and improve the calculation efficiency.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present application provides an embodiment of an apparatus for processing an image, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the apparatus 500 for processing an image of the present embodiment includes: a first determination unit 501, a second determination unit 502, a third determination unit 503, a convolution calculation unit 504, and a result output unit 505.

A first determining unit 501 configured to determine a target convolutional layer among convolutional layers of the convolutional neural network.

A second determining unit 502 configured to determine feature images of a plurality of channels of the input target convolutional layer.

A third determining unit 503 configured to determine a target number corresponding to the target convolutional layers.

A convolution calculation unit 504 configured to perform convolution calculation on the feature images of the target number of channels and the target number of convolution kernels corresponding to the target number of channels in the target convolution layer.

And a result output unit 505 configured to determine an output result of the target convolutional layer according to a result obtained by each convolution calculation.

In some optional implementations of this embodiment, the convolution calculation unit 504 may be further configured to: and performing convolution calculation on the feature images of the target number of adjacent channels and the target number of convolution kernels corresponding to the target number of adjacent channels in the target convolution layer.

In some optional implementations of this embodiment, the apparatus 500 may further include a fourth determining unit, not shown in fig. 5, configured to take the output result of the target convolutional layer as an input feature image of a layer next to the target convolutional layer in the convolutional neural network.

In some optional implementations of the present embodiment, the apparatus 500 may further include an object recognition unit, not shown in fig. 5, configured to recognize an object included in the image input to the convolutional neural network according to an output result of the last convolutional layer of the convolutional neural network.

In some optional implementations of this embodiment, the target number is N times 2, N is a natural number, and N is greater than or equal to 1.

It should be understood that units 501 to 505, which are described in the apparatus 500 for processing an image, correspond to respective steps in the method described with reference to fig. 2. Thus, the operations and features described above for the method for processing an image are equally applicable to the apparatus 500 and the units included therein and will not be described in detail here.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device or the server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the use range of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining a target convolutional layer in each convolutional layer of the convolutional neural network; determining characteristic images of a plurality of channels of an input target convolutional layer; determining a target number corresponding to the target convolutional layer; performing convolution calculation on the feature images of the target number of channels and the target number of convolution kernels corresponding to the target number of channels in the target convolution layer; and determining the output result of the target convolutional layer according to the result obtained by each convolution calculation.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first determining unit, a second determining unit, a third determining unit, a convolution calculating unit, and a result output unit. The names of the units do not in some cases constitute a limitation on the unit itself, and for example, the first determination unit may also be described as a "unit that determines a target convolutional layer among convolutional layers of a convolutional neural network".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method for processing an image, comprising:

determining a target convolutional layer in each convolutional layer of the convolutional neural network;

determining feature images of a plurality of channels input into the target convolutional layer;

determining a target number corresponding to the target convolutional layer;

performing convolution calculation on the feature images of the target number of channels and the target number of convolution kernels corresponding to the target number of channels in the target convolution layer;

and determining the output result of the target convolutional layer according to the result obtained by each convolution calculation.

2. The method of claim 1, wherein the convolving the feature images of the target number of channels with the target number of convolution kernels in the target convolution layer corresponding to the target number of channels comprises:

and performing convolution calculation on the feature images of the target number of adjacent channels and the target number of convolution kernels corresponding to the target number of adjacent channels in the target convolution layer.

3. The method of claim 1, wherein the method further comprises:

and taking the output result of the target convolutional layer as an input characteristic image of the next layer of the target convolutional layer in the convolutional neural network.

4. The method of claim 3, wherein the method further comprises:

identifying an object included in an image input to a convolutional neural network according to an output result of a last convolutional layer of the convolutional neural network.

5. The method of any one of claims 1-4, wherein the target number is 2 to the power of N, N being a natural number and N being greater than or equal to 1.

6. An apparatus for processing an image, comprising:

a first determination unit configured to determine a target convolutional layer among convolutional layers of the convolutional neural network;

a second determination unit configured to determine feature images of a plurality of channels input to the target convolutional layer;

a third determination unit configured to determine a target number corresponding to the target convolutional layer;

a convolution calculation unit configured to perform convolution calculation on the feature images of the target number of channels and a target number of convolution kernels corresponding to the target number of channels in the target convolution layer;

a result output unit configured to determine an output result of the target convolutional layer according to a result obtained by each convolution calculation.

7. The apparatus of claim 6, wherein the convolution computation unit is further configured to:

8. The apparatus of claim 6, wherein the apparatus further comprises:

a fourth determination unit configured to take an output result of the target convolutional layer as an input feature image of a layer next to the target convolutional layer in a convolutional neural network.

9. The apparatus of claim 8, wherein the apparatus further comprises:

an object recognition unit configured to recognize an object included in an image input to a convolutional neural network according to an output result of a last convolutional layer of the convolutional neural network.

10. The apparatus of any one of claims 6-9, wherein the target number is an nth power of 2, N being a natural number and N being greater than or equal to 1.

11. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.