CN113239898A

CN113239898A - Method for processing image, road side equipment and cloud control platform

Info

Publication number: CN113239898A
Application number: CN202110670945.4A
Authority: CN
Inventors: 夏春龙
Original assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Current assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-08-10

Abstract

The disclosure provides a method for processing images, roadside equipment and a cloud control platform, relates to the technical field of computers, and particularly relates to intelligent transportation and computer vision technologies. The specific implementation scheme is as follows: acquiring an image to be processed; inputting the image to be processed to a preset non-down sampling layer to generate a first output result, wherein the number of output channels of the first output result is consistent with the number of input channels; generating a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to the preset non-downsampling layer, wherein the downsampling layer comprises a target number of square matrixes with preset dimensionality, and the square matrixes are used for indicating a pre-trained convolution kernel; and generating an image processing result of the image to be processed based on the second output result.

Description

Method for processing image, road side equipment and cloud control platform

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an intelligent transportation and computer vision technology, and in particular, to a method for processing an image, a roadside device, and a cloud control platform.

Background

In the prior art, conventional models such as ResNet, GoogleNet, Res2Net, SeNet and other optic nerve network models are deployed in large servers due to their excellent performance. But simultaneously, the requirement for operating the video memory is higher because of large parameter quantity, and the video memory can not be directly applied to small-sized terminal equipment.

At present, the mainstream conventional model usually adopts a spindle-shaped or dumbbell-shaped module structure to change the channel number of the characteristic diagram.

Disclosure of Invention

A method, an apparatus, an electronic device, a storage medium, a roadside device and a cloud control platform for processing an image are provided.

According to a first aspect, there is provided a method for processing an image, the method comprising: acquiring an image to be processed; inputting an image to be processed to a preset non-downsampling layer to generate a first output result, wherein the number of output channels of the first output result is consistent with the number of input channels; generating a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to a preset non-downsampling layer, wherein the downsampling layer comprises a target number of square matrixes with preset dimensionality, and the square matrixes are used for indicating a pre-trained convolution kernel; and generating an image processing result of the image to be processed based on the second output result.

According to a second aspect, there is provided an apparatus for processing an image, the apparatus comprising: an acquisition unit configured to acquire an image to be processed; the image processing device comprises a first generation unit, a second generation unit and a third generation unit, wherein the first generation unit is configured to input an image to be processed to a preset non-downsampling layer and generate a first output result, and the number of output channels of the first output result is consistent with the number of input channels; a second generation unit configured to generate a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to a preset non-downsampling layer, wherein the downsampling layer includes a target number of square matrices of preset dimensions, and elements in the square matrices are used for indicating a pre-trained convolution kernel; a third generating unit configured to generate an image processing result of the image to be processed based on the second output result.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for enabling a computer to perform the method as described in any one of the implementations of the first aspect.

According to a fifth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, performs the method as described in any one of the implementations of the first aspect.

According to a sixth aspect, there is provided a roadside apparatus including the electronic apparatus as described in the third aspect.

According to a seventh aspect, there is provided a cloud controlled platform comprising an electronic device as described in the third aspect.

According to the technology disclosed by the invention, the input channel number and the output channel number of the features are ensured to be consistent in the preset non-downsampling layer, and the dimension increasing or dimension reducing in the channel number dimension is realized through a plurality of square matrixes with preset dimensions in the corresponding downsampling layer, so that the full rank of the weight matrix can be ensured as far as possible, and the feature extraction effect of the model is improved by reducing information loss.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram of one application scenario in which a method for processing an image of an embodiment of the present disclosure may be implemented;

FIG. 4 is a schematic diagram of an apparatus for processing an image according to an embodiment of the present disclosure;

fig. 5 is a block diagram of an electronic device for implementing a method for processing an image according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram 100 illustrating a first embodiment according to the present disclosure. The method for processing an image comprises the steps of:

and S101, acquiring an image to be processed.

In the present embodiment, the execution subject of the method for processing an image can acquire an image to be processed in various ways. As an example, the execution subject may obtain the to-be-processed image from a local or communicatively connected electronic device (e.g., a database server) through a wired connection or a wireless connection. The image to be processed can be flexibly set according to different actual application scenes. As an example, the above-mentioned image to be processed may be an original image as an input of the input layer. As still another example, the image to be processed may be an image output from a hidden layer.

S102, inputting the image to be processed to a preset non-down sampling layer, and generating a first output result.

In this embodiment, the execution main body may input the image to be processed to the preset non-downsampling layer in various ways, and obtain the first output result. And the number of output channels of the first output result is consistent with the number of input channels. As an example, the above correspondence may be the same number. As yet another example, the above-mentioned agreement may also be that the difference in the numbers is not greater than a preset threshold (e.g., the difference is not greater than 2).

It should be noted that, there are studies that when the difference between the number of input channels and the number of output channels is large, the feature expression capability of the model is often deteriorated. According to the scheme, the number of the input channels is consistent with the number of the output channels, and the weight matrix in the non-down sampling layer can be made to be full-rank as far as possible, so that the loss of information quantity is reduced, and the characteristic expression capability is improved.

And S103, generating a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to a preset non-downsampling layer.

In the present embodiment, the execution main body may generate the second output result corresponding to the first output result generated at the above step S102 in various ways based on the first output result generated at the step S102 and the down-sampling layer corresponding to the preset non-down-sampling layer at the above step S102. The downsampling layer may include a target number of square matrices with preset dimensions. The elements in the above-described square matrix may include pre-trained convolution kernel units (kernel) with a scale size of K × K. Wherein the K may be used to characterize the height and width of the convolution kernel unit.

In this embodiment, the downsampling layer corresponding to the predetermined non-downsampling layer may include a downsampling layer connected to the predetermined non-downsampling layer.

In the present embodiment, as an example, the dimension of the filter (filter) included in the above-described down-sampling layer may be (O, C, K). The filter may be transformed into a square matrix of predetermined dimensions. Thus, the rows and columns of the predetermined dimensional square matrix may be used to represent the number of input channels (i.e., C) and the number of output channels (i.e., O) of the convolution kernel corresponding to the downsampling layer, respectively. Each element in the square matrix may include a convolution kernel unit corresponding to the downsampling layer. The scale size of the convolution kernel unit may be (K × K), for example. Wherein the K may be used to characterize the height and width of the convolution kernel unit.

It should be noted that, when the predetermined non-downsampling layer and the corresponding downsampling layer belong to depth separable convolution, the square matrix of the predetermined dimension may be a diagonal matrix. The dimension of the filter included in the down-sampling layer may be (C, K), where the value of C may be the diagonal position of the square matrix.

It is further noted that one or more non-downsampling layers, one or more downsampling layers, and combinations thereof may generally be included in a neural network for image processing. Thus, the connection of the layers in the neural network may take various forms, such as a non-downsampling layer connected to a non-downsampling layer, a non-downsampling layer connected to a downsampling layer, a downsampling layer connected to a non-downsampling layer, a downsampling layer connected to a downsampling layer, and the like, and is not limited herein.

In the present embodiment, as an example, the execution main body may input the first output result generated in the step S102 to a down-sampling layer corresponding to a preset non-down-sampling layer in the step S102, thereby taking the output of the down-sampling layer as a second output result corresponding to the first output result generated in the step S102.

And S104, generating an image processing result of the image to be processed based on the second output result.

In the present embodiment, based on the second output result generated in step S103, the execution subject may generate an image processing result of the image to be processed in various ways. As an example, the executing entity may further process the feature map generated in step S103 as the second output result according to different image processing tasks (e.g., a classification task, a detection task, etc.). The executing body may input the generated second output result to a preset full connection layer, and perform normalization processing, so as to generate an image processing result of the to-be-processed image (for example, a category to which content presented by the to-be-processed image belongs).

According to the method provided by the embodiment of the disclosure, the input channel number and the output channel number of the feature are ensured to be consistent in the preset non-downsampling layer, and the dimensionality increasing or dimensionality reducing of the channel number is realized through the multiple square matrixes with preset dimensionalities in the corresponding downsampling layer, so that the full rank of the weight matrix can be ensured as much as possible, and the feature extraction effect of the model is improved by reducing information loss.

In some optional implementations of this embodiment, the executing body may input the image to be processed to the preset non-downsampling layer according to the following steps to obtain the first output result:

firstly, processing an image to be processed based on a pre-trained depth separable convolution module to generate a convolution result.

In these implementations, the image to be processed is processed based on a pre-trained depth separable convolution module, which can generate the convolution result in various ways.

It should be noted that the depth separable convolution module is generally used to indicate constituent elements of the depth separable convolution network. The plurality of depth-separable convolution modules may form a depth-separable convolution network.

And secondly, processing the convolution result by using a preset first convolution kernel to generate a first output result with the size consistent with that of the convolution result.

In these implementations, as an example, the preset first convolution kernel may be a convolution kernel having a dimension of 1 × 1. So that the execution body can generate a first output result having the same size as the convolution result.

Based on the optional implementation manner, the processing result based on the depth separable convolution module can be subjected to channel fusion by using the preset first convolution core, so that an output result is generated.

Optionally, based on an optional implementation manner described in the first step, the to-be-processed image is processed based on a pre-trained depth separable convolution module, and the executing body may generate a convolution result according to the following steps:

and S1, convolving the image to be processed by using a preset second convolution kernel, and generating an initial convolution result with the size consistent with that of the image to be processed.

In these implementations, the execution subject may perform convolution on the image to be processed by using a preset second convolution kernel, so as to generate an initial convolution result that is consistent with the size of the image to be processed. Here, as an example, the preset second convolution kernel may be a convolution kernel having a dimension of 1 × 1. So that the executing body can generate an initial convolution result with the same size as the image to be processed.

The weight in the preset first convolution kernel may be the same as or different from the weight in the preset second convolution kernel; it may be a preset value or a value obtained by training in advance, and is not limited herein.

And S2, performing post-processing on the initial convolution result to generate a post-processing result.

In these implementations, the execution subject may perform post-processing on the initial convolution result generated in step S1, thereby generating a post-processing result. Wherein the post-processing may comprise at least one of: and (4) batch normalization and activating function processing. As an example, the activation function may be, for example, ReLU 6.

And S3, inputting the post-processing result into a pre-trained deep separable convolution module to generate a convolution result.

In these implementations, the execution subject may input the post-processing result generated at step S2 to the pre-trained deep separable convolution module to generate a convolution result.

Based on the optional implementation manner, the channel feature fusion can be performed by using a preset second convolution kernel, and the post-processed image processing result is input to a pre-trained depth separable convolution module to generate a convolution result, so that the expression capability of the extracted feature can be improved.

In some optional implementations of this embodiment, the target number may be determined according to the number of output channels of the second output result. As an example, the number of output channels is 8, and the number of input channels is 4. The target number may be 8/4-2. Thus, the execution body may adopt two 4 × 4 square matrices instead of one 4 × 8 channel number transformation matrix.

Based on the optional implementation mode, the direct transformation of the channel number can be replaced by the square matrix with a plurality of preset dimensions through the determination of the target number, so that the weight matrix is close to the full rank by reducing the difference value of the input channel number and the output channel number as far as possible, and the improvement of the feature extraction effect of the model is realized by reducing the information loss.

With continued reference to fig. 2, fig. 2 is a schematic diagram 200 according to a second embodiment of the present disclosure. The method for processing an image comprises the steps of:

s201, acquiring an image to be processed.

S202, inputting the image to be processed to a preset non-down sampling layer, and generating a first output result.

And S203, processing the first output result by a preset number of preset non-downsampling layers to generate a processing result.

In this embodiment, the execution subject of the method for processing an image may subject the first output result generated in step S202 described above to a preset number of preset non-downsampled layers to generate a processing result.

In this embodiment, the execution body may process the first output result by a preset number of serially connected preset non-downsampling layers (i.e., an output of a previous non-downsampling layer is used as an input of a next non-downsampling layer), so as to generate a processing result obtained by the preset number of preset non-downsampling layers.

And S204, inputting the processing result to a down-sampling layer corresponding to a preset non-down-sampling layer, and generating a second output result corresponding to the first output result.

In this embodiment, the executing body may input the processing result generated in step S203 to a down-sampling layer corresponding to the preset non-down-sampling layer, and generate a second output result corresponding to the first output result.

And S205, generating an image processing result of the image to be processed based on the second output result.

The steps S201, S202, and S205 may be respectively consistent with the steps S101, S102, and S104 in the foregoing embodiment and their optional implementations, and the above description for the steps S101, S102, and S104 and their optional implementations also applies to the steps S201, S202, and S205, which is not described herein again.

As can be seen from fig. 2, the flow 200 of the method for processing an image in the present embodiment represents a step of processing the first output result through a plurality of preset non-downsampling layers (which is equivalent to concatenating a plurality of non-downsampling layers), generating a processing result, and inputting the generated processing result into a corresponding downsampling layer to obtain a second output result corresponding to the first output result. Therefore, the scheme described in the embodiment provides an image processing method suitable for multi-layer non-down sampling layer series connection, thereby improving the image processing effect.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of a method for processing an image according to an embodiment of the present disclosure. In the application scenario of fig. 3, an electronic device (e.g., a smart terminal) may first acquire an n-channel image to be processed (as shown by 301). The electronic device may then input the image to be processed 301 into a predetermined non-downsampling layer 302, generating a first output result of n channels (as shown in 303). Based on the first output result 303 and a downsampling layer corresponding to the preset non-downsampling layer 302 (as shown in 304), the electronic device may generate a second output result corresponding to the first output result 303 (as shown in 305). Based on the second output result, the electronic device may generate an image processing result (e.g., classification result "car") of the image to be processed 301.

At present, one of the prior arts usually adopts a spindle-shaped or dumbbell-shaped module structure to change the number of channels of the feature map. However, since the above model has a large number of spindle-shaped or dumbbell-shaped module structures, the expression capability of the features is deteriorated due to differences in the number of input/output and channels, which is not favorable for feature extraction of the model. In the method provided by the embodiment of the disclosure, the input channel number and the output channel number of the feature are ensured to be consistent in the preset non-downsampling layer, and the dimension increasing or dimension reducing in the channel number dimension is realized through a plurality of matrixes with preset dimensions in the corresponding downsampling layer, so that the full rank of the weight matrix can be ensured as much as possible, and the feature extraction effect of the model is improved by reducing information loss.

With further reference to fig. 4, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for processing an image, which corresponds to the method embodiment shown in fig. 1 or fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 4, the apparatus 400 for processing an image provided by the present embodiment includes an acquisition unit 401, a first generation unit 402, a second generation unit 403, and a third generation unit 404. Wherein, the acquiring unit 401 is configured to acquire an image to be processed; a first generating unit 402, configured to input the image to be processed to a preset non-downsampling layer, and generate a first output result, where the number of output channels of the first output result is consistent with the number of input channels; a second generating unit 403 configured to generate a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to a preset non-downsampling layer, wherein the downsampling layer includes a square matrix of a target number of preset dimensions, and the square matrix is used for indicating a pre-trained convolution kernel; a third generating unit 404 configured to generate an image processing result of the image to be processed based on the second output result.

In the present embodiment, in the apparatus 400 for processing an image: the specific processing of the obtaining unit 401, the first generating unit 402, the second generating unit 403, and the third generating unit 404 and the technical effects thereof can refer to the related descriptions of steps S101, S102, S103, and S104 in the corresponding embodiment of fig. 1, respectively, and are not described herein again.

In some optional implementations of the present embodiment, the first generating unit 402 may include: a first generation module (not shown in the figure) configured to process the image to be processed based on a pre-trained depth separable convolution module, generating a convolution result; and a second generating module (not shown in the figure) configured to process the convolution result by using a preset first convolution kernel, and generate a first output result with the same size as the convolution result.

In some optional implementations of this embodiment, the first generating module may be further configured to: convolving the image to be processed by using a preset second convolution kernel to generate an initial convolution result with the size consistent with that of the image to be processed; and performing post-processing on the initial convolution result to generate a post-processing result, wherein the post-processing comprises at least one of the following items: normalizing batches and activating function processing; and inputting the post-processing result into a pre-trained depth separable convolution module to generate a convolution result.

In some optional implementations of this embodiment, the second generating unit 403 may be further configured to: processing the first output result by a preset number of preset non-downsampling layers to generate a processing result; and inputting the processing result to a downsampling layer corresponding to a preset non-downsampling layer to obtain a second output result corresponding to the first output result.

In some optional implementations of this embodiment, the target number may be determined according to the number of output channels of the second output result.

In the device provided by the above embodiment of the present disclosure, the first generating unit 402 ensures that the number of input channels of the feature is consistent with the number of output channels in the preset non-downsampling layer, and the second generating unit 403 realizes dimension increase or dimension reduction in the channel number dimension through a plurality of square matrices of preset dimensions in the corresponding downsampling layer, so that the full rank of the weight matrix can be ensured as much as possible, thereby reducing information loss and further improving the feature extraction effect of the model.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as the method for processing an image. For example, in some embodiments, the method for processing images may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the method for processing an image described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the method for processing the image by any other suitable means (e.g., by means of firmware).

Optionally, the roadside device may include a communication component and the like in addition to the electronic device, and the electronic device may be integrated with the communication component or may be separately provided. The electronic device may acquire data, such as pictures and videos, from a sensing device (e.g., a roadside camera) for image video processing and data computation. Optionally, the electronic device itself may also have a sensing data acquisition function and a communication function, for example, an AI camera, and the electronic device may directly perform image video processing and data calculation based on the acquired sensing data.

Optionally, the cloud control platform performs processing at the cloud end, and the electronic device included in the cloud control platform may acquire data, such as pictures and videos, of the sensing device (such as a roadside camera), so as to perform image video processing and data calculation; the cloud control platform can also be called a vehicle-road cooperative management platform, an edge computing platform, a cloud computing platform, a central system, a cloud server and the like.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for processing an image, comprising:

acquiring an image to be processed;

inputting the image to be processed to a preset non-downsampling layer to generate a first output result, wherein the number of output channels of the first output result is consistent with the number of input channels;

generating a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to the preset non-downsampling layer, wherein the downsampling layer comprises a target number of square matrixes with preset dimensionality, and the square matrixes are used for indicating a pre-trained convolution kernel;

and generating an image processing result of the image to be processed based on the second output result.

2. The method of claim 1, wherein the inputting the image to be processed to a preset non-downsampling layer to obtain a first output result comprises:

processing the image to be processed based on a pre-trained depth separable convolution module to generate a convolution result;

and processing the convolution result by using a preset first convolution core to generate a first output result with the size consistent with that of the convolution result.

3. The method of claim 2, the pre-trained depth-separable convolution-based module processing the image to be processed to generate a convolution result, comprising:

performing convolution on the image to be processed by utilizing a preset second convolution core to generate an initial convolution result with the size consistent with that of the image to be processed;

post-processing the initial convolution result to generate a post-processing result, wherein the post-processing includes at least one of: normalizing batches and activating function processing;

and inputting the post-processing result into the pre-trained depth separable convolution module to generate a convolution result.

4. The method of claim 1, wherein the deriving a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to the preset non-downsampling layer comprises:

processing the first output result by a preset number of preset non-downsampling layers to generate a processing result;

and inputting the processing result to the down-sampling layer corresponding to the preset non-down-sampling layer to obtain a second output result corresponding to the first output result.

5. The method according to one of claims 1 to 4, wherein the target number is determined according to the number of output channels of the second output result.

6. An apparatus for processing an image, comprising:

an acquisition unit configured to acquire an image to be processed;

the first generation unit is configured to input the image to be processed to a preset non-down sampling layer and generate a first output result, wherein the number of output channels of the first output result is consistent with the number of input channels;

a second generation unit configured to generate a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to the preset non-downsampling layer, wherein the downsampling layer includes a square matrix of a target number of preset dimensions, and the square matrix is used for indicating a pre-trained convolution kernel;

a third generating unit configured to generate an image processing result of the image to be processed based on the second output result.

7. The apparatus of claim 6, wherein the first generating unit comprises:

a first generation module configured to process the image to be processed based on a pre-trained depth separable convolution module to generate a convolution result;

and the second generation module is configured to utilize a preset first convolution core to process the convolution result and generate a first output result with the size consistent with that of the convolution result.

8. The apparatus of claim 7, wherein the first generation module is further configured to:

and inputting the preprocessing result into the pre-trained depth separable convolution module to generate a convolution result.

9. The apparatus of claim 6, wherein the second generating unit is further configured to:

10. The apparatus according to one of claims 6-9, wherein the target number is determined according to the number of output channels of the second output result.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.

13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-5.

14. A roadside apparatus comprising the electronic apparatus of claim 11.

15. A cloud controlled platform comprising the electronic device of claim 11.