CN113239898A - Method for processing image, road side equipment and cloud control platform - Google Patents

Method for processing image, road side equipment and cloud control platform Download PDF

Info

Publication number
CN113239898A
CN113239898A CN202110670945.4A CN202110670945A CN113239898A CN 113239898 A CN113239898 A CN 113239898A CN 202110670945 A CN202110670945 A CN 202110670945A CN 113239898 A CN113239898 A CN 113239898A
Authority
CN
China
Prior art keywords
result
image
processing
output result
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110670945.4A
Other languages
Chinese (zh)
Inventor
夏春龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Intelligent Connectivity Beijing Technology Co Ltd
Original Assignee
Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apollo Intelligent Connectivity Beijing Technology Co Ltd filed Critical Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority to CN202110670945.4A priority Critical patent/CN113239898A/en
Publication of CN113239898A publication Critical patent/CN113239898A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a method for processing images, roadside equipment and a cloud control platform, relates to the technical field of computers, and particularly relates to intelligent transportation and computer vision technologies. The specific implementation scheme is as follows: acquiring an image to be processed; inputting the image to be processed to a preset non-down sampling layer to generate a first output result, wherein the number of output channels of the first output result is consistent with the number of input channels; generating a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to the preset non-downsampling layer, wherein the downsampling layer comprises a target number of square matrixes with preset dimensionality, and the square matrixes are used for indicating a pre-trained convolution kernel; and generating an image processing result of the image to be processed based on the second output result.

Description

Method for processing image, road side equipment and cloud control platform
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an intelligent transportation and computer vision technology, and in particular, to a method for processing an image, a roadside device, and a cloud control platform.
Background
In the prior art, conventional models such as ResNet, GoogleNet, Res2Net, SeNet and other optic nerve network models are deployed in large servers due to their excellent performance. But simultaneously, the requirement for operating the video memory is higher because of large parameter quantity, and the video memory can not be directly applied to small-sized terminal equipment.
At present, the mainstream conventional model usually adopts a spindle-shaped or dumbbell-shaped module structure to change the channel number of the characteristic diagram.
Disclosure of Invention
A method, an apparatus, an electronic device, a storage medium, a roadside device and a cloud control platform for processing an image are provided.
According to a first aspect, there is provided a method for processing an image, the method comprising: acquiring an image to be processed; inputting an image to be processed to a preset non-downsampling layer to generate a first output result, wherein the number of output channels of the first output result is consistent with the number of input channels; generating a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to a preset non-downsampling layer, wherein the downsampling layer comprises a target number of square matrixes with preset dimensionality, and the square matrixes are used for indicating a pre-trained convolution kernel; and generating an image processing result of the image to be processed based on the second output result.
According to a second aspect, there is provided an apparatus for processing an image, the apparatus comprising: an acquisition unit configured to acquire an image to be processed; the image processing device comprises a first generation unit, a second generation unit and a third generation unit, wherein the first generation unit is configured to input an image to be processed to a preset non-downsampling layer and generate a first output result, and the number of output channels of the first output result is consistent with the number of input channels; a second generation unit configured to generate a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to a preset non-downsampling layer, wherein the downsampling layer includes a target number of square matrices of preset dimensions, and elements in the square matrices are used for indicating a pre-trained convolution kernel; a third generating unit configured to generate an image processing result of the image to be processed based on the second output result.
According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for enabling a computer to perform the method as described in any one of the implementations of the first aspect.
According to a fifth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, performs the method as described in any one of the implementations of the first aspect.
According to a sixth aspect, there is provided a roadside apparatus including the electronic apparatus as described in the third aspect.
According to a seventh aspect, there is provided a cloud controlled platform comprising an electronic device as described in the third aspect.
According to the technology disclosed by the invention, the input channel number and the output channel number of the features are ensured to be consistent in the preset non-downsampling layer, and the dimension increasing or dimension reducing in the channel number dimension is realized through a plurality of square matrixes with preset dimensions in the corresponding downsampling layer, so that the full rank of the weight matrix can be ensured as far as possible, and the feature extraction effect of the model is improved by reducing information loss.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram of one application scenario in which a method for processing an image of an embodiment of the present disclosure may be implemented;
FIG. 4 is a schematic diagram of an apparatus for processing an image according to an embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device for implementing a method for processing an image according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram 100 illustrating a first embodiment according to the present disclosure. The method for processing an image comprises the steps of:
and S101, acquiring an image to be processed.
In the present embodiment, the execution subject of the method for processing an image can acquire an image to be processed in various ways. As an example, the execution subject may obtain the to-be-processed image from a local or communicatively connected electronic device (e.g., a database server) through a wired connection or a wireless connection. The image to be processed can be flexibly set according to different actual application scenes. As an example, the above-mentioned image to be processed may be an original image as an input of the input layer. As still another example, the image to be processed may be an image output from a hidden layer.
S102, inputting the image to be processed to a preset non-down sampling layer, and generating a first output result.
In this embodiment, the execution main body may input the image to be processed to the preset non-downsampling layer in various ways, and obtain the first output result. And the number of output channels of the first output result is consistent with the number of input channels. As an example, the above correspondence may be the same number. As yet another example, the above-mentioned agreement may also be that the difference in the numbers is not greater than a preset threshold (e.g., the difference is not greater than 2).
It should be noted that, there are studies that when the difference between the number of input channels and the number of output channels is large, the feature expression capability of the model is often deteriorated. According to the scheme, the number of the input channels is consistent with the number of the output channels, and the weight matrix in the non-down sampling layer can be made to be full-rank as far as possible, so that the loss of information quantity is reduced, and the characteristic expression capability is improved.
And S103, generating a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to a preset non-downsampling layer.
In the present embodiment, the execution main body may generate the second output result corresponding to the first output result generated at the above step S102 in various ways based on the first output result generated at the step S102 and the down-sampling layer corresponding to the preset non-down-sampling layer at the above step S102. The downsampling layer may include a target number of square matrices with preset dimensions. The elements in the above-described square matrix may include pre-trained convolution kernel units (kernel) with a scale size of K × K. Wherein the K may be used to characterize the height and width of the convolution kernel unit.
In this embodiment, the downsampling layer corresponding to the predetermined non-downsampling layer may include a downsampling layer connected to the predetermined non-downsampling layer.
In the present embodiment, as an example, the dimension of the filter (filter) included in the above-described down-sampling layer may be (O, C, K). The filter may be transformed into a square matrix of predetermined dimensions. Thus, the rows and columns of the predetermined dimensional square matrix may be used to represent the number of input channels (i.e., C) and the number of output channels (i.e., O) of the convolution kernel corresponding to the downsampling layer, respectively. Each element in the square matrix may include a convolution kernel unit corresponding to the downsampling layer. The scale size of the convolution kernel unit may be (K × K), for example. Wherein the K may be used to characterize the height and width of the convolution kernel unit.
It should be noted that, when the predetermined non-downsampling layer and the corresponding downsampling layer belong to depth separable convolution, the square matrix of the predetermined dimension may be a diagonal matrix. The dimension of the filter included in the down-sampling layer may be (C, K), where the value of C may be the diagonal position of the square matrix.
It is further noted that one or more non-downsampling layers, one or more downsampling layers, and combinations thereof may generally be included in a neural network for image processing. Thus, the connection of the layers in the neural network may take various forms, such as a non-downsampling layer connected to a non-downsampling layer, a non-downsampling layer connected to a downsampling layer, a downsampling layer connected to a non-downsampling layer, a downsampling layer connected to a downsampling layer, and the like, and is not limited herein.
In the present embodiment, as an example, the execution main body may input the first output result generated in the step S102 to a down-sampling layer corresponding to a preset non-down-sampling layer in the step S102, thereby taking the output of the down-sampling layer as a second output result corresponding to the first output result generated in the step S102.
And S104, generating an image processing result of the image to be processed based on the second output result.
In the present embodiment, based on the second output result generated in step S103, the execution subject may generate an image processing result of the image to be processed in various ways. As an example, the executing entity may further process the feature map generated in step S103 as the second output result according to different image processing tasks (e.g., a classification task, a detection task, etc.). The executing body may input the generated second output result to a preset full connection layer, and perform normalization processing, so as to generate an image processing result of the to-be-processed image (for example, a category to which content presented by the to-be-processed image belongs).
According to the method provided by the embodiment of the disclosure, the input channel number and the output channel number of the feature are ensured to be consistent in the preset non-downsampling layer, and the dimensionality increasing or dimensionality reducing of the channel number is realized through the multiple square matrixes with preset dimensionalities in the corresponding downsampling layer, so that the full rank of the weight matrix can be ensured as much as possible, and the feature extraction effect of the model is improved by reducing information loss.
In some optional implementations of this embodiment, the executing body may input the image to be processed to the preset non-downsampling layer according to the following steps to obtain the first output result:
firstly, processing an image to be processed based on a pre-trained depth separable convolution module to generate a convolution result.
In these implementations, the image to be processed is processed based on a pre-trained depth separable convolution module, which can generate the convolution result in various ways.
It should be noted that the depth separable convolution module is generally used to indicate constituent elements of the depth separable convolution network. The plurality of depth-separable convolution modules may form a depth-separable convolution network.
And secondly, processing the convolution result by using a preset first convolution kernel to generate a first output result with the size consistent with that of the convolution result.
In these implementations, as an example, the preset first convolution kernel may be a convolution kernel having a dimension of 1 × 1. So that the execution body can generate a first output result having the same size as the convolution result.
Based on the optional implementation manner, the processing result based on the depth separable convolution module can be subjected to channel fusion by using the preset first convolution core, so that an output result is generated.
Optionally, based on an optional implementation manner described in the first step, the to-be-processed image is processed based on a pre-trained depth separable convolution module, and the executing body may generate a convolution result according to the following steps:
and S1, convolving the image to be processed by using a preset second convolution kernel, and generating an initial convolution result with the size consistent with that of the image to be processed.
In these implementations, the execution subject may perform convolution on the image to be processed by using a preset second convolution kernel, so as to generate an initial convolution result that is consistent with the size of the image to be processed. Here, as an example, the preset second convolution kernel may be a convolution kernel having a dimension of 1 × 1. So that the executing body can generate an initial convolution result with the same size as the image to be processed.
The weight in the preset first convolution kernel may be the same as or different from the weight in the preset second convolution kernel; it may be a preset value or a value obtained by training in advance, and is not limited herein.
And S2, performing post-processing on the initial convolution result to generate a post-processing result.
In these implementations, the execution subject may perform post-processing on the initial convolution result generated in step S1, thereby generating a post-processing result. Wherein the post-processing may comprise at least one of: and (4) batch normalization and activating function processing. As an example, the activation function may be, for example, ReLU 6.
And S3, inputting the post-processing result into a pre-trained deep separable convolution module to generate a convolution result.
In these implementations, the execution subject may input the post-processing result generated at step S2 to the pre-trained deep separable convolution module to generate a convolution result.
Based on the optional implementation manner, the channel feature fusion can be performed by using a preset second convolution kernel, and the post-processed image processing result is input to a pre-trained depth separable convolution module to generate a convolution result, so that the expression capability of the extracted feature can be improved.
In some optional implementations of this embodiment, the target number may be determined according to the number of output channels of the second output result. As an example, the number of output channels is 8, and the number of input channels is 4. The target number may be 8/4-2. Thus, the execution body may adopt two 4 × 4 square matrices instead of one 4 × 8 channel number transformation matrix.
Based on the optional implementation mode, the direct transformation of the channel number can be replaced by the square matrix with a plurality of preset dimensions through the determination of the target number, so that the weight matrix is close to the full rank by reducing the difference value of the input channel number and the output channel number as far as possible, and the improvement of the feature extraction effect of the model is realized by reducing the information loss.
With continued reference to fig. 2, fig. 2 is a schematic diagram 200 according to a second embodiment of the present disclosure. The method for processing an image comprises the steps of:
s201, acquiring an image to be processed.
S202, inputting the image to be processed to a preset non-down sampling layer, and generating a first output result.
And S203, processing the first output result by a preset number of preset non-downsampling layers to generate a processing result.
In this embodiment, the execution subject of the method for processing an image may subject the first output result generated in step S202 described above to a preset number of preset non-downsampled layers to generate a processing result.
In this embodiment, the execution body may process the first output result by a preset number of serially connected preset non-downsampling layers (i.e., an output of a previous non-downsampling layer is used as an input of a next non-downsampling layer), so as to generate a processing result obtained by the preset number of preset non-downsampling layers.
And S204, inputting the processing result to a down-sampling layer corresponding to a preset non-down-sampling layer, and generating a second output result corresponding to the first output result.
In this embodiment, the executing body may input the processing result generated in step S203 to a down-sampling layer corresponding to the preset non-down-sampling layer, and generate a second output result corresponding to the first output result.
And S205, generating an image processing result of the image to be processed based on the second output result.
The steps S201, S202, and S205 may be respectively consistent with the steps S101, S102, and S104 in the foregoing embodiment and their optional implementations, and the above description for the steps S101, S102, and S104 and their optional implementations also applies to the steps S201, S202, and S205, which is not described herein again.
As can be seen from fig. 2, the flow 200 of the method for processing an image in the present embodiment represents a step of processing the first output result through a plurality of preset non-downsampling layers (which is equivalent to concatenating a plurality of non-downsampling layers), generating a processing result, and inputting the generated processing result into a corresponding downsampling layer to obtain a second output result corresponding to the first output result. Therefore, the scheme described in the embodiment provides an image processing method suitable for multi-layer non-down sampling layer series connection, thereby improving the image processing effect.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of a method for processing an image according to an embodiment of the present disclosure. In the application scenario of fig. 3, an electronic device (e.g., a smart terminal) may first acquire an n-channel image to be processed (as shown by 301). The electronic device may then input the image to be processed 301 into a predetermined non-downsampling layer 302, generating a first output result of n channels (as shown in 303). Based on the first output result 303 and a downsampling layer corresponding to the preset non-downsampling layer 302 (as shown in 304), the electronic device may generate a second output result corresponding to the first output result 303 (as shown in 305). Based on the second output result, the electronic device may generate an image processing result (e.g., classification result "car") of the image to be processed 301.
At present, one of the prior arts usually adopts a spindle-shaped or dumbbell-shaped module structure to change the number of channels of the feature map. However, since the above model has a large number of spindle-shaped or dumbbell-shaped module structures, the expression capability of the features is deteriorated due to differences in the number of input/output and channels, which is not favorable for feature extraction of the model. In the method provided by the embodiment of the disclosure, the input channel number and the output channel number of the feature are ensured to be consistent in the preset non-downsampling layer, and the dimension increasing or dimension reducing in the channel number dimension is realized through a plurality of matrixes with preset dimensions in the corresponding downsampling layer, so that the full rank of the weight matrix can be ensured as much as possible, and the feature extraction effect of the model is improved by reducing information loss.
With further reference to fig. 4, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for processing an image, which corresponds to the method embodiment shown in fig. 1 or fig. 2, and which is particularly applicable in various electronic devices.
As shown in fig. 4, the apparatus 400 for processing an image provided by the present embodiment includes an acquisition unit 401, a first generation unit 402, a second generation unit 403, and a third generation unit 404. Wherein, the acquiring unit 401 is configured to acquire an image to be processed; a first generating unit 402, configured to input the image to be processed to a preset non-downsampling layer, and generate a first output result, where the number of output channels of the first output result is consistent with the number of input channels; a second generating unit 403 configured to generate a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to a preset non-downsampling layer, wherein the downsampling layer includes a square matrix of a target number of preset dimensions, and the square matrix is used for indicating a pre-trained convolution kernel; a third generating unit 404 configured to generate an image processing result of the image to be processed based on the second output result.
In the present embodiment, in the apparatus 400 for processing an image: the specific processing of the obtaining unit 401, the first generating unit 402, the second generating unit 403, and the third generating unit 404 and the technical effects thereof can refer to the related descriptions of steps S101, S102, S103, and S104 in the corresponding embodiment of fig. 1, respectively, and are not described herein again.
In some optional implementations of the present embodiment, the first generating unit 402 may include: a first generation module (not shown in the figure) configured to process the image to be processed based on a pre-trained depth separable convolution module, generating a convolution result; and a second generating module (not shown in the figure) configured to process the convolution result by using a preset first convolution kernel, and generate a first output result with the same size as the convolution result.
In some optional implementations of this embodiment, the first generating module may be further configured to: convolving the image to be processed by using a preset second convolution kernel to generate an initial convolution result with the size consistent with that of the image to be processed; and performing post-processing on the initial convolution result to generate a post-processing result, wherein the post-processing comprises at least one of the following items: normalizing batches and activating function processing; and inputting the post-processing result into a pre-trained depth separable convolution module to generate a convolution result.
In some optional implementations of this embodiment, the second generating unit 403 may be further configured to: processing the first output result by a preset number of preset non-downsampling layers to generate a processing result; and inputting the processing result to a downsampling layer corresponding to a preset non-downsampling layer to obtain a second output result corresponding to the first output result.
In some optional implementations of this embodiment, the target number may be determined according to the number of output channels of the second output result.
In the device provided by the above embodiment of the present disclosure, the first generating unit 402 ensures that the number of input channels of the feature is consistent with the number of output channels in the preset non-downsampling layer, and the second generating unit 403 realizes dimension increase or dimension reduction in the channel number dimension through a plurality of square matrices of preset dimensions in the corresponding downsampling layer, so that the full rank of the weight matrix can be ensured as much as possible, thereby reducing information loss and further improving the feature extraction effect of the model.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as the method for processing an image. For example, in some embodiments, the method for processing images may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the method for processing an image described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the method for processing the image by any other suitable means (e.g., by means of firmware).
Optionally, the roadside device may include a communication component and the like in addition to the electronic device, and the electronic device may be integrated with the communication component or may be separately provided. The electronic device may acquire data, such as pictures and videos, from a sensing device (e.g., a roadside camera) for image video processing and data computation. Optionally, the electronic device itself may also have a sensing data acquisition function and a communication function, for example, an AI camera, and the electronic device may directly perform image video processing and data calculation based on the acquired sensing data.
Optionally, the cloud control platform performs processing at the cloud end, and the electronic device included in the cloud control platform may acquire data, such as pictures and videos, of the sensing device (such as a roadside camera), so as to perform image video processing and data calculation; the cloud control platform can also be called a vehicle-road cooperative management platform, an edge computing platform, a cloud computing platform, a central system, a cloud server and the like.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (15)

1. A method for processing an image, comprising:
acquiring an image to be processed;
inputting the image to be processed to a preset non-downsampling layer to generate a first output result, wherein the number of output channels of the first output result is consistent with the number of input channels;
generating a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to the preset non-downsampling layer, wherein the downsampling layer comprises a target number of square matrixes with preset dimensionality, and the square matrixes are used for indicating a pre-trained convolution kernel;
and generating an image processing result of the image to be processed based on the second output result.
2. The method of claim 1, wherein the inputting the image to be processed to a preset non-downsampling layer to obtain a first output result comprises:
processing the image to be processed based on a pre-trained depth separable convolution module to generate a convolution result;
and processing the convolution result by using a preset first convolution core to generate a first output result with the size consistent with that of the convolution result.
3. The method of claim 2, the pre-trained depth-separable convolution-based module processing the image to be processed to generate a convolution result, comprising:
performing convolution on the image to be processed by utilizing a preset second convolution core to generate an initial convolution result with the size consistent with that of the image to be processed;
post-processing the initial convolution result to generate a post-processing result, wherein the post-processing includes at least one of: normalizing batches and activating function processing;
and inputting the post-processing result into the pre-trained depth separable convolution module to generate a convolution result.
4. The method of claim 1, wherein the deriving a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to the preset non-downsampling layer comprises:
processing the first output result by a preset number of preset non-downsampling layers to generate a processing result;
and inputting the processing result to the down-sampling layer corresponding to the preset non-down-sampling layer to obtain a second output result corresponding to the first output result.
5. The method according to one of claims 1 to 4, wherein the target number is determined according to the number of output channels of the second output result.
6. An apparatus for processing an image, comprising:
an acquisition unit configured to acquire an image to be processed;
the first generation unit is configured to input the image to be processed to a preset non-down sampling layer and generate a first output result, wherein the number of output channels of the first output result is consistent with the number of input channels;
a second generation unit configured to generate a second output result corresponding to the first output result based on the first output result and a downsampling layer corresponding to the preset non-downsampling layer, wherein the downsampling layer includes a square matrix of a target number of preset dimensions, and the square matrix is used for indicating a pre-trained convolution kernel;
a third generating unit configured to generate an image processing result of the image to be processed based on the second output result.
7. The apparatus of claim 6, wherein the first generating unit comprises:
a first generation module configured to process the image to be processed based on a pre-trained depth separable convolution module to generate a convolution result;
and the second generation module is configured to utilize a preset first convolution core to process the convolution result and generate a first output result with the size consistent with that of the convolution result.
8. The apparatus of claim 7, wherein the first generation module is further configured to:
performing convolution on the image to be processed by utilizing a preset second convolution core to generate an initial convolution result with the size consistent with that of the image to be processed;
post-processing the initial convolution result to generate a post-processing result, wherein the post-processing includes at least one of: normalizing batches and activating function processing;
and inputting the preprocessing result into the pre-trained depth separable convolution module to generate a convolution result.
9. The apparatus of claim 6, wherein the second generating unit is further configured to:
processing the first output result by a preset number of preset non-downsampling layers to generate a processing result;
and inputting the processing result to the down-sampling layer corresponding to the preset non-down-sampling layer to obtain a second output result corresponding to the first output result.
10. The apparatus according to one of claims 6-9, wherein the target number is determined according to the number of output channels of the second output result.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-5.
14. A roadside apparatus comprising the electronic apparatus of claim 11.
15. A cloud controlled platform comprising the electronic device of claim 11.
CN202110670945.4A 2021-06-17 2021-06-17 Method for processing image, road side equipment and cloud control platform Pending CN113239898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110670945.4A CN113239898A (en) 2021-06-17 2021-06-17 Method for processing image, road side equipment and cloud control platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110670945.4A CN113239898A (en) 2021-06-17 2021-06-17 Method for processing image, road side equipment and cloud control platform

Publications (1)

Publication Number Publication Date
CN113239898A true CN113239898A (en) 2021-08-10

Family

ID=77140154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110670945.4A Pending CN113239898A (en) 2021-06-17 2021-06-17 Method for processing image, road side equipment and cloud control platform

Country Status (1)

Country Link
CN (1) CN113239898A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109996023A (en) * 2017-12-29 2019-07-09 华为技术有限公司 Image processing method and device
US20190220653A1 (en) * 2018-01-12 2019-07-18 Qualcomm Incorporated Compact models for object recognition
CN110059710A (en) * 2018-01-18 2019-07-26 Aptiv技术有限公司 Device and method for carrying out image classification using convolutional neural networks
US20190266387A1 (en) * 2017-01-03 2019-08-29 Boe Technology Group Co., Ltd. Method, device, and computer readable storage medium for detecting feature points in an image
CN111008924A (en) * 2019-12-02 2020-04-14 西安交通大学深圳研究院 Image processing method and device, electronic equipment and storage medium
CN111898733A (en) * 2020-07-02 2020-11-06 西安交通大学 Deep separable convolutional neural network accelerator architecture
WO2020252740A1 (en) * 2019-06-20 2020-12-24 深圳市汇顶科技股份有限公司 Convolutional neural network, face anti-spoofing method, processor chip, and electronic device
AU2020104006A4 (en) * 2020-12-10 2021-02-18 Naval Aviation University Radar target recognition method based on feature pyramid lightweight convolutional neural network
CN112488060A (en) * 2020-12-18 2021-03-12 北京百度网讯科技有限公司 Object detection method, device, apparatus, medium, and program product

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190266387A1 (en) * 2017-01-03 2019-08-29 Boe Technology Group Co., Ltd. Method, device, and computer readable storage medium for detecting feature points in an image
CN109996023A (en) * 2017-12-29 2019-07-09 华为技术有限公司 Image processing method and device
US20190220653A1 (en) * 2018-01-12 2019-07-18 Qualcomm Incorporated Compact models for object recognition
CN110059710A (en) * 2018-01-18 2019-07-26 Aptiv技术有限公司 Device and method for carrying out image classification using convolutional neural networks
WO2020252740A1 (en) * 2019-06-20 2020-12-24 深圳市汇顶科技股份有限公司 Convolutional neural network, face anti-spoofing method, processor chip, and electronic device
CN111008924A (en) * 2019-12-02 2020-04-14 西安交通大学深圳研究院 Image processing method and device, electronic equipment and storage medium
CN111898733A (en) * 2020-07-02 2020-11-06 西安交通大学 Deep separable convolutional neural network accelerator architecture
AU2020104006A4 (en) * 2020-12-10 2021-02-18 Naval Aviation University Radar target recognition method based on feature pyramid lightweight convolutional neural network
CN112488060A (en) * 2020-12-18 2021-03-12 北京百度网讯科技有限公司 Object detection method, device, apparatus, medium, and program product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DANIEL HAASE 等: "Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets", 《ARXIV:2003.13549V3 [CS.CV]》, pages 1 - 10 *
张雨丰;郑忠龙;刘华文;向道红;何小卫;李知菲;何依然;KHODJA ABD ERRAOUF;: "基于特征图切分的轻量级卷积神经网络", 模式识别与人工智能, no. 03, pages 47 - 56 *

Similar Documents

Publication Publication Date Title
CN113379627A (en) Training method of image enhancement model and method for enhancing image
CN113963110B (en) Texture map generation method and device, electronic equipment and storage medium
CN113808044B (en) Encryption mask determining method, device, equipment and storage medium
CN113592932A (en) Training method and device for deep completion network, electronic equipment and storage medium
CN114693934A (en) Training method of semantic segmentation model, video semantic segmentation method and device
CN114494814A (en) Attention-based model training method and device and electronic equipment
CN115690443A (en) Feature extraction model training method, image classification method and related device
CN114120454A (en) Training method and device of living body detection model, electronic equipment and storage medium
CN113920313A (en) Image processing method, image processing device, electronic equipment and storage medium
CN114092708A (en) Characteristic image processing method and device and storage medium
CN113361536A (en) Image semantic segmentation model training method, image semantic segmentation method and related device
CN113344213A (en) Knowledge distillation method, knowledge distillation device, electronic equipment and computer readable storage medium
CN112785501A (en) Method, device and equipment for processing character image and storage medium
CN115496916B (en) Training method of image recognition model, image recognition method and related device
CN113642654B (en) Image feature fusion method and device, electronic equipment and storage medium
CN114494782B (en) Image processing method, model training method, related device and electronic equipment
CN114943995A (en) Training method of face recognition model, face recognition method and device
CN112784967B (en) Information processing method and device and electronic equipment
CN113239898A (en) Method for processing image, road side equipment and cloud control platform
CN114723796A (en) Three-dimensional point cloud generation method and device and electronic equipment
CN114707638A (en) Model training method, model training device, object recognition method, object recognition device, object recognition medium and product
CN114358198A (en) Instance segmentation method and device and electronic equipment
CN113361621A (en) Method and apparatus for training a model
CN114282664A (en) Self-feedback model training method and device, road side equipment and cloud control platform
CN114359905B (en) Text recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210810